Tag Archives: BigData

Moving data from oracle to HDFS

I divided the scope into two stages:

Stage -1 | Initial data load from Oracle database to HDFS (for bulk load)

  • Here I used oracle native utility (Copy2hadoop) for reducing the friction

Stage -2 | Synchronize data from Oracle to Hadoop (delta changes)

  • Here I used SharePlex, comparatively effective third party tool (Quest/Dell) for data replication

Stage -1 | Initial data load from Oracle database to HDFS

Generally, we have two tools for data movement from the Oracle Database to HDFS

Sqoop and Copy2Hadoop :

Copy2Hadoop has advantages over sqoop, In my case we have complex query(Functions, PL/SQL, Joins) or even views (like a sub-case for complex queries). So, it is recommended to use Copy2Hadoop for the initial load. Since it is oracle native utility it will save our time and effort.

Copy2Hadoop creates datapump files that could have any Oracle datatype which is supported by Oracle External table and store data in Oracle’s datatype. When we convert datatype from Oracle format to the Java format (like sqoop does) there is always risk that some information would be converted incorrectly. Copy2Hadoop insures us from that.

Below are the high-level steps to copy the data:

Note:
Create Oracle Data Pump files from Oracle Database table data that we want to move. 
We can do this by exporting the data from the Oracle Database table using the oracle_datapump
access driver (Using Oracle Data Pump files generated by other database utilities will
not be accessible by Hive tables.  We need to use the oracle_datapump access driver to export the data.)

• Create Oracle Data Pump files and the ORACLE_DATAPUMP access driver.

• Copy the Oracle Data Pump files from source system to HDFS.

• Create Hive tables to read the Oracle Data Pump files.

What Is Copy to Hadoop?
Oracle Big Data SQL includes the Oracle Copy to Hadoop utility. This utility makes it simple to identify and copy Oracle data to the Hadoop Distributed File System. It can be accessed either through a command-line interface or via Oracle SQL Developer. Data exported to the Hadoop cluster by Copy to Hadoop is stored in Oracle Data Pump format. This format optimizes queries thru Big Data SQL:

• The data is stored as Oracle data types – eliminating data type conversions

• The data is queried directly – without requiring the overhead associated with Java SerDes

After generating Data Pump format files from the tables and copying the files to HDFS, you can use Apache Hive to query the data. Hive can process the data locally without accessing Oracle Database. When the Oracle table changes, you can refresh the copy in Hadoop. Copy to Hadoop is primarily useful for Oracle tables that are relatively static, and thus do not require frequent refreshes.

Copy to Hadoop is licensed under Oracle Big Data SQL. You must have an Oracle Big Data SQL license in order to use utility

Stage -2 | Synchronize data from Oracle to Hadoop

Using SharePlex to replicate the data in near real-time.

What is SharePlex ?

SharePlex is the complete enterprise solution for Oracle database replication, including high availability, scalability, data integration and reporting use cases.

Why SharePlex ?

• Redo Log Based Database Replication
• Read the redo log for changes
• Ship the changes from source to target in near-real time
• Requires source and target have identical data to start
• Backup/copy consistent to an SCN
• Maintain synchronization between databases

SharePlex architecture

Oracle Multi-tenant Application Containers – Create Application Container:

 Application containers:
An application container consists of an application root and one or more
application PDBs. The container stores data for a specific application, which
contains common data and metadata. You can upgrade or patch the
application once in the application root, and then synchronize the application
PDBs with the root.

Annotation 2019-02-03 141100

Reference book:
Oracle Database 12c Release 2 Multitenant by Franck Pachot, Vit Špinka, Anton Els.

Create Application root:

Login to the CDB root as SYSDBA and create an application root named shan_app.

[oracle@multisrv1 ~]$ ps -ef|grep pmon
grid 4986 1 0 22:10 ? 00:00:00 asm_pmon_+ASM
oracle 5086 1 0 22:10 ? 00:00:00 ora_pmon_CDB1
oracle 6241 6216 0 22:39 pts/0 00:00:00 grep pmon
[oracle@multisrv1 ~]$ sqlplus sys/oracle@cdb1 as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Sun Feb 3 22:39:37 2019

Copyright (c) 1982, 2016, Oracle. All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

CREATE PLUGGABLE DATABASE shan_app AS APPLICATION CONTAINER
ADMIN USER shan_appadm IDENTIFIED BY oracle;
ALTER PLUGGABLE DATABASE shan_app OPEN;
col name format a10
SELECT CON_ID, NAME, OPEN_MODE
FROM V$PDBS WHERE APPLICATION_ROOT='YES';
CON_ID NAME OPEN_MODE
---------- ---------- ----------
5 SHAN_APP READ ONLY

— save the state of shan_app (otherwise, it will be closed when you reboot):

ALTER PLUGGABLE DATABASE shan_app SAVE STATE;

View the tablespaces and the datafiles created for the application container shan_app.
Obtain the CON_ID value from the query in the previous step.
Observe that the tablespaces SYSTEM, SYSAUX, and UNDOTBS1 are created.

SQL> SELECT FILE_NAME, TABLESPACE_NAME FROM CDB_DATA_FILES WHERE CON_ID=5;

FILE_NAME
--------------------------------------------------------------------------------
TABLESPACE_NAME
------------------------------
+DATA/CDB1/80F305B39D922445E0535800A8C090A6/DATAFILE/system.275.999230255
SYSTEM

+DATA/CDB1/80F305B39D922445E0535800A8C090A6/DATAFILE/sysaux.276.999230255
SYSAUX

+DATA/CDB1/80F305B39D922445E0535800A8C090A6/DATAFILE/undotbs1.274.999230255
UNDOTBS1


Switch the current container to the application root and check which privilege is granted to SHAN_APPADM
Observe that the application root administrator is only granted the role PDB_DBA. This role has three privileges granted to it.
ALTER SESSION SET CONTAINER=SHAN_APP;
-- check the roles granted to the user:
col grantee format a10
col granted_role format a15
SELECT GRANTEE, GRANTED_ROLE, COMMON
FROM DBA_ROLE_PRIVS where GRANTEE ='SHAN_APPADM';
-- check the privileges granted to the role:
col role format a10
col privilege format a30
SELECT ROLE, PRIVILEGE, ADMIN_OPTION, COMMON, INHERITED
FROM ROLE_SYS_PRIVS WHERE ROLE='PDB_DBA';

Configure the tnsnames.ora file to allow connecting to the application root through the listener.

vi tnsnames.ora
SHAN_APP =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = srv1)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = shan_app.localdomain)
)
)

Test the configuration

conn sys/oracle@shan_app as sysdba

Install an application in the application root

Login to the application root as SYSDBA.

conn sys/oracle@shan_app as sysdba

Begin installing the HR_APP application.
Note: When you install a new application in an application root, you always start with this statement.

ALTER PLUGGABLE DATABASE APPLICATION hr_app BEGIN INSTALL '1.0';

Create the application tablespace (HR_TBS) and create the application owner user (HR).
The application owner (HR) should have the privileges enough to control the application objects, like the application tables, indexes, sequences, PL/SQL program units… etc.
The application owner is common user in the application root.

CREATE TABLESPACE hr_tbs;
CREATE USER HR IDENTIFIED BY oracle DEFAULT TABLESPACE HR_TBS QUOTA UNLIMITED ON HR_TBS CONTAINER = ALL;
GRANT CREATE SESSION, CREATE TABLE, CREATE SEQUENCE, CREATE VIEW, CREATE PROCEDURE, CREATE TRIGGER TO HR;

Switch the current schema to the application user.

ALTER SESSION SET CURRENT_SCHEMA=hr;

Run the script file.

@<run the sql for the appication objects creation>.sql

End the application installation, if all the commands in the script successfully run.

ALTER PLUGGABLE DATABASE APPLICATION hr_app END INSTALL '1.0';

Verify that the application has been successfully created.

column app_name format a15
column app_version format a10
column app_status format a15
SELECT APP_NAME, APP_VERSION, APP_STATUS FROM DBA_APPLICATIONS WHERE APP_IMPLICIT='N';