In this poc source database will be MySQL and target will be GCP BigQuery. We will use Oracle GoldenGate for MySQL on source and on target server Oracle GoldenGate Big Data → BigQuery Handler.
Note:- In my case both source and target server are on same server.
Below are Perquisite need to be done on source server where oracle or MySQL DB is running.
Below are Perquisite need to be done on Target server where we have to install Oracle GoldenGate Bigdata.
First we will do all the activity on source server like create initial load extract process, cdc extract process, cdc pump process.
Step 1:- Create initial load extract, file to replicat method.
GGSCI (pocgg) 9> edit param EINLOADGGSCI (pocgg) 10> view param EINLOADSOURCEISTABLE
SOURCEDB sourcegg@pocgg, USERID ggsuser_S, PASSWORD Ggsuser$1234
RMTHOST 10.128.0.6, MGRPORT 7820
RMTFILE /GGbigdata/OGGHOME/dirdat/ia, megabytes 200
TABLE sourcegg.categories;
TABLE sourcegg.customers;
TABLE sourcegg.departments;
TABLE sourcegg.order_items;
TABLE sourcegg.orders;
TABLE sourcegg.products;
Step 2:- Create and Run change data capture (cdc) extract process.
GGSCI (pocgg) 14> edit param EGCPBIGGGSCI (pocgg) 15> view param EGCPBIGEXTRACT EGCPBIG
SOURCEDB sourcegg@pocgg, USERID ggsuser_S, PASSWORD Ggsuser$1234
DISCARDFILE ./dirrpt/egcpbig.dsc, append, MEGABYTES 10
TRANLOGOPTIONS ALTLOGDEST "/var/lib/mysql/binlog.index"
EXTTRAIL /oggmysql/OGG19c/dirdat/ed
TABLE sourcegg.categories;
TABLE sourcegg.customers;
TABLE sourcegg.departments;
TABLE sourcegg.order_items;
TABLE sourcegg.orders;
TABLE sourcegg.products;Before starting extract process check for supplemental logging on table. For MySQL ADD trandata owner.table name doesn't work. By default Goldengate determine column identifier based on
1. primary key
2. First unique key alphanumerically that does not contain a timestamp or non-materialized computed column
3.If Primary or Unique Key doesn't exist then Oracle GoldenGate constructs a pseudo key of all columns that the database allows to be used in a unique key. GGSCI (pocgg) 5> dblogin SOURCEDB sourcegg@pocgg, USERID ggsuser_S, PASSWORD Ggsuser$1234
Successfully logged into database.In our case all table have a PK column. Now Add and start the extract process GGSCI (pocgg DBLOGIN as ggsuser_S) 12> ADD EXTRACT EGCPBIG, TRANLOG, BEGIN NOW
EXTRACT added.GGSCI (pocgg DBLOGIN as ggsuser_S) 13> ADD EXTTRAIL /oggmysql/OGG19c/dirdat/ed , EXTRACT EGCPBIG
EXTTRAIL added.GGSCI (pocgg DBLOGIN as ggsuser_S) 14> info allProgram Status Group Lag at Chkpt Time Since ChkptMANAGER RUNNING
EXTRACT STOPPED EGCPBIG 00:00:00 00:00:07
REPLICAT RUNNING RCDC1 00:00:00 00:00:02
Step 3:- Create and Run change data capture pump process.
GGSCI (pocgg) 8> edit param PUMPGCPGGSCI (pocgg) 9> view param PUMPGCPEXTRACT PUMPGCP
SOURCEDB sourcegg@pocgg, USERID ggsuser_S, PASSWORD Ggsuser$1234
DISCARDFILE ./dirrpt/pumpgcp.dsc, append, MEGABYTES 10
RMTHOST 10.128.0.6, MGRPORT 7820
RMTTRAIL /GGbigdata/OGGHOME/dirdat/rd
TABLE sourcegg.categories;
TABLE sourcegg.customers;
TABLE sourcegg.departments;
TABLE sourcegg.order_items;
TABLE sourcegg.orders;
TABLE sourcegg.products;GGSCI (pocgg) 27> ADD EXTRACT PUMPGCP, EXTTRAILSOURCE /oggmysql/OGG19c/dirdat/ed
EXTRACT added.GGSCI (pocgg) 28> ADD RMTTRAIL /GGbigdata/OGGHOME/dirdat/rd , EXTRACT PUMPGCP
RMTTRAIL added.
Step 4:- Now Start initial load extract process.
[opc@pocgg OGG19c]$ extract paramfile /oggmysql/OGG19c/dirprm/einload.prm reportfile /oggmysql/OGG19c/dirrpt/einload.rptCheck file size of trail
[opc@pocgg OGG19c]$ ls -ltrh /GGbigdata/OGGHOME/dirdat/ia*
-rw-r-----. 1 opc mysql 30M Nov 20 18:18 /GGbigdata/OGGHOME/dirdat/ia000000Also check report file of all status
B. Now we create initial load and cdc replicat process on target server.
Step 5:- Create Initial load Replicat process for Big Query Adapter.
We need to download latest version of the Java Client library for BigQuery below is the link.
https://developers.google.com/api-client-library/java/apis/bigquery/v2I will share list of Java Client library jar, keep all below jar in a directory (/home/opc/bigquery/libs/) this path will be value for gg.classpath property.
Eg:- gg.classpath=/home/opc/bigquery/libs/*
In bigquery.props file we have to get value of property from GCP console.
Eg:- gg.handler.bigquery.projectId=oracle-******8106
For "gg.handler.bigquery.datasetId" property go to BigQuery Console. On left side you will see your project id detail, click on it, Then create dataset give it a name and location. So value for gg.handler.bigquery.datasetId=CDC_GG
for "gg.handler.bigquery.credentialsFile" property Go to IAM & Admin console, then go to Service Accounts and create a new service account and download the file. Transfer that file to your VM.
gg.handler.bigquery.credentialsFile=/home/opc/oracle-****8106-717fb0cdacad.json
Final property file will be look something like this.
Add property to initial load replicat and start it. Run the command
replicat paramfile /GGbigdata/OGGHOME/dirprm/rilogcp.prm reportfile /GGbigdata/OGGHOME/dirrpt/rilogcp.rpt
initial load replication was successful.
We are able to see data in Big Query.
Step 6:- Create CDC Replicat process for Big Query Adapter.
Now do some transaction on source database.
Now start your own load testing from here on.
Note:
1. Initial load for larger table will take long time using GoldenGate.
2. If Oracle GoldenGate for RDBMS database is present Client DC and and you have installed GoldenGate Bigdata on GCP VM, then make sure network speed between these VM is high. Other wise you may face lag in pump process.