Change Data Delivery Combines InfoSphere Change Data Capture (ICDC) and DataStage

Posted by Frank Fillmore on August 9, 2010 under DB2 for i, InfoSphere. Tags: , , , , , , , , .

InfoSphere Change Data Delivery combines legacy products from two different IBM acquisitions: Data Mirror Transformation Server (now InfoSphere Change Data Capture) and Ascential DataStage (now InfoSphere DataStage).  A recent customer engagement highlights the benefits – and a few challenges – of this hybrid replication/ETL (extract, transform, load) combination.

The B2B eCommerce business problem: a large, privately-held regional retailer wanted to gather data from each of their retail outlets into a single web presence.  This “bricks and clicks” approach would enable just-in-time delivery of their products from either a warehouse or the nearest store to the businesses that install the products for end-customers.  Each store had a small System i to handle local transactions.  But there are hundreds of stores.  Standard replication from each of the stores to a single website repository would be unwieldy at best.  Coupled with the astronomical licensing charges for replication software at each of the stores, direct replication was a technical and financial non-starter.

The IBM Change Data Delivery (CDD) offering makes local replication “free”.  InfoSphere Change Data Capture (ICDC) software can be installed on each of the (in this case) System i store transaction servers at no cost.  Data is replicated from journaled application tables/physical files to Consistent Change Data (CCD) tables on the same server.  Then DataStage reaches up periodically from a mid-tier server to gather the data from the CCD tables and populate the web application database – in this case MySQL.  Once the data is loaded into MySQL from a CCD table, the CCD table is pruned.

We employ ICDC Live Audit replication so that an application table INSERT, followed immediately by an UPDATE to the same row, followed by a DELETE of that row would record three distinct rows in the CCD tables.  Each would be identified by a timestamp and an “entity type” (e.g. INSERT, UPDATE) of the activity.  DataStage reads the Live Audit records and makes the appropriate change to the target MySQL database tables.

There are 17 application tables of interest: 10 on the warehouse System i and 7 on each of the store System i servers.  So there are 17 DataStage jobs.  Each DataStage job has four Stages or nodes:

  • the first ODBC stage that uses an SQL SELECT to read from a particular System i CCD table
  • a Transform stage to place each of the Live Audit row types (i.e. INSERT, UPDATE, DELETE) on one of three links to a second ODBC stage
  • the second ODBC stage with the appropriate MySQL Data Manipulation Language (DML) statement (i.e. INSERT, UPDATE, DELETE) for each link
  • a third and final ODBC stage branching from the Transform stage to prune the CCD table

All of the DataStage jobs have exactly the same structure.  They had to be exceedingly efficient because we are executing them so frequently (~ one per second).

DataStage Job

DataStage Job

The entity types in the ICDC Live Audit CCD tables are as follows:

  • RR – an INSERT record caused by the refresh of a table in a subscription
  • PT – a regular application INSERT
  • PX – an extra INSERT (found only on the System i)
  • UB – before image of an UPDATE
  • UP – after image of an UPDATE
  • DL – a DELETE

There are also two Live Audit control records: RS (which signals a full refresh) and CR (which is equivalent to a TRUNCATE) which I was able to ignore.

The “secret sauce” is the SQL SELECT statement used to gather the data from each CCD table on the System i servers.  Read More…