Feeding a Netezza Warehouse

Posted by Frank Fillmore on March 9, 2011 under InfoSphere, Netezza, Q-Replication. Tags: , , , .

By now you’ve probably heard that IBM has acquired Netezza.  So how do I get my data into the darn thing?

We’ve worked a lot with IBM change data capture technologies: SQL Replication, Q Replication, and InfoSphere Change Data Capture (ICDC).  We also work with InfoSphere Warehouse, IBM’s internally developed data warehousing platform.  Usually we use one of the former to feed the latter.  As changes occur in a transactional database, the deltas are shipped near-real-time to the target data warehouse.  There might be some transformation, cleansing, or other manipulation (that’s the “T” in “ETL”) for which we would use DataStage, but a (surprisingly large) number of customers just copy data to the reporting and analysis platform with little massaging.

Enter Netezza.  The usual “drip-feed” (i.e. as soon as it changes in the source, send it to and update the target) methods listed above don’t play as well.  Netezza is optimized for bulk data loading.  One-row-at-a-time INSERT/UPDATE/DELETE propagation by a replication technology isn’t efficient.  The solution is to create intermediate mini-bulk delimited files that can be ingested, say, every five minutes.

There are a couple of ways to create these files.  One is to use InfoSphere Data Event Publisher (EP).  Think of EP as Q Replication without the Apply component that posts the deltas to a target database.  EP can publish XML or delimited files.  Viola!  Our Netezza problem is solved.  If you need complex business rules applied to transform the data, use ICDC or Q Replication to feed DataStage and have it created the delimited files.