Feeding a Netezza Warehouse

Posted by Frank Fillmore on March 9, 2011 under InfoSphere, Netezza, Q-Replication. Tags: , , , .

By now you’ve probably heard that IBM has acquired Netezza.  So how do I get my data into the darn thing?

We’ve worked a lot with IBM change data capture technologies: SQL Replication, Q Replication, and InfoSphere Change Data Capture (ICDC).  We also work with InfoSphere Warehouse, IBM’s internally developed data warehousing platform.  Usually we use one of the former to feed the latter.  As changes occur in a transactional database, the deltas are shipped near-real-time to the target data warehouse.  There might be some transformation, cleansing, or other manipulation (that’s the “T” in “ETL”) for which we would use DataStage, but a (surprisingly large) number of customers just copy data to the reporting and analysis platform with little massaging.

Enter Netezza.  The usual “drip-feed” (i.e. as soon as it changes in the source, send it to and update the target) methods listed above don’t play as well.  Netezza is optimized for bulk data loading.  One-row-at-a-time INSERT/UPDATE/DELETE propagation by a replication technology isn’t efficient.  The solution is to create intermediate mini-bulk delimited files that can be ingested, say, every five minutes.

There are a couple of ways to create these files.  One is to use InfoSphere Data Event Publisher (EP).  Think of EP as Q Replication without the Apply component that posts the deltas to a target database.  EP can publish XML or delimited files.  Viola!  Our Netezza problem is solved.  If you need complex business rules applied to transform the data, use ICDC or Q Replication to feed DataStage and have it created the delimited files.

“Putting Data Where You Need It: The Options” @ IOD

Posted by Frank Fillmore on October 15, 2010 under DB2 Education, Information on Demand Conference, InfoSphere, Q-Replication. Tags: , , , , .

For those planning to attend the IBM Information On Demand Conference in Las Vegas beginning October 24 my colleague, Kim May, will be presenting a survey of IBM replication technologies.  Session TOD-2708 will be delivered on Wednesday, October 27 at 3:15 p.m. (local time) in the Breakers L room.  Check the conference schedule to confirm.

“IBM has three distinct replication technologies: SQL Replication, InfoSphere Replication Server (Q Replication) and InfoSphere Change Data Capture (ICDC) What are the strengths and weaknesses of each tool? What data sources and targets are supported by each product? Learn about customer use cases for each technology, from data warehousing to eCommerce to high-availability. Understand how to choose the right tool for your business.”