How to Handle the “V” for Volume in Big Data on DB2 for z/OS

Posted by Frank Fillmore on March 14, 2013 under Big Data, DB2 for z/OS, IDAA, Netezza. Tags: , .

One of the challenges of Big Data is, well, its bigness.  There are two approaches you should consider – one fairly new, the other not yet delivered – for your data volume issues on DB2 for z/OS.

The first technique exploits the IBM DB2 Analytics Accelerator (IDAA).  IDAA is a Netezza (now PureData System for Analytics) appliance lashed together with DB2 for z/OS using a high-speed connection.  The same data typically resides in both DB2 and Netezza.  The DB2 optimizer determines if a particular dynamic query can be addressed at a lower cost by offloading that query to Netezza.

Until November 2012, all of the data needed to be present on both Netezza and DB2.  In IDAA version 3, that restriction has been removed.  The attached presentation describes offloading older data to the Netezza and eliminating it from DB2.  In a common use case the most current data (say, the last 30 days) remains in DB2, but older data (days 31 to the preceding 2 years) reside *only* in Netezza.  This is called the High Performance Storage Saver (HPSS).  IDAA HPSS

Based on the cost of mainframe DASD storage and other factors, it might be less expensive to store stale data in a “hot” back-end like the Netezza part of an IDAA than to continue to store that data natively in DB2 for z/OS.  From an API standpoint, you run your queries as if all of the data resides locally in DB2 for z/OS.  That way you can change which data reside in each of the IDAA components with out impacting applications or SQL changes.

The second technique exploits temporal data functionality delivered in DB2 10 for z/OS.  Temporal tables enable you to see values for attributes of a particular business object (account, part, employee, customer, etc.) at a specific point-in-time (PIT).  As SQL UPDATEs are processed against a transaction table, deltas are recorded in a history table associated with the transaction table.  That way you can issue a query like:

SELECT coverage_amt FROM policy FOR SYSTEM_TIME AS OF '12-01-2010' WHERE id = 111

and see the coverage_amt at a point-in-time regardless of any intervening changes.  DB2 11 for z/OS – not yet Generally Available (GA), but in beta with Early Support Program (ESP) customers – builds on this technology by extending it to “archive” data.  The use case is the same as the one described above: for processing efficiency you want the last 30 days of transaction data to reside in one DB2 table and older data (days 31 to the preceding 2 years) to be stored in another.  Think insurance claims: most activity regarding a claim occurs within the first 30 days of a loss.  But for business and compliance reasons, you want to retain up to 2 years of data on spinning disks; even older data might be kept on offline storage.  Using this technique, all of the data resides natively in two different DB2 for z/OS tables (current and archive).  The benefit this time is segregating commonly accessed data for processing efficiency.  As was the case with IDAA/HPSS, the location of the data is transparent to the SQL.  You write the query for the data you want and DB2 determines whether it resides in the current data table, archive table, or both.

In summary

 

Available

Data segregated by

Data movement

Administrative overhead

Static SQL

Value prop

IDAA/HPSS

Today

DB2 and IDAA

Bulk load or replication

Minimal

No

Cost savings

DB2 11 for z/OS

GA

Current and Archive tables

Via DELETE or REORG DISCARD

None

Yes

Processing efficiency