DB2 Gold Consultant Perspectives on Large Table Design in DB2 10 for z/OS

Posted by Frank Fillmore on December 30, 2012 under DB2 for z/OS, DB2 Gold Consultants. Tags: .

I have just begun a new engagement working with a large, integrated financial services firm on a project migrating IMS and VSAM data to DB2 10 for z/OS.  There had been a lot of discussion regarding whether to break a multi-billion row set of historical data into separate physical tables using schemas (e.g. SCHEMA1.MY_TAB, SCHEMA2.MY_TAB,… ) with each table containing a portion of the data *or* use Range Partitioning.  I reached out to my DB2 Gold Consultant colleagues for their opinions and received four detailed replies.

Martin Hubel, DB2 Gold Consultant:
Use range partitioning.  Almost all of the objections and disadvantages to range partitioning have been removed starting in DB2 V4, and continue to be removed in later versions of DB2 up to including V10.

The use of separate tables is cumbersome, complex, and much harder to maintain.  It is typically people with long memories and short experience who recommend this.

Joel Goldstein, DB2 Gold Consultant:
My first question is – have they re-designed for DB2 or trying to port the data to DB2?  If their IMS systems were decently designed, simply running the application under DB2 will cost at least 2x the CPU based on my experiences, and maybe more.

From a performance perspective, (excluding how the applications must actually access and process the data), range partitioning should give the best performance – but the info provided doesn’t give us any idea (i.e. under IMS is most of their access using HIDAM or HDAM?).

Physically separating the tables into subsets by schema and using an application algorithm to determine which tables to access might be a good option, if this can point to ONE table to access.  If multiple tables, performance dies.

An updateable view with many UNIONs for DML that span subsets—  performance disaster for an OLTP application.

Jürgen Glag, DB2 Gold Consultant:
First design rule: keep the design as simple=elegant as possible.  I would never start with the alternative with separate physical tables

Which benefits does the client expect from the solution with separate tables over the partitioned table?  Why does he think that his homegrown solution which is not tested for robustness will be better than a product (e.g. DB2 for z/OS range partitioning)?

William Miller, DB2 Gold Consultant
In my experience, separating tables into subschemas is NOT the best solution.  I saw this used at the Commerzbank.  While it was a good solution for the problem they had back in the early days when DB2 had limits on the size of the table, it is not a good solution now.  There is a lot of unnecessary overhead involved.  When I query this table with this date, what underlying table do I need to access?  And what happens when I join two tables, where do I find the correct data?  These are all questions that have to be answered by fallible human coding.

Today’s technology allow for truly massive numbers of rows in a single table.  Most often we can divide data into ‘active’ and ‘historical’, the active bit containing open contracts, the historical contains closed contracts.  Due to government regulation you can’t throw away data that is no longer active.  But large amounts of historical data make queries slower (more index levels, etc). 

A truly better solution is the v10 functionality of temporal tables.  I used this technique at the Deutsche Bundesbank.  Basically, you have two identical tables that can be accessed by SQL.  Data that is no longer current automagically wanders from the active table to the historical table.  SQL can access all data, or a range of values.  There is no need for application logic to move active data to a history table.  The active data and the historical data can be indexed differently.  So you have a lot of flexibility here.

The problem with this solution is that it is not very adaptable to current applications.  Every company has its own way of historizing data.  And nobody is interested in throwing away a chunk of application logic and retesting everything anew.  It is impossible to get a company to rewrite a corner of the application that just works.  But from what I read, you are in a way starting from scratch.  You can design temporal data into your solution.

2 Comments so far

  1. Joe Geller December 31, 2012 9:00 am

    Frank, the consensus is clearly to go with range partitioning, but there are some other fundamental questions that need to be asked:
    1) Will this be a straight port of the db design (each IMS segment becomes a DB2 table whose key is the concatenated keys of the hierarchy), or will there be an opportunity for redesign? One thing you don’t want to do is create system generated keys for each dependent table – the loss of clustering could lead to disastrous performance in comparison to the IMS system.
    2) They should certainly consider rewritting some of the code to take advantage of DB2’s set processing.
    3) If they are using HDAM, then HASH access (new with V10) could be a consideration for the table that was the root segment.

  2. Brian Lynch January 2, 2013 4:52 pm

    I am contemplating the re-design of a 200 partition table (currently writing to partition 176) with 700 million rows, 38 columns, and 7 NPIs. The objective is to archive data but keep it available to the operational data store. Thanks for sharing this discussion. I appreciate it.
    Brian Lynch

Leave a Comment