IBM InfoSphere Data Replication (IIDR)

Posted by Frank Fillmore on April 10, 2013 under DB2 for i, DB2 for Linux Unix Windows, DB2 for z/OS, IBM Information Management Software Sales, Information on Demand Conference, InfoSphere, Q-Replication. Tags: , , .

IBM has bundled its replication technology into a single package: IBM InfoSphere Data Replication (IIDR).  IIDR combines three components:

  • SQL Replication (heritage DataPropagator- Relational)
    • Easy to set up
    • Staging tables
  • InfoSphere Replication Server (IRS – heritage Q Replication)
    • High volume, low latency
    • Native Oracle and DB2 sources and targets
    • WebSphere MQ transport layer
  • InfoSphere Change Data Capture (ICDC – heritage DataMirror)
    • Broadest set of heterogeneous sources and targets
    • TCP/IP transport layer

There are a few of important take-aways.

  1. IBM substituted an implementation decision for a buying decision.  For quite a while SQL Replication was bundled free with DB2 for LUW.  ICDC and IRS were separately purchasable technologies with a lot of functional overlap.  In 2010 my colleague, Kim May, delivered an IBM Information on Demand (IOD) presentation distinguishing between the three.  Now you purchase the IIDR bundle and determine which technology is best suited for a particular use case.  As with most IBM software on distributed platforms, the cost is based on Processor Value Units (PVUs).
  2. New feature/functionality will be built into IIDR rather than the former heritage ICDC and IRS packaging.
  3. There will be a convergence of the technologies over time.  Many of the prospective changes are still IBM Confidential, but IBM is looking to consolidate components where it makes sense to do so.  There is a long-term roadmap that I hope IBM be sharing shortly.
  4. Upgrading and migration paths are a work-in-progress.  If you currently own ICDC or IRS and want to move up to IIDR, contact The Fillmore Group for pricing and implementation assistance.

Flipboard #BigData Magazine @ http://flip.it/JPdgy

Posted by Frank Fillmore on March 28, 2013 under Big Data. Tags: .

Flipboard is an online content and social network aggregation app for Apple iPad, iPhone and Google Android devices.  It’s a smart, interactive Reader’s Digest for the Internet age.  Flipboard v2.0, just released, allows you to be DeWitt (or Lila) Wallace and create your own “magazine” of Tweets, Facebook postings, Instagrams, and content from providers like TechCrunch and the Wall Street Journal.  Think of it as a personal content “playlist” around a particular theme.

My BigData magazine can be found here.  Become a reader and peruse the content I have collected from around the web.  I’ll be adding new content every day or so.

Career Advice for a Friend #BigData

Posted by Frank Fillmore on March 25, 2013 under Big Data, IBM Information Management Software Sales. Tags: , .

My friend and (épée) fencing teammate, Alan, e-mailed me the following question: “I’m in the middle of a Master’s degree program (Information Systems).  I’m about half-way through, and I’m stuck on picking a concentration.  Specifically, I’m stuck between Data Warehousing and Business Intelligence (DW & BI) and Information Security (InfoSec).”

Here is my reply:

First, it’s difficult to predict future demand, but InfoSec is a growing, evolving field.  We work with IBM appliances called Guardium (which provides trusted user auditing) and Identity Insight (“who is who; who knows who”).

The data warehouse space is, I believe, being overtaken by “Big Data”.  The classic DW development cycle from data modeling and star schema design to Extract/Transform/Load (ETL) to dashboards and Key Performance Indicators (KPIs) is time-consuming and expensive.  It might be replaced in many instances by Hadoop (MapReduce).

There is, of course, room and requirements for both development models, but in 2013 I would want to build a career around InfoSec or BigData rather than the maturing DataWarehouse paradigm.

How to Handle the “V” for Volume in Big Data on DB2 for z/OS

Posted by Frank Fillmore on March 14, 2013 under Big Data, DB2 for z/OS, IDAA, Netezza. Tags: , .

One of the challenges of Big Data is, well, its bigness.  There are two approaches you should consider – one fairly new, the other not yet delivered – for your data volume issues on DB2 for z/OS.

The first technique exploits the IBM DB2 Analytics Accelerator (IDAA).  IDAA is a Netezza (now PureData System for Analytics) appliance lashed together with DB2 for z/OS using a high-speed connection.  The same data typically resides in both DB2 and Netezza.  The DB2 optimizer determines if a particular dynamic query can be addressed at a lower cost by offloading that query to Netezza.

Until November 2012, all of the data needed to be present on both Netezza and DB2.  In IDAA version 3, that restriction has been removed.  The attached presentation describes offloading older data to the Netezza and eliminating it from DB2.  In a common use case the most current data (say, the last 30 days) remains in DB2, but older data (days 31 to the preceding 2 years) reside *only* in Netezza.  This is called the High Performance Storage Saver (HPSS).  IDAA HPSS

Based on the cost of mainframe DASD storage and other factors, it might be less expensive to store stale data in a “hot” back-end like the Netezza part of an IDAA than to continue to store that data natively in DB2 for z/OS.  From an API standpoint, you run your queries as if all of the data resides locally in DB2 for z/OS.  That way you can change which data reside in each of the IDAA components with out impacting applications or SQL changes.

The second technique exploits temporal data functionality delivered in DB2 10 for z/OS.  Temporal tables enable you to see values for attributes of a particular business object (account, part, employee, customer, etc.) at a specific point-in-time (PIT).  As SQL UPDATEs are processed against a transaction table, deltas are recorded in a history table associated with the transaction table.  That way you can issue a query like:

SELECT coverage_amt FROM policy FOR SYSTEM_TIME AS OF '12-01-2010' WHERE id = 111

and see the coverage_amt at a point-in-time regardless of any intervening changes.  DB2 11 for z/OS – not yet Generally Available (GA), but in beta with Early Support Program (ESP) customers – builds on this technology by extending it to “archive” data.  The use case is the same as the one described above: for processing efficiency you want the last 30 days of transaction data to reside in one DB2 table and older data (days 31 to the preceding 2 years) to be stored in another.  Think insurance claims: most activity regarding a claim occurs within the first 30 days of a loss.  But for business and compliance reasons, you want to retain up to 2 years of data on spinning disks; even older data might be kept on offline storage.  Using this technique, all of the data resides natively in two different DB2 for z/OS tables (current and archive).  The benefit this time is segregating commonly accessed data for processing efficiency.  As was the case with IDAA/HPSS, the location of the data is transparent to the SQL.  You write the query for the data you want and DB2 determines whether it resides in the current data table, archive table, or both.

In summary

 

Available

Data segregated by

Data movement

Administrative overhead

Static SQL

Value prop

IDAA/HPSS

Today

DB2 and IDAA

Bulk load or replication

Minimal

No

Cost savings

DB2 11 for z/OS

GA

Current and Archive tables

Via DELETE or REORG DISCARD

None

Yes

Processing efficiency

What Your IBM Competitive Sales Specialist Wants You To Know (Recording and Presentation Material)

Posted by Frank Fillmore on March 7, 2013 under DB2 Education, DB2 for Linux Unix Windows, DB2 Migrations, IBM DB2 Services, IBM Information Management Software Sales, IBM Mid Market Customers, InfoSphere, Optim, Oracle. Tags: .

On February 28, 2013 Kim May of The Fillmore Group and Bill Kincaid of IBM collaborated on an excellent DB2 for LUW presentation detailing:

  • administrative tool functionality
  • new features
  • new product bundles that deliver better value at a lower cost

Many thanks to all of those that attended.  If you or a colleague weren’t able to join us, the recording of the webinar can be found on ChannelDB2.  The presentation materials are found here:

Mid Atlantic VUG Presentation Bill Kincaid 2.28.13

Kim May named IBM Champion for 2013

Posted by Frank Fillmore on February 27, 2013 under Baltimore Washington DB2 Users Group, DB2 Education, IBM Champion, InfoSphere. Tags: .

The Fillmore Group’s Kim May has been named an IBM Champion for the 5th straight year.  Click here for Kim’s developWorks profile.  Kim has been especially active in the IBM Information Management (IM) Regional Users Group (RUG) community.

Kim has recently collaborated with IBM Education to deliver the first public Big Data classes on BigInsights and InfoSphere Streams.  These will be appearing on TFG’s 2H13 training schedule.

Hearty, well-earned congratulations to Kim!

Parallel asntdiff Dramatically Improves Performance for Q Rep on z/OS

Posted by Frank Fillmore on February 11, 2013 under DB2 for z/OS, InfoSphere, Q-Replication. Tags: , , .

The asntdiff utility is used to compare the contents of two tables, primarily in the context of SQL and Q Replication.  The problem has been that very large tables can take a very long time to compare.  Moreover, in a dynamic replication environment, the data you are comparing might rapidly be changing.  There were attempts at fixing this problem, primarily by using the asntdiff -f parameter to provide a smaller range of rows to be compared.  Using the -f input file, SQL queries subset the data on a key value (e.g. account number) or transaction timestamp.  The bottom line is that many customers abandoned using asntdiff against production tables with millions or billions of rows.

Parallel asntdiff provides orders-of-magnitude improvements in performance.  The key differences between the processing of the legacy asntdiff and the parallel version are as follows:

  • Parallel processing: partitioning, difference detection, and reporting all in parallel
  • Compare multiple partitions in parallel (source and target table) and retrieve rows from each partition also in parallel from source and target
  • Minimize traffic (only send data needed)
  • Exploit computational capabilities of systems (checksum calculations, etc.)
  • Use a Stored Procedure (ASNTDSP) which runs at source and target systems
  • Avoid storing intermediate results

The results are dramatic.  A European bank killed a legacy asntdiff compare utility run after 17 hours; it might have run for 30.  Parallel asntdiff completed in 5 minutes.

There are some restrictions and advisories:

  1. The source and target tables must have the same code page and collation sequences.
  2. For optimal performance, increase the system resource limit for application threads and set NUMTHREADS to 21.  The minimal requirement is 6 threads.
  3. Since asntdiff can determine the parameter NUMBLOCKS automatically, grant the SELECT privilege on the following catalog tables: SYSIBM.SYSTABLES, SYSIBM.SYSKEYS and SYSIBM.SYSINDEXES.

There are others, but the topic is well worth exploring to speed up asntdiff – or to be able to use it at all.

Thanks to Jayanti Mahapatra and Xiao Li of IBM’s Silicon Valley Lab for some of the material presented here.

“Updating Netezza with Data from Q Replication”

Posted by Frank Fillmore on January 2, 2013 under DB2 for Linux Unix Windows, DB2 for z/OS, IDAA, InfoSphere, Netezza, Q-Replication.

In cooperation with IBM, The Fillmore Group performed a proof-of-concept for a large, well-known midwestern retailer in 2011 to demonstrate the use of IBM InfoSphere Data Replication (IIDR) Event Publisher (EP) feature to feed OLTP data into a Netezza appliance.  There is now a native changed data capture Apply component for Netezza built on technology IBM acquired from Data Mirror.  This provides for heterogeneous near-real-time OLTP replication to a Netezza appliance.  However – and this is a significant “however” – your shop might already be using Q Replication.  If you are replicating from a DB2 OLTP database, does it make sense to introduce a different replication technology just to keep your Netezza appliance current?

The answer is found in the developerworks post “Updating Netezza with Data from Q Replication“.  Built on the template developed by The Fillmore Group, IBM has published a java consumer that will provide mini-batches of comma delimited data files created by IIDR Event Publisher for ingestion into Netezza using native Netezza utilities.  Note that this approach works for both the new PureData System for Analytics as well as IBM DB2 Analytics Accelerator (IDAA) which are built on Netezza technology.

Kudos to IBMer Cecile Madsen of Silicon Valley Lab for her work on this technique.

DB2 Gold Consultant Perspectives on Large Table Design in DB2 10 for z/OS

Posted by Frank Fillmore on December 30, 2012 under DB2 for z/OS, DB2 Gold Consultants. Tags: .

I have just begun a new engagement working with a large, integrated financial services firm on a project migrating IMS and VSAM data to DB2 10 for z/OS.  There had been a lot of discussion regarding whether to break a multi-billion row set of historical data into separate physical tables using schemas (e.g. SCHEMA1.MY_TAB, SCHEMA2.MY_TAB,… ) with each table containing a portion of the data *or* use Range Partitioning.  I reached out to my DB2 Gold Consultant colleagues for their opinions and received four detailed replies.

Martin Hubel, DB2 Gold Consultant:
Use range partitioning.  Almost all of the objections and disadvantages to range partitioning have been removed starting in DB2 V4, and continue to be removed in later versions of DB2 up to including V10.

The use of separate tables is cumbersome, complex, and much harder to maintain.  It is typically people with long memories and short experience who recommend this.

Joel Goldstein, DB2 Gold Consultant:
My first question is – have they re-designed for DB2 or trying to port the data to DB2?  If their IMS systems were decently designed, simply running the application under DB2 will cost at least 2x the CPU based on my experiences, and maybe more.

From a performance perspective, (excluding how the applications must actually access and process the data), range partitioning should give the best performance – but the info provided doesn’t give us any idea (i.e. under IMS is most of their access using HIDAM or HDAM?).

Physically separating the tables into subsets by schema and using an application algorithm to determine which tables to access might be a good option, if this can point to ONE table to access.  If multiple tables, performance dies.

An updateable view with many UNIONs for DML that span subsets—  performance disaster for an OLTP application.

Jürgen Glag, DB2 Gold Consultant:
First design rule: keep the design as simple=elegant as possible.  I would never start with the alternative with separate physical tables

Which benefits does the client expect from the solution with separate tables over the partitioned table?  Why does he think that his homegrown solution which is not tested for robustness will be better than a product (e.g. DB2 for z/OS range partitioning)?

William Miller, DB2 Gold Consultant
In my experience, separating tables into subschemas is NOT the best solution.  I saw this used at the Commerzbank.  While it was a good solution for the problem they had back in the early days when DB2 had limits on the size of the table, it is not a good solution now.  There is a lot of unnecessary overhead involved.  When I query this table with this date, what underlying table do I need to access?  And what happens when I join two tables, where do I find the correct data?  These are all questions that have to be answered by fallible human coding.

Today’s technology allow for truly massive numbers of rows in a single table.  Most often we can divide data into ‘active’ and ‘historical’, the active bit containing open contracts, the historical contains closed contracts.  Due to government regulation you can’t throw away data that is no longer active.  But large amounts of historical data make queries slower (more index levels, etc). 

A truly better solution is the v10 functionality of temporal tables.  I used this technique at the Deutsche Bundesbank.  Basically, you have two identical tables that can be accessed by SQL.  Data that is no longer current automagically wanders from the active table to the historical table.  SQL can access all data, or a range of values.  There is no need for application logic to move active data to a history table.  The active data and the historical data can be indexed differently.  So you have a lot of flexibility here.

The problem with this solution is that it is not very adaptable to current applications.  Every company has its own way of historizing data.  And nobody is interested in throwing away a chunk of application logic and retesting everything anew.  It is impossible to get a company to rewrite a corner of the application that just works.  But from what I read, you are in a way starting from scratch.  You can design temporal data into your solution.

IBM IM Sellers: Partner to Win with The Fillmore Group in 4Q12

Posted by Frank Fillmore on November 28, 2012 under DB2 for Linux Unix Windows, DB2 Gold Consultants, DB2 Migrations, IBM Champion, IBM DB2 Services, IBM Information Management Software Sales, IBM Mid Market Customers, Information on Demand Conference, Oracle. Tags: , .

The fourth quarter of 2012 will close in just over one month.  What will you close between now and then?  If you attended the Break Free Forum at the IBM Information on Demand (IOD) conference in Las Vegas last month, you heard The Fillmore Group’s founder and principal, Frank Fillmore, give real-world perspectives on successful Oracle to DB2 migrations – including a project at a worldwide Top 50 integrated financial services firm.

Listen to a brief ten minute podcast discussion between Frank Fillmore and Kim May, TFG Vice President of Business Development, that provides an overview of a the steps involved in a successful migration – from intital validation to production cutover.

Oracle Takeout – Partner with TFG

If you have prospects that can benefit from these perspectives, contact Kim at kim.may@thefillmoregroup.com or (001) 443-956-0288.