Change Data Delivery Combines InfoSphere Change Data Capture (ICDC) and DataStage

Posted by Frank Fillmore on August 9, 2010 under DB2 for i, InfoSphere. Tags: , , , , , , , , .

InfoSphere Change Data Delivery combines legacy products from two different IBM acquisitions: Data Mirror Transformation Server (now InfoSphere Change Data Capture) and Ascential DataStage (now InfoSphere DataStage).  A recent customer engagement highlights the benefits - and a few challenges - of this hybrid replication/ETL (extract, transform, load) combination.

The B2B eCommerce business problem: a large, privately-held regional retailer wanted to gather data from each of their retail outlets into a single web presence.  This “bricks and clicks” approach would enable just-in-time delivery of their products from either a warehouse or the nearest store to the businesses that install the products for end-customers.  Each store had a small System i to handle local transactions.  But there are hundreds of stores.  Standard replication from each of the stores to a single website repository would be unwieldy at best.  Coupled with the astronomical licensing charges for replication software at each of the stores, direct replication was a technical and financial non-starter.

The IBM Change Data Delivery (CDD) offering makes local replication “free”.  InfoSphere Change Data Capture (ICDC) software can be installed on each of the (in this case) System i store transaction servers at no cost.  Data is replicated from journaled application tables/physical files to Consistent Change Data (CCD) tables on the same server.  Then DataStage reaches up periodically from a mid-tier server to gather the data from the CCD tables and populate the web application database - in this case MySQL.  Once the data is loaded into MySQL from a CCD table, the CCD table is pruned.

We employ ICDC Live Audit replication so that an application table INSERT, followed immediately by an UPDATE to the same row, followed by a DELETE of that row would record three distinct rows in the CCD tables.  Each would be identified by a timestamp and an “entity type” (e.g. INSERT, UPDATE) of the activity.  DataStage reads the Live Audit records and makes the appropriate change to the target MySQL database tables.

There are 17 application tables of interest: 10 on the warehouse System i and 7 on each of the store System i servers.  So there are 17 DataStage jobs.  Each DataStage job has four Stages or nodes:

  • the first ODBC stage that uses an SQL SELECT to read from a particular System i CCD table
  • a Transform stage to place each of the Live Audit row types (i.e. INSERT, UPDATE, DELETE) on one of three links to a second ODBC stage
  • the second ODBC stage with the appropriate MySQL Data Manipulation Language (DML) statement (i.e. INSERT, UPDATE, DELETE) for each link
  • a third and final ODBC stage branching from the Transform stage to prune the CCD table

All of the DataStage jobs have exactly the same structure.  They had to be exceedingly efficient because we are executing them so frequently (~ one per second).

DataStage Job

DataStage Job

The entity types in the ICDC Live Audit CCD tables are as follows:

  • RR - an INSERT record caused by the refresh of a table in a subscription
  • PT - a regular application INSERT
  • PX - an extra INSERT (found only on the System i)
  • UB - before image of an UPDATE
  • UP - after image of an UPDATE
  • DL - a DELETE

There are also two Live Audit control records: RS (which signals a full refresh) and CR (which is equivalent to a TRUNCATE) which I was able to ignore.

The “secret sauce” is the SQL SELECT statement used to gather the data from each CCD table on the System i servers.  Read More…

IBM zEnterprise Has Benefits on Many Levels

Posted by Frank Fillmore on July 23, 2010 under DB2 for z/OS, InfoSphere, Optim, Q-Replication. Tags: , , , , , .

I attended IBM’s zEnterprise announcement in New York yesterday.  IBM Senior VP Steve Mills said it was the most important announcement IBM had ever made in its impact on saving customers money.  He also said IBM spent US$1.5 billion dollars on the zEnterprise research and development effort over the past several years.  So, as movie reviewers ask about the latest blockbuster: “Can you see the money up on the screen?”.  The answer comes in a few loosely coupled parts.

  1. Oskar Schindler said you must have a “clever accountant”.  Mills made it clear that organizations that can accurately allocate their IT expenses will see the most benefit from zEnterprise.  zEnterprise delivers System z quality of services (QOS) across heterogeneous architectures: the aforementioned System z as well as Power 7 blade servers and (eventually) System x blades.  The problem for most organizations is that System z “mainframe” costs have been capitalized from central IT budgets for over four decades.  As the PC revolution unfolded since the early 1980’s, most of the costs for networking, systems administrator salaries, PCs themselves and the software they run have been expensed out of departmental budgets.  Organizations with the discipline to accurately accumulate these costs certainly will be able to see the benefit of deploying the zEnterprise platform.  Interestingly, the table talk at lunch indicated that some IBMers see the sweet-spot for zEnterprise in the rapidly growing economies of China and Russia.  The reason?  Tight budgetary control and hierarchical, centralized decision-making in state and quasi-state enterprises (think Gazprom) will help them “get it” immediately.  I would not be surprised to see zEnterprise adoption in emerging and growing economies exceed that of North America in the next two years.
  2. IBM has been able to run Linux on System z hardware using Virtual Servers and z/VM for a decade.  And the System z has been able to dispatch workloads to specialty engines within System z such as the Integrated Facility for Linux, zIIP, and zAAP for years.  Think of zEnterprise as extending that dispatching capability out of the physical System z box to discrete blade servers.  IBM’s goal is to move away from the “you can do everything on System z” posture - which in reality was a losing, rear-guard action - to embracing disparate architectures and acknowledging that maybe a print server really should run under Linux on an x86 platform.  Yet you can benefit from the centralized management and security of the System z.  This is workload integration at the chip, firmware, hypervisor, and middleware levels.  A pretty neat trick.
  3. So what can I do with the zEnterprise?  Here are two relatively simple scenarios.

The Online Travel Portal  One well-known travel reservation site front-ends their expensive Oracle transaction servers with MySQL running on cheap x86 hardware.  While you’re noodling around trying to figure out the best intinerary, all you’re seeing is data replicated from Oracle to MySQL on a near-real-time basis.  When you enter your credit card number and hit “Purchase”, you’re routed to the Oracle OLTP server.  This is called “database tiering”.  I can now architect the same topology on System z with DB2 for z/OS on the back-end and x86 blades running the DB2 Express-C freeware database.  On the zEnterprise platform, these databases will communicate over a 10Gb private, secure network with extraordinarily low latency.  Ever get the “That seat is no longer available” message?  It might be a thing of the past with zEnterprise.

The Hospitality Company  This organization runs their centralized reservation systems on DB2 for z/OS already.  In order to support their frequent-guest affinity program portal, they have WebSphere Application Server running on a separate System p AIX server.  The only problem is that sometimes transactions hang to the point that the JVMs have to be recycled.  The Java programmers say their code is tightly written and the DB2 for z/OS database administrations say that the incoming SQL requests are satisfied sub-second.  While zEnterprise alone would not resolve this problem - see pureQuery and the lyrically named Optim Performance Manager Extended Edition - the application and the database servers will be as tightly coupled as possible while each runs on the optimal platform.  Since the transfer points and the servers themselves are under unified management, an entire layer of complexity (and potential breakage) will be eliminated.

The real buzz in the announcement for me is the IBM Smart Analytics Optimizer (ISAO).  For a generation as a DB2 database administrator, I’ve told my clients that OLTP and ad hoc query workloads should not be intermingled.  The solution has been to make copies of the data using replication technologies - InfoSphere Change Data Capture and Q Replication among them.  This approach has been a boon to DBAs, and software, storage, and server salesmen everywhere.  When it achieves its full promise, ISAO will evaluate incoming database requests and dispatch them along with the data needed to satisfy the request to the appropriate platform server.  DB2 for z/OS will serve as a centralized front-end for all workloads: OLTP, OLAP, ad hoc query, etc.  ISAO will transparently run the workload on the optimal platform and return the result set to the requesting application.  Organizations will be able to dismantle the miasma of extracts, FTPs, and other artifices now necessary to keep analytic workloads from bogging down OLTP.  And they’ll reduce complexity.  And save a ton of cash.

So on whose door will IBM knock first?  Clearly the System z installed base will be getting lots of attention.  But could Facebook or some other enterprise with orders-of-magnitude scaling issues (500 million Facebook users and counting) benefit from zEnterprise?  Surprisingly, the answer is Yes!  Facebook needs to manage lots of unstructured data (pictures, videos, et al) , but they also have the need for complex analytics.  First, to target online advertising ever more precisely, but also to serve larger societal needs.  Let’s say a man declares he needs a reduction in child support because he’s nearly broke.  The local social services agency unleases a smart agent to run against social networking sites and comes up with pictures on Facebook from the man’s recent two-week vacation in Hawaii.  Too big brother-ish?  A topic for another day.

Summer Services Special

Posted by Kim May on July 21, 2010 under DB2 Connect, DB2 for Linux Unix Windows, DB2 for VSE&VM, DB2 for i, DB2 for z/OS, IBM DB2 Services, InfoSphere, Optim, Q-Replication, SQL Tuning. Tags: , , , , , , , , , , , .

As much as I dislike the relentless repetition that’s part of the nature of the world of blogs and twitter and listserves and email blasts, here I go with a shameless pitch for a TFG special services offering I emailed to several DB2 users earlier today.  I am doing this because, at the end of the day, the rate disparity in today’s DB2 services market baffles me.  Are the ridiculously high rates being charged eroding product adoption?  I am afraid so, which is why we are offering a summer services special. 

Read More…

InfoSphere Data Architect - Help for ERWIN Users

Posted by Kim May on July 2, 2010 under DB2 Education, InfoSphere, Optim, Uncategorized. Tags: , , , .

While the IBM Optim teams are excited to see customers adopting InfoSphere Data Architect, everyone recognizes the journey from CA ERwin to IDA can be frustrating.   The Optim team has been working to develop more resource materials for ERwin users and recently emailed a new Tips & Tricks guide developed by Joe Cullen.  If you find this valuable please let us know, and we will let you know when the new materials are released.  Thanks to Joe Cullen for offering to share this with the community!

IDA Tips&Tricks by Joe Cullen

InfoSphere Change Data Capture Luncheon May 6th

Posted by Kim May on April 18, 2010 under Baltimore Washington DB2 Users Group, DB2 Connect, DB2 for Linux Unix Windows, DB2 for i, DB2 for z/OS, InfoSphere, Oracle. Tags: , , , , .

If you will be in Baltimore on May 6th please join us!

Do you know the cost of capturing a change in your production database systems?  Does it cost an extra $100,000 per month to capture that change - and then is the change moved to a data warehouse that’s left many users dissatisfied?  We have a solution, InfoSphere Change Data Capture.

The Fillmore Group is teaming with IBM for a *free* lunch session at Orioles Park at Camden Yards, from 11am to 1pm, where we will explain how InfoSphere Change Data Capture can do an intelligent capture on heterogeneous databases to:  1) create an audit trail detailing who did what and when, 2) replace inefficient staging of changes on your mainframe, 3) eliminate the overhead and instability of ad hoc queries, 4) remove the inefficiency of triggers and message queues, and 5) reduce mainframe costs by $1 million per year - reliably feed downstream ETL, MDM, or SOA applications.

If you are interested in attending, click here for more information.  See you at the Yard!

IBM Replication for Partitioned DB2 Databases

Posted by Frank Fillmore on March 19, 2010 under DB2 for Linux Unix Windows, DB2 for z/OS, InfoSphere, Q-Replication. Tags: , , .

IBMer David Tolleson does a terrific job of outlining the configuration options for SQL and Q Replication when the source or target DB2 database is partitioned.  This was formerly know as Data Partitioning Feature (DPF).  Today it is part of InfoSphere Warehouse.

Check out David’s presentation.

Dubuntu (DB2 + Ubuntu) Appliance

Posted by Frank Fillmore on March 7, 2010 under DB2 for Linux Unix Windows, IBM Smart Analytics System, InfoSphere. Tags: , , , , .

I once got in trouble with a woman I was dating because I bought her an “appliance” for Christmas.  It was a combination VCR/DVD player (this was a while ago) and she liked movies.  Go figure.  I’ve also gotten in trouble with my wife for *not* buying an appliance - in this case a new, energy efficient clothes dryer; she pays the electric bills.  So you would think that I would steer clear of appliances.  You would be wrong.

Database “appliances” are integrated packages of software including operating system, database software (e.g. DB2), and other useful features.  Sometimes these appliances are pre-configured and optimized on hardware servers as well.  Some prominent examples are IBM’s Smart Analytics System (ISAS) and Oracle’s Exadata.  ISAS is the follow-on to IBM’s Balanced Configuration Unit (BCU) which clearly needed a new, better name.  ISAS combines InfoSphere Warehouse with IBM server hardware and a back-end storage system that provide a complete out-of-the-box solution.  The systems are “balanced” in terms of configured machine resources (i.e. CPU, memory, SAN) based on projected data volumes so you don’t have to spend weeks trying to figure all of that out.  That job has been done for you.  You can begin achieving ROI shortly after powering on the unit.

Appliances don’t always include hardware, however.  Some flavors of ISAS are VMWare images that can be deployed on existing hardware.  This benefits organizations which have established hardware standards in their enterprises and don’t want to deviate.

Want to get started with an appliance (and not get in trouble with your “significant other”)?  Check out this link on Channel DB2.  Anil Mahadev has put together Dubuntu, an all-in-one VMWare image DB2 appliance built on the open source Linux operating system Ubuntu.  It’s all *free*.  This is not a trial.  Using open source application development tools and DB2 Express-C, Anil has put together quite a cool package.

You can download Dubuntu at http://www.idug.org/anil-dubuntu/dubuntu-appliance.html from the IDUG website.  This is perfect for small - even one-person - development shops that can’t invest in expensive up-front software licensing.  If you are a DB2 for z/OS specialist and want to expand your skills to the distributed platforms, here’s a terrific tool.

With apologies to Cole Porter: “Do do that Dubuntu that you do so well.”

Take Me Out to the Ball*park*

Posted by Kim May on March 4, 2010 under DB2 Education, InfoSphere, Optim. Tags: , , , , , .

The countdown to the first day of Spring is below 20!  Spring can’t come soon enough for me, as here in Baltimore we’ve had over 80 inches of snow - our average is 18.  In preparation for Spring I am coordinating events with the local IBM teams to introduce customers to two great technologies we hope everyone will consider implementing in 2010:  InfoSphere Change Data Capture (CDC) and Optim Data Growth Solutions. 

Both events are planned for Baltimore’s beautiful Camden Yards, home of the Baltimore Orioles.  The tentative date for the CDC event is Thursday, May 6th, with a PoT scheduled two weeks later on May 20th.  I am working on getting the Optim dates on the calendar.  If you will be in the Baltimore/Washington area please try to join us - both topics should be of value to organizations looking for ways to better control their data.  CDC’s niche is in quickly moving data to where it’s most effective, while Optim Data Growth Solutions can help any organization with a packrat in the IT Department.  You know who you are.

Hope to see you there!

Q Replication Dashboard - v9.7.1

Posted by Frank Fillmore on February 8, 2010 under InfoSphere, Q-Replication. Tags: .

Be sure to upgrade to the latest version of the Q Replication Dashboard v9.7.1.  More flexibility is provided - including monitoring Oracle Q Replication sources.  Check the link for all of the details.

DRDA Performance for Q Replication ASNTDIFF Utility on DB2 for z/OS

Posted by Frank Fillmore on February 8, 2010 under DB2 for z/OS, InfoSphere, Q-Replication, SQL Tuning. Tags: , .

As you know, I work with IBM’s Q Replication technology - a lot.  Q Replication functionality is delivered in InfoSphere Replication Server.  The challenges are amplified when working on DB2 for z/OS with *really* large tables.  One financial institution at which I am working has tables with over 1 billion rows and hundreds of partitions.  Of course, DB2 for z/OS can manage tables of that size, but what about the tooling?

Q Replication comes with a utility called ASNTDIFF.  ASNTDIFF compares a checksum of rows in the source and target tables being replicated to validate that there are no discrepancies.  Challenge #1 is that when replicating between DB2 for z/OS subsystems, ASNTDIFF runs under Unix Systems Services (USS) that provides the Unix APIs enabled in z/OS.  There are considerations for USS applications that will form the basis for another post.

Challenge #2 is that ASNTDIFF retrieves the rows from the remote system (Application Server or AS) using a three-part-name query across a Distributed Relational Database Architecture (DRDA) connection.  For example:

SELECT * FROM <location>.<schema>.<tablename>

where location is found in the DRDA Communications Database (CDB) portion of the DB2 for z/OS Application Requestor (AR) catalog tables.  You typically run the ASNTDIFF utility on the replication target DB2 for z/OS server.  That’s because the CDB has probably already been configured to support cursor-based loading of the target tables.  Why is this a challenge?  Well, a three-part-name query across a DRDA pipe against a 1 billion row table ran for about 18 hours.  Ouch!

So the basic problem is: how can I get three-part-name queries running across a DRDA connection between DB2 for z/OS subsystems to run faster?  I asked a couple of IBMers and Jim Pickel pointed me in the direction of exploiting OPTIMZE FOR n ROWS.  There’s a good explanation of this in “Limiting the number of DRDA network transmissions”.  Right now the ASNTDIFF utility adds OPTIMIZE for 1000 ROWS and FOR READ ONLY to every ASNTDIFF query.  We’re experiementing with the recommendations in the “Limiting…” document to see if we need to override this hardcoded parameter.  I’ll keep you posted as to our findings.