DB2 Connect Virtual Briefing Replay

Posted by Frank Fillmore on August 24, 2010 under DB2 Connect, DB2 Education, DB2 for z/OS.

Thanks to all who attended the Optim Virtual Briefing on DB2 Connect.  If you were not one of the 250 attendees last Thursday, the replay and materials are available here.

For more in-depth training, consider the following IBM authorized training classes.

  • CF602 “DB2 Connect 9 to DB2 for z/OS DRDA Implementation with TCP/IP” December 6, 2010
  • CF632 “DB2 Connect 9 for DB2 for z/OS Problem Determination and Performance” September 20, 2010

Change Data Delivery Combines InfoSphere Change Data Capture (ICDC) and DataStage

Posted by Frank Fillmore on August 9, 2010 under DB2 for i, InfoSphere. Tags: , , , , , , , , .

InfoSphere Change Data Delivery combines legacy products from two different IBM acquisitions: Data Mirror Transformation Server (now InfoSphere Change Data Capture) and Ascential DataStage (now InfoSphere DataStage).  A recent customer engagement highlights the benefits - and a few challenges - of this hybrid replication/ETL (extract, transform, load) combination.

The B2B eCommerce business problem: a large, privately-held regional retailer wanted to gather data from each of their retail outlets into a single web presence.  This “bricks and clicks” approach would enable just-in-time delivery of their products from either a warehouse or the nearest store to the businesses that install the products for end-customers.  Each store had a small System i to handle local transactions.  But there are hundreds of stores.  Standard replication from each of the stores to a single website repository would be unwieldy at best.  Coupled with the astronomical licensing charges for replication software at each of the stores, direct replication was a technical and financial non-starter.

The IBM Change Data Delivery (CDD) offering makes local replication “free”.  InfoSphere Change Data Capture (ICDC) software can be installed on each of the (in this case) System i store transaction servers at no cost.  Data is replicated from journaled application tables/physical files to Consistent Change Data (CCD) tables on the same server.  Then DataStage reaches up periodically from a mid-tier server to gather the data from the CCD tables and populate the web application database - in this case MySQL.  Once the data is loaded into MySQL from a CCD table, the CCD table is pruned.

We employ ICDC Live Audit replication so that an application table INSERT, followed immediately by an UPDATE to the same row, followed by a DELETE of that row would record three distinct rows in the CCD tables.  Each would be identified by a timestamp and an “entity type” (e.g. INSERT, UPDATE) of the activity.  DataStage reads the Live Audit records and makes the appropriate change to the target MySQL database tables.

There are 17 application tables of interest: 10 on the warehouse System i and 7 on each of the store System i servers.  So there are 17 DataStage jobs.  Each DataStage job has four Stages or nodes:

  • the first ODBC stage that uses an SQL SELECT to read from a particular System i CCD table
  • a Transform stage to place each of the Live Audit row types (i.e. INSERT, UPDATE, DELETE) on one of three links to a second ODBC stage
  • the second ODBC stage with the appropriate MySQL Data Manipulation Language (DML) statement (i.e. INSERT, UPDATE, DELETE) for each link
  • a third and final ODBC stage branching from the Transform stage to prune the CCD table

All of the DataStage jobs have exactly the same structure.  They had to be exceedingly efficient because we are executing them so frequently (~ one per second).

DataStage Job

DataStage Job

The entity types in the ICDC Live Audit CCD tables are as follows:

  • RR - an INSERT record caused by the refresh of a table in a subscription
  • PT - a regular application INSERT
  • PX - an extra INSERT (found only on the System i)
  • UB - before image of an UPDATE
  • UP - after image of an UPDATE
  • DL - a DELETE

There are also two Live Audit control records: RS (which signals a full refresh) and CR (which is equivalent to a TRUNCATE) which I was able to ignore.

The “secret sauce” is the SQL SELECT statement used to gather the data from each CCD table on the System i servers.  Read More…

IBM zEnterprise Has Benefits on Many Levels

Posted by Frank Fillmore on July 23, 2010 under DB2 for z/OS, InfoSphere, Optim, Q-Replication. Tags: , , , , , .

I attended IBM’s zEnterprise announcement in New York yesterday.  IBM Senior VP Steve Mills said it was the most important announcement IBM had ever made in its impact on saving customers money.  He also said IBM spent US$1.5 billion dollars on the zEnterprise research and development effort over the past several years.  So, as movie reviewers ask about the latest blockbuster: “Can you see the money up on the screen?”.  The answer comes in a few loosely coupled parts.

  1. Oskar Schindler said you must have a “clever accountant”.  Mills made it clear that organizations that can accurately allocate their IT expenses will see the most benefit from zEnterprise.  zEnterprise delivers System z quality of services (QOS) across heterogeneous architectures: the aforementioned System z as well as Power 7 blade servers and (eventually) System x blades.  The problem for most organizations is that System z “mainframe” costs have been capitalized from central IT budgets for over four decades.  As the PC revolution unfolded since the early 1980’s, most of the costs for networking, systems administrator salaries, PCs themselves and the software they run have been expensed out of departmental budgets.  Organizations with the discipline to accurately accumulate these costs certainly will be able to see the benefit of deploying the zEnterprise platform.  Interestingly, the table talk at lunch indicated that some IBMers see the sweet-spot for zEnterprise in the rapidly growing economies of China and Russia.  The reason?  Tight budgetary control and hierarchical, centralized decision-making in state and quasi-state enterprises (think Gazprom) will help them “get it” immediately.  I would not be surprised to see zEnterprise adoption in emerging and growing economies exceed that of North America in the next two years.
  2. IBM has been able to run Linux on System z hardware using Virtual Servers and z/VM for a decade.  And the System z has been able to dispatch workloads to specialty engines within System z such as the Integrated Facility for Linux, zIIP, and zAAP for years.  Think of zEnterprise as extending that dispatching capability out of the physical System z box to discrete blade servers.  IBM’s goal is to move away from the “you can do everything on System z” posture - which in reality was a losing, rear-guard action - to embracing disparate architectures and acknowledging that maybe a print server really should run under Linux on an x86 platform.  Yet you can benefit from the centralized management and security of the System z.  This is workload integration at the chip, firmware, hypervisor, and middleware levels.  A pretty neat trick.
  3. So what can I do with the zEnterprise?  Here are two relatively simple scenarios.

The Online Travel Portal  One well-known travel reservation site front-ends their expensive Oracle transaction servers with MySQL running on cheap x86 hardware.  While you’re noodling around trying to figure out the best intinerary, all you’re seeing is data replicated from Oracle to MySQL on a near-real-time basis.  When you enter your credit card number and hit “Purchase”, you’re routed to the Oracle OLTP server.  This is called “database tiering”.  I can now architect the same topology on System z with DB2 for z/OS on the back-end and x86 blades running the DB2 Express-C freeware database.  On the zEnterprise platform, these databases will communicate over a 10Gb private, secure network with extraordinarily low latency.  Ever get the “That seat is no longer available” message?  It might be a thing of the past with zEnterprise.

The Hospitality Company  This organization runs their centralized reservation systems on DB2 for z/OS already.  In order to support their frequent-guest affinity program portal, they have WebSphere Application Server running on a separate System p AIX server.  The only problem is that sometimes transactions hang to the point that the JVMs have to be recycled.  The Java programmers say their code is tightly written and the DB2 for z/OS database administrations say that the incoming SQL requests are satisfied sub-second.  While zEnterprise alone would not resolve this problem - see pureQuery and the lyrically named Optim Performance Manager Extended Edition - the application and the database servers will be as tightly coupled as possible while each runs on the optimal platform.  Since the transfer points and the servers themselves are under unified management, an entire layer of complexity (and potential breakage) will be eliminated.

The real buzz in the announcement for me is the IBM Smart Analytics Optimizer (ISAO).  For a generation as a DB2 database administrator, I’ve told my clients that OLTP and ad hoc query workloads should not be intermingled.  The solution has been to make copies of the data using replication technologies - InfoSphere Change Data Capture and Q Replication among them.  This approach has been a boon to DBAs, and software, storage, and server salesmen everywhere.  When it achieves its full promise, ISAO will evaluate incoming database requests and dispatch them along with the data needed to satisfy the request to the appropriate platform server.  DB2 for z/OS will serve as a centralized front-end for all workloads: OLTP, OLAP, ad hoc query, etc.  ISAO will transparently run the workload on the optimal platform and return the result set to the requesting application.  Organizations will be able to dismantle the miasma of extracts, FTPs, and other artifices now necessary to keep analytic workloads from bogging down OLTP.  And they’ll reduce complexity.  And save a ton of cash.

So on whose door will IBM knock first?  Clearly the System z installed base will be getting lots of attention.  But could Facebook or some other enterprise with orders-of-magnitude scaling issues (500 million Facebook users and counting) benefit from zEnterprise?  Surprisingly, the answer is Yes!  Facebook needs to manage lots of unstructured data (pictures, videos, et al) , but they also have the need for complex analytics.  First, to target online advertising ever more precisely, but also to serve larger societal needs.  Let’s say a man declares he needs a reduction in child support because he’s nearly broke.  The local social services agency unleases a smart agent to run against social networking sites and comes up with pictures on Facebook from the man’s recent two-week vacation in Hawaii.  Too big brother-ish?  A topic for another day.

Information Champion at Information On Demand Conference

Posted by Frank Fillmore on May 6, 2010 under Authorized Training Partner, Baltimore Washington DB2 Users Group, DB2 Education, Information on Demand Conference. Tags: .

My colleague, Kim May, was interviewed at last year’s IBM Information On Demand Conference.  If “brevity is the soul of wit”, Kim’s summary of The Fillmore Group’s philosophy and the value of the IOD conference is soulfully witty indeed!

Kudos to Kim.

DB2 Connect Redux

Posted by Frank Fillmore on March 29, 2010 under Authorized Training Partner, DB2 Connect, DB2 Education, DB2 for VSE&VM, DB2 for i, DB2 for z/OS. Tags: , .

I’ve been seeing a lot of an old friend lately: DB2 Connect.  This is the software which enables Distributed Relational Database Architecture (DRDA) Application Requestor (AR) or “client” applications running on Linux, Unix, and Windows platforms to access DB2 ”host” databases on System z and System i.  More precisely, DB2 Connect accesses an Application Server (AS) which can be DB2 for z/OS, DB2 Server for VSE&VM, or DB2 for i (Wait, shouldn’t that be DB2 for “me”?)

What we know today as DB2 Connect started out almost 20 years ago as Distributed Database Connection Services (DDCS).  Then it was amazing to access host DB2 data using a spreadsheet.  Of course it took an experienced consultant a few weeks to install and configure DDCS and it’s supporting software which relied on the Systems Network Architecture (SNA) communications protocol.  It was so difficult, The Fillmore Group had a services offering called “Client/Server in 10 Days” where we guaranteed to get DDCS working in your shop in two weeks.  Now you can install and configure DB2 Connect using TCP/IP in about an hour.

So what does DB2 Connect actually do?  At its most basic, DB2 Connect is a protocol converter that changes the character coding on the host (Extended Binary Coded Decimal Interchange Code or EBCDIC) into the equivalent used on Linux, Unix, and Windows platforms (American Standard Code for Information Interchange or ASCII).  It does lots of other things and can be a source of lots of troubleshooting and performance tuning, but EBCDIC to ASCII translation is a core function.

So what’s happening lately?  First, after a few years of no publicly available training in DB2 Connect, The Fillmore Group is holding classes: CF602 “DB2 Connect 9 to DB2 for z/OS DRDA Implementation with TCP/IP” and CF632 “DB2 Connect 9 for DB2 for z/OS Problem Determination and Performance”.  The CF602 beginning June 21 is actually nearing capacity, so if this is something you need, sign up to attend now.  CF632 starts May 17.

As a bonus, I’ve attached a DB2 Connect configuration worksheet.  If you know what it all means and can use it right away, great!  Have at it.  If not, we’ll see you in class.

Second, on the consulting side of the house we’re seeing lots of DB2 Connect activity - especially with DB2 for i.  Having a relational database directly integrated into the i5/OS operating system, System i always has been a bit of a land unto itself.  Now we’re seeing lots of activity integrating System i data into eCommerce and Business Intelligence applications.  If this is what you’re doing, let us know how we can help.

IBM Replication for Partitioned DB2 Databases

Posted by Frank Fillmore on March 19, 2010 under DB2 for Linux Unix Windows, DB2 for z/OS, InfoSphere, Q-Replication. Tags: , , .

IBMer David Tolleson does a terrific job of outlining the configuration options for SQL and Q Replication when the source or target DB2 database is partitioned.  This was formerly know as Data Partitioning Feature (DPF).  Today it is part of InfoSphere Warehouse.

Check out David’s presentation.

Dubuntu (DB2 + Ubuntu) Appliance

Posted by Frank Fillmore on March 7, 2010 under DB2 for Linux Unix Windows, IBM Smart Analytics System, InfoSphere. Tags: , , , , .

I once got in trouble with a woman I was dating because I bought her an “appliance” for Christmas.  It was a combination VCR/DVD player (this was a while ago) and she liked movies.  Go figure.  I’ve also gotten in trouble with my wife for *not* buying an appliance - in this case a new, energy efficient clothes dryer; she pays the electric bills.  So you would think that I would steer clear of appliances.  You would be wrong.

Database “appliances” are integrated packages of software including operating system, database software (e.g. DB2), and other useful features.  Sometimes these appliances are pre-configured and optimized on hardware servers as well.  Some prominent examples are IBM’s Smart Analytics System (ISAS) and Oracle’s Exadata.  ISAS is the follow-on to IBM’s Balanced Configuration Unit (BCU) which clearly needed a new, better name.  ISAS combines InfoSphere Warehouse with IBM server hardware and a back-end storage system that provide a complete out-of-the-box solution.  The systems are “balanced” in terms of configured machine resources (i.e. CPU, memory, SAN) based on projected data volumes so you don’t have to spend weeks trying to figure all of that out.  That job has been done for you.  You can begin achieving ROI shortly after powering on the unit.

Appliances don’t always include hardware, however.  Some flavors of ISAS are VMWare images that can be deployed on existing hardware.  This benefits organizations which have established hardware standards in their enterprises and don’t want to deviate.

Want to get started with an appliance (and not get in trouble with your “significant other”)?  Check out this link on Channel DB2.  Anil Mahadev has put together Dubuntu, an all-in-one VMWare image DB2 appliance built on the open source Linux operating system Ubuntu.  It’s all *free*.  This is not a trial.  Using open source application development tools and DB2 Express-C, Anil has put together quite a cool package.

You can download Dubuntu at http://www.idug.org/anil-dubuntu/dubuntu-appliance.html from the IDUG website.  This is perfect for small - even one-person - development shops that can’t invest in expensive up-front software licensing.  If you are a DB2 for z/OS specialist and want to expand your skills to the distributed platforms, here’s a terrific tool.

With apologies to Cole Porter: “Do do that Dubuntu that you do so well.”

Q Replication Dashboard - v9.7.1

Posted by Frank Fillmore on February 8, 2010 under InfoSphere, Q-Replication. Tags: .

Be sure to upgrade to the latest version of the Q Replication Dashboard v9.7.1.  More flexibility is provided - including monitoring Oracle Q Replication sources.  Check the link for all of the details.

DRDA Performance for Q Replication ASNTDIFF Utility on DB2 for z/OS

Posted by Frank Fillmore on February 8, 2010 under DB2 for z/OS, InfoSphere, Q-Replication, SQL Tuning. Tags: , .

As you know, I work with IBM’s Q Replication technology - a lot.  Q Replication functionality is delivered in InfoSphere Replication Server.  The challenges are amplified when working on DB2 for z/OS with *really* large tables.  One financial institution at which I am working has tables with over 1 billion rows and hundreds of partitions.  Of course, DB2 for z/OS can manage tables of that size, but what about the tooling?

Q Replication comes with a utility called ASNTDIFF.  ASNTDIFF compares a checksum of rows in the source and target tables being replicated to validate that there are no discrepancies.  Challenge #1 is that when replicating between DB2 for z/OS subsystems, ASNTDIFF runs under Unix Systems Services (USS) that provides the Unix APIs enabled in z/OS.  There are considerations for USS applications that will form the basis for another post.

Challenge #2 is that ASNTDIFF retrieves the rows from the remote system (Application Server or AS) using a three-part-name query across a Distributed Relational Database Architecture (DRDA) connection.  For example:

SELECT * FROM <location>.<schema>.<tablename>

where location is found in the DRDA Communications Database (CDB) portion of the DB2 for z/OS Application Requestor (AR) catalog tables.  You typically run the ASNTDIFF utility on the replication target DB2 for z/OS server.  That’s because the CDB has probably already been configured to support cursor-based loading of the target tables.  Why is this a challenge?  Well, a three-part-name query across a DRDA pipe against a 1 billion row table ran for about 18 hours.  Ouch!

So the basic problem is: how can I get three-part-name queries running across a DRDA connection between DB2 for z/OS subsystems to run faster?  I asked a couple of IBMers and Jim Pickel pointed me in the direction of exploiting OPTIMZE FOR n ROWS.  There’s a good explanation of this in “Limiting the number of DRDA network transmissions”.  Right now the ASNTDIFF utility adds OPTIMIZE for 1000 ROWS and FOR READ ONLY to every ASNTDIFF query.  We’re experiementing with the recommendations in the “Limiting…” document to see if we need to override this hardcoded parameter.  I’ll keep you posted as to our findings.

Learn How to Become an IBM Information Champion

Posted by Frank Fillmore on October 28, 2009 under Uncategorized.

My colleague, Kim May, will be leading a birds-of-a-feather (BOF) session at IBM’s Information On Demand Conference today - ”Becoming an Information Champion for Dummies”.  The particulars are:

BOF-1631A; Wednesday, October 28, 6:00 p.m.; Tradewinds F

Hope to see you there.