Forensic data analysis and extraction

Posted by Frank Fillmore on February 19, 2009 under Uncategorized. Tags: , , , , , .

I usually blog about what I learned in the past week that I had hoped to find someplace in a manual: for example, how to perform a specific technical task like changing a hostname on a DB2 server.  This post is a little different for me.

In the past couple of months we have been approached by three different customers – two of which are quite large – all with the same problem: we have data that we can’t access.  The three specific challenges were:

  1. An independant consultant had a DB2 Universal Database version 5 backup file.  He needed to review the data, but didn’t have the DB2 version 5 code (which hasn’t been supported or available for about a decade) required to restore the backup.
  2. A large financial services firm had a legacy CommonStore Content Manager e-mail respository which hadn’t worked for a while.  The firm had gone forward with different e-mail server and archiving technology, but for legal and regulatory reasons needed to occasionally access older e-mail on the non-functioning server.
  3. A large hospitality industry conglomerate de-commissioned a DB2 for OS/390 server.  The customer requested that the DB2 table data be retained (again, for legal and regulatory reasons), so the vendor gave them a bunch of 3490 tape cartridges containing DB2 Image Copy files.  Eventually the customer needed to actually look at the data, but didn’t have a 3490 cartridge reader.  Or DB2 for OS/390.  Or a mainframe computer.

The good news is that we’ve addressed two of the challenges and are close to completing the third.  The common thread is media, format, and software evolution and obsolesence.  Since The Fillmore Group has been working with IBM Information Management software products for over 22 years we have old software code, access to DB2 on a variety of server platforms and at various software versions, and experience with how all of this stuff actually worked way back when President Obama was still at Harvard.

This is a problem to which anyone (like me) who currently has music stored on casette tapes, vinyl albums, CDs, and an iPod can relate.  And it’s only going to get worse.  For thousands of years of civilization, the common API for storing and transmitting information was clay tablets and then papyrus.  With the exponential growth of the volume of data, the variety of storage media, and constant advance of software technology combined with the growing duration and complexity of records retention requirements, lot’s of enterprises are going to encounter a problem like the examples I’ve listed above.

There’s no simple solution.  It would be prohibitively expensive to routinely migrate *all* of an enterprise’s data assets to the latest version of any particular data repository (or format, or media).  So we’ll have to address the opportunities manually, one by one.

I’m interested if anyone else has had to solve a similar “forensic” data recovery or extraction puzzle.