Dr. Sheldon Cooper of “The Big Bang Theory” regularly records a YouTube video called “Fun with Flags” so I’m going to call this blog post “Fun with SQL”. Sheldon may be a future Nobel Prize winner, but I’ve been named an IBM Champion for Analytics and System z for 2019. 🙂
The problem statement: a large US government agency is contemplating migrating a packaged application from System z to Linux. The current and future database repositories are Db2 for z/OS and Db2 for LUW respectively. As a System z Champion I’ve presented on the continued relevancy and efficacy of the IBM mainframe, but that’s not the focus of this post. One of the challenges of porting data is the conversion of character encoding from EBCDIC (Extended Binary Coded Decimal Interchange Code) which is the legacy encoding protocol used on IBM mainframes and System i to Unicode. Db2 for z/OS supports Unicode, but I’m not going to address anything other than EBCDIC encoding at the source and Unicode at the target (Sorry, ASCII). In brief, EBCDIC represents an 8 bit binary encoding scheme that can represent 256 different characters. For example: the letter “F” (one of my faves) is represented as 1100 0110 in binary and C6 in hexadecimal. For national language support amid the variety of Single Byte Character Sets (SBCS) used around the world – think the German umlaut (i.e. ä ö ü) – the 256 byte universe of EBCDIC wasn’t nearly enough. The Unicode solution is to use more than 1 byte to represent some characters. For example: the “¢” cent sign takes one byte to represent in EBCDIC, but 2 bytes in Unicode.
select length(ebcdic_str('¢')) as EBCDIC, length(unicode_str('¢')) as Unicode from sysibm.sysdummy1;
EBCDIC UNICODE
1 2
This problem has manifested itself as an error when moving a CHAR(5) column of data from Db2 for z/OS to Db2 for LUW. If, say, the pesky “¢” is present in the string and all 5 characters are significant (i.e. no trailing blanks), the string won’t fit in the same column width on the target platform. Most customers find this out the hard way when the Db2 for LUW LOAD command or IMPORT utility posts errors. There are two possible solutions:
- data cleanup (Do we really need the pesky ‘¢’ in a Comments column?)
- lengthen the target columns to accommodate Unicode expansion
Both can be time-consuming and expensive and will depend on a variety of factors beyond the scope of this blog post. The purpose here today is to identify the extent of the problem before you actually try to port the data. I’ve constructed this SQL statement to do just that. Read More…