May 2002 Technical Tip - When is –32,104 equal to +25,201?
Many non-programmers process massive amounts of data on a daily basis. Often these files come from sources outside of their company, and go to users who likewise are outside of the organization. In the meantime, records have been added or dropped and formats have been changed. It has been our observation that when these employees do not understand data representation, quality suffers!
Many of these employees risk corrupting their data when they transfer files from the mainframe to the PC. They often aren't aware that they are dealing with two different coding schemes, and the impact this has on the file transfer process.
These coding schemes are EBCDIC and ASCII. Generally, we can think of EBCDIC as the mainframe code and ASCII as the PC code. Technically, the word "code" is incorrect here, as these schemes are really "substitution ciphers". Even the Morse Code is really a cipher! You may have worked with ciphers as a kid. The most common cipher is where each letter is substituted with its ordinal position in the alphabet: A=1, B=2, C=3, … , Z=26. As a kid, I could encipher CAB as 3-1-2.
EBCDIC and ASCII are just different ciphers. In EBCDIC, CAB would be 195-193-194 (or hexadecimal X'C3C1C2'), whereas in ASCII, CAB would be 43-41-42 (or hexadecimal X'2B292A'.)
The file transfer process is really a very dumb process; that is, it has no idea what is being transferred. The file transfer program transfers – and translates – one byte at a time. This is not a problem if the file being transferred contains "text" data only. But if the file contains binary or packed decimal fields, problems will likely occur. Binary and packed decimal fields must be converted to "text" before file transfers!
Consider the following example. Assume –32,104 is stored on the mainframe as a binary halfword. This would occupy two bytes as X'8298'. This file needs to be downloaded to a PC. But, as stated before, the file transfer program will translate one byte at a time. In EBCDIC, a X'82' is a lower-case letter 'b'. The file transfer program attempts to find the equivalent character in ASCII. In ASCII, a lower-case letter 'b' is X'62'. Likewise, in EBCDIC, the X'98' is a lower-case letter 'q'. In ASCII, a lower-case letter 'q' is X'71'. So X'8298' on the mainframe becomes X'6271' on the PC.
On the mainframe side, X'8298' was the number –32,104 but it was also the letters 'bq' – it's an issue of context. On the PC side, the letters 'bq' are represented as X'6271' which if read as a binary halfword is the number +25,201. If the PC-based program which would process this downloaded data was expecting a binary halfword, the value of that halfword has changed!
If you or your associates need to learn more, then we recommend our Data Representation course. The purpose of this one-day course is to familiarize the student with the various ways in which data can be stored. Topics include number systems, ASCII vs. EBCDIC, character data, numeric data (zoned decimal, packed decimal, binary, and stripped-packed), reading record layouts, and reading dumps.