June 2006 Technical Tip Self-documenting data files in SAS

Often when I create a small data file for use with SAS, I will create a corresponding text file which describes that data. For example, here is orditem.txt:

This is orditem.txt containing
documentation for the orditem.dat file.

1-4 Invoice number
5-7 Item number
8-9 Quantity ordered
Download file here.

...and here is orditem.dat, the data file described by orditem.txt:

111011804
111024703
111056112
111156102
111195604
111211805
111224703
111395608
111424704
111456103
111495604
Download file here.

No one would question the appropriateness of having a documentation file such as orditem.txt. But how much better is it to have the documentation embedded within the data file itself! Consider, for example, selfdoc.dat which follows. Lines with an asterisk in the first column are to be treated as comments, not data:

*--------------------------------
* selfdoc.dat
* order line item detail file
* 1-4 Invoice number
* 5-7 Item number
* 8-9 Quantity ordered
*--------------------------------
111011804
111024703
111056112
111156102
111195604
111211805
111224703
111395608
111424704
111456103
111495604
Download file here.

Reading such a dataset and bypassing the documentation lines is a trivial process if you make use of the linehold character. The linehold character is a single trailing at sign and indicates that subsequent input statements in the same pass through the DATA step will read from the same physical record. Here's the SAS program which reads selfdoc.dat:

filename orditem 'c:/data/selfdoc.dat';

data orditem;                 /* sas file output                   */
  infile orditem;             /* flat file input                   */
  input @1 star $1.0 @;       /* note line hold trailing @ sign    */
  if star = '*' then delete;  /* star in clm 1 is comment in data  */
  drop star;                  /* don't include star clm in output  */
  input @1 invoice $4.0       /* note how I can re-read clm 1      */
        @5 item    $3.0
        @8 qty      2.0;
  length qty 3.0;             /* default is 8 bytes of storage!    */

proc print data=orditem;
  run;
Download source here.

We hope you will consider Caliber Data Training when you are in need of high quality SAS training.


Go to the articles index. Written by Bill Qualls. Copyright © 2006 by Caliber Data Training 800.938.1222