April 2002 Technical Tip – Creating a Frequency Distribution: SAS or SyncSort?

Many of our clients use SAS. SAS is a truly remarkable package, but is a classic example of a tool often being used for other than its intended purpose. SAS is a statistics package, but many sites use it only as a report writer. Not that there's anything wrong with that…. But SAS requires a lot of resources. If CPU usage is a concern for you as it was for one of our clients, you might consider converting some of your SAS jobs to something else. What follows is a rather bizarre example.

I had just completed teaching a series of SyncSort classes for one of our clients. The students were actively encouraged by their managers to apply what they had learned in class in an effort to reduce CPU usage. One of my students wrote to me saying, "I currently use SAS to produce a frequency distribution showing the different values taken on by a given field, and the number of times each of those values occurs. Can this be done more efficiently with SyncSort?" The answer is YES!

For the benefit of those readers who know SAS, here's the original SAS program:

DATA INFILE1/VIEW=INFILE1;
  INFILE DATAIN;
  INPUT  @13  CYCLE  $2.  ;
  OUTPUT;
  LABEL CYCLE='WHAT BILLING CYCLE FILE HAS';
  RUN;
PROC FREQ DATA=INFILE1;
  TABLE CYCLE/MISSING MISSPRINT NOCOL NOROW;
  RUN;

You can achieve the same results with SyncSort by employing the following trick. Use the INREC command to create a new record containing the field to be tabulated and a field containing a value of one. Use the SUM command to add those ones, breaking by the field to be tabulated. Finally, use the OUTREC command to format the results. Here's the complete SyncSort code:

INREC  FIELDS=(1:13,2,         BILLING CYCLE ON INPUT IN 13-14
               3:X'0000001C')  PACKED DECIMAL ONE (FOUR BYTES)
SORT   FIELDS=(1,2,CH,A)       BILLING CYCLE AFTER INREC IN 1-2
SUM    FIELDS=(3,4,PD)         SUMMING "ONES" TO GET A COUNT
OUTREC FIELDS=(1:C'CYCLE ',    LITERAL, JUST TO MAKE IT PRETTY
               7:1,2,          BILLING CYCLE AFTER INREC
               9:C' OCCURRED ',             ANOTHER LITERAL
              19:3,4,PD,EDZ9=(Z,ZZZ,ZZ9),   COUNT, FORMATTED
              28:C' TIMES.')                LAST LITERAL

The SyncSort output isn't as "pretty" as the output from SAS, but is very adequate for internal use. And the results were impressive. We ran a before and after test with 2,482,833 records. The CPU time using SAS was 40 seconds, whereas the CPU time using SyncSort was 6 seconds. That's an 85% savings!

Now SyncSort can't do everything SAS can do. Indeed, it wasn't designed to do so. And I don't generally advocate using "the wrong tool." But this is a great example of where being clever can result in significant savings.

Caliber Data Training is pleased to offer training in SyncSort as well as SAS. We hope you will consider us when deciding upon a training provider.


Go to the articles index. Written by Bill Qualls. Copyright © 2003 by Caliber Data Training 800.938.1222