April 2002 Technical Tip – Creating a Frequency Distribution: SAS or SyncSort?
Many of our clients use SAS. SAS is a truly remarkable package, but is a classic example of a tool often being used for other than its intended purpose. SAS is a statistics package, but many sites use it only as a report writer. Not that there's anything wrong with that…. But SAS requires a lot of resources. If CPU usage is a concern for you as it was for one of our clients, you might consider converting some of your SAS jobs to something else. What follows is a rather bizarre example.
I had just completed teaching a series of SyncSort classes for one of our clients. The students were actively encouraged by their managers to apply what they had learned in class in an effort to reduce CPU usage. One of my students wrote to me saying, "I currently use SAS to produce a frequency distribution showing the different values taken on by a given field, and the number of times each of those values occurs. Can this be done more efficiently with SyncSort?" The answer is YES!
For the benefit of those readers who know SAS, here's the original SAS program:
You can achieve the same results with SyncSort by employing the following trick. Use the INREC command to create a new record containing the field to be tabulated and a field containing a value of one. Use the SUM command to add those ones, breaking by the field to be tabulated. Finally, use the OUTREC command to format the results. Here's the complete SyncSort code:
The SyncSort output isn't as "pretty" as the output from SAS, but is very adequate for internal use. And the results were impressive. We ran a before and after test with 2,482,833 records. The CPU time using SAS was 40 seconds, whereas the CPU time using SyncSort was 6 seconds. That's an 85% savings!
Now SyncSort can't do everything SAS can do. Indeed, it wasn't designed to do so. And I don't generally advocate using "the wrong tool." But this is a great example of where being clever can result in significant savings.