November 2007 Technical Tip SAS: Parsing and Counting

Today I received the following email from a friend of mine:

I was wondering if you could give me a hand with a difficult data issue? I have a character field of 6 bytes which represents 6 different product types. Product 1 = A, Product 2 = B, (3=C, 4=D, 5=E, 6=F) and 0 is non purchase. I was wondering if you know of a way to quickly scan the field and if A is represented, flag Product1 as a hit. Same for B through F (2-6). The really poor formatting makes it very difficult. You would think that B should be on column 2 but I guess they weren't that smart. I've attached a SAS dataset with the example.

For some reason I was unable to open the attached dataset using SAS, so I "peeked" at it with Notepad and was able to determine how the data was stored. I then wrote a quick DATA step to create a test dataset.

I like this example because it shows the use of RETAIN, END= and OUTPUT. Here's my solution...


* Create test dataset; 
data product; 
input @1 prods $6.; 
datalines; 
ABCDEF 
ABCDE 
ABCD 
ABC 
AB 
A 
FEDCBA 
FEDCB 
FEDC 
FED 
FE 
F 
AFBECD 
AFBEC 
AFBE 
AFB 
AF 
A 
; 
run; 

* Show generated test dataset; 
proc print data=product; 
run; 

* produce count by product (How many As? Bs? ... Fs?); 
data counts (keep = A B C D E F); 
set product; 
retain A B C D E F; 
set product end=lastobs; 
do i = 1 to 6; 
   p = substr(prods, i, 1); 
   if p = 'A' then A+1; 
      else if p = 'B' then B+1; 
      else if p = 'C' then C+1; 
      else if p = 'D' then D+1; 
      else if p = 'E' then E+1; 
      else if p = 'F' then F+1; 
end; 
if (lastobs) then output; 
run; 

* show counts by product; 
proc print data=counts; 
run; 

Comments:

  • RETAIN'ed variables are not reset with each iteration of the DATA step. Recall the special syntax for summing variables: A+1; is the same as A=A+1; But variables summed in this way are automatically retained, so the use of RETAIN here was not required. Still, I think it's a better coding style.
  • The END= option on the SET statement let's you set a variable (in this case LASTOBS) which can be tested within the DATA step. This allows for special end of file processing.
  • The OUTPUT statement causes the current record to be written to the output SAS dataset. Recall that when an explicit OUTPUT is used in a DATA step, the implicit output which normally takes place at the end of the data step does not happen. I only want to show the counts one time, after all records have been processed.

We hope you will consider Caliber Data Training when deciding upon a SAS training provider.


Go to the articles index. Written by Bill Qualls. Copyright © 2007 by Caliber Data Training 800.938.1222