|
November 2007 Technical Tip – SAS: Parsing and Counting
Today I received the following email from a friend of mine:
|
I was wondering if you could give me a hand with a difficult data issue? I have a character field of 6 bytes which represents 6 different product types. Product 1 = A, Product 2 = B, (3=C, 4=D, 5=E, 6=F) and 0 is non purchase. I was wondering if you know of a way to quickly scan the field and if A is represented, flag Product1 as a hit. Same for B through F (2-6). The really poor formatting makes it very difficult. You would think that B should be on column 2 but I guess they weren't that smart. I've attached a SAS dataset with the example.
|
For some reason I was unable to open the attached dataset using SAS, so I "peeked" at it with Notepad and was able to determine how the data was stored. I then wrote a quick DATA step to create a test dataset.
I like this example because it shows the use of RETAIN, END= and OUTPUT. Here's my solution...
* Create test dataset;
data product;
input @1 prods $6.;
datalines;
ABCDEF
ABCDE
ABCD
ABC
AB
A
FEDCBA
FEDCB
FEDC
FED
FE
F
AFBECD
AFBEC
AFBE
AFB
AF
A
;
run;
* Show generated test dataset;
proc print data=product;
run;
* produce count by product (How many As? Bs? ... Fs?);
data counts (keep = A B C D E F);
set product;
retain A B C D E F;
set product end=lastobs;
do i = 1 to 6;
p = substr(prods, i, 1);
if p = 'A' then A+1;
else if p = 'B' then B+1;
else if p = 'C' then C+1;
else if p = 'D' then D+1;
else if p = 'E' then E+1;
else if p = 'F' then F+1;
end;
if (lastobs) then output;
run;
* show counts by product;
proc print data=counts;
run;
|
Comments:
- RETAIN'ed variables are not reset with each iteration of the DATA step. Recall the special syntax for summing variables: A+1; is the same as A=A+1; But variables summed in this way are automatically retained, so the use of RETAIN here was not required. Still, I think it's a better coding style.
- The END= option on the SET statement let's you set a variable (in this case LASTOBS) which can be tested within the DATA step. This allows for special end of file processing.
- The OUTPUT statement causes the current record to be written to the output SAS dataset. Recall that when an explicit OUTPUT is used in a DATA step, the implicit output which normally takes place at the end of the data step does not happen. I only want to show the counts one time, after all records have been processed.
We hope you will consider Caliber Data Training when deciding upon a SAS training provider.
Go to the articles index.
Written by Bill Qualls. Copyright © 2007 by Caliber Data Training 800.938.1222
|