|
February 2005 Technical Tip – SAS: Multiple Records from a Single
Observation
My task of writing these monthly articles was made easier this month as
I received a request from a former student. Kevin emailed the following
question to me:
Hello Bill,
Hope this message finds you well. I have a question for you.
I received a text file which is at customer level: each
customer with multiple accounts is a line as the following:
ID Acct. Bal. Acct. Bal. Acct. Bal
01 11 $20 12 $30 13 $35
02 21 $15 22 $25
My question is how I can turn the data into an account level
layout so each account is a line like the following:
ID Acct. Bal.
01 11 $20
01 12 $30
01 13 $35
02 21 $15
02 22 $25
My goal is to eventually use SAS code to calculate average
and total balances, etc. on this data set. Please help.
Thanks very much,
Kevin
|
It's not often I receive a request complete with test data and expected
output! Gotta love it! I put Kevin's test data into file
C:\TEMP\ACCOUNTS.TXT as follows. I added a couple of records because I
wanted to make sure the logic worked for id's with only one account:
01 11 $20 12 $30 13 $35
02 21 $15 22 $25
03 31 $30
04 41 $10 42 $20 43 $30 44 $40 45 $50
05 51 $15
|
Here is the SAS source code:
data accounts;
infile "c:\temp\accounts.txt" missover;
input id $ acct @;
do while (acct ne .);
input bal : dollar. @;
output;
input acct @;
end;
format bal dollar.;
proc print data=accounts;
run;
|
Here is the output from PROC PRINT – just as Kevin asked:
The SAS System
Obs id acct bal
1 01 11 $20
2 01 12 $30
3 01 13 $35
4 02 21 $15
5 02 22 $25
|
Space is limited but a few quick comments are appropriate:
- The MISSOVER option on the INFILE statement tells SAS that any
attempt to read beyond the end of the record should result in missing
values rather than going to the next record.
- The single trailing at sign (@) on the INPUT statements tells SAS to
allow multiple INPUT statements for the same record within a single
iteration through the DATA step. I like to think of this as "Wait! I'm
not done with this record!" (The single trailing at sign (@) releases a
record when control returns to the top of the DATA step.)
- (acct ne .) is read as "acct is not a missing value"
- The dollar sign following id on the first INPUT statement tells SAS
to treat id as a character field: I am assuming from Kevin's letter that
leading zeroes on the id are significant.
- The colon following bal on the second INPUT tells SAS that this is
formatted; the dollar. informat tells SAS to treat it as a number
with a leading dollar sign.
- The OUTPUT statement writes a record. Normally SAS will
automatically output a record for each iteration through the DATA step.
But if an OUTPUT statement appears anywhere in the DATA step, a
record is written only when an OUTPUT statement occurs.
What about Kevin's stated goal "to eventually use SAS code to calculate
average and total balances, etc. on this data set"? Well, you'll have to
tune in next month
for that! Want to know more about SAS? Give us a call! Mainframe or PC,
you can count on Caliber Data Training for top quality education.
Go to the articles index. Written
by Bill Qualls. Copyright © 2005 by Caliber Data Training
800.938.1222 |