|
March 2005 Technical Tip – SAS: A Single Record from Multiple
Observations
Last month I told you about a letter I received from a former student.
Kevin emailed the following question to me:
Hello Bill,
Hope this message finds you well. I have a question for you.
I received a text file which is at customer level: each
customer with multiple accounts is a line as the following:
ID Acct. Bal. Acct. Bal. Acct. Bal
01 11 $20 12 $30 13 $35
02 21 $15 22 $25
My question is how I can turn the data into an account level
layout so each account is a line like the following:
ID Acct. Bal.
01 11 $20
01 12 $30
01 13 $35
02 21 $15
02 22 $25
My goal is to eventually use SAS code to calculate average
and total balances, etc. on this data set. Please help.
Thanks very much,
Kevin
|
Last month I showed you how to create the SAS
data set per Kevin's request: now it's time to address his stated goal
"to eventually use SAS code to calculate average and total balances, etc.
on this data set." So here's the input SAS data set created last
month:
Obs id acct bal
1 01 11 $20
2 01 12 $30
3 01 13 $35
4 02 21 $15
5 02 22 $25
|
Here is the SAS source code to process this data set:
data summary (keep = id count total);
set accounts (rename=(id=in_id)) end=last;
retain id count total;
if _N_ = 1 then id = in_id;
if (id NE in_id) then do;
output;
total=0;
count=0;
id = in_id;
end;
count+1;
total+bal;
if last then output;
proc print data=summary;
run;
|
Here is the output from PROC PRINT:
The SAS System
Obs id count total
1 01 3 85
2 02 2 40
3 03 1 30
4 04 5 150
5 05 1 15
|
Space is limited but a few quick comments are appropriate:
- The purpose of the KEEP statement is to make sure only the three
variables mentioned are written to the output data set. Without the KEEP
statement all variables (including in_id and acct) would be included.
- if last then output; writes the last record to the output
data set. The condition last is set by the end=last on the
SET statement at the top of the DATA step.
- The RETAIN statement tells SAS not to reinitialize these fields with
each iteration of the DATA step.
- This task requires that I have a hold field so I can compare the id
from one record to the next. I renamed the id on the input to in_id so I
could use the original variable name (id) for the output.
- if _n_ = 1 then id = in_id; initializes the id to that found
on the first record.
- As mentioned last month, the OUTPUT statement writes a record.
Normally SAS will automatically output a record for each iteration
through the DATA step. But if an OUTPUT statement appears
anywhere in the DATA step, a record is written only when
an OUTPUT statement occurs.
- count+1; and total+bal; are examples of the SAS sum
statement which is similar to Java's count++; and
total+=bal;
Want to know more about SAS? Give us a call! Mainframe or PC, you can
count on Caliber Data Training for top quality education.
Go to the articles index. Written
by Bill Qualls. Copyright © 2005 by Caliber Data Training
800.938.1222 |