March 2005 Technical Tip SAS: A Single Record from Multiple Observations

Last month I told you about a letter I received from a former student. Kevin emailed the following question to me:

Hello Bill,

Hope this message finds you well. I have a question for you.
I received a text file which is at customer level: each
customer with multiple accounts is a line as the following:

   ID  Acct.  Bal.   Acct.  Bal.  Acct.  Bal
   01   11    $20    12    $30    13    $35
   02   21    $15    22    $25   

My question is how I can turn the data into an account level
layout so each account is a line like the following:

   ID  Acct.  Bal.
   01  11    $20
   01  12    $30
   01  13    $35
   02  21    $15
   02  22    $25

My goal is to eventually use SAS code to calculate average
and total balances, etc. on this data set.  Please help.  

Thanks very much,
Kevin

Last month I showed you how to create the SAS data set per Kevin's request: now it's time to address his stated goal "to eventually use SAS code to calculate average and total balances, etc. on this data set." So here's the input SAS data set created last month:

Obs    id    acct    bal

 1     01     11     $20
 2     01     12     $30
 3     01     13     $35
 4     02     21     $15
 5     02     22     $25

Here is the SAS source code to process this data set:

data summary (keep = id count total);
   set accounts (rename=(id=in_id)) end=last;
   retain id count total;

   if _N_ = 1 then id = in_id;

   if (id NE in_id) then do;
      output;
      total=0;
      count=0;
      id = in_id;
   end;

   count+1;
   total+bal;
   if last then output;

proc print data=summary;

run;

Here is the output from PROC PRINT:

     The SAS System

Obs    id    count    total

 1     01      3        85
 2     02      2        40
 3     03      1        30
 4     04      5       150
 5     05      1        15

Space is limited but a few quick comments are appropriate:

  • The purpose of the KEEP statement is to make sure only the three variables mentioned are written to the output data set. Without the KEEP statement all variables (including in_id and acct) would be included.
  • if last then output; writes the last record to the output data set. The condition last is set by the end=last on the SET statement at the top of the DATA step.
  • The RETAIN statement tells SAS not to reinitialize these fields with each iteration of the DATA step.
  • This task requires that I have a hold field so I can compare the id from one record to the next. I renamed the id on the input to in_id so I could use the original variable name (id) for the output.
  • if _n_ = 1 then id = in_id; initializes the id to that found on the first record.
  • As mentioned last month, the OUTPUT statement writes a record. Normally SAS will automatically output a record for each iteration through the DATA step. But if an OUTPUT statement appears anywhere in the DATA step, a record is written only when an OUTPUT statement occurs.
  • count+1; and total+bal; are examples of the SAS sum statement which is similar to Java's count++; and total+=bal;

Want to know more about SAS? Give us a call! Mainframe or PC, you can count on Caliber Data Training for top quality education.


Go to the articles index. Written by Bill Qualls. Copyright © 2005 by Caliber Data Training 800.938.1222