Proc Transpose
Proc Transpose
ABSTRACT
PROC TRANSPOSE can be used to rotate (transpose) SAS data sets. This procedure
transforms the data from rows to columns or from columns to rows. But PROC
TRANSPOSE has some limitations. It doesn't works as required for multiple values of
VAR parameter FOR ID/BY parameter. This paper demonstrates how Transpose can be
done when VAR parameter has multiple values without losing any record in output
dataset.
PROC TRANSPOSE
PROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET>
<NAME=name> <OUT=output-data-set> <PREFIX=prefix>;
COPY variable(s);
ID variable;
IDLABEL variable;
VAR variable(s);
Options
DATA= input-data-set names the SAS data set to transpose.
Default: most recently created SAS data set
LABEL= label specifies a name for the variable in the output data set that contains the
label of the variable that is being transposed to create the current observation.
Default: _LABEL_
NAME= name specifies the name for the variable in the output data set that contains the
name of the variable being transposed to create the current observation.
Default: _NAME_
PROBLEM
Suppose we have a SAS dataset that looks like below where a UPC can have multiple
type and a type can have multiple value.
Now we are required to manipulate the data by UPC so that output has one column for
each TYPE, and the rows for each UPC represent ALL possible combinations of the
values for that UPC (also called a Cartesian expansion).
UPC A B C D
1 1 2
1 1 3
2 4 5
2 4 6
3 8 1 9 1
3 8 1 2 1
3 8 4 9 1
3 8 4 2 1
3 8 8 9 1
3 8 8 2 1
3 9 1 9 1
3 9 1 2 1
3 9 4 9 1
3 9 4 2 1
3 9 8 9 1
3 9 8 2 1
If we do simple PROC TRANSPOSE then that will not help.
We can use LET option. LET allows duplicate values for an ID variable.
UPC _NAME_ A B C D
1 value 1 3 . .
2 value . 4 6 .
3 value 9 8 2 1
But this is not what we are looking for. LET option will pick up only last occurrence of a
particular ID value within the data set or BY group.
SOLUTION
A Solution of this problem can be to first separate single and multiple value records.
Arrange dataset AA in order of UPC and TYPE and make two datasets one (SV) having
only one value for UPC and TYPE combination and another (MV) datasets containing
multiple occurrences for a UPC and TYPE combination. For MV dataset, generate all the
right combinations and assign proper index to each of the combination. Then combine SV
and MV datasets. Now we have an index assigned to each unique combination of UPC,
TYPE and VALUE. Now apply Transpose on UPC and INDEX.
%macro sort(ds,by) ;
%mend ;
%global &mvar ;
%let &mvar = 0 ;
data _null_ ;
set &ds nobs=nobs ;
call symput("&mvar", nobs) ;
stop ;
run ;
%mend nobs ;
%macro xtrans(in,out) ;
%sort(&in, upc chrtyp) ;
/* split by single and multiple values */
data sv mv ;
set &in ;
by upc chrtyp ;
if first.chrtyp and last.chrtyp
then output sv ;
else output mv ;
run ;
/* if there are multiple values expand, transpose, merge with the single values */
%nobs(mv, nobs) ;
%if &nobs %then %do ;
data
tmp_xtrans (keep=upc pi_index output_type output_value sortedby=upc pi_index )
sv_xid1(keep=upc pi_index sortedby=upc pi_index)
;
set mv ;
by upc chrtyp ;
retain num_fields 0 ;
end ;
/* done with this UPC */
run ;
proc sql ;
create table sv2 as
select *
from sv a left join sv_xid1 b
on a.upc = b.upc ;
run ;
data mv_t ;
set sv2 tmp_xtrans(rename=(output_type=chrtyp output_value=chrvl)) ;
by upc pi_index ;
run ;
%else %do ;
/* upc with single values */
proc transpose data=sv out=&out.(bufno=4 drop=_: ) ;
by upc ;
id chrtyp ;
var chrvl ;
run ;
%end ;
%mend ;
data AA;
infile datalines dlm = ',';
input upc chrtyp $ chrvl $;
datalines;
001, A, 1
001, B, 2
001, B, 3
002, B, 4
002, C, 5
002, C, 6
003, A, 8
003, A, 9
003, B, 1
003, B, 4
003, B, 8
003, C, 9
003, C, 2
003, D, 1
;
run;
/* Call Macro */
%xtrans(AA,BB);
Num_field contains the number of unique TYPE for a UPC. buf_field_counts array
contains the number of occurrences of a TYPE for a UPC. buf_field_names array contains
name of all unique TYPE for a UPC. buf_field_values array contains all values of a TYPE
for a UPC. In the above example it is assumed that maximum number of TYPE possible is
25.
CONCLUSION
This is an extremely powerful programming technique, which can be used to generate
TRANSPOSE of a datasets where VAR has multiple occurances for ID/BY parameters.
This code provides the basic programming structure for transposing datasets which with
little bit modification can be used to get desired output.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at: