Proc Report Basics
Proc Report Basics
PROC REPORT is a powerful procedure that allows a Let’s now take a closer look at each step.
programmer to do lists, subsets, statistics, and
computations all within one procedure. It has 3 basic Step 1 - Call Procedure
steps and 4 optional steps. Its syntax is as follows:
As with nearly all PROC statements, the first part of
PROC REPORT is to identify the dataset that SAS will be
Step 1 - Call Procedure using. This is done with a DATA=dataset statement.
Also like most PROCs, a list of options can follow (See
PROC REPORT DATA=dataset options ; Table A). I will touch upon the most common.
Next comes HEADLINE and HEADSKIP. These two title1 “State = #byval1” ;
options perform what they sound like. HEADLINE
provides a header line below the column headers, while
HEADSKIP skips one line between the column headers Step 2 - Assign Report Variables
and the first row of data.
The next step is to identify the columns of the report.
There are four other common options. One is First, write the term COLUMN, followed by an ordered
SPLIT=’character’. This option wraps the column list of variables that will comprise your columns. The
header and contents at the character specified, so be sure order of the list must match the order in which the
to use an uncommon character such as ~ or /. Another columns will appear in the report. There are three
option is SPACING=numeric. This specifies the number additional elements of note.
of spaces between columns. The other two options are
for page formatting: PS=numeric for page-size and First, if one wants to add the frequency (number of non-
LS=numeric for line-size. If one has already set PS and missing values) of a variable in the report, one
LS in a global option statement, then one does not have accomplishes this by adding an N to the COLUMN
to add them to the call procedure line. statement. For example, if we wanted to include the
frequency of CPTCODE for each PROVIDER, our
Putting these options together, the procedure call COLUMN statement would read:
statement reads:
column provider cptcode n ;
proc report data=dataset nowd headline headskip split=’~’
spacing=2 ls=142 ps=53; Second, if one wants to perform multiple statistics on a
specific variable, then one needs to create a new column
The statement above will be the same for most every variable and tell SAS that it is equal to the variable. For
PROC REPORT one creates. example, if we wanted the mean of PAYAMT as well the
sum, then we need to create a new column variable such
as AVGPAY and tell SAS that PAYAMT=AVGPAY. Our
Option 1 – Subset Data
column statement would look like this:
Similar to other PROCs, PROC REPORT allows one to
column provider cptcode n payamt payamt=avgpay ;
subset the data set by using any valid WHERE clause.
For example, if we only wanted a subset of New York
If we are adding a computed column (a variable not on the
providers, the statement would be:
dataset and that has to be calculated), then we need to
include the computed variable name to the list where it
where state = ‘NY’ ;
would be situated. For example, if we were to include a
column called NEWRATE after PAYAMT that is 90% of
This WHERE clause can be used on several variables by
PAYAMT, then our COLUMN statement would read:
using AND or OR between the clauses:
column provider cptcode n payamt newrate
where state = ‘NY’ and provider = ‘HOSP1’ ;
payamt=avgpay;
Option 2 – BY Statement
Step 3 - Define the Report Variables (type, format,
The BY statement can be used if one wants to separate
‘label’ width)
the report into page sections by a variable or group of
variables. The dataset, however, must be sorted by the
The next step is what makes PROC REPORT seem long
BY variable(s) before the program invokes PROC
and complicated. On the contrary, it is performing the
REPORT. The statement is simply:
same actions as one would do in Excel, Access, Power
Point, or any other tool for report writing. With all
by variable1 variable2 ;
tools, one needs to define one’s report variables. In
PROC REPORT, one does this by using the term For additional options that one can add to the DEFINE
DEFINE for each column (variable) that will be in the statement, please see Table D.
report.
Option 3 – Defining Computed Variables
The syntax for defining the report variables:
The ability to compute new variables without having to
DEFINE variable / type FORMAT= ‘label’ add them to the dataset is one of PROC REPORT’s
WIDTH= options ; strongest attributes. With COMPUTE blocks, one can
create a new column, percentages within groups, perform
There are six “types” for defining a variable: GROUP, arithmetic on or between columns, or even print text for
ORDER, DISPLAY, ANALYSIS, COMPUTED, and conditional outcomes.
ACROSS (See Table B). I will discuss the most
commonly used types. If one is adding a computed variable, then one calculates
the computed variable in a COMPUTE block after the
GROUP is used to specify the class variables. As with last DEFINE statement:
SQL, the report will be summarized by this variable.
Each class, or summary, variable will have this define COMPUTE variable ;
type. variable = statement ;
ENDCOMP ;
ORDER sorts the data by this variable, much like the
option in SQL. Since one can also control the sort order If the computed variable is based on a calculation of an
by a define option, I personally use the type GROUP ANALYSIS type variable, then one needs to refer to that
instead of ORDER. variable in the COMPUTE block as variable.statistic.
For example, if we want to create a new column called
DISPLAY lists the values as they appear in the data. This NEWRATE that is 90% of PAYAMT, then the define
is the default type. statements would be:
ANALYSIS statistic tells SAS to perform the specified define payamt / analysis sum format=dollar8.
statistical function on the variable. A list of available “PAID~AMOUNT” width=10 ;
statistics can be found in Table C. define newrate / computed format=dollar8. “NEW
RATE” width=10 ;
COMPUTED informs SAS that the variable does not
appear on the dataset and will need to be calculated. This While the COMPUTE block would read:
calculation occurs later in the COMPUTE block.
compute newrate ;
After the define type, one states the format, if any, for newrate = payamt.sum * 0.9 ;
SAS to use for the variable by writing FORMAT=format. endcomp;
Next comes the header, or label, of the column in quotes.
The last term used is WIDTH=numeric, which specifies If the computed variable is based on N (the frequency),
the width of the column. An example line would read: then use N without any statistic extension:
An example of how one would perform a statistical For examples on how to calculate percentages within
function on a variable: groups and for entire reports, please see the section
“Two Useful COMPUTE Statements”.
define payamt / analysis mean format=dollar8.
“AVG~PAYAMT” width=10 ;
Option 4 - Adding summary lines column provider n payamt perctn ;
define provider / group format=$hosp. ‘Hospital’
Another good tool of PROC REPORT is the ability to width=15 ;
insert summary lines between groups, as well as insert a define n / format=comma6. ‘Admits’ width=8 ;
summary line for the entire report. Two commands define payamt / analysis sum format=dollar8.
perform these functions: BREAK AFTER and RBREAK. ‘Paid~Amount’ width=10 ;
BREAK AFTER variable tells SAS to create a break define perctn / computed format=percent8.1 ‘%
after the last row of each unique value of the variable. To of~Admits” ;
create a summary line, add a slash, then the term
SUMMARIZE: Second, we need to compute the total number of
observations for each PROVIDER and retain the result as
BREAK AFTER variable / SUMMARIZE options ; TOTN:
There are other options that one may add after the term compute before provider ;
SUMMARIZE (See Table E). The most common are OL, totn = n ;
UL, DOL, DUL, and SKIP. The option OL will place an endcomp;
OverLine above the summary numbers, while the UL will
place an UnderLine below the summary numbers. Then, compute PERCTN as the number of observations
Likewise, DOL will place a Double OverLine above the (frequency) for each CPTCODE (N) and divide by the
summary numbers, while the DUL will place a Double total number of observation (TOTN):
UnderLine below the summary numbers. The term SKIP
will skip a line between the summary numbers and the compute perctn ;
next row. perctn = n/totn ;
endcomp ;
One can summarize within all GROUP types by adding a
separate BREAK AFTER command for each variable. Percent for Entire Report
For example, if we were summarizing by STATE and by
PROVIDER, then our statements would read: One can also create a new column variable that is the
percentage of observations for a variable for the entire
break after provider / summarize ol ul skip ; report. Using our example, we will create the new
break after state / summarize ol ul skip ; variable PERCTALL that is the percentage of services
for each CPTCODE for all PROVIDERs.
To add a total summary line for the report, the term
RBREAK AFTER is used with no variable specified. The First, we call the procedure, assign the report variables,
term SUMMARIZE and the same options for BREAK and define the report variables:
AFTER can apply. For example:
proc report data=dataset nowd headline headskip
rbreak after / summarize dol dul ; split=”~” spacing=2 ls=152 ps=53 ;
column provider n payamt perctall ;
Two Useful COMPUTE Statements define provider / group format=$hosp. ‘Hospital’
width=15 ;
Percent within a Group define n / format=comma6. ‘Admits’ width=8 ;
define payamt / analysis sum format=dollar8.
One can compute a new column variable that is the ‘Paid~Amount’ width=10 ;
percentage of observations for a variable within a group. define perctall / computed format=percent8.1 ‘%
The following creates a new column variable PERCTN as of~Admits” ;
the percentage of services for each PROVIDER.
Second, we need to calculate the total number of
First, we call the procedure, assign the report variables, observations and retain the result as TOTALL.
and define the report variables:
compute before ;
proc report data=dataset nowd headline headskip totall = n ;
split=”~” spacing=2 ls=152 ps=53 ; endcomp;
Then, compute PERCTALL as the number of numbers are then turned into data
observations for each CPTCODE (N) and divides by the values according to created formats. */
total number of observations (TOTN).
data claims (drop= i);
compute perctall ; do i=1 to &numclms ;
perctall = n/totall ; hosp=put((ceil((&maxhosp-
endcomp ; &minhosp+1)*ranuni(0)+ &minhosp-
1)),hosp.);
cptcode=put((ceil((&maxcpt-&mincpt
Creating a Medical Utilization Report +1)*rantri(0,.001) + &mincpt-
1)),cpt.);
With using the information provided above, one can los=(ceil((&maxlos-&minlos
construct the medical utilization report found in Example +1)*rantri(0,.0001) + &minlos-1));
1. First, one needs to create a test dataset of claims. payamt=los*ranuni(0)*&payrate;
output;
Generic Data Builder end;
format payamt 15.2;
The following program will create a generic dataset of run;
claims data consisting of PROVIDER, CPTCODE
(procedure code), LOS (length of stay), and PAYAMT. *Note: The RANTRI function gives a random number
This method can be used to create any kind of dataset, using triangular distribution. I used this function to
using PROC FORMAT to convert random numbers into create “real” test data by skewing the data distribution to
data values. provide lesser length of stays and more RMBRDs
(regular room and board) and RMOBSs (obstetrics room
/* Establishing maximum and minimum values */ and board).
%let minhosp=1; *Minimum hospital Now that we have our dataset, we can use what we have
range; learned and create the following report:
%let maxhosp=4; *Maximum hospital
range; **Call Procedure**;
%let mincpt=1; *Minimum CPT-code range; proc report data=claims nowd headline
%let maxcpt=4; *Maximum CPT-code range; headskip split='~' spacing=2 ls=142
%let payrate=1200; *Pay rate amount; ps=53;
%let minlos=1; *Minimum length of stay;
%let maxlos=10; *Maximum length of **Assign Report Variables**;
stay; column cptcode hosp n payamt los
%let numclms=3000; *Number of claims; los=alos vlos los=mlos apay
payamt=mpay pcta;
/* Creating formats for data values */
**Define Report Variables**;
proc format ; define cptcode / group "Cpt~Code"
value hosp 1='HOSP1' width=5;
2='HOSP2' define hosp / group "Hospital"
3='HOSP3' width=10;
4='HOSP4'; define n / format=comma6. "Admits"
value cpt width=6;
1='RMBRD' define payamt / analysis sum
2='RMOBS' format=dollar12. "Paid~Amount"
3='RMPED' width=11;
4='RMPSY'; define los / analysis sum
run; format=comma6. "Total~LOS" width=6;
define alos / analysis mean
/* Numbers are selected at random with format=comma6.2 "ALOS" width=6 ;
set minimum and maximum ranges. These
define vlos / computed format=comma6.2 PROC REPORT Option Tables
"Diff from~ALOS" width=9;
define mlos / analysis max Table A - Options when Calling PROC REPORT
format=comma6. "Max~LOS" width=6;
define apay / computed format=dollar8. Usage Purpose
"Avg Paid~per Day" width=8; HEADLINE Places a line under the
define mpay / analysis mean column names
format=dollar8. "Avg Paid~Per Admit" HEADSKIP Skips a line between the
width=10; column names and the first
define pcta / computed row
format=percent8.1 "% of~Total SPLIT=’’ Wraps the column name
Admits"; after the specified
character.
**Define Computed Variables**; NOWD No windows option
compute before; LS= Spacing for line-size.
admits=n; PS= Spacing for page-size.
endcomp; SPACING= Specifies spacing between
columns
compute before cptcode; NOHEADER Does not print column
proclos=los.sum/n; names
endcomp; MISSING Do not drop missing values
in calculations and
compute vlos; groupings.
vlos=_c6_-proclos; OUTREPT= Stores report definition in
endcomp; a catalog entry
REPORT= Calls specific stored
compute apay; report definition.
apay=payamt.sum/los.sum; OUT= Places output in specified
endcomp; dataset.
COLWIDTH= Specifies default column
compute pcta; width for numeric and
pcta=n/admits; computed variables.
endcomp; LIST Writes report definition to
the log.
**Add Summary Lines**; NAMED Writes “name=” before
break after cptcode /summarize dol dul value.
suppress skip; WRAP Displays one value of each
column, going to a
run; consecutive lines before
the printing the next row
Your output will be similar to Example 1. Your numbers value.
will be different due to the creation of the random test PANELS= Specifies number of panels
data. on a page.
PSPACE= Space between panels.
VARDEF= Specifies the divisor to use
when calculating variances
(n, df, wdf, weight)
SHOWALL Overrides NOPRINT and
NOZERO define options.
BOX Puts boxes around all
columns and rows
Table B – Types for Defining Variables Table D – Additional Options When Defining a Column
REFERENCES
ACKNOWLEDGEMENTS
CONTACT INFORMATION
% OF
CPT PAID TOTAL DIFF FROM MAX AVG PAID AVG PAID TOTAL
CODE HOSPITAL ADMITS AMOUNT LOS ALOS ALOS LOS PER DAY PER ADMIT ADMITS
---------------------------------------------------------------------------------------------------------
RMBRD HOSP1 287 $596,842 1,010 3.52 -0.25 9 $591 $2,080 10.0%
HOSP2 248 $590,667 951 3.83 0.07 10 $621 $2,382 8.7%
HOSP3 233 $510,718 932 4.00 0.23 10 $548 $2,192 8.1%
HOSP4 252 $567,657 952 3.78 0.01 10 $596 $2,253 8.8%
====== =========== ====== ====== ========= ====== ======== ========== ========
1,020 $2,265,882 3,845 3.77 0.00 10 $589 $2,221 35.6%
====== =========== ====== ====== ========= ====== ======== ========== ========
RMOBS HOSP1 224 $531,765 921 4.11 0.28 10 $577 $2,374 7.8%
HOSP2 174 $403,207 641 3.68 -0.15 10 $629 $2,317 6.1%
HOSP3 216 $552,363 818 3.79 -0.05 10 $675 $2,557 7.5%
HOSP4 194 $438,121 718 3.70 -0.13 10 $610 $2,258 6.8%
====== =========== ====== ====== ========= ====== ======== ========== ========
808 $1,925,456 3,098 3.83 0.00 10 $622 $2,383 28.2%
====== =========== ====== ====== ========= ====== ======== ========== ========
RMPED HOSP1 152 $310,656 532 3.50 -0.26 10 $584 $2,044 5.3%
HOSP2 150 $339,310 552 3.68 -0.08 9 $615 $2,262 5.2%
HOSP3 174 $415,001 669 3.84 0.08 10 $620 $2,385 6.1%
HOSP4 150 $367,301 601 4.01 0.25 10 $611 $2,449 5.2%
====== =========== ====== ====== ========= ====== ======== ========== ========
626 $1,432,269 2,354 3.76 0.00 10 $608 $2,288 21.9%
====== =========== ====== ====== ========= ====== ======== ========== ========
RMPSY HOSP1 123 $285,258 469 3.81 -0.09 10 $608 $2,319 4.3%
HOSP2 86 $213,784 362 4.21 0.31 10 $591 $2,486 3.0%
HOSP3 108 $265,003 406 3.76 -0.14 9 $653 $2,454 3.8%
HOSP4 94 $237,527 366 3.89 -0.01 10 $649 $2,527 3.3%
====== =========== ====== ====== ========= ====== ======== ========== ========
411 $1,001,573 1,603 3.90 0.00 10 $625 $2,437 14.3%
====== =========== ====== ====== ========= ====== ======== ========== ========