0% found this document useful (0 votes)
107 views9 pages

Proc Report Basics

This document provides a concise introduction to the basics of PROC REPORT in SAS for generating reports. It outlines the basic syntax of PROC REPORT in 3 steps and 4 optional steps. The steps include calling the procedure, assigning report variables, and defining the report variables. It also describes how to subset data, use BY groups, define computed variables, and add summary lines. The goal is to teach the essential components of PROC REPORT in a short paper for both beginners and advanced users to produce reports without extensive pre-processing or post-processing of data.

Uploaded by

hhjbjhb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views9 pages

Proc Report Basics

This document provides a concise introduction to the basics of PROC REPORT in SAS for generating reports. It outlines the basic syntax of PROC REPORT in 3 steps and 4 optional steps. The steps include calling the procedure, assigning report variables, and defining the report variables. It also describes how to subset data, use BY groups, define computed variables, and add summary lines. The goal is to teach the essential components of PROC REPORT in a short paper for both beginners and advanced users to produce reports without extensive pre-processing or post-processing of data.

Uploaded by

hhjbjhb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

SO YOU WANT TO LEARN PROC REPORT

Chris Moriak, Boehringer Ingelheim, Ridgefield, CT

ABSTRACT Option 2 – By Statement

Producing attractive reports is essential when working BY variable1 variable2 ;


with either internal or external clients. By now, you have
probably heard about PROC REPORT in the non-
windowing environment, but you may have never used it. Step 2 - Assign Report Variables
If you want to learn PROC REPORT but do not have
either the time nor money to take a SAS course or read COLUMN variable1 variable2 variableN ;
the extensive manual, then this paper is for you. I have
condensed the basic components and major options of
PROC REPORT into this short paper. This paper is for Step 3 - Define the Report Variables
both beginners in its step-by-step instructions, as well as
for advanced users with its reference section containing DEFINE variable / type FORMAT= ‘label’
nearly every available option. The paper also contains WIDTH= options ;
two methods for easily calculating percentages within
groups and for the entire report, and a useful medical
utilization report. Option 3 - Define New Computed Variables

INTRODUCTION COMPUTE variable;


variable=statements ;
What is fantastic about PROC REPORT is that by ENDCOMP;
learning the basics, one can create seemingly
complicated reports. This paper introduces the basics of
displaying data, using PROC REPORT to generate Option 4 - Add Summary Lines
statistics, creating new columns, and adding summary
lines. When finished reading this paper, one will see that BREAK AFTER variable / options ;
by knowing the basic concepts that are outlined in this RBREAK AFTER / options ;
paper, one can produce Example 1 with no pre or post-
processing of the data, using just one PROC REPORT. RUN ;

BASIC REPORT SYNTAX

PROC REPORT is a powerful procedure that allows a Let’s now take a closer look at each step.
programmer to do lists, subsets, statistics, and
computations all within one procedure. It has 3 basic Step 1 - Call Procedure
steps and 4 optional steps. Its syntax is as follows:
As with nearly all PROC statements, the first part of
PROC REPORT is to identify the dataset that SAS will be
Step 1 - Call Procedure using. This is done with a DATA=dataset statement.
Also like most PROCs, a list of options can follow (See
PROC REPORT DATA=dataset options ; Table A). I will touch upon the most common.

The first option is the windows/no windows option. This


Option 1 - Subset Data option tells SAS whether to send the output to the
interactive Report window (similar to SAS/ASSIST), or
WHERE variable = ; to send the output to the Output window. The default is
windows. Much of the time, one will want to avoid
sending output to an interactive window, especially when
the program is running in batch, or when the report is in
production or a validated status. Therefore, this paper One can print the BY variable by using #byvaln in the
concentrates on the non-windowing environment. The TITLE statement where n is the placement number of
option for this is NOWD, short for NOWINDOWS. where the variable resides in the BY statement:

Next comes HEADLINE and HEADSKIP. These two title1 “State = #byval1” ;
options perform what they sound like. HEADLINE
provides a header line below the column headers, while
HEADSKIP skips one line between the column headers Step 2 - Assign Report Variables
and the first row of data.
The next step is to identify the columns of the report.
There are four other common options. One is First, write the term COLUMN, followed by an ordered
SPLIT=’character’. This option wraps the column list of variables that will comprise your columns. The
header and contents at the character specified, so be sure order of the list must match the order in which the
to use an uncommon character such as ~ or /. Another columns will appear in the report. There are three
option is SPACING=numeric. This specifies the number additional elements of note.
of spaces between columns. The other two options are
for page formatting: PS=numeric for page-size and First, if one wants to add the frequency (number of non-
LS=numeric for line-size. If one has already set PS and missing values) of a variable in the report, one
LS in a global option statement, then one does not have accomplishes this by adding an N to the COLUMN
to add them to the call procedure line. statement. For example, if we wanted to include the
frequency of CPTCODE for each PROVIDER, our
Putting these options together, the procedure call COLUMN statement would read:
statement reads:
column provider cptcode n ;
proc report data=dataset nowd headline headskip split=’~’
spacing=2 ls=142 ps=53; Second, if one wants to perform multiple statistics on a
specific variable, then one needs to create a new column
The statement above will be the same for most every variable and tell SAS that it is equal to the variable. For
PROC REPORT one creates. example, if we wanted the mean of PAYAMT as well the
sum, then we need to create a new column variable such
as AVGPAY and tell SAS that PAYAMT=AVGPAY. Our
Option 1 – Subset Data
column statement would look like this:
Similar to other PROCs, PROC REPORT allows one to
column provider cptcode n payamt payamt=avgpay ;
subset the data set by using any valid WHERE clause.
For example, if we only wanted a subset of New York
If we are adding a computed column (a variable not on the
providers, the statement would be:
dataset and that has to be calculated), then we need to
include the computed variable name to the list where it
where state = ‘NY’ ;
would be situated. For example, if we were to include a
column called NEWRATE after PAYAMT that is 90% of
This WHERE clause can be used on several variables by
PAYAMT, then our COLUMN statement would read:
using AND or OR between the clauses:
column provider cptcode n payamt newrate
where state = ‘NY’ and provider = ‘HOSP1’ ;
payamt=avgpay;
Option 2 – BY Statement
Step 3 - Define the Report Variables (type, format,
The BY statement can be used if one wants to separate
‘label’ width)
the report into page sections by a variable or group of
variables. The dataset, however, must be sorted by the
The next step is what makes PROC REPORT seem long
BY variable(s) before the program invokes PROC
and complicated. On the contrary, it is performing the
REPORT. The statement is simply:
same actions as one would do in Excel, Access, Power
Point, or any other tool for report writing. With all
by variable1 variable2 ;
tools, one needs to define one’s report variables. In
PROC REPORT, one does this by using the term For additional options that one can add to the DEFINE
DEFINE for each column (variable) that will be in the statement, please see Table D.
report.
Option 3 – Defining Computed Variables
The syntax for defining the report variables:
The ability to compute new variables without having to
DEFINE variable / type FORMAT= ‘label’ add them to the dataset is one of PROC REPORT’s
WIDTH= options ; strongest attributes. With COMPUTE blocks, one can
create a new column, percentages within groups, perform
There are six “types” for defining a variable: GROUP, arithmetic on or between columns, or even print text for
ORDER, DISPLAY, ANALYSIS, COMPUTED, and conditional outcomes.
ACROSS (See Table B). I will discuss the most
commonly used types. If one is adding a computed variable, then one calculates
the computed variable in a COMPUTE block after the
GROUP is used to specify the class variables. As with last DEFINE statement:
SQL, the report will be summarized by this variable.
Each class, or summary, variable will have this define COMPUTE variable ;
type. variable = statement ;
ENDCOMP ;
ORDER sorts the data by this variable, much like the
option in SQL. Since one can also control the sort order If the computed variable is based on a calculation of an
by a define option, I personally use the type GROUP ANALYSIS type variable, then one needs to refer to that
instead of ORDER. variable in the COMPUTE block as variable.statistic.
For example, if we want to create a new column called
DISPLAY lists the values as they appear in the data. This NEWRATE that is 90% of PAYAMT, then the define
is the default type. statements would be:

ANALYSIS statistic tells SAS to perform the specified define payamt / analysis sum format=dollar8.
statistical function on the variable. A list of available “PAID~AMOUNT” width=10 ;
statistics can be found in Table C. define newrate / computed format=dollar8. “NEW
RATE” width=10 ;
COMPUTED informs SAS that the variable does not
appear on the dataset and will need to be calculated. This While the COMPUTE block would read:
calculation occurs later in the COMPUTE block.
compute newrate ;
After the define type, one states the format, if any, for newrate = payamt.sum * 0.9 ;
SAS to use for the variable by writing FORMAT=format. endcomp;
Next comes the header, or label, of the column in quotes.
The last term used is WIDTH=numeric, which specifies If the computed variable is based on N (the frequency),
the width of the column. An example line would read: then use N without any statistic extension:

define cptcode / group format=$code. “CPT~CODE” compute newvar ;


width=30 ; newvar = N/100 ;
endcomp;
Notice the tilda between “CPT” and “CODE”. This is
our SPLIT character, and PROC REPORT will place One can even refer to another column by the keyword
“CPT” and “CODE” on separate lines. _cn_, where n is the column number.

An example of how one would perform a statistical For examples on how to calculate percentages within
function on a variable: groups and for entire reports, please see the section
“Two Useful COMPUTE Statements”.
define payamt / analysis mean format=dollar8.
“AVG~PAYAMT” width=10 ;
Option 4 - Adding summary lines column provider n payamt perctn ;
define provider / group format=$hosp. ‘Hospital’
Another good tool of PROC REPORT is the ability to width=15 ;
insert summary lines between groups, as well as insert a define n / format=comma6. ‘Admits’ width=8 ;
summary line for the entire report. Two commands define payamt / analysis sum format=dollar8.
perform these functions: BREAK AFTER and RBREAK. ‘Paid~Amount’ width=10 ;
BREAK AFTER variable tells SAS to create a break define perctn / computed format=percent8.1 ‘%
after the last row of each unique value of the variable. To of~Admits” ;
create a summary line, add a slash, then the term
SUMMARIZE: Second, we need to compute the total number of
observations for each PROVIDER and retain the result as
BREAK AFTER variable / SUMMARIZE options ; TOTN:

There are other options that one may add after the term compute before provider ;
SUMMARIZE (See Table E). The most common are OL, totn = n ;
UL, DOL, DUL, and SKIP. The option OL will place an endcomp;
OverLine above the summary numbers, while the UL will
place an UnderLine below the summary numbers. Then, compute PERCTN as the number of observations
Likewise, DOL will place a Double OverLine above the (frequency) for each CPTCODE (N) and divide by the
summary numbers, while the DUL will place a Double total number of observation (TOTN):
UnderLine below the summary numbers. The term SKIP
will skip a line between the summary numbers and the compute perctn ;
next row. perctn = n/totn ;
endcomp ;
One can summarize within all GROUP types by adding a
separate BREAK AFTER command for each variable. Percent for Entire Report
For example, if we were summarizing by STATE and by
PROVIDER, then our statements would read: One can also create a new column variable that is the
percentage of observations for a variable for the entire
break after provider / summarize ol ul skip ; report. Using our example, we will create the new
break after state / summarize ol ul skip ; variable PERCTALL that is the percentage of services
for each CPTCODE for all PROVIDERs.
To add a total summary line for the report, the term
RBREAK AFTER is used with no variable specified. The First, we call the procedure, assign the report variables,
term SUMMARIZE and the same options for BREAK and define the report variables:
AFTER can apply. For example:
proc report data=dataset nowd headline headskip
rbreak after / summarize dol dul ; split=”~” spacing=2 ls=152 ps=53 ;
column provider n payamt perctall ;
Two Useful COMPUTE Statements define provider / group format=$hosp. ‘Hospital’
width=15 ;
Percent within a Group define n / format=comma6. ‘Admits’ width=8 ;
define payamt / analysis sum format=dollar8.
One can compute a new column variable that is the ‘Paid~Amount’ width=10 ;
percentage of observations for a variable within a group. define perctall / computed format=percent8.1 ‘%
The following creates a new column variable PERCTN as of~Admits” ;
the percentage of services for each PROVIDER.
Second, we need to calculate the total number of
First, we call the procedure, assign the report variables, observations and retain the result as TOTALL.
and define the report variables:
compute before ;
proc report data=dataset nowd headline headskip totall = n ;
split=”~” spacing=2 ls=152 ps=53 ; endcomp;
Then, compute PERCTALL as the number of numbers are then turned into data
observations for each CPTCODE (N) and divides by the values according to created formats. */
total number of observations (TOTN).
data claims (drop= i);
compute perctall ; do i=1 to &numclms ;
perctall = n/totall ; hosp=put((ceil((&maxhosp-
endcomp ; &minhosp+1)*ranuni(0)+ &minhosp-
1)),hosp.);
cptcode=put((ceil((&maxcpt-&mincpt
Creating a Medical Utilization Report +1)*rantri(0,.001) + &mincpt-
1)),cpt.);
With using the information provided above, one can los=(ceil((&maxlos-&minlos
construct the medical utilization report found in Example +1)*rantri(0,.0001) + &minlos-1));
1. First, one needs to create a test dataset of claims. payamt=los*ranuni(0)*&payrate;
output;
Generic Data Builder end;
format payamt 15.2;
The following program will create a generic dataset of run;
claims data consisting of PROVIDER, CPTCODE
(procedure code), LOS (length of stay), and PAYAMT. *Note: The RANTRI function gives a random number
This method can be used to create any kind of dataset, using triangular distribution. I used this function to
using PROC FORMAT to convert random numbers into create “real” test data by skewing the data distribution to
data values. provide lesser length of stays and more RMBRDs
(regular room and board) and RMOBSs (obstetrics room
/* Establishing maximum and minimum values */ and board).

%let minhosp=1; *Minimum hospital Now that we have our dataset, we can use what we have
range; learned and create the following report:
%let maxhosp=4; *Maximum hospital
range; **Call Procedure**;
%let mincpt=1; *Minimum CPT-code range; proc report data=claims nowd headline
%let maxcpt=4; *Maximum CPT-code range; headskip split='~' spacing=2 ls=142
%let payrate=1200; *Pay rate amount; ps=53;
%let minlos=1; *Minimum length of stay;
%let maxlos=10; *Maximum length of **Assign Report Variables**;
stay; column cptcode hosp n payamt los
%let numclms=3000; *Number of claims; los=alos vlos los=mlos apay
payamt=mpay pcta;
/* Creating formats for data values */
**Define Report Variables**;
proc format ; define cptcode / group "Cpt~Code"
value hosp 1='HOSP1' width=5;
2='HOSP2' define hosp / group "Hospital"
3='HOSP3' width=10;
4='HOSP4'; define n / format=comma6. "Admits"
value cpt width=6;
1='RMBRD' define payamt / analysis sum
2='RMOBS' format=dollar12. "Paid~Amount"
3='RMPED' width=11;
4='RMPSY'; define los / analysis sum
run; format=comma6. "Total~LOS" width=6;
define alos / analysis mean
/* Numbers are selected at random with format=comma6.2 "ALOS" width=6 ;
set minimum and maximum ranges. These
define vlos / computed format=comma6.2 PROC REPORT Option Tables
"Diff from~ALOS" width=9;
define mlos / analysis max Table A - Options when Calling PROC REPORT
format=comma6. "Max~LOS" width=6;
define apay / computed format=dollar8. Usage Purpose
"Avg Paid~per Day" width=8; HEADLINE Places a line under the
define mpay / analysis mean column names
format=dollar8. "Avg Paid~Per Admit" HEADSKIP Skips a line between the
width=10; column names and the first
define pcta / computed row
format=percent8.1 "% of~Total SPLIT=’’ Wraps the column name
Admits"; after the specified
character.
**Define Computed Variables**; NOWD No windows option
compute before; LS= Spacing for line-size.
admits=n; PS= Spacing for page-size.
endcomp; SPACING= Specifies spacing between
columns
compute before cptcode; NOHEADER Does not print column
proclos=los.sum/n; names
endcomp; MISSING Do not drop missing values
in calculations and
compute vlos; groupings.
vlos=_c6_-proclos; OUTREPT= Stores report definition in
endcomp; a catalog entry
REPORT= Calls specific stored
compute apay; report definition.
apay=payamt.sum/los.sum; OUT= Places output in specified
endcomp; dataset.
COLWIDTH= Specifies default column
compute pcta; width for numeric and
pcta=n/admits; computed variables.
endcomp; LIST Writes report definition to
the log.
**Add Summary Lines**; NAMED Writes “name=” before
break after cptcode /summarize dol dul value.
suppress skip; WRAP Displays one value of each
column, going to a
run; consecutive lines before
the printing the next row
Your output will be similar to Example 1. Your numbers value.
will be different due to the creation of the random test PANELS= Specifies number of panels
data. on a page.
PSPACE= Space between panels.
VARDEF= Specifies the divisor to use
when calculating variances
(n, df, wdf, weight)
SHOWALL Overrides NOPRINT and
NOZERO define options.
BOX Puts boxes around all
columns and rows
Table B – Types for Defining Variables Table D – Additional Options When Defining a Column

Usage Purpose Option Purpose


GROUP Summarizes rows based on CENTER Centers values within
group variable values (class column.
variable). DESCENDING Orders rows in
ORDER Orders rows by variable. descending order
ANALYSIS Perform a statistical FLOW Wraps the value of a
function. character variable within
ACROSS Tabular reports with variable the column
values as column headers. RIGHT Right justifies values
DISPLAY List values as they appear in within a column.
the dataset. LEFT Left justifies values
COMPUTED Create calculated variables within a column.
not on data set. NOPRINT Suppresses printing of the
variable
NOZERO Suppresses printing of a
column if all values are 0
Table C – Types of Statistical Functions or missing.
ORDER= Controls ordering of rows
Statistic Definition (Data, Formatted, Freq,
N Count non-missing Internal).
observations SPACING= Spacing between column.
NMISS Count of missing PAGE Places column and all
observations columns to the left on a
MEAN Mean separate page.
STD Standard deviation ID Prints column on
MIN Minimum subsequent pages.
MAX Maximum
RANGE Range (Maximum-
Minimum)
SUM Sum Table E – Options to Use with BREAK or RBREAK
USS Uncorrected sum of
squares Option Purpose
CSS Sum of squares OL Overline
corrected for the mean UL Underline
STDERR Standard error of mean DOL Double Overline
CV Percent coefficient of DUL Double Underline
variation SUMMARIZE Summarizes the column.
T Student’s t-value when SKIP Skip a line before the next
population mean=0 row
PRT Two tailed p-value for PAGE Skip a page before the next
Student’s t-value row
VAR Variance SUPPRESS Does not write the name of
SUMWGT Sum of weights the summarizing variable
CONCLUSION

PROC REPORT has a lot to offer, and I consider it to be


one of the most powerful procedures in SAS. Before
writing that next report using PROC TABULATE or
DATA _NULL_, try PROC REPORT. There are books
with excellent examples of how to produce most any kind
of report. Please see my references for more
information on this procedure.

REFERENCES

SAS Institute Inc., SAS  Guide to Report Writing:


Examples, Version 6, First Edition, Cary, NC: SAS
Institute Inc., 1994 220 pp.

SAS Institute Inc., SAS  Technical Report P-258,Using


the REPORT Procedure in a Nonwindowing
Environment, Release 6.07, Cary, NC: SAS Institute
Inc., 1993 276 pp.

SAS Institute Inc., SAS  Guide to the Report Procedure:


Reference, Release 6.11, Cary, NC: SAS Institute Inc.,
1995 123 pp.

ACKNOWLEDGEMENTS

I would like to thank John Adams for his research into


solving this problem.

SAS is a registered trademark of SAS Institute Inc., Cary,


NC.

CONTACT INFORMATION

The author welcomes comments, suggestions, and


questions by phone 203-798-4239, or by e-mail
[email protected].
Example 1
The SAS System

% OF
CPT PAID TOTAL DIFF FROM MAX AVG PAID AVG PAID TOTAL
CODE HOSPITAL ADMITS AMOUNT LOS ALOS ALOS LOS PER DAY PER ADMIT ADMITS
---------------------------------------------------------------------------------------------------------

RMBRD HOSP1 287 $596,842 1,010 3.52 -0.25 9 $591 $2,080 10.0%
HOSP2 248 $590,667 951 3.83 0.07 10 $621 $2,382 8.7%
HOSP3 233 $510,718 932 4.00 0.23 10 $548 $2,192 8.1%
HOSP4 252 $567,657 952 3.78 0.01 10 $596 $2,253 8.8%
====== =========== ====== ====== ========= ====== ======== ========== ========
1,020 $2,265,882 3,845 3.77 0.00 10 $589 $2,221 35.6%
====== =========== ====== ====== ========= ====== ======== ========== ========

RMOBS HOSP1 224 $531,765 921 4.11 0.28 10 $577 $2,374 7.8%
HOSP2 174 $403,207 641 3.68 -0.15 10 $629 $2,317 6.1%
HOSP3 216 $552,363 818 3.79 -0.05 10 $675 $2,557 7.5%
HOSP4 194 $438,121 718 3.70 -0.13 10 $610 $2,258 6.8%
====== =========== ====== ====== ========= ====== ======== ========== ========
808 $1,925,456 3,098 3.83 0.00 10 $622 $2,383 28.2%
====== =========== ====== ====== ========= ====== ======== ========== ========

RMPED HOSP1 152 $310,656 532 3.50 -0.26 10 $584 $2,044 5.3%
HOSP2 150 $339,310 552 3.68 -0.08 9 $615 $2,262 5.2%
HOSP3 174 $415,001 669 3.84 0.08 10 $620 $2,385 6.1%
HOSP4 150 $367,301 601 4.01 0.25 10 $611 $2,449 5.2%
====== =========== ====== ====== ========= ====== ======== ========== ========
626 $1,432,269 2,354 3.76 0.00 10 $608 $2,288 21.9%
====== =========== ====== ====== ========= ====== ======== ========== ========

RMPSY HOSP1 123 $285,258 469 3.81 -0.09 10 $608 $2,319 4.3%
HOSP2 86 $213,784 362 4.21 0.31 10 $591 $2,486 3.0%
HOSP3 108 $265,003 406 3.76 -0.14 9 $653 $2,454 3.8%
HOSP4 94 $237,527 366 3.89 -0.01 10 $649 $2,527 3.3%
====== =========== ====== ====== ========= ====== ======== ========== ========
411 $1,001,573 1,603 3.90 0.00 10 $625 $2,437 14.3%
====== =========== ====== ====== ========= ====== ======== ========== ========

You might also like