SAS Tips
SAS Tips
https://www.tutorialspoint.com/sas/sas_overview.htm
SAS is platform independent which means you can run SAS on any
operating system either Linux or Windows. SAS is driven by SAS
programmers who use several sequences of operations on the SAS
datasets to make proper reports for data analysis.
Over the years SAS has added numerous solutions to its product portfolio.
It has solution for Data Governance, Data Quality, Big Data Analytics, Text
Mining, Fraud management, Health science etc. We can safely assume
SAS has a solution for every business domain.
SAS is basically worked on large datasets. With the help of SAS software
• Data Management
• Statistical Analysis
• Business Planning
• Quality Improvement
• Application Development
• Data extraction
• Data transformation
2 Permanent Library
we want them.
SAS on WINDOWS -
SAS FOLDERS,SAS PROGRAMS,LOG,RESULTS (errors, warning, notes)
SAS Code Autocomplete
This is a very powerful feature which helps getting the correct syntThe
Every SAS program must have all these steps to complete reading the
input data, analysing the data and giving the output of the analysis. Also
the RUN statement at the end of each step is required to complete the
keyword.
==============
The SAS Programming involves first creating/reading the data sets into
the memory and then doing the analysis on this data. We need to
understand the flow in which a program is written to achieve this.
The below diagram shows the steps to be written in the given sequence to
Every SAS program must have all these steps to complete reading the
input data, analysing the data and giving the output of the analysis. Also
the RUN statement at the end of each step is required to complete the
PROC Step
yntax
PROC procedure_name options; #The name of the proc.
RUN;
The data from the data sets can be displayed with conditional output
statements.
Syntax
PROC PRINT DATA = data_set;
OPTIONS;
RUN;
PROC PRINT DATA = TEMP;
WHERE SALARY > 700;
RUN;
The three components of any SAS program - Statements, Variables and
Data sets follow the below rules on Syntax.
SAS Statements
• Many SAS statements can be on the same line, with each statement
Variables in SAS represent a column in the SAS data set. The variable
underscore (_).
Example
# Valid Variable Names
REVENUE_YEAR
MaxVal
_Length
The DATA statement marks the creation of a new SAS data set. The rules
• A single word after the DATA statement indicates a temporary data set
name. Which means the data set gets erased at the end of the session.
• The data set name can be prefixed with a library name which makes it a
permanent data set. Which means the data set persists after the session
is over.
• If the SAS data set name is omitted then SAS creates a temporary data set
The SAS programs, data files and the results of the programs are saved
• *.sas − It represents the SAS code file which can be edited using the SAS
errors, warnings, and data set details for a submitted SAS program.
• *.sas7bdat −It represents SAS Data File which contains a SAS data set
Comments in SAS
* This is comment ;
/* This is first line of the comment
* This is second line of the comment */
variables
INPUT ID SALARY COMM_PERCENT;
INPUT VAR1 $ VAR2 $ VAR3 $;
INPUT VAR1 DATE11. VAR2 MMDDYY10. ;
STRING VARIABLES
data string_examples;
LENGTH string1 $ 6 String2 $ 5;
/*String variables of length 6 and 5 */
String1 = 'Hello';
String2 = 'World';
Joined_strings = String1 ||String2 ;
run;
proc print data = string_examples noobs;
run;
————
SUBSTRN
This function extracts a substring using the start and end positions. In
case of no end position is mentioned it extracts all the characters till end
of the string.
Syntax
SUBSTRN('stringval',p1,p2)
• ————
TRIMN
Arrays in SAS are used to store and retrieve a series of values using an
index value. The index represents the location in a reserved memory area.
Syntax
• ARRAY-NAME is the name of the array which follows the same rule as
variable names.
• ARRAY-VALUES are the actual values that are stored in the array. They
• —————
DATA array_example;
INPUT a1 $ a2 $ a3 $ a4 $ a5 $;
ARRAY colours(5) $ a1-a5;
mix = a1||'+'||a2;
DATALINES;
yello pink orange green blue
;
RUN;
PROC PRINT DATA = array_example;
RUN;
————
DATA array_example_OF;
INPUT A1 A2 A3 A4;
ARRAY A(4) A1-A4;
A_SUM = SUM(OF A(*));
A_MEAN = MEAN(OF A(*));
A_MIN = MIN(OF A(*));
DATALINES;
21 4 52 11
96 25 42 6
;
RUN;
PROC PRINT DATA = array_example_OF;
RUN;
—————
DATA array_in_example;
INPUT A1 $ A2 $ A3 $ A4 $;
ARRAY COLOURS(4) A1-A4;
IF 'yellow' IN COLOURS THEN available = 'Yes';ELSE available = 'No';
DATALINES;
Orange pink violet yellow
;
RUN;
PROC PRINT DATA = array_in_example;
RUN;
====================
DATA MYDATA1;
input x 6.; /*maxiiuum width of the data*/
format x 6.3;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA1;
RUN;
RESULT= 8722 .0, 93.200, 0.112, 15.116
———————
DATA MYDATA2;
input x 6.; /*maximum width of the data*/
format x 5.2;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA2;
RUN;
RESULT: 8722, 93.20,0.11,15.12
===============
DATA MYDATA3;
input x 6.; /*maximum width of the data*/
format x DOLLAR10.2;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA3;
RUN;
RESULT : $8,722.00 , $93.20, $0.11,$15.12
===============
ARITHMATIC
DATA MYDATA1;
input @1 COL1 4.2 @7 COL2 3.1;
Add_result = COL1+COL2;
Sub_result = COL1-COL2;
Mult_result = COL1*COL2;
Div_result = COL1/COL2;
Expo_result = COL1**COL2;
datalines;
11.21 5.3
3.11 11
;
PROC PRINT DATA = MYDATA1;
RUN;
——
RESULT 16.51, 6.1
===========
LOGICAL OPERATOR
DATA MYDATA1;
input @1 COL1 5.2 @7 COL2 4.1;
and_=(COL1 > 10 & COL2 > 5 );
or_ = (COL1 > 12 | COL2 > 15 );
not_ = ~( COL2 > 7 );
datalines;
11.21 5.3
3.11 11.4
;
PROC PRINT DATA = MYDATA1;
RUN;
============
DATA MYDATA1;
input @1 COL1 5.2 @7 COL2 4.1;
EQ_ = (COL1 = 11.21);
NEQ_= (COL1 ^= 11.21);
GT_ = (COL2 => 8);
LT_ = (COL2 <= 12);
IN_ = COL2 in( 6.2,5.3,12 );
datalines;
11.21 5.3
3.11 11.4
;
PROC PRINT DATA = MYDATA1;
RUN;
==========
Operators Precedence
=========
==========
SAS - Macros
They are called global macro variables because they can accessed by any
SAS program available in the SAS environment. In general they are the
general example is the system date.It should be noted that the macro
processor.
Here the Value field can take any numeric, text or date value as required
by the program. The Macro variable name is any valid SAS variable.
%LET make_name = 'Audi';
%LET type_name = 'Sports';
proc print data = sashelp.cars;
where make = &make_name and type = &type_name ;
TITLE "Sales as of &SYSDAY &SYSDATE";
run;
==========
Macro Programs
Format:
# Creating a Macro program.
%MACRO <macro name>(Param1, Param2,….Paramn);
Macro Statements;
%MEND;
Macro %PUT
the SAS log. In the below example the value of the variable 'today' is
===
Macro %RETURN
the below examplewhen the value of the variable "val" becomes 10, the
%mend check_condition;
Macro %END
named test takes a user input and runs the DO loop using this
input value. The end of DO loop is achieved through the %end statement
========
DATE TIME FORMAT
A smaller width will give incorrect result. with SAS V9, there is a generic
03/11/2014 10 mmddyy10.
03/11/14 8 mmddyy8.
December 11, 2012 20 worddate20.
14mar2011 9 date9.
14-mar-2011 11 date11.
14-mar-2011 15 anydtdte15. (a
generic date
format anydtdte15.
which can process
any date input)
DATA TEMP;
INPUT @1 DOJ1 mmddyy10. @12 DOJ2 mmddyy10.;
format DOJ1 date11. DOJ2 worddate20. ;
DATALINES;
01/12/2012 02/11/1998
;
PROC PRINT DATA = TEMP;
RUN;
========
SAS can read data from various sources which includes many file formats.
• Delimited Data
• Excel Data
• Hierarchical Data
========
Example
In the below example we read the data file named emp_data.txt from
Example
In the below example we read the data file named emp.csv from the local
environment.
data TEMP;
infile
'/folders/myfolders/sasuser.v94/TutorialsPoint/emp.csv' dlm=",";
input empID empName $ Salary Dept $ DOJ date9. ;
format DOJ date9.;
run;
PROC PRINT DATA = TEMP;
RUN;
======
SAS can directly read an excel file using the import facility. As seen in the
chapter SAS data sets, it can handle a wide variety of file types including
environment.
Example
FILENAME REFFILE
"/folders/myfolders/TutorialsPoint/emp.xls"
TERMSTR = CR;
PROC EXPORT - inbuilt EXPORT function to out the data set files in a
variety of formats
SAS can write datasets in different formats. It can write data from SAS
files to normal text file.These files can be read by other software
programs. SAS uses PROC EXPORT to write data sets.
We will use the SAS data set named cars available in the SASHELP library.
We export it as a space delimited text file with the code as shown in the
following program.
proc export data = sashelp.cars
outfile = '/folders/myfolders/sasuser.v94/TutorialsPoint/car_data.txt'
dbms = dlm;
delimiter = ' ';
run;
======
In order to write a comma delimited file we can use the dlm option with a
In order to write a tab delimited file we can use the dlm option with a
=====
======
Previous Page
Next Page
Multiple SAS data sets can be concatenated to give a single data set using
data set is the sum of the number of observations in the original data
and so on.
Ideally all the combining data sets have same variables, but in case they
have different number of variables, then in the result all the variables
Syntax
Example
different data sets, one for the IT department and another for Non-It
concatenate both the data sets using the SET statement shown as below.
DATA ITDEPT;
INPUT empid name $ salary ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid name $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT NON_ITDEPT;
RUN;
PROC PRINT DATA = All_Dept;
RUN;
Scenarios
When we have many variations in the data sets for concatenation, the
result of variables can differ but the total number of observations in the
concatenated data set is always the sum of the observations in each data
If one of the original data set has more number of variables then another,
then the data sets still get combined but in the smaller data set those
Example
In below example the first data set has an extra variable named DOJ. In
the result the value of DOJ for second data set will appear as missing.
DATA ITDEPT;
INPUT empid name $ salary DOJ date9. ;
DATALINES;
1 Rick 623.3 02APR2001
3 Mike 611.5 21OCT2000
6 Tusar 578.6 01MAR2009
;
RUN;
DATA NON_ITDEPT;
INPUT empid name $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT NON_ITDEPT;
RUN;
PROC PRINT DATA = All_Dept;
RUN;
In this scenario the data sets have same number of variables but a
will produce all the variables in the result set and giving missing results
for the two variables which differ. While we may not change the variable
name in the original data sets we can apply the RENAME function in the
concatenated data set we create. That will produce the same result as a
normal concatenation but of course with one new variable name in place
Example
name ename whereas the data set NON_ITDEPT has the variable
shown below.
DATA ITDEPT;
INPUT empid ename $ salary ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid empname $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT(RENAME =(ename = Employee) ) NON_ITDEPT(RENAME
=(empname = Employee) );
RUN;
PROC PRINT DATA = All_Dept;
RUN;
If the variable lengths in the two data sets is different than the
concatenated data set will have values in which some data is truncated
for the variable with smaller length. It happens if the first data set has a
smaller length. To solve this we apply the higher length to both the data
Example
In the below example the variable ename is of length 5 in the first data
The total number of observations in the merged data set is often less than
because the variables form both data sets get merged as one record
There are two Prerequisites for merging data sets given below −
• input data sets must have at least one common variable to merge on.
• input data sets must be sorted by the common variable(s) that will be
Data Merging
# Data set 1
ID NAME SALARY
1 Rick 623.3
2 Dan 515.2
3 Mike 611.5
4 Ryan 729.1
5 Gary 843.25
6 Tusar 578.6
7 Pranab 632.8
8 Rasmi 722.5
# Data set 2
ID DEPT
1 IT
2 OPS
3 IT
4 HR
5 FIN
6 IT
7 OPS
8 FIN
The above result is achieved by using the following code in which the
common variable (ID) is used in the BY statement. Please note that the
There may be cases when some values of the common variable will not
match between the data sets. In such cases the data sets still get merged
Example
Consider the case of employee ID 3 missing from the dataset salary and
employee ID 6 missing form data set DEPT. When the above code is
applied, we get the below result.
To avoid the missing values in the result we can consider keeping only the
Example
In the below example, the IN= value keeps only the observations where
the values from both the data sets SALARY and DEPT match.
DATA All_details;
MERGE SALARY(IN = a) DEPT(IN = b);
BY (empid);
IF a = 1 and b = 1; (If salary DEPT is not null, include and get the records
in the result)
RUN;
PROC PRINT DATA = All_details;
RUN;
Upon execution of the above SAS program with the above changed part,
Subsetting a SAS data set means extracting a part of the data set by
selecting a fewer number of variables or fewer number of observations or
both. While subsetting of variables is done by
using KEEP and DROP statement, the sub setting of observations is done
using DELETE statement.
Also the resulting data from the subsetting operation is held in a new data
set which can be used for further analysis. Sub setting is mainly used for
the purpose of analyzing a part of the data set without using those
variables or observations which may not be relevant to the analysis.
Example
Consider the below SAS data set containing the employee details of an
Department values from the data set, then we can use the below code.
DATA Employee;
INPUT empid ename $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
KEEP ename DEPT; or DROP empid salary
RUN;
PROC PRINT DATA = OnlyDept;
RUN;
======
======
=======
SAS - SQL
==========
SAS - ODS
ODS HTML
PATH = '/folders/myfolders/sasuser.v94/TutorialsPoint/'
FILE = 'CARS2.html'
STYLE = EGDefault;
proc SQL;
select make, model, invoice
from sashelp.cars
where make in ('Audi','BMW')
and type = 'Sports'
;
quit;
proc SQL;
select make,mean(horsepower)as meanhp
from sashelp.cars
where make in ('Audi','BMW')
group by make;
quit;
ODS HTML CLOSE;
https://www.tutorialspoint.com/sas/sas_output_delivery_system.htm
========
called SAS Simulation Studio. Its graphical user interface provides a full
set of tools for building, executing, and analyzing the results of discrete
Histograms
A Histogram is
graphical display of data using bars of different heights.
It groups the various numbers in the data set into many ranges.
It also represents the estimation of the probability of distribution of
a continuous variable.
In SAS the PROC UNIVARIATE is used to create histograms with the
below options.
Example
=========
A stacked bar chart is a bar chart in which a variable from the dataset is
Example
The below script will create a stacked bar-chart where the length of the
cars are calculated for each car type. We use the group option to specify
The clustered bar chart is created to show how the values of a variable
The clustered bar chart is created to show how the values of a variable
Example
The below script will create a clustered bar-chart where the length of the
cars is clustered around the car type.So we see two adjacent bars at
length 191, one for the car type 'Sedan' and another for the car type
'Wagon'.
======
Pie Charts
A pie-chart is a representation of values as slices of a circle with
different colors. The slices are labeled and the numbers
corresponding to each slice is also represented in the chart.
Labels mentioned outside the circle.
In SAS the pie chart is created using PROC TEMPLATE
Grouped Pie Char - Two full circles(car maker name) one in another shows
Scatter Plots
A scatterplot is a type of graph which uses values from two variables
plotted in a Cartesian plane. It is usually used to find out the relationship
between two variables. In SAS we use PROC SGSCATTER to create
scatterplots.
======
Box Plots
a simple Boxplot is created using PROC SGPLOT and paneled boxplot is
created using PROC SGPANEL.
=========
Arithmetic Mean
In the below example we find the mean of all the numeric variables in the
SAS dataset named CARS. We specify the maximum digits after decimal
Standard Deviation
PROC MEANS
To measure the SD using proc means we choose the STD option in the
PROC step. It brings out the SD values for each numeric variable present
Syntax
data _null_;
text = "I love SAS Programming";
result = scan(text,2);
put result=;
run;
RESULT: LOVE
———
Since we wish to find the second last word in the string, we have
mentioned -2 in the second argument of the SCAN function.
data _null_;
text = "I love SAS Programming";
result = scan(text,-2);
put result=;
run;
RESULT:
As shown in the image below, the SAS Program returns "SAS" as the
second-to-last word
Extract the second last word
There are two ways to scan from right to left in the SCAN function.
—————
SCAN : Convert a String into Multiple Observations
Suppose you have a string that consists of multiple substrings delimited
by commas, and you wish to transform it into multiple observations
(rows).
data readin;
input text $30.;
datalines;
live, love, laugh, repeat
;
run;
data readin2(keep=word);
set readin;
do i = 1 to countw(text, ',');
word = scan(text, i, ',');
output;
end;
proc print;
run;
Explanation
1. A DO loop is initiated with the variable 'i' iterating from 1 to the number
of words in the 'text' variable, separated by commas. This is done using
the 'COUNTW (‘count words In the STRING not characters) function.
2. Within the loop, the SCAN function is used to extract each word from the
'text' variable based on the current value of 'i' and the comma delimiter.
The extracted word is then assigned to the 'word' variable.
======
CALL SYMPUTX does not generate a note in the SAS log when the
second argument is numeric. Whereas, CALL SYMPUT produces a log note
stating the conversion of numeric values to character values. CALL
SYMPUTX removes both leading and trailing blanks
variable being created or assigned a value and the second argument will
TIPS;
———