0% found this document useful (0 votes)
50 views34 pages

SAS Tips

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views34 pages

SAS Tips

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 34

SAS SAS stands for Statistical Analysis Software.

https://www.tutorialspoint.com/sas/sas_overview.htm

SAS takes an extensive programming approach to data transformation


and analysis rather than a pure drag drop and connect approach.
That makes it stand out from the crowd as it gives much finer control over
data manipulation.
SAS has a very large number of components customized for specific
industries and data analysis tasks.

SAS is platform independent which means you can run SAS on any
operating system either Linux or Windows. SAS is driven by SAS
programmers who use several sequences of operations on the SAS
datasets to make proper reports for data analysis.

Over the years SAS has added numerous solutions to its product portfolio.
It has solution for Data Governance, Data Quality, Big Data Analytics, Text
Mining, Fraud management, Health science etc. We can safely assume
SAS has a solution for every business domain.

SAS is basically worked on large datasets. With the help of SAS software

you can perform various operations on the data like −

• Data Management

• Statistical Analysis

• Report formation with perfect graphics

• Business Planning

• Operations Research and project Management

• Quality Improvement

• Application Development

• Data extraction

• Data transformation

• Data updation and modification

There are two types of libraries are available in SAS −

Sr.No. SAS Window & their Usage

1 Temporary or Work Library


This is the by default library of

SAS. All the programs that we

create are stored in this work

library if we do not assign any

other library to them. You can

check this work library in the

Explore Window. If you create a

SAS program and have not

assign any permanent library to

it then if you end the session

after that again you start the

software then this program will

not be in the work library.

Because it will only be there in

Work library as long as the

session goes ones.

2 Permanent Library

These are the permanent

libraries of SAS. We can create a

new SAS library by using SAS

utilities or by writing the codes

in the editor window. These

libraries are named as

permanent because if we create

a program in SAS and save it in


these permanent libraries then

these will be available as long as

we want them.

SAS on WINDOWS -
SAS FOLDERS,SAS PROGRAMS,LOG,RESULTS (errors, warning, notes)
SAS Code Autocomplete

This is a very powerful feature which helps getting the correct syntThe

below diagram shows the steps to be written in the given sequence to

create a SAS Program.

Every SAS program must have all these steps to complete reading the

input data, analysing the data and giving the output of the analysis. Also

the RUN statement at the end of each step is required to complete the

execution of that step.

ax of SAS keywords as well as provides link to the documentation for that

keyword.

==============
The SAS Programming involves first creating/reading the data sets into
the memory and then doing the analysis on this data. We need to
understand the flow in which a program is written to achieve this.

The below diagram shows the steps to be written in the given sequence to

create a SAS Program.

Every SAS program must have all these steps to complete reading the

input data, analysing the data and giving the output of the analysis. Also

the RUN statement at the end of each step is required to complete the

execution of that step.


Syntax
DATA data_set_name; #Name the data set.
INPUT var1,var2,var3; #Define the variables in this data set.
NEW_VAR; #Create new variables.
LABEL; #Assign labels to variables.
DATALINES; #Enter the data.
RUN;

PROC Step

yntax
PROC procedure_name options; #The name of the proc.
RUN;

The OUTPUT Step

The data from the data sets can be displayed with conditional output

statements.

Syntax
PROC PRINT DATA = data_set;
OPTIONS;
RUN;
PROC PRINT DATA = TEMP;
WHERE SALARY > 700;
RUN;
The three components of any SAS program - Statements, Variables and
Data sets follow the below rules on Syntax.

The three components of any SAS program - Statements, Variables and

Data sets follow the below rules on Syntax.

SAS Statements

• Statements can start anywhere and end anywhere. A semicolon at the

end of the last line marks the end of the statement.

• Many SAS statements can be on the same line, with each statement

ending with a semicolon.

• Space can be used to separate the components in a SAS program


statement.

• SAS keywords are not case sensitive.

• Every SAS program must end with a RUN statement.

SAS Variable Names

Variables in SAS represent a column in the SAS data set. The variable

names follow the below rules.

• It can be maximum 32 characters long.

• It can not include blanks.

• It must start with the letters A through Z (not case sensitive) or an

underscore (_).

• Can include numbers but not as the first character.

• Variable names are case insensitive.

Example
# Valid Variable Names
REVENUE_YEAR
MaxVal
_Length

# Invalid variable Names


Miles Per Liter #contains Space.
RainfFall% # contains apecial character other than underscore.
90_high # Starts with a number.

SAS Data Set

The DATA statement marks the creation of a new SAS data set. The rules

for DATA set creation are as below.

• A single word after the DATA statement indicates a temporary data set

name. Which means the data set gets erased at the end of the session.

• The data set name can be prefixed with a library name which makes it a

permanent data set. Which means the data set persists after the session

is over.
• If the SAS data set name is omitted then SAS creates a temporary data set

with a name generated by SAS like - DATA1, DATA2 etc.

SAS File Extensions

The SAS programs, data files and the results of the programs are saved

with various extensions in windows.

• *.sas − It represents the SAS code file which can be edited using the SAS

Editor or any text editor.

• *.log − It represents the SAS Log File it contains information such as

errors, warnings, and data set details for a submitted SAS program.

• *.mht / *.html −It represents the SAS Results file.

• *.sas7bdat −It represents SAS Data File which contains a SAS data set

including variable names, labels, and the results of calculations.

Comments in SAS
* This is comment ;
/* This is first line of the comment
* This is second line of the comment */
variables
INPUT ID SALARY COMM_PERCENT;
INPUT VAR1 $ VAR2 $ VAR3 $;
INPUT VAR1 DATE11. VAR2 MMDDYY10. ;
STRING VARIABLES
data string_examples;
LENGTH string1 $ 6 String2 $ 5;
/*String variables of length 6 and 5 */
String1 = 'Hello';
String2 = 'World';
Joined_strings = String1 ||String2 ;
run;
proc print data = string_examples noobs;
run;

————

SUBSTRN

This function extracts a substring using the start and end positions. In

case of no end position is mentioned it extracts all the characters till end
of the string.

Syntax
SUBSTRN('stringval',p1,p2)

Following is the description of the parameters used −

• stringval is the value of the string variable.

• p1 is the start position of extraction.

• p2 is the final position of extraction.

• ————

TRIMN

This function removes the trailing space form a string.


data string_examples;
LENGTH string1 $ 7 ;
String1='Hello ';
length_string1 = lengthc(String1);
length_trimmed_string = lengthc(TRIMN(String1));
run;
proc print data = string_examples noobs;
run;
—————

Arrays in SAS are used to store and retrieve a series of values using an

index value. The index represents the location in a reserved memory area.

Syntax

In SAS an array is declared by using the following syntax −


ARRAY ARRAY-NAME(SUBSCRIPT) ($) VARIABLE-LIST ARRAY-VALUES

In the above syntax −

• ARRAY is the SAS keyword to declare an array.

• ARRAY-NAME is the name of the array which follows the same rule as

variable names.

• SUBSCRIPT is the number of values the array is going to store.

• ($) is an optional parameter to be used only if the array is going to store


character values.

• VARIABLE-LIST is the optional list of variables which are the place

holders for array values.

• ARRAY-VALUES are the actual values that are stored in the array. They

can be declared here or can be read from a file or dataline.

• —————
DATA array_example;
INPUT a1 $ a2 $ a3 $ a4 $ a5 $;
ARRAY colours(5) $ a1-a5;
mix = a1||'+'||a2;
DATALINES;
yello pink orange green blue
;
RUN;
PROC PRINT DATA = array_example;
RUN;
————
DATA array_example_OF;
INPUT A1 A2 A3 A4;
ARRAY A(4) A1-A4;
A_SUM = SUM(OF A(*));
A_MEAN = MEAN(OF A(*));
A_MIN = MIN(OF A(*));
DATALINES;
21 4 52 11
96 25 42 6
;
RUN;
PROC PRINT DATA = array_example_OF;
RUN;
—————
DATA array_in_example;
INPUT A1 $ A2 $ A3 $ A4 $;
ARRAY COLOURS(4) A1-A4;
IF 'yellow' IN COLOURS THEN available = 'Yes';ELSE available = 'No';
DATALINES;
Orange pink violet yellow
;
RUN;
PROC PRINT DATA = array_in_example;
RUN;
====================
DATA MYDATA1;
input x 6.; /*maxiiuum width of the data*/
format x 6.3;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA1;
RUN;
RESULT= 8722 .0, 93.200, 0.112, 15.116
———————
DATA MYDATA2;
input x 6.; /*maximum width of the data*/
format x 5.2;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA2;
RUN;
RESULT: 8722, 93.20,0.11,15.12
===============
DATA MYDATA3;
input x 6.; /*maximum width of the data*/
format x DOLLAR10.2;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA3;
RUN;
RESULT : $8,722.00 , $93.20, $0.11,$15.12
===============
ARITHMATIC
DATA MYDATA1;
input @1 COL1 4.2 @7 COL2 3.1;
Add_result = COL1+COL2;
Sub_result = COL1-COL2;
Mult_result = COL1*COL2;
Div_result = COL1/COL2;
Expo_result = COL1**COL2;
datalines;
11.21 5.3
3.11 11
;
PROC PRINT DATA = MYDATA1;
RUN;
——
RESULT 16.51, 6.1
===========
LOGICAL OPERATOR
DATA MYDATA1;
input @1 COL1 5.2 @7 COL2 4.1;
and_=(COL1 > 10 & COL2 > 5 );
or_ = (COL1 > 12 | COL2 > 15 );
not_ = ~( COL2 > 7 );
datalines;
11.21 5.3
3.11 11.4
;
PROC PRINT DATA = MYDATA1;
RUN;
============
DATA MYDATA1;
input @1 COL1 5.2 @7 COL2 4.1;
EQ_ = (COL1 = 11.21);
NEQ_= (COL1 ^= 11.21);
GT_ = (COL2 => 8);
LT_ = (COL2 <= 12);
IN_ = COL2 in( 6.2,5.3,12 );
datalines;
11.21 5.3
3.11 11.4
;
PROC PRINT DATA = MYDATA1;
RUN;
==========

Operators Precedence

The operator precedence indicates the order of evaluation of the multiple

operators present in complex expression. The below table describes the

order of precedence with in a group of operators.

Group Order Symbols

Group I Right to Left ** + - NOT MIN MAX

Group II Left to Right */

Group III Left to Right +-


Group IV Left to Right ||

Group V Left to Right < <= = >= >

=========
==========

SAS - Macros

SAS has a powerful programming feature called Macros which allows us


to avoid repetitive sections of code and to use them again and again when
needed.
They are declared at the beginning of a SAS program and called out later
in the body of the program. They can be Global or Local in scope.Global
Macro variable

They are called global macro variables because they can accessed by any

SAS program available in the SAS environment. In general they are the

system assigned variables which are accessed by multiple programs. A

general example is the system date.It should be noted that the macro

processor is the SAS system module that processes macros and

the SAS macro languages is how you communicate with the

processor.

global macro variables


proc print data = sashelp.cars;
where make = 'Audi' and type = 'Sports' ;
TITLE "Sales as of &SYSDAY &SYSDATE";
run;
=========

The local variables are declared with below syntax.


% LET (Macro Variable Name) = Value;

Here the Value field can take any numeric, text or date value as required

by the program. The Macro variable name is any valid SAS variable.
%LET make_name = 'Audi';
%LET type_name = 'Sports';
proc print data = sashelp.cars;
where make = &make_name and type = &type_name ;
TITLE "Sales as of &SYSDAY &SYSDATE";
run;
==========

Macro Programs
Format:
# Creating a Macro program.
%MACRO <macro name>(Param1, Param2,….Paramn);

Macro Statements;

%MEND;

# Calling a Macro program.


%MacroName (Value1, Value2,…..Valuen);
============

Macro %PUT

This macro statement writes text or macro variable information to

the SAS log. In the below example the value of the variable 'today' is

written to the program log.


data _null_;
CALL SYMPUT ('today',
TRIM(PUT("&sysdate"d,worddate22.)));
run;
Calling the Macro like below
%put &today;

===

Macro %RETURN

Execution of this macro causes normal termination of the currently

executing macro when certain condition evaluates to be true. In

the below examplewhen the value of the variable "val" becomes 10, the

macro terminates else it contnues.


%macro check_condition(val);
%if &val = 10 %then %return;
data p;
x = 34.2;
run;

%mend check_condition;

Calling the Macro like below


%check_condition(11) ;
======

Macro %END

This macro definition contains a %DO %WHILE loop that ends, as

required, with a %END statement. In the below example the macro

named test takes a user input and runs the DO loop using this

input value. The end of DO loop is achieved through the %end statement

while the end of macro is achieved through %mend statement.


%macro test(finish);
%let i = 1;
%do %while (&i <&finish);
%put the value of i is &i;
%let i=%eval(&i+1);
%end;
%mend test;
Calling the Macro like below
%test(5)

========
DATE TIME FORMAT

A smaller width will give incorrect result. with SAS V9, there is a generic

date format anydtdte15. which can process any date input.

Input Date Date width Informat

03/11/2014 10 mmddyy10.

03/11/14 8 mmddyy8.
December 11, 2012 20 worddate20.

14mar2011 9 date9.

14-mar-2011 11 date11.

14-mar-2011 15 anydtdte15. (a
generic date
format anydtdte15.
which can process
any date input)

DATA TEMP;
INPUT @1 DOJ1 mmddyy10. @12 DOJ2 mmddyy10.;
format DOJ1 date11. DOJ2 worddate20. ;
DATALINES;
01/12/2012 02/11/1998
;
PROC PRINT DATA = TEMP;
RUN;

========

SAS - Read Raw Data

SAS can read data from various sources which includes many file formats.

The file formats used in SAS environment is discussed below.

• ASCII(Text) Data Set

• Delimited Data

• Excel Data

• Hierarchical Data

========

Example

In the below example we read the data file named emp_data.txt from

the local environment.


data TEMP;
infile
'/folders/myfolders/sasuser.v94/TutorialsPoint/emp_data.txt';
input empID empName $ Salary Dept $ DOJ date9. ;
format DOJ date9.;
run;
PROC PRINT DATA = TEMP;
RUN;

Reading Delimited Data


In this case we use the dlm option in the infile statement

Example

In the below example we read the data file named emp.csv from the local

environment.
data TEMP;
infile
'/folders/myfolders/sasuser.v94/TutorialsPoint/emp.csv' dlm=",";
input empID empName $ Salary Dept $ DOJ date9. ;
format DOJ date9.;
run;
PROC PRINT DATA = TEMP;
RUN;
======

Reading Excel Data

SAS can directly read an excel file using the import facility. As seen in the

chapter SAS data sets, it can handle a wide variety of file types including

MS excel. Assuming the file emp.xls is available locally in the SAS

environment.

Example
FILENAME REFFILE
"/folders/myfolders/TutorialsPoint/emp.xls"
TERMSTR = CR;

PROC IMPORT DATAFILE = REFFILE


DBMS = XLS
OUT = WORK.IMPORT;
GETNAMES = YES;
RUN;
PROC PRINT DATA = WORK.IMPORT RUN;
The above code reads the data from excel file and gives the same output

as above two file types.


======

PROC EXPORT - inbuilt EXPORT function to out the data set files in a

variety of formats
SAS can write datasets in different formats. It can write data from SAS
files to normal text file.These files can be read by other software
programs. SAS uses PROC EXPORT to write data sets.

We will use the SAS data set named cars available in the SASHELP library.

We export it as a space delimited text file with the code as shown in the

following program.
proc export data = sashelp.cars
outfile = '/folders/myfolders/sasuser.v94/TutorialsPoint/car_data.txt'
dbms = dlm;
delimiter = ' ';
run;

======

Writing a CSV file

In order to write a comma delimited file we can use the dlm option with a

value "csv". The following code writes the file car_data.csv.


proc export data = sashelp.cars
outfile = '/folders/myfolders/sasuser.v94/TutorialsPoint/car_data.csv'
dbms = csv;
run;
=======

Writing a tab delimited file

In order to write a tab delimited file we can use the dlm option with a

value "tab". The following code writes the file car_tab.txt.

=====

Concatenate Data Sets


Multiple SAS data sets can be concatenated to give a single data set using
the SET statement.
Ideally all the combining data sets have same variables, but in case they
have different number of variables, then in the result all the variables
appear, with missing values for the smaller data set.
DATA ITDEPT;
INPUT empid name $ salary ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid name $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT NON_ITDEPT;
RUN;
PROC PRINT DATA = All_Dept;
RUN;

For different column names, code as below


SET ITDEPT(RENAME =(ename = Employee) ) NON_ITDEPT(RENAME
=(empname = Employee) );

======

SAS - Concatenate Data Sets

Previous Page
Next Page

Multiple SAS data sets can be concatenated to give a single data set using

the SET statement. The total number of observations in the concatenated

data set is the sum of the number of observations in the original data

sets. The order of observations is sequential. All observations from the


first data set are followed by all observations from the second data set,

and so on.

Ideally all the combining data sets have same variables, but in case they

have different number of variables, then in the result all the variables

appear, with missing values for the smaller data set.

Syntax

The basic syntax for SET statement in SAS is −


SET data-set 1 data-set 2 data-set 3.....;

Following is the description of the parameters used −

• data-set1,data-set2 are dataset names written one after another.

Example

Consider the employee data of an organization which is available in two

different data sets, one for the IT department and another for Non-It

department. To get the complete details of all the employees we

concatenate both the data sets using the SET statement shown as below.
DATA ITDEPT;
INPUT empid name $ salary ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid name $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT NON_ITDEPT;
RUN;
PROC PRINT DATA = All_Dept;
RUN;

When the above code is executed, we get the following output.

Scenarios

When we have many variations in the data sets for concatenation, the

result of variables can differ but the total number of observations in the

concatenated data set is always the sum of the observations in each data

set. We will consider below many scenarios on this variation.

Different number of variables

If one of the original data set has more number of variables then another,

then the data sets still get combined but in the smaller data set those

variables appear as missing.

Example

In below example the first data set has an extra variable named DOJ. In

the result the value of DOJ for second data set will appear as missing.
DATA ITDEPT;
INPUT empid name $ salary DOJ date9. ;
DATALINES;
1 Rick 623.3 02APR2001
3 Mike 611.5 21OCT2000
6 Tusar 578.6 01MAR2009
;
RUN;
DATA NON_ITDEPT;
INPUT empid name $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT NON_ITDEPT;
RUN;
PROC PRINT DATA = All_Dept;
RUN;

When the above code is executed, we get the following output.

Different variable name

In this scenario the data sets have same number of variables but a

variable name differs between them. In that case a normal concatenation

will produce all the variables in the result set and giving missing results

for the two variables which differ. While we may not change the variable

name in the original data sets we can apply the RENAME function in the

concatenated data set we create. That will produce the same result as a

normal concatenation but of course with one new variable name in place

of two different variable names present in the original data set.

Example

In the below example data set ITDEPT has the variable

name ename whereas the data set NON_ITDEPT has the variable

name empname. But both of these variables represent the same

type(character). We apply the RENAME function in the SET statement as

shown below.
DATA ITDEPT;
INPUT empid ename $ salary ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid empname $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT(RENAME =(ename = Employee) ) NON_ITDEPT(RENAME
=(empname = Employee) );
RUN;
PROC PRINT DATA = All_Dept;
RUN;

When the above code is executed, we get the following output.

Different variable lengths

If the variable lengths in the two data sets is different than the

concatenated data set will have values in which some data is truncated

for the variable with smaller length. It happens if the first data set has a

smaller length. To solve this we apply the higher length to both the data

set as shown below.

Example

In the below example the variable ename is of length 5 in the first data

set and 7 in the second. When concatenating we apply the LENGTH

statement in the concatenated data set to set the ename length to 7.


DATA ITDEPT;
INPUT empid 1-2 ename $ 3-7 salary 8-14 ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid 1-2 ename $ 3-9 salary 10-16 ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
LENGTH ename $ 7 ;
SET ITDEPT NON_ITDEPT ;
RUN;
PROC PRINT DATA = All_Dept;
RUN;
=======

Merge Data Sets by common field


Multiple SAS data sets can be merged based on a specific common
variable to give a single data set. This is done using
the MERGE statement and BY statement.

The total number of observations in the merged data set is often less than

the sum of the number of observations in the original data sets. It is

because the variables form both data sets get merged as one record

based when there is a match in the value of the common variable.

There are two Prerequisites for merging data sets given below −

• input data sets must have at least one common variable to merge on.

• input data sets must be sorted by the common variable(s) that will be

used to merge on.

Data Merging
# Data set 1
ID NAME SALARY
1 Rick 623.3
2 Dan 515.2
3 Mike 611.5
4 Ryan 729.1
5 Gary 843.25
6 Tusar 578.6
7 Pranab 632.8
8 Rasmi 722.5

# Data set 2
ID DEPT
1 IT
2 OPS
3 IT
4 HR
5 FIN
6 IT
7 OPS
8 FIN

# Merged data set


ID NAME SALARY DEPT
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN

The above result is achieved by using the following code in which the

common variable (ID) is used in the BY statement. Please note that the

observations in both the datasets are already sorted in ID column.


DATA SALARY;
INPUT empid name $ salary ;
DATALINES;
1 Rick 623.3
2 Dan 515.2
3 Mike 611.5
4 Ryan 729.1
5 Gary 843.25
6 Tusar 578.6
7 Pranab 632.8
8 Rasmi 722.5
;
RUN;
DATA DEPT;
INPUT empid dEPT $ ;
DATALINES;
1 IT
2 OPS
3 IT
4 HR
5 FIN
6 IT
7 OPS
8 FIN
;
RUN;
DATA All_details;
MERGE SALARY DEPT;
BY (empid);
RUN;
PROC PRINT DATA = All_details;
RUN;

Missing Values in the Matching Column

There may be cases when some values of the common variable will not

match between the data sets. In such cases the data sets still get merged

but give missing values in the result.

Example
Consider the case of employee ID 3 missing from the dataset salary and
employee ID 6 missing form data set DEPT. When the above code is
applied, we get the below result.

ID NAME SALARY DEPT


1 Rick 623.3 IT
2 Dan 515.2 OPS
3. . IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 .
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN

Merging only the Matches

To avoid the missing values in the result we can consider keeping only the

observations with matched values for the common variable. That is

achieved by using the IN statement. The merge statement of the SAS

program needs to be changed.

Example

In the below example, the IN= value keeps only the observations where

the values from both the data sets SALARY and DEPT match.
DATA All_details;
MERGE SALARY(IN = a) DEPT(IN = b);
BY (empid);
IF a = 1 and b = 1; (If salary DEPT is not null, include and get the records
in the result)
RUN;
PROC PRINT DATA = All_details;
RUN;

Upon execution of the above SAS program with the above changed part,

we get the following output.


1 Rick 623.3 IT
2 Dan 515.2 OPS
4 Ryan 729.1 HR
5 Gary 843.25 FIN
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
=======

Subsetting Data Sets

Subsetting a SAS data set means extracting a part of the data set by
selecting a fewer number of variables or fewer number of observations or
both. While subsetting of variables is done by
using KEEP and DROP statement, the sub setting of observations is done
using DELETE statement.
Also the resulting data from the subsetting operation is held in a new data
set which can be used for further analysis. Sub setting is mainly used for
the purpose of analyzing a part of the data set without using those
variables or observations which may not be relevant to the analysis.

Example

Consider the below SAS data set containing the employee details of an

organization. If we are interested only in getting the Name and

Department values from the data set, then we can use the below code.
DATA Employee;
INPUT empid ename $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
KEEP ename DEPT; or DROP empid salary
RUN;
PROC PRINT DATA = OnlyDept;
RUN;

======

Sort Data Sets


Data sets in SAS can be sorted on any of the variables present in them.
This helps both in data analysis and performing other options like merging
etc. Sorting can happen on any single variable as well as multiple
variables. The SAS procedure used to carry out the sorting in SAS data set
is named PROC SORT. The result after sorting is stored in a new
data set and the original data set remains unchanged.
DATA Employee;
INPUT empid name $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;

PROC SORT DATA = Employee OUT = Sorted_sal ;


BY salary;
RUN ;
Reverse order
PROC SORT DATA = Employee OUT = Sorted_sal_reverse ;
BY DESCENDING salary;

PROC PRINT DATA = Sorted_sal;


RUN ;

Sorting Multiple Variables


PROC SORT DATA = Employee OUT = Sorted_dept_sal ;
BY salary DEPT;
RUN ;

======

Format Data Sets


DATA Employee;
INPUT empid name $ salary DEPT $ ;
format name $upcase9. ;
-
We can also use PROC FORMAT ….. VALUE to format data.
proc format;
value $DEP 'IT' = 'Information Technology'
'OPS'= 'Operations' ;
RUN;
PROC PRINT DATA = Employee;
format name $upcase9. DEPT $DEP.;
RUN;

=======

SAS - SQL
==========

SAS - ODS

The output from a SAS program can be converted to more user


friendly forms like .html or PDF. This is done by using
the ODS statement available in SAS. ODS stands for output delivery
system. It is mostly used to format the output data of a SAS program to
nice reports which are good to look at and understand. That also helps
sharing the output with other platforms and soft wares. It can also
combine the results from multiple PROC statements in one single
file.

ODS HTML ODS RTF ODS PDF

ODS HTML
PATH = '/folders/myfolders/sasuser.v94/TutorialsPoint/'
FILE = 'CARS2.html'
STYLE = EGDefault;
proc SQL;
select make, model, invoice
from sashelp.cars
where make in ('Audi','BMW')
and type = 'Sports'
;
quit;

proc SQL;
select make,mean(horsepower)as meanhp
from sashelp.cars
where make in ('Audi','BMW')
group by make;
quit;
ODS HTML CLOSE;

When the above code is executed we get the following result −

https://www.tutorialspoint.com/sas/sas_output_delivery_system.htm
========

SAS Simulation Studio.


Its graphical user interface provides a full set of tools for building,
executing, and analyzing the results of discrete event simulation models.

The SAS software component which is used in creating SAS simulation is

called SAS Simulation Studio. Its graphical user interface provides a full

set of tools for building, executing, and analyzing the results of discrete

event simulation models.

Different types of statistical distributions on which SAS simulation can be

applied is listed below.

• SIMULATE DATA FROM A CONTINUOUS DISTRIBUTION

• SIMULATE DATA FROM A DISCRETE DISTRIBUTION

• SIMULATE DATA FROM A MIXTURE OF DISTRIBUTIONS

• SIMULATE DATA FROM A COMPLEX DISTRIBUTION

• SIMULATE DATA FROM A MULTIVARIATE DISTRIBUTION

• APPROXIMATE A SAMPLING DISTRIBUTION

• ASSESS REGRESSION ESTIMATES

Histograms
A Histogram is
graphical display of data using bars of different heights.
It groups the various numbers in the data set into many ranges.
It also represents the estimation of the probability of distribution of
a continuous variable.
In SAS the PROC UNIVARIATE is used to create histograms with the
below options.

Example

In the below example, we consider the minimum and maximum values of


the variable horsepower and take a range of 50. So the values form a

group in steps of 50.


proc univariate data = sashelp.cars;
histogram horsepower
/ midpoints = 176 to 350 by 50;
run;

Histogram with Curve Fitting

=========

Stacked Bar chart

A stacked bar chart is a bar chart in which a variable from the dataset is

calculated with respect to another variable.

Example

The below script will create a stacked bar-chart where the length of the

cars are calculated for each car type. We use the group option to specify

the second variable.


proc SGPLOT data = work.cars1;
vbar length /group = type ;
title 'Lengths of Cars by Types';
run;
quit;

Clustered Bar chart

The clustered bar chart is created to show how the values of a variable

are spread across a culture.

The clustered bar chart is created to show how the values of a variable

are spread across a culture.

Example

The below script will create a clustered bar-chart where the length of the
cars is clustered around the car type.So we see two adjacent bars at

length 191, one for the car type 'Sedan' and another for the car type

'Wagon'.
======

Pie Charts
A pie-chart is a representation of values as slices of a circle with
different colors. The slices are labeled and the numbers
corresponding to each slice is also represented in the chart.
Labels mentioned outside the circle.
In SAS the pie chart is created using PROC TEMPLATE

Pie Chart with Data Labels - % include fractions … Labels(car type)

specified inside the circle.


In this pie chart we represent both the fractional value as well as the
percentage value for each slice. We also change the location of the label
to be inside the chart. The style of appearance of the chart is modified by
using the DATASKIN option. It uses one of the inbuilt styles, available in
the SAS environment.

Grouped Pie Char - Two full circles(car maker name) one in another shows

the type of cars sold in percentage


In this pie chart the value of the variable presented in the graph is
grouped with respect to another variable of the same data set. Each group
becomes one circle and the chart has as many concentric circles as the
number of groups available.
======

Scatter Plots
A scatterplot is a type of graph which uses values from two variables
plotted in a Cartesian plane. It is usually used to find out the relationship
between two variables. In SAS we use PROC SGSCATTER to create
scatterplots.
======

Box Plots
a simple Boxplot is created using PROC SGPLOT and paneled boxplot is
created using PROC SGPANEL.

Boxplot in Vertical Panels


Boxplot in Horizontal Panels

=========

Arithmetic Mean

In the below example we find the mean of all the numeric variables in the

SAS dataset named CARS. We specify the maximum digits after decimal

place to be 2 and also find the sum of those variables.


PROC MEANS DATA = sashelp.CARS Mean SUM MAXDEC=2;
RUN;
=========

Standard Deviation

Standard deviation (SD) is a measure of how varied is the data in a data


set. Mathematically it measures how distant or close are each value to the
mean value of a data set. A standard deviation value close to 0 indicates
that the data points tend to be very close to the mean of the data set and
a high standard deviation indicates that the data points are spread out
over a wider range of values

PROC MEANS

To measure the SD using proc means we choose the STD option in the

PROC step. It brings out the SD values for each numeric variable present

in the data set.

Syntax

The basic syntax for calculating standard deviation in SAS is −


PROC means DATA = dataset STD;
========

QUESTIONS AND ANSWERS

The SCAN function


in SAS provides a simple and convenient way to parse out words
from character strings. The SCAN function can be used to select individual
words from text or variables which contain text and then store those
words in new variables.
we have not created any dataset as we just wanted to show how SCAN
function works.

data _null_;
text = "I love SAS Programming";
result = scan(text,2);
put result=;
run;
RESULT: LOVE

———

When you specify a negative number in the second argument of the


function nth-word, SAS starts scanning from the right. For example -1
means the last word of the string.

Since we wish to find the second last word in the string, we have
mentioned -2 in the second argument of the SCAN function.

data _null_;
text = "I love SAS Programming";
result = scan(text,-2);
put result=;
run;

RESULT:

As shown in the image below, the SAS Program returns "SAS" as the
second-to-last word
Extract the second last word
There are two ways to scan from right to left in the SCAN function.

—————
SCAN : Convert a String into Multiple Observations
Suppose you have a string that consists of multiple substrings delimited
by commas, and you wish to transform it into multiple observations
(rows).
data readin;
input text $30.;
datalines;
live, love, laugh, repeat
;
run;
data readin2(keep=word);
set readin;
do i = 1 to countw(text, ',');
word = scan(text, i, ',');
output;
end;
proc print;
run;

Explanation
1. A DO loop is initiated with the variable 'i' iterating from 1 to the number
of words in the 'text' variable, separated by commas. This is done using
the 'COUNTW (‘count words In the STRING not characters) function.

2. Within the loop, the SCAN function is used to extract each word from the
'text' variable based on the current value of 'i' and the comma delimiter.
The extracted word is then assigned to the 'word' variable.

3. The 'OUTPUT' statement is used to store each word as a separate


observation in the new dataset named 'readin2'. It is run within the loop.

4. In the new dataset named 'readin2', we have kept the variable


'word' only. We didn't retain the variables 'i', 'text'.

======

Difference between CALL SYMPUT and CALL SYMPUTX

CALL SYMPUTX does not generate a note in the SAS log when the
second argument is numeric. Whereas, CALL SYMPUT produces a log note
stating the conversion of numeric values to character values. CALL
SYMPUTX removes both leading and trailing blanks

What does call SYMPUTX do in SAS?

SYMPUTX - Assigns a value to a macro variable, and removes both leading

and trailing blanks.

What is the syntax of SYMPUT in SAS?

The syntax for SYMPUT is the following: CALL SYMPUT


(macrovar,value) ; The first argument will yield the name of the macro

variable being created or assigned a value and the second argument will

provide the value.

TIPS;

SYMPUT - Numeric to character


SYMPUTX - CtoN
PUT = NtoC
ASSIGN = CtoN
DO.. WHILE - Loop continues until the condition is false
DO.. UNTIL - Loop continues until the condition is TRUE. At least one time
execution happens since CONDITION CHECK at the end of atlas one
execution
———-
If use simple OUTPUT command, it write one obs. For each iteration.
Instead of finally at the PROC PRINT OUTFILE;

What is best 12 format in SAS?


The BEST12. format specification says to display a numeric value
using 12 characters. The BEST format will attempt to determine the "best"
way to display that particular value in 12 characters. So a larger value
might use scientific notation.

———

What is the difference between formats and informats in SAS?


Informats is used to tell SAS how to read a variable whereas
Formats is used to tell SAS how to display or write values of a variable.
Informats is basically used when you read in sample data which is being
created using CARDS/DATALINES statement or read or import data from
either an external file (Text/Excel/CSV)
———
MACROS - are SYSTEM MODULE(like machine language directly
understandable by complier) .
SAS macros construct input for the SAS compiler. Some functions of the
SAS macro processor are to pass symbolic values between SAS
statements and steps, to establish default symbolic values, to
conditionally execute SAS steps, and to invoke very long, complex code in
a quick, short way. It should be noted that the macro processor is the SAS
system module that processes macros and the SAS macro languages is
how you communicate with the processor.

You might also like