Quick Reference: SAS Programming 1: Essentials
Quick Reference: SAS Programming 1: Essentials
LIBNAME output-libref 'physical-file-name'; PROC COPY IN=input-libref OUT=output-libref; SELECT input-data-set1 input-data-set2; RUN;
Creating Variables
variable=expression;
1
Copyright 2010 SAS Institute Inc., Cary, NC, USA. All rights reserved.
Functions
WEEKDAY(SAS-date) YEAR(SAS-date) QTR(SAS-date) MONTH(SAS-date) TODAY() MDY(month, day, year)
UPCASE(argument)
SUM(argument1,argument2, . . .)
Printing Data
PROC PRINT DATA=SAS-data-set <option(s)>; VAR variable(s); BY BY-variable(s); RUN;
PROC MEANS DATA=SAS-data-set <statistic(s)> <option(s)>; CLASS classification-variable(s); VAR analysis-variable(s); RUN;
PROC SUMMARY DATA=SAS-data-set <statistic(s)> <option(s)>; VAR analysis-variable(s); CLASS classification-variable(s); RUN;
FORMAT variable(s) format; PROC TABULATE DATA=SAS-data-set <option(s)>; CLASS classification-variable(s); VAR analysis-variable(s); TABLE page-expression, row-expression, column-expression </ option(s)>; <additional statements> RUN;
Subsetting Data
WHERE where-expression;
PROC FORMAT; VALUE format-name value-or-range1= 'formatted-value1' format-name value-or-range2= 'formatted-value2' ; RUN;
Creating Graphs
GOPTIONS <options-list>;
ODS _ALL_=CLOSE;
PROC GPLOT DATA=SAS-data-set; PLOT vertical-variable*horizontal-variable </ option(s)>; SYMBOL<1255> <options>; RUN; QUIT;
Operators
Arithmetic Operators
Operator
** * / + -
Logical Operators
Example Priority
I I II II III III
Action
Operator
Meaning
negative prefix negative=-x; exponentiation raise=x**y; multiplication division addition subtraction mult=x*y; divide=x/y; sum=x+y; diff=x-y;
AND or & and, both. If both expressions are true, then the compound expression is true. OR or | or, either. If either expression is true, then the compound expression is true.
Comparison Operators
Symbol(s) Mnemonic
= ^= = ~= > < >= <= = EQ NE GT LT GE LE IN
Definition
equal to not equal to greater than less than greater than or equal to less than or equal to equal to one of a list
Definition
writes standard character data. writes standard numeric data writes numeric values with a comma that separates every three digits and a period that separates every decimal fraction.
Stored Value
0 0 0 365 365 365 -1 -1 0
Displayed Value
010160 01/01/60 01/01/1960 311260 31/12/60 31/12/1960 31DEC59 31DEC1959 January 1, 1960
COMMAXw.d writes numeric values with a period that separates every three digits and a comma that separates the decimal fraction. DOLLARw.d writes numeric values with a leading dollar sign, a comma that separates every three digits, and a period that separates the decimal fraction. writes numeric values with a leading euro symbol (), a period that separates every three digits, and a comma that separates the decimal fraction.
EUROXw.d
Definition
reads standard character data. reads standard numeric data reads nonstandard numeric data and removes embedded commas, blanks, dollar signs, percent signs, and dashes.
COMMAXw.d reads nonstandard numeric data and removes embedded periods, blanks, dollar signs, percent signs, and dashes. EUROXw.d reads nonstandard numeric data and removes embedded characters in European currency.
SAS Functions
SAS Date Functions
These date functions extract date information from the date value that SAS stores. Date Function YEAR(SAS-date) QTR(SAS-date) Value Extracted the year the quarter Value Returned a four-digit year a number from 1 to 4
MONTH(SAS-date) DAY(SAS-date)
WEEKDAY(SAS-date)
These date functions create a SAS date value. Date Function TODAY() SAS Date Value Created the current date
Statistical Functions
Function
SUM MEAN
Syntax
sum(argument, argument,...) mean(argument, argument,...) min(argument, argument,...) max(argument, argument,...) var(argument, argument,...) std(argument, argument,...)
Calculates
sum of values average of non-missing values minimum value maximum value variance of the values standard deviation of the values
PROC APPEND
SET Statement
Is faster because it does not process observations Is slower because it in the BASE= data set. processes all observations in all input data sets. Can concatenate any number of input data steps in one DATA step. Uses all variables in all input data sets. If necessary, assigns missing values.
Number of data sets Is limited to two input data sets in one PROC APPEND step. Combining data sets that contain different variables Uses all variables in the BASE= data set. If necessary, assigns missing values to observations from the DATA= data set. Drops any variables found only in the DATA= data set.
DATA= data set variables have a Replaces all values for the variable in the DATA= data set different type than the variables in the with missing values and keeps the variable type of the BASE= data set. variable specified in the BASE= data set. DATA= data set variables are longer Truncates values from the DATA= data set to fit them into than the variables in the BASE= data the length that is specified in the BASE= data set. set.
Description
Description
two-sided confidence limit for the mean corrected sum of squares coefficient of variation kurtosis one-sided confidence limit below the mean maximum value average minimum value number of observations with non-missing values number of observations with missing values range skewness standard error of the mean sum sum of the weight variable values one-sided confidence limit above the mean uncorrected sum of squares variance
MEDIAN / P50 median or 50th percentile 1st percentile 5th percentile 10th percentile lower quartile or 25th percentile upper quartile or 75th percentile 90th percentile 95th percentile 99th percentile difference between upper and lower quartiles: Q3-Q1
Description
probability of a greater absolute value for the t value Student's t for testing the hypothesis that the population mean is 0
Description
percentage of a value in a single cell in relation to the total values in the column percentage of a sum in a single cell in relation to the total sum in the column sum of squares corrected for the mean percent coefficient of variation kurtosis one-sided confidence limit below the mean maximum value average minimum value most frequent value number of observations with non-missing values number of observations with missing values percentage of a value in a single cell in relation to the total of the values in the page percentage of a sum in a single cell in relation to the total of the values in the page percentage that one frequency represents of another frequency (can specify a denominator definition) percentage that one sum represents of another sum (can specify a denominator definition) range percentage of a value in a single cell in relation to the total of the value in the report
Keyword
REPPCTSUM
Description
percentage of a sum in a single cell in relation to the total of the value in the report percentage of a value in a single cell in relation to the total values in the row percentage of a sum in a single cell in relation to the total sum in the row skewness standard deviation standard error of the mean sum sum of the weights one-sided confidence limit above the mean uncorrected sum of squares variance
COLPCTSUM
ROWPCTN
CSS CV KURTOSIS | KURT LCLM MAX MEAN MIN MODE N NMISS PAGEPCTN
ROWPCTSUM
SKEWNESS | SKEW STDDEV | STD STDERR SUM SUMWGT UCLM USS VAR
PAGEPCTSUM
PCTN
PCTNSUM
RANGE REPPCTN
Description
median or 50th percentile 1st percentile 5th percentile 10th percentile lower quartile or 25th percentile upper quartile or 75th percentile 90th percentile 95th percentile 99th percentile interquartile range (difference between upper and lower quartiles)
Description
probability of a greater absolute value for the t value Student's t for testing the hypothesis that the population mean is 0
10