L1 Controlling Input and Output
L1 Controlling Input and Output
/*******************************************************************************
1. Outputting Multiple Observations
*******************************************************************************/
/* You can control when SAS writes an observation to a SAS data set by using
explicit OUTPUT statements in a DATA step. When an explicit OUTPUT statement is
used, implicit output does not occur at the bottom of the DATA step.
OUTPUT <SAS-data-set(s)>;
The syntax for the OUTPUT statement begins with the keyword OUTPUT. Optionally, the
keyword can be followed by the data set name to which the observation should be
written. If you do not specify a data set name in the OUTPUT statement, the
observation is written to the data set named in the DATA statement.*/
data forecast;
/*The SET statement reads the first observation in the SAS data set growth into
the program data vector.*/
set orion.growth;
/*Additional programming statements create the variable Year and calculate the
total number of employees at the end of the first year. */
year=1;
Total_Employees=Total_Employees*(1+Increase);
/*The first OUTPUT statement directs SAS to write the contents of the PDV to the
output data set. No data set is listed in the OUTPUT statement, so SAS writes the
observation to the 'forecast' data set.*/
output;
/*The second OUTPUT statement directs SAS to write the contents of the PDV to
the forecast data set. Now there are two observations in the forecast data set from
one observation that SAS read from the growth data set.*/
output;
/*An implicit RETURN statement returns processing to the top of the DATA step,
and SAS reads the next observation from the growth data set.*/
run;
/*---PRACTICE 1---*/
/*read orion.prices and create a new data set named work.price_increase*/
DATA work.price_increase;
SET orion.prices;
/*use explicit OUTPUT statements to output three observations for each input
observation*/
Year=2;
Unit_Price=Unit_Price * Factor;
OUTPUT;
Year=3;
Unit_Price=Unit_Price * Factor;
OUTPUT;
RUN;
/*Print the new data set and include only Product_ID, Unit_Price, and Year in the
report*/
PROC PRINT DATA=work.price_increase;
VAR Product_ID Unit_Price Year;
RUN;
/*---PRACTICE 2---*/
/*Read orion.discount and use explicit OUTPUT statements to create a data set,
work.extended, that lists all discounts for the Happy Holidays promotion. Output
two observations for each observation read from the input data.*/
DATA work.extended;
SET orion.discount;
/*Print the new data set with an appropriate title and view the results.
*/
title 'the Happy Holidays promotion products';
proc print data=work.extended;
/* var Product_ID Unit_Price Year;*/
run;
title;
/*******************************************************************************
2. Writing to Multiple SAS Data Sets
*******************************************************************************/
/* 2.1 To create more than one data set, you specify the names of the SAS data sets
you want to create in the DATA statement. Separate the data set names with a space.
You can use OUTPUT statements with IF-THEN-ELSE statements to conditionally write
observations to a specific data set based on the value of a variable in the input
data set.
For conditional processing, it's most efficient to check for values in order of
decreasing frequency. Revise the program, as shown below, to check Country='US'
first.
The values of Country were miscoded as lowercase. Now that you've seen the data in
the other data set, you could fix the data or revise the conditional logic to look
for both uppercase and lowercase values, or you could use a function in the IF
statement to change the values of Country to uppercase.
SELECT <(select-expression)>;
WHEN-1 (when-expression-1 <…, when-expression-n>) statement;
WHEN-n (when-expression-1 <…, when-expression-n>) statement;
<OTHERWISE statement;>
END;
The optional SELECT expression specifies any valid SAS expression. Often a variable
name is used as the SELECT expression. When you specify a SELECT expression, SAS
evaluates the expression and then compares the result to each when-expression. When
a true condition is encountered, the associated statement is executed and the
remaining WHEN statements are skipped.
If you omit the SELECT expression, SAS evaluates each when-expression until it
finds a true condition, then behaves as described above. This form of SELECT is
useful when you want to check the value of more than one variable using a compound
condition, or check for an inequality. One thing to keep in mind is that SAS
executes WHEN statements in the order that you write them and once a when-
expression is true, no other when-expressions are evaluated.
A null OTHERWISE statement can be useful when you want to ignore certain values.
For example, if you only want to create data sets for employees in the United
States and Australia, then you would want to ignore values for other countries.
You can execute multiple statements when a when-expression is true by using DO-END
groups in a SELECT Group.
The way SAS evaluates a WHEN expression in a SELECT group depends on whether or not
you specify a SELECT expression. When you specify a SELECT expression in the SELECT
statement, SAS finds the value of the SELECT expression and then compares the value
to each WHEN expression to return a value of true or false.
SELECT (Country);
WHEN ('US') OUTPUT usa;
WHEN ('AU') OUTPUT australia;
OTHERWISE OUTPUT other;
END;
/*less efficient*/
SELECT; /*Omitting the SELECT Expression*/
WHEN (Country='US') OUTPUT usa;
WHEN (Country='AU') OUTPUT australia;
OTHERWISE OUTPUT other;
END;
There are times when you cannot use a SELECT expression. For example, you might
want to check the condition of more than one variable in a WHEN expression. One
thing to keep in mind is that SAS executes WHEN statements in the order that you
write them and once a WHEN expression is true, no other WHEN expressions are
evaluated.
SELECT;
WHEN (Country='US') OUTPUT usa;
WHEN (Country='AU' and City='Melbourne') OUTPUT newOffice;
WHEN (Country='AU') OUTPUT australia;
OTHERWISE OUTPUT other;
END;
The australia data set will contain all observations in which Country is Australia
and City is NOT Melbourne. This is the result that we want. If you reverse the
order of these two WHEN statements, all observations in which Country=AU will be
written to the australia data set and no observations will be written to the
newoffice data set.
/*---PRACTICE 3---*/
/* Read orion.employee_organization and create the data sets work.admin,
work.stock, and work.purchasing.*/
DATA work.admin work.stock work.purchasing;
SET orion.employee_organization;
/* or */
data work.admin work.stock work.purchasing;
set orion.employee_organization;
if Department='Administration' then output work.admin;
else if Department='Stock & Shipping' then output work.stock;
else if Department='Purchasing' then output work.purchasing;
run;
/*---PRACTICE 4---*/
/* Read orion.orders and create three data sets named work.fast, work.slow, and
work.veryslow. */
DATA work.fast work.slow work.veryslow;
SET orion.orders;
/*Create a variable ShipDays that is the number of days between when the
order was placed and when the order was delivered.*/
ShipDays=Delivery_Date-Order_Date;
/*******************************************************************************
* 3. Controlling Variable Input and Output
*******************************************************************************/
/* By default, SAS writes all variables from the input data set to every data set
listed in the DATA statement.
You can use DROP and KEEP statements (1) to control which variables are written to
output data sets. DROP and KEEP statements affect all output data sets listed in
the DATA statement. */
/* When you use the DROP= or KEEP= data set options in a DATA statement,
the DROP= and KEEP= data set options specify the variables to drop or keep
in each output data set.
Remember, the dropped variables are still in the program data vector, and therefore
available for processing in the DATA step.
When you use the DROP= or KEEP= data set options in a SET statement,
the variables are dropped on input.
In other words, they are not read into the program data vector, therefore they are
not available for processing.
set orion.employee_addresses;
if Country='US' then
output usa;
else if Country='AU' then
output australia;
else
output other;
run;
/* You can use both DROP and KEEP statements and DROP= and KEEP= options in the
same step, but do not try to drop and keep the same variable.
If you use a DROP or KEEP statement at the same time as a data set option, the
statement is applied first.*/
/* 3.2 Controlling Variable Input Using the DROP= and KEEP= options
in the SET statement (an input data set) */
DATA usa;
SET orion.employee_addresses (DROP=
Street_ID Street_Number Street_Name Country);
<additional SAS statements>;
run;
/* Remember that when you associate the DROP= and KEEP= data set options with an
output data set, the variables are still available for processing.
In contrast, when you associate these options with an input data set in a SET
statement, the variables are not read into the program data vector, and therefore
they are not available for processing.
For cases where you don't need all the variables in an input data set, this is an
efficient way to drop them so that they aren't processed at all. */
/* Example
You want to drop 'Employee_ID' and 'Country' from every data set, and you want to
drop 'State' from the australia data set. You can do this by using a combination of
options and statements. Let's start over with the code that creates the three data
sets with all nine variables.
You can use the DROP= data set option (1) in the SET statement to drop Employee_ID
from the input data because it's not used for processing in the DATA step. */
data usa
australia(DROP=State) /* (3)*/
other;
DROP Country; /* (2) */
set orion.employee_addresses
(DROP=Employee_ID); /* (1) */
if Country='US' then
output usa;
else if Country='AU' then
output australia;
else
output other;
run;
/* Next you want to drop Country from every output data set, but the variable needs
to be available for processing.
Here's a question: What's the simplest way to drop Country from all three output
data sets? It’s easiest to use a DROP statement (2).
You could use the DROP= data set option to drop the variable from each output data
set individually
data usa(DROP=Country) australia(DROP=State, Country) otherDROP=Country);
but it's more concise to use a DROP statement.
Finally, you use the DROP= data set option (3) to drop 'State' from the australia
data set.
When the code compiles, only the Employee_ID variable is dropped from the input
data, and all other variables are included in the program data vector and are
available for processing. */
/* Other Example:
The SAS data set car has the variables CarID, CarType, Miles, and Gallons. Select
the DATA step or steps that creates the ratings data set with the variables CarType
and MPG. */
/* or this way */
data ratings;
set car(drop=CarID);
drop Miles Gallons;
MPG=Miles/Gallons;
run;
/******************************************************************************
* 4. Controlling Which Observations Are Read
******************************************************************************/
You can use the OBS= and FIRSTOBS= data set options to limit the number of
observations that SAS processes.
The FIRSTOBS= data set option specifies a starting point for processing an input
data set. By default, FIRSTOBS=1.
The OBS= data set option specifies the number of the last observation to process.
It does not specify how many observations should be processed.
You can use FIRSTOBS= and OBS= together to define a range of observations for SAS
to process.
SAS_data_set_name(OBS=n)
E.g. (OBS=100) data set option in this SET statement causes the DATA step to stop
processing after observation 100.
SAS_data_set_name(FIRSTOBS=n)
E.g. (FIRSTOBS=20) data set option to specify a starting point for processing an
input data set, so the SET statement starts reading observations from the input
data set at observation number 20 and continues processing until the last
observation is read.
SAS_data_set_name(FIRSTOBS=n OBS=n)
Used together to define a range of observations in the data set.
E.g. (FIRSTOBS=50 OBS=100) - these data set options cause the SET statement to
read 51 observations from the data set. Processing begins with observation 50 and
ends after observation 100.
Both the FIRSTOBS= and the OBS= options are used with input data sets - SET
statement. You cannot use either option with output data sets.
When you limit the number of observations that SAS reads from input data, the
number of observations in your output data is also limited.
4.2 Use FIRSTOBS= and OBS= options in an INFILE statement
You can also use FIRSTOBS= and OBS= options in an INFILE statement to control which
records are read when you read raw data files.
DATA employees;
INFILE 'emps.dat' FIRSTOBS=11 and OBS=15;
INPUT @1 EmpID 8. @9 EmpName $40. @153 Country $2.;
RUN;
PROC PRINT DATA=employees;
RUN;
Notice that the syntax is different. In an INFILE statement, the options follow the
filename, but they are not enclosed in parentheses.
4.3 Use FIRSTOBS= and OBS= options in an SAS procedures (e.g. PROC PRINT step)
You can also use FIRSTOBS= and OBS= in a procedure step, to limit the number of
observations that are processed.
DATA new;
SET old(FIRSTOBS=100 OBS=200);
RUN;
PROC PRINT DATA=new(OBS=50);
RUN;
The data set options in the SET statement direct SAS to begin reading at
observation 100 and stop after observation 200. The data set option in the PROC
PRINT step directs SAS to stop printing after 50 observations.
/*---PRACTICE 5---*/
/*
* Practice L1-5: Specify Variables and Observations
*/
/*
Task
In this practice, you create two data sets based on the value of a variable in the
input data. You specify which variables to include in the output data sets and you
specify the observations to print.
The data set orion.employee_organization contains information on employee job
titles, departments, and managers. Create two data sets: one for the Sales
department and another for the Executive department.
/*---PRACTICE 6---*/
/*
* Practice L1-6: Specify Variables and Observations
*/
DATA work.instore (KEEP=Order_ID Customer_ID Order_Date)
work.delivery (KEEP=Order_ID Customer_ID Order_Date ShipDays);
SET orion.orders;
WHERE Order_Type=1;
ShipDays=Delivery_Date - Order_Date;
IF ShipDays=0 THEN
OUTPUT work.instore;
ELSE IF ShipDays>0 THEN
OUTPUT work.delivery;
RUN;
/*********************************************************
* Sample Programs
**********************************************************/
Writing to Multiple Data Sets (Using a SELECT Group with DO-END Group in the WHEN
statement)
data usa australia other;
set orion.employee_addresses;
select (upcase(Country));
when ('US') do;
Benefits=1;
output usa;
end;
when ('AU') do;
Benefits=2;
output australia;
end;
otherwise do;
Benefits=0;
output other;
end;
end;
run;