interview3
interview3
What SAS statements would you code to read an external raw data file to a
DATA step?
INFILE statement.
· Are you familiar with special input delimiters? How are they used?
DLM and DSD are the delimiters that I’ve used. They should be included in the infile statement.
Comma separated values files or CSV files are a common type of file that can be
used to read with the DSD option. DSD option treats two delimiters in a row as MISSING value.
· If reading a variable length file with fixed input, how would you prevent
SAS from reading the next record if the last variable didn't
have a value?
By using the option MISSOVER in the infile statement.If the input of some data
lines are shorter than others then we use TRUNCOVER option in the infile
statement.
· What is the difference between an informat and a format? Name three informats or
formats.
· Name and describe three SAS functions that you have used, if any?
· How would you code the criteria to restrict the output to be produced?
Use NOPRINT option.
· What is the purpose of the trailing @ and the @@? How would you use them?
@ holds the value past the data step.@@ holds the value till a input statement or
end of the line.
Double trailing @@: When you have multiple observations per line of raw data, we should use
double trailing signs (@@) at the end of the INPUT statement. The line hold
specifies like a stop sign telling SAS, “stop, hold that line of raw data”.
Trailing @: By using @ without specifying a column, it is as if you are telling SAS,” stay tuned
for more information. Don’t touch that dial”. SAS will hold the line of data until it reaches either
the end of the data step or an INPUT statement that does not end with the trailing.
· Under what circumstances would you code a SELECT construct instead of IF statements?
When you have a long series of mutually exclusive conditions and the comparison is numeric,
using a SELECT group is slightly more efficient than using IF-THEN or IF-THEN-ELSE
statements because CPU time is reduced.
SELECT GROUP:
Select: begins with select group.When: identifies SAS statements that are executed when a
particular condition is true.
Otherwise (optional): specifies a statement to be executed if no WHEN condition
is met.
End: ends a SELECT group.
· If you're not wanting any SAS output from a data step, how would you code the data
statement to prevent SAS from producing a set?
Data _Null_
· What is the one statement to set the criteria of data that can be coded in any
step?
Options statement: This a part of SAS program and effects all steps
that follow it.
· Have you ever linked SAS code? If so, describe the link and any required statements used
to either process the code or the step itself
.· How would you include common or reuse code to be processed along with your
statements?
By using SAS Macros.
· When looking for data contained in a character string of 150 bytes, which function is the
best to locate that data: scan, index, or indexc?
SCAN.· If you have a data set that contains 100 variables, but you need only five of those,
· Code a PROC SORT on a data set containing State, District and County as the primary
variables, along with several numeric variables.
Proc sort data=one;
BY State District County ;
Run ;
· How would you code a merge that will keep only the observations that have matches from
both sets.
Check the condition by using If statement in the Merge statement
while merging datasets.
· How would you code a merge that will write the matches of both to one data set, the non-
matches from the left-most data.
· What is the Program Data Vector (PDV)? What are its functions?
Function: To store the current obs;PDV (Program Data Vector) is a logical area in memory where
SAS creates a dataset one observation at a time. When SAS processes a data step it has two
phases. Compilation phase and execution phase. During the compilation phase the input buffer is
created to hold a record from external file. After input buffer is created the PDV is created. The
PDV is the area of memory where SAS builds dataset, one observation at a time. The PDV
contains two automatic variables _N_ and _ERROR_.
The Logical Program Data Vector (PDV) is a set of buffers that includes all
variables referenced either explicitly or implicitly in the DATA step. It is created at compile
time, then used at execution time as the location where the working values of
variables are stored as they are processed by the DATA step program(source:
http://www2.sas.com/proceedings/sugi24/Posters/p235-24.pdf).
· In the flow of DATA step processing, what is the first action in a typical DATA Step?
The DATA step begins with a DATA statement. Each time the DATA
statement executes, a new iteration of the DATA step begins, and the _N_
automatic variable is incremented by 1.
· What is _n_?
It is a Data counter variable in SAS.
Note: Both -N- and _ERROR_ variables are always available to you in the data step
.–N- indicates the number of times SAS has looped through the data step.This is not necessarily
equal to the observation number, since a simple sub setting IF statement can
change the relationship between Observation number and the number of iterations of the data
step.The –ERROR- variable ha a value of 1 if there is a error in the data for that observation and
0 if it is not. Ex: This is nothing but a implicit variable created by SAS during
data processing. It gives the total number of records SAS has iterated in a dataset. It is Available
only for data step and not for PROCS. Eg. If we want to find every third record in
a Dataset thenwe can use the _n_ as follows
Data new-sas-data-set;
Set old;
if mod(_n_,3)= 1 then;
run;
Note: If we use a where clause to subset the _n_ will not yield the required result.
Posted by sarath at 4:45 PM
0 comments:
Post a Comment