Assignment 1
Assignment 1
Deliverable
Asnwer each question in SAS and generate a PDF summary of your sas file and its output. Include
your name, ID and date at the beginning of your program.
Introduction
The head of Marketing wants to know which customers have the highest propensity for buying a
Certificate of Deposit (CD) from the institution. The goal of this assignment is to create part of an
analytical data mart by combining information from many data domains and sources.
Define the library name "mylib" and specify its location using libname.
Use proc import datafile to import customer_banking_info_promo.xlsx into a sas dataset
named customer_banking_info_promo under mylib.
Reference:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n02nz0e7cykqhun14hcppfm
d0558.htm#n02nz0e7cykqhun14hcppfmd0558
Print the first five rows of the dataset add (obs=5) at the end of proc print.
Reference:
https://documentation.sas.com/doc/en/vdmmlcdc/8.1/ledsoptsref/p0h5nwbig8mobbn1u0dwtd
o0c0a0.htm#n15y2or0sz9ttdn15o2okmj6yt3b
Q2. Examine the variable Customer ID. Check the type and format.
Use proc contents procedure to examine the variables and their types. This will also print
more details.
Reference: http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/
viewer.htm#p120panelmbpren1m0j2n77s9f67.htm
duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the
output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is
performed. Also, after the end of the call y is obviously known. Thus, this input should only be
included for benchmark purposes and should be discarded if the intention is to have a realistic
predictive model.
References:
Rename option: https://online.stat.psu.edu/stat481/lesson/13
Drop option: https://online.stat.psu.edu/stat481/lesson/13/13.2
Use proc contents to examine the list of variables as before. You will see that customer_id1
is numerical with len=8. This is important to check as this column will be used to merge the
datasets.
Within a data step, perform the following:
Keep the output dataset name same as the input dataset name
(customer_banking_info)
Rename "customer_id1" as customer_id
Print the first 5 observations in the dataset
Before merging multiple datasets, the common column between the datasets should be of the same
type.
In customer_banking_info_promo, customer_id is defined as character. you are given a sample data
step code to run:
Reference: http://support.sas.com/kb/24/590.html
Check the customer_id variable type again by using proc contents or proc means to see
the list of numerical variables
Join the three sources of data into a single SAS data set.
Sort each of the datasets by customer_id
Merge the three datasets using the merge function within a data step. name the new
dataset as "customer_all"
Print the first five observations.
Reference
https://online.stat.psu.edu/stat481/lesson/15
https://online.stat.psu.edu/stat481/lesson/16