0% found this document useful (0 votes)
75 views

Assignment 1

Uploaded by

ray yusuf
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Assignment 1

Uploaded by

ray yusuf
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment 1

Bank Marketing Case Study


Loading & Merging Data
Learning Outcomes
1. Load data using input files in various formats to combine information from many data
domains and sources
2. Rename columns and convert column types from character to numeric to prepare for
merging
3. Merge sas datasets to obtain a datawarehouse-ready version for analysis

Deliverable
Asnwer each question in SAS and generate a PDF summary of your sas file and its output. Include
your name, ID and date at the beginning of your program.

Introduction
The head of Marketing wants to know which customers have the highest propensity for buying a
Certificate of Deposit (CD) from the institution. The goal of this assignment is to create part of an
analytical data mart by combining information from many data domains and sources.

Q1. Load data from customer_banking_info_promo.xslx

 Define the library name "mylib" and specify its location using libname.
 Use proc import datafile to import customer_banking_info_promo.xlsx into a sas dataset
named customer_banking_info_promo under mylib.

Reference:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n02nz0e7cykqhun14hcppfm
d0558.htm#n02nz0e7cykqhun14hcppfmd0558

 Print the first five rows of the dataset add (obs=5) at the end of proc print.

Reference:
https://documentation.sas.com/doc/en/vdmmlcdc/8.1/ledsoptsref/p0h5nwbig8mobbn1u0dwtd
o0c0a0.htm#n15y2or0sz9ttdn15o2okmj6yt3b

Q2. Examine the variable Customer ID. Check the type and format.

 Use proc contents procedure to examine the variables and their types. This will also print
more details.
Reference: http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/
viewer.htm#p120panelmbpren1m0j2n77s9f67.htm

Q3. Delete/Rename Columns

 Look at the description of the different


here: https://archive.ics.uci.edu/ml/datasets/bank+marketing

duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the
output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is
performed. Also, after the end of the call y is obviously known. Thus, this input should only be
included for benchmark purposes and should be discarded if the intention is to have a realistic
predictive model.

 Within a data step, perform the following:


 Keep the output dataset name same as input (customer_banking_info_promo)
 Rename "customer_id2" to customer_id
 Drop the column "duration" from the dataset.
 Print the first 5 observations in the dataset

References:

Rename option: https://online.stat.psu.edu/stat481/lesson/13

Drop option: https://online.stat.psu.edu/stat481/lesson/13/13.2

Q4. Load data from customer_banking_info.csv

 Load the data and print the first five rows.

Q5. Renaming columns

 Use proc contents to examine the list of variables as before. You will see that customer_id1
is numerical with len=8. This is important to check as this column will be used to merge the
datasets.
 Within a data step, perform the following:
 Keep the output dataset name same as the input dataset name
(customer_banking_info)
 Rename "customer_id1" as customer_id
 Print the first 5 observations in the dataset

Q6. SAS data from customer_demographics.sas7bdat

 Print the first 5 rows of customer_demographics.sas7bdat


 Use proc contents and examine the list of variables. What is the type of customer_id

Q7. Convert from character to numeric type

Before merging multiple datasets, the common column between the datasets should be of the same
type.
In customer_banking_info_promo, customer_id is defined as character. you are given a sample data
step code to run:

 The output dataset name customer_banking_info_promocv


 To convert customer_id to numeric variable, we use the input function.

Reference: http://support.sas.com/kb/24/590.html

 Check the customer_id variable type again by using proc contents or proc means to see
the list of numerical variables

Q8. Data Merging

 Join the three sources of data into a single SAS data set.
 Sort each of the datasets by customer_id
 Merge the three datasets using the merge function within a data step. name the new
dataset as "customer_all"
 Print the first five observations.

Reference

https://online.stat.psu.edu/stat481/lesson/15

https://online.stat.psu.edu/stat481/lesson/16

You might also like