0% found this document useful (0 votes)
11 views

SMDM-Project Sample Business Report

This document analyzes automobile sales data to understand customer preferences and characteristics. It includes univariate and multivariate statistical analysis of the data, answering key questions and providing recommendations. Various statistical techniques are used to study relationships between variables like customer age, income, gender and the make of car purchased.

Uploaded by

Janhavi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

SMDM-Project Sample Business Report

This document analyzes automobile sales data to understand customer preferences and characteristics. It includes univariate and multivariate statistical analysis of the data, answering key questions and providing recommendations. Various statistical techniques are used to study relationships between variables like customer age, income, gender and the make of car purchased.

Uploaded by

Janhavi Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

l

tia
Statistical Methods for Decision Making

[email protected]
Project Report en
166FVD0TPV
fid
on
C

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Contents

S.no Topics Page

1 Problem- Austo Automobile Analysis 4

1.1 Problem Definition 4

l
tia
1.2 Data Overview 7

1.3 Univariate Analysis 13

1.4 Multivariate Analysis 23

1.5

1.6

[email protected]
Answer Key Questions
en
Conclusion and Recommendations
35

40

166FVD0TPV 2 Problem- Framing Analytics Problem 42


fid
2.1 Problem Definition 42

2.2 Top 5 Features 45


on

List of Tables
C

No Name of the Table Page no

1 Top five rows of dataset 7

2 Basic Information of dataset 7

3 Numerical summarization of dataset 10

4 Value Counts of the Categorical Variables 11

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 1
List of Figures

No Name of Figure Page no

1 Univariate Analysis of Age 13

2 Univariate Analysis of Salary 14

l
tia
3 Univariate Analysis of Partner 15

4 Univariate Analysis of Total Salary 16

5 Univariate Analysis of Price 17

8
[email protected]
en
Univariate Analysis of Gender

Univariate Analysis of Profession

Univariate Analysis of Marital Status


18

18

19
166FVD0TPV 9 Univariate Analysis of Education 19
fid
10 Univariate Analysis of Personal Loan 20

11 Univariate Analysis of Number of dependents 20

12 Univariate Analysis of House Loan 21


on

13 Univariate Analysis of Partner Working 21

14 Univariate Analysis of Make 22

15 Correlation of Numerical Variables 23


C

16 Relationship between Numerical Variables 24

17 Make vs Age Plot 25

18 Make vs Price Plot 26

19 Make vs Salary Plot 27

20 Make vs Education Plot 28

21 Make vs Number of Dependents Plot 29

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 2
22 Make vs Profession Plot 30

23 Make vs Personal Loan Plot 31

24 Make vs House Loan Plot 32

25 Make vs Gender Plot 33

26 Make vs Marital Status Plot 34

27 Make vs Gender Plot 35

28 Make vs Profession Plot 36

l
tia
29 Make vs Profession Plot (Male) 37

[email protected]
en
166FVD0TPV
fid
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 3
Project 1

Problem Definition
Context

In the 21st century, cars are an important mode of transportation that


provides us the opportunity for personal control and autonomy. In

l
day-to-day life, people use cars for commuting to work, shopping, visiting

tia
family and friends, etc. Research shows that more than 76% of people
prevent themselves from traveling somewhere if they don't have a car. Most
people tend to buy different types of cars based on their day-to-day

into the market.


[email protected]
en
necessities and preferences. So, it is essential for automobile companies to
analyze the preference of their customers before launching a car model
Austo, a UK-based automobile company aspires to grow its
166FVD0TPV
fid
business into the US market after successfully establishing its footprints in
the European market.

In order to be familiar with the types of cars preferred by the customers and
on

factors influencing the car purchase behavior in the US market, Austo has
contracted a consulting firm. Based on various market surveys, the
consulting firm has created a dataset of 3 major types of cars that are
C

extensively used across the US market. They have collected various details
of the car owners which can be analyzed to understand the automobile
market of the US.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 4
Objective

Austo's management team wants to understand the demand of the buyers


and trends in the US market. They want to build customer profiles based on
the analysis to identify new purchase opportunities so that they can
manipulate the business strategy and production to meet certain demand
levels. Further, the analysis will be a good way for management to

l
understand the dynamics of a new market. Suppose you are a Data

tia
Scientist working at the consulting firm that has been contracted by Austo.
You are given the task to create buyer's profiles for different types of cars
with the available data as well as a set of recommendations for Austo.

automobile company to grow its business.


[email protected]
en
Perform the data analysis to generate useful insights that will help the

166FVD0TPV
fid
Data Description
austo_automobile.csv: The dataset contains buyer's data corresponding to
different types of products(cars).
on

Data Dictionary
● Age: Age of the customer
C

● Gender: Gender of the customer


● Profession: Indicates whether the customer is a salaried or business
person
● Marital_status: Marital status of the customer
● Education: Refers to the highest level of education completed by the
customer

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 5
● No_of_dependents: Number of dependents(partner/children/spouse)
of the customer
● Personal_loan: Indicates whether the customer availed a personal
loan or not
● House_loan: Indicates whether the customer availed house loan or
not
● Partner_working: Indicates whether the customer's partner is working
or not
● Salary: Annual Salary of the customer

l
● Partner_salary: Annual Salary of the customer's partner

tia
● Total_salary: Annual household income (Salary + Partner_salary) of
the customer's family
● Price: Price of the car
● Make: Car type (Hatchback/Sedan/SUV)

[email protected]
en
166FVD0TPV
fid
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 6
Data Overview

Load the required packages, set the working directory, and load the data
file.

The dataset has 1581 rows and 14 columns. It is always a good practice to
view a sample of the rows. A simple way to do that is to use head()
function.

l
tia
[email protected]
en
Table 1: Top five rows of the dataset

166FVD0TPV
fid
on
C

Table 2: Basic Information of the Dataset

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 7
A quick look at the dataset information tells us that there are 6 are
numerical and 8 categorical variables. There are few Null records present in
two variables: Gender and Partner_salary, which will be analyzed in detail in
the next section. There are no duplicate records in the dataset.

Missing value treatment

l
tia
Inspecting Null Values -

There are Nulls in Gender and Partner_salary variables.


● Gender - total 53 Nulls
● Partner_salary - Total 106 Nulls

Handling
[email protected] Nulls -
en
166FVD0TPV
fid
Nulls are usually handled by the following techniques –
● If the proportion of Null values is more than 60 % of the total number
of records in a column, then drop the column. Here you assume that
the column is uninformative.
on

● If any row is missing a large amount of records across columns then


that row may also be dropped.
● Otherwise, the missing values may be imputed.
C

For the given data, neither (a) nor (b) is applicable since the proportion of
null values in any column is small and no row contains a large number of
missing observations.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 8
Simple rules for imputation:
● For categorical variables, we can impute the Nulls with the majority
class. For the current dataset, Null values in the ‘Gender’ field are
imputed with ‘Male’ (Male being the majority class).

● For continuous variables, it is possible to impute the Null values with


the mean/median of the variable depending on the nature of the
distribution. However, more efficient imputation is possible if

l
variables are internally related.

tia
The three variables on salary are related to one another:

en
Also, non-null values in the Partner_salary field are possible only if the
Binary variable Partner_working is YES. Hence for this data, we do a
[email protected]
166FVD0TPV
rule-based imputation instead of the mean/median imputation –
fid
● If Partner_working = ‘No’ then Partner_salary = 0
● If Partner_working = ‘Yes’ then Partner_salary = Total_salary - Salary
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 9
Statistical Summary

Inspecting the Summary Statistics of the Dataset (Numerical fields)

l
tia
Observations:
[email protected]
en
Table 3: Numerical summarization of the dataset

166FVD0TPV ● The average age of the customers is around 32 years. 75% of the
fid
customers are below 38 years and the minimum age of the customer
is 22. This indicates that buyers in the age group 22-38 purchase new
cars.
● 50% of the customers have at least 2 dependents.
on

● The salary of the customer lies between 30,000 to 90,000, with an


average of around 60,000 and a standard deviation of 14,278. The
mean salary is almost equal to the median, this suggests that salary
distribution is symmetrical.
C

● At least 25% of the customer's partners are not working. The average
partner's salary of the customer is around 20000. The mean salary is
less than the median, this suggests that salary distribution will be
left-skewed.
● The average household salary of the customer is around 80000, with
a standard deviation of around 25000. The mean salary is
approximately equal to the median, this suggests that salary
distribution is symmetrical.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 10
● The price of the car lies in the range of 18000 to 70000 with an
average of around 36000. The mean salary is greater than the
median, this suggests that salary distribution will be a bit
right-skewed.

Checking for anomalous values in categorical variables

l
Determining the unique values for each categorical variable to check if any

tia
junk/garbage values are present. This check can also help us to identify if
any data entry issues are present.

[email protected]
en
166FVD0TPV
fid
on
C

Table 4: Value Counts of the Categorical Variables

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 11
From the value counts of the Gender variable, we find that there are two
instances of possible data entry issues. The word Female has been
misspelled as ‘Femle’ and ‘Femal’.

For the current dataset, we are confident that the category Female has
been misspelled, so we can go ahead and impute these records with the
correct spelling i.e. ‘Female’. However, in real-time data, the issues might

l
not be this straightforward all the time, it might need thorough inspection

tia
and domain knowledge to rectify such issues.

The rest of the categorical fields seem to be free from any such issues.

[email protected]
en
166FVD0TPV
fid
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 12
Univariate Analysis

For performing Univariate analysis we will take a look at the Boxplots and
Histograms to get better understanding of the distributions.

Numerical variables

l
● Observations on Age

tia
[email protected]
en
166FVD0TPV
fid
on

Figure 1: Univariate analysis of Age


C

Observations:
● The distribution of Age is right skewed.
● From the boxplot we can see that the second quartile(Q2) is less than
30 which means more than 50% of customers in the dataset are
below the age of 30.
● There are a few outliers in this variable.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 13
● Observations on Salary

l
tia
[email protected]
en
166FVD0TPV Figure 2: Univariate analysis of Salary
fid
Observations:
● The salary of the customer lies between 30,000 to 90,000, with an
average of around 60,000.
on

● The mean salary is almost equal to the median.


C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 14
● Observations on Partner’s salary

l
tia
[email protected]
en
166FVD0TPV
Figure 3: Univariate analysis of Partner_salary
fid
Observations:
● Around 45% of the customer's partners do not work. Hence, their
salary is 0.
on

● Most of the working partners earn in the range of 20000-60000.


C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 15
● Observations on Total salary

l
tia
[email protected]
en
166FVD0TPV Figure 4: Univariate analysis of Total_salary
fid
Observations:
● The total salary of the customer's household follows a normal
distribution, with an average of around 80,000.
on

● The mean salary is almost equal to the median.


● There are a few outliers in this variable. However, we will not treat
them as if they are proper values.
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 16
● Observations on Price

l
tia
[email protected]
en
166FVD0TPV
fid
Figure 5: Univariate analysis of Price

Observations:
● Most of the cars cost in the range 20000-40000.
on

● The mean price of the cars is greater than the median. This indicates
that the car price is right-skewed.
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 17
Categorical variables

● Observations on Gender

Observations:
● There are more male
customers(around 79%) than

l
females(around 21%).

tia
[email protected]
en
166FVD0TPV
fid
Figure 6: Univariate analysis of Gender

● Observations on Profession
on

Observations:
● There are more salaried
customers(around 57%) than business
persons(around 43%).
C

Figure 7: Univariate analysis of Profession

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 18
● Observations on Marital Status

Observations:
● 91.3% customers are married.
Only 8.7% customers are single.

l
tia
[email protected]
en
166FVD0TPV
Figure 8: Univariate analysis of Marital Status
fid
● Observations on Education
on

Observations:
● Around 38% customers are
graduate; whereas 62% have completed
their post graduation.
C

Figure 9: Univariate analysis of Education

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 19
● Observations on Personal Loan

Observations:
● Around 50% of the customers have
a personal loan.

l
tia
en
Figure 10: Univariate analysis of Personal Loan
[email protected]
166FVD0TPV
● Observations on Number of dependents
fid
Observations:
on

● Around 84% of the


customers have at least
2 dependents.
C

Figure 11: Univariate analysis of Number of dependents

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 20
● Observations on House loan

Observations:
● Around 33% of the customers
have a house loan.

l
tia
en
Figure 12: Univariate analysis of House Loan
[email protected]
166FVD0TPV
● Observations on Partner working
fid
Observations:
on

● Around 55% of the customers have


working partners.
C

Figure 13: Univariate analysis of Partner working

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 21
● Observations on Make

Observations:
● The Sale of the 'Hatchback'
type car is more compared to SUV
and Sedan.
● Only 15% of the customers

l
buy SUVs.

tia
[email protected]
en
166FVD0TPV
Figure 14: Univariate analysis of Make
fid
Insights
● Sedan is the most preferred purchase, followed by Hatchback and
on

SUV.
● The number of customers having a working partner are slightly higher
than customers with nonworking partner or singles. There are a total
of 713 customers with Partner_working variable as ‘No’, out of which
C

138 customers are ‘Single’.


● Number of Customers who did not take a House Loan is almost
double the customers who took a House Loan.
● The data consists of very small proportion of Single customers when
compared to married customers.
● Count of Salaried customers is slightly higher than that of Business
customers.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 22
● Majority of the customers in the dataset are Post Graduate.
● From the Barplot of No_of_dependnts variable we can infer that
majority of the customers have either 2 or 3 dependents, followed by
1 or 4 dependents. Very few customers have zero no of dependents.

Multivariate Analysis

l
tia
● Correlation of Numerical Variables

[email protected]
en
166FVD0TPV
fid
on

Figure 15: Correlation of numerical variables

Observations:
C

● Age is moderately correlated with the customer's salary. This is


expected as the salary of the customers in the higher age group will
be more compared to the lower ones.
● Age is highly correlated with the price of the car. It is possible that
higher age group customers tend to buy costly cars.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 23
● Show the relationship between numerical variables

l
tia
[email protected]
en
166FVD0TPV
fid
on

Figure 16: Relationship between numerical variables


C

Observations:
● Customers with higher household salaries prefer SUVs and sedans;
whereas customers with lower household salaries prefer Hatchback
cars.
● Customers in the higher age group prefer SUVs; whereas young
customers prefer hatchbacks.
● Let's analyze it further to get more insights.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 24
Find relationship between Numerical and Categorical variables

● Make vs Age

l
tia
Observations:
[email protected]
en
Figure 17: Make vs Age Plot

166FVD0TPV
● SUV is preferred by customers in the age group 35-60.
fid
● Sedan is preferred by customers in the age group 30-45.
● Hatchback is preferred by the younger customers in the age group
22-30.
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 25
● Make vs Price

l
tia
Observations:
● SUV
en
Figure 18: Make vs Price

is the costliest type of car among the three car types. The price
[email protected]
166FVD0TPV
range of the SUVs is 50000-70000.
fid
● Sedan is costlier compared to hatchback type cars.
● Hatchback is the most affordable car ranging between 15000-35000.
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 26
● Make vs Salary

l
tia
Observations:
● SUV
[email protected]
en
Figure 19: Make vs Salary

is the costliest type of car among the three car types. Hence,
166FVD0TPV
customers with higher household incomes prefer to buy SUVs.
fid
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 27
● Make vs Education

l
tia
Observations:
● Customers
[email protected]
en
Figure 20: Make vs Education

with higher education are more tend to buy cars. As


166FVD0TPV
observed Post Graduates have purchased more cars of all types.
fid
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 28
● Make vs Number of dependents

l
tia
Observations:
● Customers
[email protected]
en
Figure 21: Make vs Number of dependents

with 3 or more number of dependents are more likely to


166FVD0TPV
buy a Hatchback or SUV.
fid
● Sedan cars are purchased by customers with 1 or 2 dependents.
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 29
● Make vs Profession

l
tia
Observations:
[email protected]
en
Figure 22:Make vs Profession

166FVD0TPV ● Customers with salaries buy more cars compared to customers who
fid
own their businesses.
● Sales of the hatchback are almost the same for Business and
Salaried individuals. Hatchback is more popular in both the
professions.
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 30
● Make vs Personal loan

l
tia
Observations:
● Few
en
Figure 23: Make vs Personal loan

SUV customers have personal loans on them.


[email protected]
166FVD0TPV
fid
● For Hatchback and Sedan, there is equal distribution of personal
loans among the customers.
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 31
● Make vs House loan

l
tia
Observations:
● SUV
en
customers do not have a house loan.
[email protected]
Figure 24: Make vs House loan

166FVD0TPV
fid
● More of the hatchback customers have a house loan compared to the
Sedan customers.
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 32
● Make vs Gender

l
tia
Observations:
[email protected]
en
Figure 25: Make vs Gender

166FVD0TPV
● Females prefer SUV and are least likely to buy a Hatchback
fid
● Males prefer Sedan or hatchback
● SUV is least preferable among males
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 33
● Make vs Marital status

l
tia
Observations:
● Married
[email protected]
en
Figure 26: Make vs Marital status

person is most likely prefers a sedan and hatchback.


166FVD0TPV
fid
Whereas, a single person most likely prefers a hatchback
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 34
Answer Key Questions

● Do men tend to prefer SUVs more compared to women?

Analyzing the ratio of SUV purchases for both Genders, we get

Proportion of females buying SUVs = 0.52 (Number of females who bought


SUVs / Total number of females)

l
tia
Proportion of Males buying SUVs = 0.09 (Number of males who bought
SUV / Total number of males)

[email protected]
en
166FVD0TPV
fid
on

Figure 27: Make vs Gender


Hence the statement made by Steve Rogers is incorrect.
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 35
● What is the likelihood of a salaried person buying a Sedan?

Analyzing the Proportion of Car Make purchases for salaried customers, we


get:

Proportion of Hatchbacks purchased = 0.32 (Total Hatchbacks bought by


salaried / Total Cars purchased by salaried)

l
Proportion of SUVs purchased = 0.23 (Total SUVs bought by salaried / Total

tia
Cars purchased by salaried)

Proportion of Sedan purchased = 0.44 (Total Sedans bought by salaried /


Total Cars purchased by salaried)

en
Using Visualization to arrive at the conclusion, we plot a count plot of
Profession as x , while Make as Hue parameter.
[email protected]
166FVD0TPV
fid
on
C

Figure 28: Make vs Profession


From the above results and chart, it is evident that salaried person is more
likely to buy a Sedan. Hence the statement made by Ned Stark is correct.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 36
● What evidence or data supports Sheldon Cooper's claim that a
salaried male is an easier target for an SUV sale over a Sedan sale?

Calculating the Total number of Cars purchased by Salaried Male


Customers for each Make, we get -

Proportion of Hatchback = 277/ 672 = 0.41 (Total Hatchbacks purchased /


Total Cars purchased)

l
tia
Proportion of SUVs = 90/672 = 0.13 (Total SUV purchased / Total Cars
purchased)

Proportion of Sedan = 305/672 = 0.45 (Total Sedans purchased / Total Cars


purchased)
en
Using Visualization to arrive at the conclusion, we plot a count plot of
Profession as x, while Make as Hue parameter for the Male customers.
[email protected]
166FVD0TPV
fid
on
C

Figure 29: Make vs Profession (Male)


From the above results and chart, it is evident that Salaried male prefers
Sedan over SUV. Hence the statement made by Sheldon Cooper is
incorrect.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 37
● How does the amount spent on purchasing automobiles across
gender?

Females are more likely to buy SUVs and on average spend more on cars
than males 47705 Units against 32416 Units.

The mean of Price across Gender:

l
Female = 47705

tia
Male = 32416

Median Price across Gender:

Female = 49000
Male = 29000
[email protected]
en
166FVD0TPV
fid
Mean and Median Price for Female customers is higher than for Male
customers.
on

● How much money was spent on purchasing automobiles by


individuals who took a personal loan?

The mean of Price across Personal Loan:


C

Personal Loan: No= 36742


Personal Loan: Yes= 34457

Median of Price across Personal Loan:

Personal Loan: No= 32000

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 38
Personal Loan: Yes= 31000

Mean and Median of Price for purchase made by customers without a


Personal loan is slightly higher than customers who have a Personal Loan.

To ensure increased spend of customers with Personal loans, the business


can look to make the interest rate cheaper (for Automobile purchases) or
ease down the repayment terms.

l
tia
● How does having a working partner influence the purchase of
higher-priced cars?

Mean of Price across Partner_working:

Partner_working:
[email protected] No = 36000
en
166FVD0TPV
Partner_working: Yes = 35267
fid
Median of Price across Partner_working:

Partner_working: No = 31000
on

Partner_working: Yes = 31000

The Mean and Median price of the purchased automobile is almost similar
across the Partner_working category, thus indicating that partner working
C

or not has no effect on the Purchase made by the customer.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 39
Conclusion and Recommendations

Conclusions
Hatchback:
● An affordable and general-purpose car that can be used by a wide
range of users.
● It can be considered as an entry-level car generally targeted at the

l
younger population with an average income of 55k.

tia
Sedan:
● Slightly costlier compared to hatchback-type cars
● The product also generally targets customers in their 30's who have a

SUV:
slightly higher income. en
● The product is suitable for single customers.

[email protected]
166FVD0TPV
● A costly car that will excite the car-lovers
fid
● It has a higher price point and is more suitable for customers who do
not have any kind of loans on them.
● The buyers in this segment are elder and salaried individuals.
on

Business Recommendations
● Austo should first launch the affordable Hatchback model in the US
market targeting the younger population. This car type can be the
C

flagship product that brings in profits for the company as most of the
young USA customers prefer this model.
● Then, Austo should launch a good and affordable Sedan model. The
company needs to engage in more marketing for this model and
should try to lure the younger age group customers into buying this
model.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 40
● After the successful launch of these models, the company can launch
the SUV model with a competitive pricing strategy to gain more
profits from the US automobile market. SUVs can be targeted to
people from the age group of 35 -60. As most of the customers for
SUVs are in this age range.

l
tia
[email protected]
en
166FVD0TPV
fid
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 41
Project 2- Framing Analytics Problem

Problem Definition

CONTEXT

A bank can generate revenue in a variety of ways, such as by charging


interest, transaction fees, and financial advice. Interest charged on the

l
capital that the bank lends out to customers has historically been the most

tia
significant method of revenue generation. The bank earns profits from the
difference between the interest rates it pays on deposits and other sources
of funds, and the interest rates it charges on the loans it gives out. GODIGT
Bank is a mid-sized private bank that deals in all kinds of banking products,
en
such as savings accounts, current accounts, investment products, etc.
among other offerings. The bank also cross-sells asset products to its
existing customers through personal loans, auto loans, business loans, etc.,
[email protected]
166FVD0TPV
and to do so they use various communication methods including cold
fid
calling, e-mails, recommendations on net banking, mobile banking, etc.
GODIGT Bank also has a set of customers who were given credit cards
based on risk policy and customer category class but due to huge
competition in the credit card market, the bank is observing high attrition in
on

credit card spending. The bank makes money only if customers spend
more on credit cards. Given the attrition, the Bank wants to revisit its credit
card policy and make sure that the card given to the customer is the right
credit card. The bank will make a profit only through the customers that
C

show higher intent toward a recommended credit card. (Higher intent


means consumers would want to use the card and hence not be attrite.)

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 42
Objective

As a Data Scientist at the company and the Data Science team has shared
some data. You are supposed to find the key variables that have a vital
impact on the analysis which will help the company to improve the
business.

Data Description

l
Credit Card Data for GODIGIT Bank

tia
Data Dictionary:
userid - Unique bank customer-id
card_no - Masked credit card number
card_bin_no - Credit card IIN number
[email protected]
en
166FVD0TPV Issuer - Card network issuer
fid
card_type - Credit card type
card_source_data - Credit card sourcing date
high_networth - Customer category based on their net-worth value (A: High to E:
on

Low)
active_30 - Savings/Current/Salary etc. account activity in last 30 days
active_60 - Savings/Current/Salary etc. account activity in last 60 days
active_90 - Savings/Current/Salary etc. account activity in last 90 days
C

cc_active30 - Credit Card activity in the last 30 days


cc_active60 - Credit Card activity in the last 60 days
cc_active90 - Credit Card activity in the last 90 days
hotlist_flag - Whether card is hot-listed(Any problem noted on the card)
widget_products - Number of convenience products customer holds (dc, cc,
net-banking active, mobile banking active, wallet active, etc.)

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 43
engagement_products - Number of investment/loan products the customer
holds (FD, RD, Personal loan, auto loan)
annual_income_at_source - Annual income recorded in the credit card
application
other_bank_cc_holding - Whether the customer holds another bank credit card
bank_vintage - Vintage with the bank (in months) as on Tthmonth
T+1_month_activity - Whether customer uses credit card in T+1 month (future)

l
T+2_month_activity - Whether customer uses credit card in T+2 month (future)

tia
T+3_month_activity - Whether customer uses credit card in T+3 month (future)
T+6_month_activity - Whether customer uses credit card in T+6 month (future)
T+12_month_activity - Whether customer uses credit card in T+12 month (future)

en
Transactor_revolver - Revolver: Customer who carries balances over from one
month to the next. Transactor: Customer who pays off their balances in full every
month.
[email protected]
166FVD0TPV
avg_spends_l3m - Average credit card spends in last 3 months
fid
Occupation_at_source - Occupation recorded at the time of credit card
application
cc_limit - Current credit card limit
on

*All above data has been recorded as on Tth month excluding T+1_month_activity,
T+2_month_activity, T+3_month_activity, T+6_month_activity, T+12_month_activity
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 44
Top 5 important variables with justification.
● annual_income_at_source - Annual income plays a big role in the
purchasing power of an individual hence is a vital piece of info.
Income can be used by the banks to make better decisions in areas
such as risk profiling, targeted ads, campaigns, offers, loan limits etc.
● cc_limit – Defining Credit Card limit for customers basis their

l
attributes (such as income, CIBIL Score, etc.) is part of the Risk

tia
Management practice wherein the banks try to minimize the number
of defaulters. The banks seek a quantifiable answer to the query
“How much is too much?”

● cc_active30
[email protected]
en
– Flag variables such as cc_active30, cc_active60 can be
166FVD0TPV
fid
used to get an understanding over how frequently does the customer
use the credit card, if the account is dormant or if the customer is
experiencing any issues leading to reduced usage of the card etc.
on

● T+1_month_activity – Flag variables such as T+1_month_activity can


be used to plan out campaigns and promotional offers so as to
C

increase activity in the credit card.

● avg_spends_l3m – The avg_spends_l3m variable can give important


insights on the customer spending behavior. It can be used to identify
whether the credit card is primary or secondary card of customer, i.e.

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 45
high spend indicates primary account whereas lower spend would
mean secondary account. Campaigns can be rolled out on the basis
of the customer preference, customized offers can be given to lure
customers into using the credit account more frequently.

Few variables which are unimportant from an analysis point of view, and

l
are merely customer/account identifiers

tia
1. userid
2. card_no
3. card_bin_no

[email protected]
en
166FVD0TPV
fid
on
C

This
Proprietary content. file isLearning.
©Great meant forAll
personal
Rights use by [email protected]
Reserved. only.
Unauthorized use or distribution prohibited
Sharing or publishing the contents in part or full is liable for legal action. 46

You might also like