0% found this document useful (0 votes)
46 views21 pages

SMDM Project Report - Set2

The document details an analysis of customer purchase data for an auto company. It describes cleaning the data by handling duplicates, discrepancies and null values. Univariate and bivariate analysis were performed on the variables through visualizations to understand relationships between age, price, salary and make. Insights from the analysis can help improve the company's marketing campaign.

Uploaded by

priyada16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views21 pages

SMDM Project Report - Set2

The document details an analysis of customer purchase data for an auto company. It describes cleaning the data by handling duplicates, discrepancies and null values. Univariate and bivariate analysis were performed on the variables through visualizations to understand relationships between age, price, salary and make. Insights from the analysis can help improve the company's marketing campaign.

Uploaded by

priyada16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Problem 1: Austo Motor Company Analysis

1.1 Problem Statement :


Analysts are required to explore data and reflect on the insights. Clear writing skill is an integral part of a
good report. Note that the explanations must be such that readers with minimum knowledge of analytics is
able to grasp the insight.
Austo Motor Company is a leading car manufacturer specializing in SUV, Sedan, and Hatchback models. In its
recent board meeting, concerns were raised by the members on the efficiency of the marketing campaign
currently being used. The board decides to rope in an analytics professional to improve the existing
campaign.
You as an analyst have been tasked with performing a thorough analysis of the data and coming up with
insights to improve the marketing campaign.
The instructions below are given to help you complete the project – Dataset - Link

Sample of the Datatset:


1.2 Analysis Report

1.2.1.1 What is the important technical information about the dataset that a database administrator would
be interested in? (Hint: Information about the size of the dataset and the nature of the variables)

1.2.1.1.1 Detailed Data Description:

Exploring into the dataset, we can imply the below that shares in-depth information on the Autso Motor
Customer Base.
- A total of 14 variables
- A total of 1581 Purchase entries
- Implies a size of 1581 entries x 14 Variables
- 12 Non null variables and 2 variables with null values
- 8 String Type Variables, 6 Numeric Variables

Output of descriptive analysis of the Dataset:

a. Data columns (total 14 columns):


# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 1581 non-null int64
1 Gender 1528 non-null object
2 Profession 1581 non-null object
3 Marital_status 1581 non-null object
4 Education 1581 non-null object
5 No_of_Dependents 1581 non-null int64
6 Personal_loan 1581 non-null object
7 House_loan 1581 non-null object
8 Partner_working 1581 non-null object
9 Salary 1581 non-null int64
10 Partner_salary 1475 non-null float64
11 Total_salary 1581 non-null int64
12 Price 1581 non-null int64
13 Make 1581 non-null object
b. dtypes: float64(1), int64(5), object(8)
c. RangeIndex: 1581 entries, 0 to 1580
1.2.2 Take a critical look at the data and do a preliminary analysis of the variables. Do a quality
check of the data so that the variables are consistent. Are there any discrepancies present in
the data? If yes, perform preliminary treatment of data.
Let us explore into the analysis of each of the variables of the dataset and perform a preliminary treatment
and remove discrepancies.

1.2.2.1.1 Duplicate Data Treatment

Performing a validation into the data using the duplicate evaluation methods, yields the result
Number of duplicate rows = 0

Since there are no identical rows in the dataset, we don’t need to action on an the dataset

1.2.2.1.2 Discrepancies Treatment

- Performing exploratory analysis on the object data variables in the dataset, we could see that there
seems to be some kind of discrepancy in the Gender column only

- On Further analysis into the Gender Variable, we could see that the dataset contains

- Treating the inconsistent data of ‘Female’ value, we have the final updated dataset with Gender
count as

1.2.2.1.3 Null Data Treatment

Exploring into the counts of null values,

Age 0 Null Count


Gender 53 Null Count
Profession 0 Null Count
Marital_status 0 Null Count
Education 0 Null Count
No_of_Dependents 0 Null Count
Personal_loan 0 Null Count
House_loan 0 Null Count
Partner_working 0 Null Count
Salary 0 Null Count
Partner_salary 106 Null Count
Total_salary 0 Null Count
Price 0 Null Count
Make 0 Null Count

We could see from the above table, that the below two variables contains null values.

1. Partner_Salary contains a total of 106 null rows


- Further Analysis on the Partner_Salary column, we could see that the ‘Partner_working’ has a direct
relation when the Customer has no Partner who is earning, the column in marked as null
- Total_salary variable implies that
Total_salary = Salary + Partner_salary

- We can Ignore/drop this column and in case of future references required, we can always reconstruct
this Variable with the formulation
Partner_salary = Total_salary - Salary
2. Gender Column contains a total of 53 null rows

To Remove the null data in the Gender column,

- we can see that the Male category is the top most occurring value with the frequency of 1199
- We can populate the NaN values to "Male" to remove the null occurrences
- Post Treatment

1.2.2.1.4 Treated Final Dataset

Post the preliminary analysis and treatment of the dataset

- The Variables and the Entries of the Purchase history of the Autso Motor Company
- Final Dataset with values of top 5 rows:
1.2.3 Explore all the features of the data separately by using appropriate visualizations and draw
insights that can be utilized by the business.
1.2.3.1 Univariate analysis of the Variables

1.2.3.1.1 Age

1.2.3.1.2 Gender

1.2.3.1.3 Profession
1.2.3.1.4 Marital_status

1.2.3.1.5 Education
1.2.3.1.6 No_of_Dependents

1.2.3.1.7 Personal_loan

1.2.3.1.8 House_loan
1.2.3.1.9 Partner_working

1.2.3.1.10 Salary
1.2.3.1.11 Total_salary

1.2.3.1.12 Price
1.2.3.1.13 Make

1.2.3.2 Descriptive Summary & Insights:


Exploring into the central tendency of each of the variables, we can summarize each of the variables
according to the type

1.2.3.2.1 Numerical Variables:

Exploration of the Numerical variables summarizes

1. Age– Customer base is of an age range between 22 to 54 with a mean age of 32


2. No_of_Dependents- The Number of dependants range between 0 to 4 with 50% of Customers with 2
Dependents
3. Salary – Salary of the Individual Customer ranges from 30K to 99K with 75% of Customers earning
between 30k to 71K
4. Total_salary – Customer Salary along with their Partner earnings reaching to the highest of 171k
5. Price – Price of the Vehicle ranging between 18k to 70k with a standard deviation of 13k

1.2.3.2.2 Non-Numeric/Categorical Variables:

Exploration of the Non-Numerical variables summarizes

1. Gender – Male Customer base dominates the list with a count of 1252
2. Profession - The Number of dependants range between 0 to 4 with 50% of Customers with 2
Dependents
3. Marital_status – 1443 Customers who were married preferred to purchase a Vehicle
4. Education– Between Graduates and Post Graduates, Post Graduates had topped the list with 985
purchases
5. Personal_loan – 792 Customers with Personal Loans had purchased a Vehicle
6. House_loan – 1054 Customers who had no Housing Loan chose to purchase a Vehicle
7. Partner_working – 868 Customers who had a working partner, preferred to purchase a Vehicle, out of
the overall 1581 entries
8. Make – The Make of the card with 3 variations with Sedan being the most preferred.
1.2.4 Understanding the relationships among the variables in the dataset is crucial for every
analytical project. Perform analysis on the data fields to gain deeper insights. Comment on
your understanding of the data.
1.2.4.1 Bivariate & Multivariate Analysis

1.2.4.1.1 Correlation between the Numeric Variables

We can understand from the below Heatmap that the Age and Price are highly corelated and plays a major
role in the purchase of a Vehicle

1.2.4.1.2 Price and Make analysis

We can see from the above Histplot and below box plot over Price Vs Make,
- SUV Make is of the higher end of the Price Range and has few outliers below Q1
- Sedan Make is the most Purchased with a mid Price range between 17k to 55k
- Hatchback is at the least price range of 17k to maximum of 35k

1.2.4.1.3 Make Vs Age Analysis

We can see from the below box plot that the

- Customers between age range 35 to 54 are the ones who prefer SUV make
- Sedan and Hatchback are preferred by customers between age range of 22 to 45
- Customers between age 22 to 30 mostly prefer a Hatchback compared to other Makes
Further Analysis along with Salary we can determine that,

- Salary does not have a great impact into the purchase of a particular Make.
- Age plays major influence in determining the choice of the Make preference

1.2.4.1.4 Pairplot Analysis based on the Heatmap

Analysing a pair plot with the 'Age','Marital_status','No_of_Dependents','Salary','Price','Make' variables

- Age, Price and Salary have a positive correlation


- Rest of the data have no to minimal corelation
1.2.5 Employees working on the existing marketing campaign have made the following remarks.
Based on the data and your analysis state whether you agree or disagree with their
observations. Justify your answer Based on the data available.
1.2.5.1 E1) Steve Roger says “Men prefer SUV by a large margin, compared to the women”
Below Countplot helps to Imply that the statement made by Steve Roger

- Quite contradictory to the available Dataset


- SUV purchases are more preferred by Female than a Male with a minimal marginal difference

1.2.5.2 E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
Below Countplot helps to understand the statement of Ned Start

- That is a likely statement as we can see that Salaried person have a higher rate of Sedan Purchases
- But there is only a minimal margin in preference of Salaried and Business professional person to
purchase a Sedan
1.2.5.3 E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier target
for a SUV sale over a Sedan Sale.

Below Mapping can confirm over Sheldon Cooper’s beliefs

- Quite contradictory and it is a personal preference


- Female Salaried Professional prefer SUV more than Male Salaried Professionals
1.2.6 From the given data, comment on the amount spent on purchasing automobiles across the
following categories. Comment on how a Business can utilize the results from this exercise.
Give justification along with presenting metrics/charts used for arriving at the conclusions.
1.2.6.1 F1) Gender
We can come to the justification based on the gender that

- Male Customers are the major base to prefer a vehicle than a female
- We can analyse more on the preferences of the Male Customer base and provide customization that
might increase the likeliness of purchase

1.2.6.2 F2) Personal_loan


- Personal Loan category has no impact over the purchase of a vehicle
- Both Customers who has a Loan and doesn’t have a Personal loan prefer to purchase a Vehicle based
on their personal preferences
- This data cannot be used at the recent moment to analyse the market trend
1.2.7 From the current data set comment if having a working partner leads to the purchase of a
higher-priced car.
Analysing the boxplot for Partner Working and Price with a Make marker,

- No implication that a working partner leads to the purchase of a high-priced car


- The purchase data is split across the board over all price ranges
1.2.8 The main objective of this analysis is to devise an improved marketing strategy to send
targeted information to different groups of potential buyers present in the data. For the
current analysis use the Gender and Marital_status - fields to arrive at groups with similar
purchase history.

Analysis over the Gender and Marital_Status, we can come to the provide the below Market Strategy

- Female Married population prefer SUV than other Makes


- Hatchback has the most market than other Makes for Single Male
- Single Females prioritize Sedan over any other models
- Married Male population prefers Sedan over other Models

You might also like