SMDM Project Report - Set2
SMDM Project Report - Set2
1.2.1.1 What is the important technical information about the dataset that a database administrator would
be interested in? (Hint: Information about the size of the dataset and the nature of the variables)
Exploring into the dataset, we can imply the below that shares in-depth information on the Autso Motor
Customer Base.
- A total of 14 variables
- A total of 1581 Purchase entries
- Implies a size of 1581 entries x 14 Variables
- 12 Non null variables and 2 variables with null values
- 8 String Type Variables, 6 Numeric Variables
Performing a validation into the data using the duplicate evaluation methods, yields the result
Number of duplicate rows = 0
Since there are no identical rows in the dataset, we don’t need to action on an the dataset
- Performing exploratory analysis on the object data variables in the dataset, we could see that there
seems to be some kind of discrepancy in the Gender column only
- On Further analysis into the Gender Variable, we could see that the dataset contains
- Treating the inconsistent data of ‘Female’ value, we have the final updated dataset with Gender
count as
We could see from the above table, that the below two variables contains null values.
- We can Ignore/drop this column and in case of future references required, we can always reconstruct
this Variable with the formulation
Partner_salary = Total_salary - Salary
2. Gender Column contains a total of 53 null rows
- we can see that the Male category is the top most occurring value with the frequency of 1199
- We can populate the NaN values to "Male" to remove the null occurrences
- Post Treatment
- The Variables and the Entries of the Purchase history of the Autso Motor Company
- Final Dataset with values of top 5 rows:
1.2.3 Explore all the features of the data separately by using appropriate visualizations and draw
insights that can be utilized by the business.
1.2.3.1 Univariate analysis of the Variables
1.2.3.1.1 Age
1.2.3.1.2 Gender
1.2.3.1.3 Profession
1.2.3.1.4 Marital_status
1.2.3.1.5 Education
1.2.3.1.6 No_of_Dependents
1.2.3.1.7 Personal_loan
1.2.3.1.8 House_loan
1.2.3.1.9 Partner_working
1.2.3.1.10 Salary
1.2.3.1.11 Total_salary
1.2.3.1.12 Price
1.2.3.1.13 Make
1. Gender – Male Customer base dominates the list with a count of 1252
2. Profession - The Number of dependants range between 0 to 4 with 50% of Customers with 2
Dependents
3. Marital_status – 1443 Customers who were married preferred to purchase a Vehicle
4. Education– Between Graduates and Post Graduates, Post Graduates had topped the list with 985
purchases
5. Personal_loan – 792 Customers with Personal Loans had purchased a Vehicle
6. House_loan – 1054 Customers who had no Housing Loan chose to purchase a Vehicle
7. Partner_working – 868 Customers who had a working partner, preferred to purchase a Vehicle, out of
the overall 1581 entries
8. Make – The Make of the card with 3 variations with Sedan being the most preferred.
1.2.4 Understanding the relationships among the variables in the dataset is crucial for every
analytical project. Perform analysis on the data fields to gain deeper insights. Comment on
your understanding of the data.
1.2.4.1 Bivariate & Multivariate Analysis
We can understand from the below Heatmap that the Age and Price are highly corelated and plays a major
role in the purchase of a Vehicle
We can see from the above Histplot and below box plot over Price Vs Make,
- SUV Make is of the higher end of the Price Range and has few outliers below Q1
- Sedan Make is the most Purchased with a mid Price range between 17k to 55k
- Hatchback is at the least price range of 17k to maximum of 35k
- Customers between age range 35 to 54 are the ones who prefer SUV make
- Sedan and Hatchback are preferred by customers between age range of 22 to 45
- Customers between age 22 to 30 mostly prefer a Hatchback compared to other Makes
Further Analysis along with Salary we can determine that,
- Salary does not have a great impact into the purchase of a particular Make.
- Age plays major influence in determining the choice of the Make preference
1.2.5.2 E2) Ned Stark believes that a salaried person is more likely to buy a Sedan.
Below Countplot helps to understand the statement of Ned Start
- That is a likely statement as we can see that Salaried person have a higher rate of Sedan Purchases
- But there is only a minimal margin in preference of Salaried and Business professional person to
purchase a Sedan
1.2.5.3 E3) Sheldon Cooper does not believe any of them; he claims that a salaried male is an easier target
for a SUV sale over a Sedan Sale.
- Male Customers are the major base to prefer a vehicle than a female
- We can analyse more on the preferences of the Male Customer base and provide customization that
might increase the likeliness of purchase
Analysis over the Gender and Marital_Status, we can come to the provide the below Market Strategy