0% found this document useful (0 votes)
64 views

Data Research Using Marpho Technique

This document summarizes a data analysis of a dataset containing over 3.5 million records of individual car details from European countries between 2015 and 2017. The dataset was cleaned by removing missing or invalid data, classifying cars by age, and creating a new table with 1.25 million cleaned records. The analysis found that Audi, Mercedes-Benz, BMW and Volkswagen had the largest volumes, most cars were from 2015 to 2014, and diesel vehicles had higher average prices than gasoline vehicles.

Uploaded by

dannydongappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Data Research Using Marpho Technique

This document summarizes a data analysis of a dataset containing over 3.5 million records of individual car details from European countries between 2015 and 2017. The dataset was cleaned by removing missing or invalid data, classifying cars by age, and creating a new table with 1.25 million cleaned records. The analysis found that Audi, Mercedes-Benz, BMW and Volkswagen had the largest volumes, most cars were from 2015 to 2014, and diesel vehicles had higher average prices than gasoline vehicles.

Uploaded by

dannydongappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Research Using Marpho technique-1

Summary:
This report is a data analysis of cars dataset, a collection of personal cars data collected from several
European countries between 2015 and 2017. The dataset contains 3552913 records capturing individual
car details like the make, model, colour, body type, fuel, transmission, doors, seating, engine,
manufacture year, price, and listing information like creating date, year and last login date. The dataset is
further refined by data cleaning to improving data quality.

Data cleaning:
1 Original dataset contains 3.5 million records

2 Analyzing columsn Maker and Model and found 0.5 million records are missing make and model
Price column contained values too low and too high.

After filtering the data for make and model missing data and records proced too low and too high
we have 1.25 million records .

3 Few columns body_type, color_slug, fuel_type, transmission have missing data.


Which will be replaced with dummy value “Unknown”
4 An additional column called Car_Class is created based on the Manufacture year, to classify cars as
Vintage cars , Antique Cars , Classic cars , Old Cars and Latest cars .
5 A new table is created based on the data cleanup and additonal column creation mentioned in
above steps 1to 4
CREATE TABLE CARS_CLEAN1 AS
SELECT maker,
model,
mileage,
manufacture_year,
stk_year,
CASE
WHEN rtrim(body_type) = '' THEN 'UNKNOWN'
ELSE body_type
END body_type,
CASE
WHEN rtrim(color_slug) = '' THEN 'UNKNOWN'
ELSE color_slug
END color_slug,
CASE
WHEN rtrim(fuel_type) = '' THEN 'UNKNOWN'
ELSE fuel_type
END fuel_type,
CASE
WHEN rtrim(transmission) = '' THEN 'UNKNOWN'
ELSE transmission
END transmission,
price_eur,
CASE
WHEN manufacture_year > 1918
AND manufacture_year <= 1930 THEN 'Vintage'
WHEN manufacture_year > 1930
AND manufacture_year <= 1975 THEN 'Antique'
WHEN manufacture_year > 1975
AND manufacture_year <= 1990 THEN 'Classic '
WHEN manufacture_year > 1990
AND manufacture_year <= 2010 THEN 'OLD '
WHEN manufacture_year > 1990
AND manufacture_year <= 2015 THEN 'Latest '
ELSE manufacture_year
END AS Car_Class
FROM carsfull
WHERE rtrim(maker) <> ''
AND price_eur > 10000
AND price_eur < 100000 ;
The new table has following number of records: 1.25 million

The new table CARS_CLEAN1 will be used to further analyze the clean data.
There is a possibility for additional categorical column creation based on the data range .
Analysis:
There are 44 makers with large volume of cars from Audi , Mercedes-Benz, BMW and Volkswagen

Large Volume of cars are from year 2015 , followed by 2012 and 2014

Data based on Mileage is not clean to make an meaningful interpretation


Analysis of new Car_class column reveals that majority of the cars are latest modles and volume for
Vintage , Antique and Class class is too low . Also , ther’s large value classification not available .
Report:

Maker vs Volume pareto chart shows that Audi, Mercedes-Benz, BMW and Volkswage have large
volume of cars in the market .

Maker vs Average price pareto chart shows that lamborghini , rolls-ryce , tesla , Aston-martin ,
Bentley , Porsche and Maserati shows large average price of cars in the market .

Models with high Average price are less in volume compared to models with lower average price
Fuel Type vs Price:
Diesel vehicles are pricer compare to Gasoline .

Conclusion:

There is a large volume of Diesel Cars with Avergage price greater than the gasoline cars. Car Models with
higer average price are low in volume in comparision to low average proced cars.

Audi, Mercedes-Benz, BMW and Volkswage are in large volumes compare to reset othercar makers.

You might also like