Lab1 for module3- Python code (1)
Lab1 for module3- Python code (1)
Hoora Fakhrmoosavy
The US Cars Dataset contains scraped data from the online North
American Car auction. It contains information about 28 car brands for sale
in the US. In this post, we will perform exploratory data analysis on the US
Cars Dataset.
Next, let’s remove the default display limits for Pandas data frames:
pd.set_option('display.max_columns', None)
years = df.year.unique()
We can also look at the most common brands for white cars:
from collections import Counter
print(dict(Counter(df_d1['brand']).most_common(5)))
df.info()
import matplotlib
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (8,
6)matplotlib.rcParams['figure.facecolor'] = '#00000000'
Let’s find popular models:
import plotly.express as px
fig.show()
The better way to study this relationship is to consider the age of car than the year when it was released.
Let's add another column in the dataframe for the age of car. The age is calculated with the help of datetime
library in Python.
import datetime
On the logrithmic scale, the visualization becomes much clearer than before and the inverse relationship is
more obvious.
models= df.groupby('model')['model'].count()
models = pd.DataFrame(models)
models = models.head(5)
models.plot.bar();
plt.title('Preferred models')
plt.xlabel('models')
plt.ylabel('No. of Cars');
Finding Top brands in our database:
topbrands= df.groupby('brand')['brand'].count()
topbrands = pd.DataFrame(topbrands)
topbrands = topbrands.head(10)
topbrands.plot.bar();
plt.title('Famous Brands')
plt.xlabel('Brands')
plt.ylabel('No. of Cars');
Most Expensive Car Brands:
expensive= df.groupby('brand')['price'].mean()
expensive = pd.DataFrame(expensive)
expensive = expensive.head(10)
expensive.plot.bar();
plt.title('Expensive Brands')
plt.xlabel('Car Brands')
plt.ylabel('No. of Cars');
Let’s look at Distribution of Price:
plt.title('Distribution of Price')
plt.xlabel('Price')
plt.ylabel('No. of Samples')
plt.xlim(1000, 5000);
keys = []
for i in dict(Counter(df[categorical_column].values).most_common(limit)):
keys.append(i)
print(keys)
df_new = df[df[categorical_column].isin(keys)]
sns.set()
plt.show()
sns.set(style='darkgrid')
sns.boxplot(x='brand', y='price', data=df).set_title("Price Distribution of Different Brands")
Question3: Cars from which release years are most cheapest (on
average) in database for the release years beyond 2000?
Question4: Which brand cars have covered most mileage on the roads?