Dsbda 10
Dsbda 10
df = pd.read_csv('iris.csv')
df.head()
How many features are there and what are their types
(e.g., numeric, nominal)?
In [3]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
In [4]: np.unique(df["Species"])
In [5]: df.describe()
Out[5]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
for i in range(4):
x, y = i � 2, i % 2
axes[x, y].hist(df[df.columns[i + 1]])
axes[x, y].set_title(f"Distribution of {df.columns[i + 1][:-2]}")
If we observe closely for the box 2, interquartile distance is roughly around 0.75
hence the values lying beyond this range of (third quartile + interquartile distance) i.e.
roughly around 4.05 will be considered as outliers. Similarly outliers with other
boxplots can be found.