U19ADS2035-Python For Data Science Laboratory Page No:17
U19ADS2035-Python For Data Science Laboratory Page No:17
Ex. No:4
PERFORM ALL BASIC DATA PREPROCESSING STEPS ON THE GIVEN DATASET
Date:
PROGRAM:
#4.to perform all basic data pre.processing steps on the given data set.
import pandas as pd
train=pd.read_csv("C://Users//admin//Downloads//train.csv")
df=train.copy()
print("The first 5 rows:\n")
print(df.head())
print("last 5 rows:\n")
print(df.tail())
print("n_samples x n_features\n")
print(df.shape)
print("List of all the columns\n")
print(df.columns)
print("Rows index\n")
print(df.index)
print("General description of dataset.\n")
print(df.describe())
print("Counting null values in whole dataset:\n")
print(df.isnull().sum())
print("Counting null value on a particular column:\n")
df['Age'].isnull().sum()
"""Handling Missing Values"""
df['Age'].fillna(df['Age'].mean(),inplace=True)
print("After Handling Null values:\n")
print(df['Age'].isnull().sum())
print(df.head())
OUTPUT:
The first 5 rows:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
[5 rows x 12 columns]
last 5 rows:
PassengerId Survived Pclass ... Fare Cabin Embarked
886 887 0 2 ... 13.00 NaN S
887 888 1 1 ... 30.00 B42 S
888 889 0 3 ... 23.45 NaN S
889 890 1 1 ... 30.00 C148 C
890 891 0 3 ... 7.75 NaN Q
[5 rows x 12 columns]
n_samples x n_features
(891, 12)
List of all the columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
Rows index
RangeIndex(start=0, stop=891, step=1)
General description of dataset.
[8 rows x 7 columns]
Counting null values in whole dataset:
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
Counting null value on a particular column:
After Handling Null values:
0
PassengerId Survived Pclass ... Fare Cabin Embarked
[5 rows x 12 columns]