0% found this document useful (0 votes)
12 views

U19ADS2035-Python For Data Science Laboratory Page No:17

Uploaded by

sailesh lal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

U19ADS2035-Python For Data Science Laboratory Page No:17

Uploaded by

sailesh lal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Register Number:61781922110041

Ex. No:4
PERFORM ALL BASIC DATA PREPROCESSING STEPS ON THE GIVEN DATASET
Date:

U19ADS2035- PYTHON FOR DATA SCIENCE LABORATORY Page No:17


Register Number:61781922110041

PROGRAM:
#4.to perform all basic data pre.processing steps on the given data set.
import pandas as pd
train=pd.read_csv("C://Users//admin//Downloads//train.csv")
df=train.copy()
print("The first 5 rows:\n")
print(df.head())
print("last 5 rows:\n")
print(df.tail())
print("n_samples x n_features\n")
print(df.shape)
print("List of all the columns\n")
print(df.columns)
print("Rows index\n")
print(df.index)
print("General description of dataset.\n")
print(df.describe())
print("Counting null values in whole dataset:\n")
print(df.isnull().sum())
print("Counting null value on a particular column:\n")
df['Age'].isnull().sum()
"""Handling Missing Values"""
df['Age'].fillna(df['Age'].mean(),inplace=True)
print("After Handling Null values:\n")
print(df['Age'].isnull().sum())
print(df.head())

U19ADS2035- PYTHON FOR DATA SCIENCE LABORATORY Page No:18


Register Number:61781922110041

OUTPUT:
The first 5 rows:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S

[5 rows x 12 columns]
last 5 rows:
PassengerId Survived Pclass ... Fare Cabin Embarked
886 887 0 2 ... 13.00 NaN S
887 888 1 1 ... 30.00 B42 S
888 889 0 3 ... 23.45 NaN S
889 890 1 1 ... 30.00 C148 C
890 891 0 3 ... 7.75 NaN Q

[5 rows x 12 columns]
n_samples x n_features
(891, 12)
List of all the columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
Rows index
RangeIndex(start=0, stop=891, step=1)
General description of dataset.

U19ADS2035- PYTHON FOR DATA SCIENCE LABORATORY Page No:19


Register Number:61781922110041

PassengerId Survived Pclass ... SibSp Parch Fare


count 891.000000 891.000000 891.000000 ... 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 ... 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 ... 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 ... 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 ... 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 ... 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 ... 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 ... 8.000000 6.000000 512.329200

[8 rows x 7 columns]
Counting null values in whole dataset:

PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype: int64
Counting null value on a particular column:
After Handling Null values:
0
PassengerId Survived Pclass ... Fare Cabin Embarked

U19ADS2035- PYTHON FOR DATA SCIENCE LABORATORY Page No:20


Register Number:61781922110041

0 1 0 3 ... 7.2500 NaN S


1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S

[5 rows x 12 columns]

U19ADS2035- PYTHON FOR DATA SCIENCE LABORATORY Page No:21

You might also like