0% found this document useful (0 votes)

11 views11 pages

ml dataset performance

Uploaded by

Rutuja Jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views11 pages

ml dataset performance

Uploaded by

Rutuja Jadhav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

[1]: import pandas as pd

[6]: #1.load the dataset

df=pd.read_csv("titanic.csv")

[8]: #2.Inspect the data

#TO print first few rows
print(df.head())

PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age SibSp \

0 Braund, Mr. Owen Harris male 22.0 1
1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1
2 Heikkinen, Miss. Laina female 26.0 0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1
4 Allen, Mr. William Henry male 35.0 0

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S

[4]: #To print summary of dataset

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----

1
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None

[5]: #to print summary statistics

print(df.describe())

PassengerId Survived Pclass Age SibSp \

count 891.000000 891.000000 891.000000 714.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008
std 257.353842 0.486592 0.836071 14.526497 1.102743
min 1.000000 0.000000 1.000000 0.420000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000
50% 446.000000 0.000000 3.000000 28.000000 0.000000
75% 668.500000 1.000000 3.000000 38.000000 1.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000

Parch Fare
count 891.000000 891.000000
mean 0.381594 32.204208
std 0.806057 49.693429
min 0.000000 0.000000
25% 0.000000 7.910400
50% 0.000000 14.454200
75% 0.000000 31.000000
max 6.000000 512.329200

[9]: #3.Clean the data

#Rename the columns
df.rename(columns={'Pclass': 'PassengerClass','SibSp':
↪'SiblingSpouses'},inplace=True)

[10]: #drop the unnecessary columns

df.drop(['Cabin','Ticket',],axis=1,inplace=True)

2
[11]: #check for duplicates
print(f"Duplicates:{df.duplicated().sum()}")
df.drop_duplicates(inplace=True)

Duplicates:0

[12]: #4.handling missing values

#check for missing values
print(df.isnull().sum())

PassengerId 0
Survived 0
PassengerClass 0
Name 0
Sex 0
Age 177
SiblingSpouses 0
Parch 0
Fare 0
Embarked 2
dtype: int64

[14]: # Fill missing 'Age' with median

df['Age'].fillna(df['Age'].median(), inplace=True)

C:\Users\alish\AppData\Local\Temp\ipykernel_18424\1672961352.py:2:
FutureWarning: A value is trying to be set on a copy of a DataFrame or Series
through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work
because the intermediate object on which we are setting values always behaves as
a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using

'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value)
instead, to perform the operation inplace on the original object.

df['Age'].fillna(df['Age'].median(), inplace=True)

[15]: # Drop rows with missing 'Embarked'

df.dropna(subset=['Embarked'], inplace=True)

[16]: #5.Perform Basix Dataframe operations

# selecting and filtering
#select passenger age above 30
older_passenger=df[df['Age']>30]
print(older_passenger.head())

3
PassengerId Survived PassengerClass \
1 2 1 1
3 4 1 1
4 5 0 3
6 7 0 1
11 12 1 1

Name Sex Age \

1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0
4 Allen, Mr. William Henry male 35.0
6 McCarthy, Mr. Timothy J male 54.0
11 Bonnell, Miss. Elizabeth female 58.0

SiblingSpouses Parch Fare Embarked

1 1 0 71.2833 C
3 1 0 53.1000 S
4 0 0 8.0500 S
6 0 0 51.8625 S
11 0 0 26.5500 S

[17]: #Sorting
#sort passenger by age
df.sort_values(by='Age',ascending=False,inplace=True)
print(df.head())

PassengerId Survived PassengerClass \

630 631 1 1
851 852 0 3
493 494 0 1
96 97 0 1
116 117 0 3

Name Sex Age SiblingSpouses Parch \

Fare Embarked
630 30.0000 S
851 7.7750 S
493 49.5042 C
96 34.6542 C
116 7.7500 Q

4
[19]: #Aggregation
#group by passengerClass and find the average fare
avg_fare=df.groupby('PassengerClass')['Fare'].mean()
print(avg_fare)

PassengerClass
1 84.193516
2 20.662183
3 13.675550
Name: Fare, dtype: float64

[20]: #display 10 first rows

df.head(10)

[20]: PassengerId Survived PassengerClass \

630 631 1 1
851 852 0 3
493 494 0 1
96 97 0 1
116 117 0 3
672 673 0 2
745 746 0 1
33 34 0 2
456 457 0 1
54 55 0 1

Name Sex Age SiblingSpouses Parch \

630 Barkworth, Mr. Algernon Henry Wilson male 80.0 0 0
851 Svensson, Mr. Johan male 74.0 0 0
493 Artagaveytia, Mr. Ramon male 71.0 0 0
96 Goldschmidt, Mr. George B male 71.0 0 0
116 Connors, Mr. Patrick male 70.5 0 0
672 Mitchell, Mr. Henry Michael male 70.0 0 0
745 Crosby, Capt. Edward Gifford male 70.0 1 1
33 Wheadon, Mr. Edward H male 66.0 0 0
456 Millet, Mr. Francis Davis male 65.0 0 0
54 Ostby, Mr. Engelhart Cornelius male 65.0 0 1

Fare Embarked
630 30.0000 S
851 7.7750 S
493 49.5042 C
96 34.6542 C
116 7.7500 Q
672 10.5000 S
745 71.0000 S
33 10.5000 S

5
456 26.5500 S
54 61.9792 C

[21]: #print total number of ros and columns

df.shape

[21]: (889, 10)

[22]: #print missing values

df.isnull().sum()

[22]: PassengerId 0
Survived 0
PassengerClass 0
Name 0
Sex 0
Age 0
SiblingSpouses 0
Parch 0
Fare 0
Embarked 0
dtype: int64

[23]: #rename
df.rename(columns={'Pclass':'PassengerClass','SibSp':
↪'SiblingsSpouses'},inplace=True)

[24]: df

[24]: PassengerId Survived PassengerClass \

630 631 1 1
851 852 0 3
493 494 0 1
96 97 0 1
116 117 0 3
.. … … …
831 832 1 2
644 645 1 3
469 470 1 3
755 756 1 2
803 804 1 3

Name Sex Age SiblingSpouses \

630 Barkworth, Mr. Algernon Henry Wilson male 80.00 0
851 Svensson, Mr. Johan male 74.00 0
493 Artagaveytia, Mr. Ramon male 71.00 0
96 Goldschmidt, Mr. George B male 71.00 0

6
116 Connors, Mr. Patrick male 70.50 0
.. … … … …
831 Richards, Master. George Sibley male 0.83 1
644 Baclini, Miss. Eugenie female 0.75 2
469 Baclini, Miss. Helene Barbara female 0.75 2
755 Hamalainen, Master. Viljo male 0.67 1
803 Thomas, Master. Assad Alexander male 0.42 0

Parch Fare Embarked

630 0 30.0000 S
851 0 7.7750 S
493 0 49.5042 C
96 0 34.6542 C
116 0 7.7500 Q
.. … … …
831 1 18.7500 S
644 1 19.2583 C
469 1 19.2583 C
755 1 14.5000 S
803 1 8.5167 C

[889 rows x 10 columns]

[29]: #drop rows with missing values in the Embarked columns

df.dropna(subset=['Embarked'],inplace=True)

[30]: #select the Name,Age and Fare columns and display first 5 rows
selected_columns=df[['Name','Age','Fare']]
print(selected_columns.head())

Name Age Fare

630 Barkworth, Mr. Algernon Henry Wilson 80.0 30.0000
851 Svensson, Mr. Johan 74.0 7.7750
493 Artagaveytia, Mr. Ramon 71.0 49.5042
96 Goldschmidt, Mr. George B 71.0 34.6542
116 Connors, Mr. Patrick 70.5 7.7500

[34]: #Aged above 30 and who paid a fare greater then 50

passenger=df[(df['Age']>30)&(df['Fare']>50)]
print(passenger)

PassengerId Survived PassengerClass \

745 746 0 1
54 55 0 1
438 439 0 1
275 276 1 1
366 367 1 1

7
.. … … …
867 868 0 1
215 216 1 1
671 672 0 1
318 319 1 1
690 691 1 1

Name Sex Age \

745 Crosby, Capt. Edward Gifford male 70.0
54 Ostby, Mr. Engelhart Cornelius male 65.0
438 Fortune, Mr. Mark male 64.0
275 Andrews, Miss. Kornelia Theodosia female 63.0
366 Warren, Mrs. Frank Manley (Anna Sophia Atkinson) female 60.0
.. … … …
867 Roebling, Mr. Washington Augustus II male 31.0
215 Newell, Miss. Madeleine female 31.0
671 Davidson, Mr. Thornton male 31.0
318 Wick, Miss. Mary Natalie female 31.0
690 Dick, Mr. Albert Adrian male 31.0

SiblingSpouses Parch Fare Embarked

745 1 1 71.0000 S
54 0 1 61.9792 C
438 1 4 263.0000 S
275 1 0 77.9583 S
366 1 0 75.2500 C
.. … … … …
867 0 0 50.4958 S
215 1 0 113.2750 C
671 1 0 52.0000 S
318 0 2 164.8667 S
690 1 0 57.0000 S

[81 rows x 10 columns]

[37]: #sort the dataset by fare in descending order and display the top 10 passengers
s_passenger=df.sort_values(by='Fare',ascending=False)
T_passenger=s_passenger.head(10)
print(T_passenger)

PassengerId Survived PassengerClass \

737 738 1 1
258 259 1 1
679 680 1 1
27 28 0 1
341 342 1 1
88 89 1 1
438 439 0 1

8
311 312 1 1
742 743 1 1
299 300 1 1

Name Sex Age \

737 Lesurer, Mr. Gustave J male 35.0
258 Ward, Miss. Anna female 35.0
679 Cardeza, Mr. Thomas Drake Martinez male 36.0
27 Fortune, Mr. Charles Alexander male 19.0
341 Fortune, Miss. Alice Elizabeth female 24.0
88 Fortune, Miss. Mabel Helen female 23.0
438 Fortune, Mr. Mark male 64.0
311 Ryerson, Miss. Emily Borie female 18.0
742 Ryerson, Miss. Susan Parker "Suzette" female 21.0
299 Baxter, Mrs. James (Helene DeLaudeniere Chaput) female 50.0

SiblingSpouses Parch Fare Embarked

737 0 0 512.3292 C
258 0 0 512.3292 C
679 0 1 512.3292 C
27 3 2 263.0000 S
341 3 2 263.0000 S
88 3 2 263.0000 S
438 1 4 263.0000 S
311 2 2 262.3750 C
742 2 2 262.3750 C
299 0 1 247.5208 C

[41]: #avgrage age of passengers in each class

avg_age=df.groupby('PassengerClass')['Age'].mean()
print("Average Age by PassengerClass:",avg_age)

Average Age by PassengerClass: PassengerClass

1 36.688879
2 29.765380
3 25.932627
Name: Age, dtype: float64

[42]: #the total fare paid by passengers in each class

total_fare=df.groupby('PassengerClass')['Fare'].sum()
print("Total fare by Passenger",total_fare)

Total fare by Passenger PassengerClass

1 18017.4125
2 3801.8417
3 6714.6951
Name: Fare, dtype: float64

9
[43]: #find the survival rate for each PassengerClasss
survival_rate=df.groupby('PassengerClass')['Survived'].mean()
print("survival rate by PassengerClass:",survival_rate)

survival rate by PassengerClass: PassengerClass

1 0.626168
2 0.472826
3 0.242363
Name: Survived, dtype: float64

[44]: #check fro duplicate row

duplicates=df.duplicated()
print("Number of duplicate row",duplicates)

Number of duplicate row 630 False

851 False
493 False
96 False
116 False
…
831 False
644 False
469 False
755 False
803 False
Length: 889, dtype: bool

[50]: #remoce duplicates if they exist and verify the total number of rows
df=df.drop_duplicates()
print("Total number of rows after removing duplicates: ",{len(df)})

Total number of rows after removing duplicates: {889}

[51]: #count the missing values

missing=df.isnull().any(axis=1).sum()
print("number of rows with missing values:",missing)

number of rows with missing values: 0

[54]: #1.Add a new column Familysize by summing SiblingSpouses and Parch Columns
df['FamilySize'] = df['SiblingSpouses'] + df['Parch']

[56]: #create new columns and set it to true if FamilySize is 0,otherwise false
df['IsAlone']=df['FamilySize']==0

[57]: #save the cleaned dataset to new csv file named cleaned_titanic.csv
df.to_csv('cleaned_titatic.csv',index=False)
print("Dataset cleand and saved as 'cleaned_titanic.csv'.")

10
Dataset cleand and saved as 'cleaned_titanic.csv'.

[ ]:

We Want It All Vitras 360 Degrees Market
No ratings yet
We Want It All Vitras 360 Degrees Market
11 pages
A Billion Suns: Interstellar Fleet Battles
From Everand
A Billion Suns: Interstellar Fleet Battles
Mike Hutchinson
1/5 (1)
Reinforced Cement Concrete Questions
91% (11)
Reinforced Cement Concrete Questions
6 pages
Pandas PD: Import As
No ratings yet
Pandas PD: Import As
19 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
PANDAS groupby continues 2
No ratings yet
PANDAS groupby continues 2
5 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Titanic Survival Prediction 1692609491
No ratings yet
Titanic Survival Prediction 1692609491
15 pages
Titanic Survival Prediction Ml
No ratings yet
Titanic Survival Prediction Ml
36 pages
Titanic Classification
100% (1)
Titanic Classification
7 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
Python for Machine Learning
No ratings yet
Python for Machine Learning
33 pages
dspracticalexternak23aug
No ratings yet
dspracticalexternak23aug
8 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Titanic eda
No ratings yet
Titanic eda
17 pages
✌️???? ????????????✌️???? ??????
No ratings yet
✌️???? ????????????✌️???? ??????
63 pages
Machine Learning Notebook
No ratings yet
Machine Learning Notebook
19 pages
seaborn ploting in titanic
No ratings yet
seaborn ploting in titanic
18 pages
LOGISTIC_REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC_REGRESSION - Jupyter Notebook
18 pages
178 - NaiveBaye's.ipynb - Colab
No ratings yet
178 - NaiveBaye's.ipynb - Colab
3 pages
Titanic Data
No ratings yet
Titanic Data
5 pages
PRAC3_23BME053
No ratings yet
PRAC3_23BME053
5 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
No ratings yet
Passengerid Survived Pclass Name Sex Age Sibsp Parch Ticket
16 pages
Onkar exp 3 - Jupyter Notebook
No ratings yet
Onkar exp 3 - Jupyter Notebook
2 pages
assignment1
No ratings yet
assignment1
2 pages
Logistic Regression On Titanic Dataset
No ratings yet
Logistic Regression On Titanic Dataset
6 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Titanic
100% (2)
Titanic
13 pages
day20
No ratings yet
day20
5 pages
Titanic
No ratings yet
Titanic
22 pages
Lab 5.Ipynb - Colab
No ratings yet
Lab 5.Ipynb - Colab
6 pages
Day 20
No ratings yet
Day 20
5 pages
Pandas Day 4
No ratings yet
Pandas Day 4
7 pages
Pandas Toolkit
No ratings yet
Pandas Toolkit
44 pages
Pythion Assigment
No ratings yet
Pythion Assigment
3 pages
Unit 5 Analysis with Pandas in python
No ratings yet
Unit 5 Analysis with Pandas in python
26 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
Assign9.Ipynb - Colab
No ratings yet
Assign9.Ipynb - Colab
4 pages
ML 3
No ratings yet
ML 3
9 pages
2524c225-2e58-4d21-8bba-8fda084be465_Programs_Week_10
No ratings yet
2524c225-2e58-4d21-8bba-8fda084be465_Programs_Week_10
11 pages
ML File 211173
No ratings yet
ML File 211173
19 pages
The Titanic dataset
No ratings yet
The Titanic dataset
6 pages
vertopal.com_homework1
No ratings yet
vertopal.com_homework1
17 pages
ds9
No ratings yet
ds9
12 pages
Learneverythingai 1695069129
No ratings yet
Learneverythingai 1695069129
56 pages
Aiml Lab04&5 - Output
No ratings yet
Aiml Lab04&5 - Output
18 pages
ML Lab File
No ratings yet
ML Lab File
19 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Assign8.ipynb - Colab
No ratings yet
Assign8.ipynb - Colab
14 pages
Assignment2_DMS672
No ratings yet
Assignment2_DMS672
15 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
No ratings yet
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
12 pages
BD WPS2
No ratings yet
BD WPS2
11 pages
Mastering the International Mathematics Olympiad - Class 3 Workbook
From Everand
Mastering the International Mathematics Olympiad - Class 3 Workbook
u-smartkid academy
No ratings yet
Frontend Development certificate Infosys springboard (1)
No ratings yet
Frontend Development certificate Infosys springboard (1)
1 page
certificate typing
No ratings yet
certificate typing
1 page
Final Report Rutu
No ratings yet
Final Report Rutu
19 pages
DOC-20250105-WA0007.
No ratings yet
DOC-20250105-WA0007.
8 pages
new abstract cnn
No ratings yet
new abstract cnn
1 page
Beamex MC5 Profibus Option Manual ENG
No ratings yet
Beamex MC5 Profibus Option Manual ENG
40 pages
Final Report AIRport
No ratings yet
Final Report AIRport
246 pages
RHMPP Cebu
No ratings yet
RHMPP Cebu
5 pages
Biopharmax Profile PDF
No ratings yet
Biopharmax Profile PDF
4 pages
Research methods for social work 7th ed., International ed Edition Babbie - Download the ebook now and own the full detailed content
No ratings yet
Research methods for social work 7th ed., International ed Edition Babbie - Download the ebook now and own the full detailed content
52 pages
Linux Admin III
100% (4)
Linux Admin III
230 pages
The Leadership Kaleidoscope How Organizations Can Help Leaders Meet Their New Mandate
No ratings yet
The Leadership Kaleidoscope How Organizations Can Help Leaders Meet Their New Mandate
11 pages
LNA
No ratings yet
LNA
24 pages
9YZ-05817-0024-DEZZA LR13.3.L RRH FDD Technical Description 0.08 Preliminary February 2014
No ratings yet
9YZ-05817-0024-DEZZA LR13.3.L RRH FDD Technical Description 0.08 Preliminary February 2014
134 pages
1. Literary Hermeneutics
No ratings yet
1. Literary Hermeneutics
30 pages
Scheduling 1
No ratings yet
Scheduling 1
65 pages
IMRAD Form (Edited)
No ratings yet
IMRAD Form (Edited)
15 pages
Graphic Sedimentary Log PDF
100% (1)
Graphic Sedimentary Log PDF
2 pages
Coindcx Strategies
No ratings yet
Coindcx Strategies
4 pages
ASME B30.25 (2018) Scrap and Material Handlers
100% (5)
ASME B30.25 (2018) Scrap and Material Handlers
44 pages
L-1.2: Thermal Energy Section A: Choose The Best Answer
No ratings yet
L-1.2: Thermal Energy Section A: Choose The Best Answer
17 pages
Blooms
No ratings yet
Blooms
6 pages
ARQ Protocol
No ratings yet
ARQ Protocol
31 pages
Detroid SERIES 60 PDF
100% (1)
Detroid SERIES 60 PDF
10 pages
MAN002 FDE InventoryManagementCockpit v3.4
No ratings yet
MAN002 FDE InventoryManagementCockpit v3.4
43 pages
Electric Motor Lab 2 PDF
No ratings yet
Electric Motor Lab 2 PDF
6 pages
Department of Mechanical Engineering: SJB Institute of Technology
No ratings yet
Department of Mechanical Engineering: SJB Institute of Technology
27 pages
Buy ebook (Ebook) Airport Design and Operation by Antonin Kazda, Robert E. Caves ISBN 9781784418700, 1784418706 cheap price
100% (1)
Buy ebook (Ebook) Airport Design and Operation by Antonin Kazda, Robert E. Caves ISBN 9781784418700, 1784418706 cheap price
82 pages
AAI Delivery Enablement Guide v1.8 September 2021
No ratings yet
AAI Delivery Enablement Guide v1.8 September 2021
27 pages
Benjamin Button Essay
100% (2)
Benjamin Button Essay
3 pages
Comprehensive Exam - Cognitivism and Behaviorism
No ratings yet
Comprehensive Exam - Cognitivism and Behaviorism
11 pages
Allen Et Al-2018-Journal of Family Theory & Review
No ratings yet
Allen Et Al-2018-Journal of Family Theory & Review
17 pages
Faqs On BNPL Process in Csi V3.0
No ratings yet
Faqs On BNPL Process in Csi V3.0
17 pages

ml dataset performance

Uploaded by

ml dataset performance

Uploaded by

[1]: import pandas as pd

[6]: #1.load the dataset

[8]: #2.Inspect the data

PassengerId Survived Pclass \

Name Sex Age SibSp \

Parch Ticket Fare Cabin Embarked

[4]: #To print summary of dataset

[5]: #to print summary statistics

PassengerId Survived Pclass Age SibSp \

[9]: #3.Clean the data

[10]: #drop the unnecessary columns

[12]: #4.handling missing values

[14]: # Fill missing 'Age' with median

For example, when doing 'df[col].method(value, inplace=True)', try using

[15]: # Drop rows with missing 'Embarked'

[16]: #5.Perform Basix Dataframe operations

Name Sex Age \

SiblingSpouses Parch Fare Embarked

PassengerId Survived PassengerClass \

Name Sex Age SiblingSpouses Parch \

[20]: #display 10 first rows

[20]: PassengerId Survived PassengerClass \

Name Sex Age SiblingSpouses Parch \

[21]: #print total number of ros and columns

[21]: (889, 10)

[22]: #print missing values

[24]: PassengerId Survived PassengerClass \

Name Sex Age SiblingSpouses \

Parch Fare Embarked

[889 rows x 10 columns]

[29]: #drop rows with missing values in the Embarked columns

Name Age Fare

[34]: #Aged above 30 and who paid a fare greater then 50

PassengerId Survived PassengerClass \

Name Sex Age \

SiblingSpouses Parch Fare Embarked

[81 rows x 10 columns]

PassengerId Survived PassengerClass \

Name Sex Age \

SiblingSpouses Parch Fare Embarked

[41]: #avgrage age of passengers in each class

Average Age by PassengerClass: PassengerClass

[42]: #the total fare paid by passengers in each class

Total fare by Passenger PassengerClass

survival rate by PassengerClass: PassengerClass

[44]: #check fro duplicate row

Number of duplicate row 630 False

Total number of rows after removing duplicates: {889}

[51]: #count the missing values

number of rows with missing values: 0

You might also like