0% found this document useful (0 votes)

34 views34 pages

Titanic Survival Prediction Using Machine Learning

Uploaded by

brucewayne.07690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views34 pages

Titanic Survival Prediction Using Machine Learning

Uploaded by

brucewayne.07690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 34

EMERALD VALLEY PUBLIC SCHOOL

ACADEMIC YEAR : 2024 – 25

PROJECT REPORT ON

TITANIC SURVIVAL PREDICTION

USING MACHINE LEARNING

Submitted to :
Mrs.Priya P B.Sc.,M.C.A,B.Ed.,M.Phil
PGT(CS)
Emerald Valley Public School,
Salem – 636008
Tamilnadu
EMERALD VALLEY PUBLIC SCHOOL

CERTIFICATE

This is to certify that NIVASINI M V, Roll. No. :

has successfully completed the project work entitled “TITANIC
SURVIVAL PREDICTION USING MACHINE LEARNING” in
the subject Data Science(844) laid down in the regulations of CBSE
for the purpose of Practical Examination in class XII to be held in
Emerald Valley Public School, Yercaud Foothills, Salem – 636008,
during the academic year 2024 – 25

Priya P

Name :
Signature :
Date :
ACKNOWLEDGEMENT

First and foremost, I owe my wholehearted thanks to my parents for their love,
encouragement and moral support for completing this project.

I sincerely appreciate our Principal Mr. K. Manimaran for permitting access to the well-
equipped lab and the resources required for the project.

I am deeply appreciative of my project mentor, Priya P, for offering invaluable guidance

and motivation throughout the project. She carefully monitored my progress, clarified my
uncertainties, and provided constructive feedback that improved the quality of my
project.

My special thanks to my classmates who were incredibly helpful. They assisted me in

various stages of the project by providing useful insights, engaging in brainstorming
sessions, and providing support .

The encouragement from my teacher, principal and friends was invaluable. I will always

remain grateful for their support.

TITANIC SURVIVAL PREDICTION

USING MACHINE LEARNING

PROJECT DONE BY : NIVASINI M V

CONTENT
SERIAL
DESCRIPTION PAGE NO.
NO.

1 PROBLEM DEFINITION

2 REQUIREMENTS

3 INTRODUCTION

4 EXPLARATORY DATA ANALYSIS

5 SOURCE CODE

6 PREDICTION

7 VISUALIZATION

8 BIBILIOGRAPHY
TITANIC SURVIVAL PREDICTION
USING MACHINE LEARNING

PROBLEM DEFINITION

The Titanic Survival Prediction problem is a classic

machine learning challenge that involves predicting whether a
passenger survived or perished in the tragic sinking of the
RMS Titanic. The dataset typically consists of historical
passenger information, such as age, gender, class, ticket fare,
and other attributes, alongside the target variable indicating
whether the passenger survived the disaster (binary outcome:
1 = survived, 0 = did not survive). This problem is widely
used in data science education and competitions (such as on
Kaggle) to help beginners practice machine learning concepts,
data preprocessing, and model evaluation.
The goal of this problem is to develop a machine learning
model that can predict whether a passenger survived or not
based on a set of features describing the passenger’s attributes.
HARDWARE AND SOFTWARE
REQUIREMENTS

HARDWARE REQUIRED
 Printer, to print the required documents of the project
 Drive
 Processor : Inter i5
 Ram : 4GM and above
 Hard Disk : 1 TB

SOFTWARE REQUIRED
 Operating System : Windows 11
 Jupyter notebook
 Python
 Visual Studio Code
 MS word (for preparing and presenting the project)
INTRODUCTION
Introduction to Machine Learning

Machine Learning (ML) is a branch of artificial intelligence (AI) that enables systems to
automatically learn and improve from experience without being explicitly programmed.
In other words, machine learning allows computers to identify patterns, make decisions,
and improve performance through exposure to data, rather than following strict, pre-
defined rules.

At its core, machine learning involves the development of algorithms that can analyze
data, learn from it, and then make predictions or decisions based on that learning.
Machine learning has become a cornerstone of AI, and it is used in a variety of fields
such as healthcare, finance, e-commerce, entertainment, and more.

Key Concepts in Machine Learning

1. Data: The foundation of machine learning. It consists of input features (also called
variables or attributes) and labels (the outcome you want to predict). The quality
and quantity of data are critical for the success of machine learning models.

2. Algorithms: Machine learning algorithms are the methods used to find patterns in
the data. There are different types of algorithms based on the problem you're trying
to solve.

3. Model: A model is the output of a machine learning algorithm after it has been
trained on data. It represents the patterns or relationships the algorithm has learned
and is used to make predictions or decisions on new data.
4. Training: The process of feeding data into a machine learning algorithm to help it
learn the underlying patterns.

5. Testing: After a model has been trained, it is evaluated on new, unseen data (test
data) to assess its performance and ability to generalize to real-world situations.

Types of Machine Learning

Machine learning is typically classified into three major categories:

1. Supervised Learning:
o In supervised learning, the algorithm is trained on a labeled dataset, meaning
the input data is paired with the correct output (or label).
o The goal is for the model to learn a mapping from inputs to outputs, so it can
predict the output for new, unseen inputs.
o Examples: Classification (e.g., spam email detection) and Regression (e.g.,
predicting house prices based on features like size, location, etc.).

2. Unsupervised Learning:
o In unsupervised learning, the algorithm is given data without labels and
must find the underlying structure or patterns in the data on its own.
o The goal is often to discover hidden structures like clusters or associations in
the data.
o Examples: Clustering (e.g., grouping similar customers based on purchasing
behavior) and Dimensionality Reduction (e.g., reducing the number of
variables in a dataset).
Data Requirements

The Titanic dataset typically includes the following features (columns):

Binary Classification Problem: Given a set of features (e.g., gender, age, class, etc.), the
model's task is to classify whether a passenger survived (1) or did not survive (0).
PassengerId: Unique identifier for each passenger (typically not used for prediction).
Pclass: Passenger class (1, 2, or 3), representing the socio-economic status of the
passenger. 1st class (highest) to 3rd class (lowest).
Name: Name of the passenger (could be useful for extracting titles such as Mr., Mrs.,
etc., but typically not directly used).
Sex: Gender of the passenger (categorical: male or female).
Age: Age of the passenger (continuous variable, often with missing values).
SibSp: Number of siblings or spouses aboard the Titanic.
Parch: Number of parents or children aboard the Titanic.
Ticket: Ticket number (could be used for extracting patterns, but often not directly used).
Fare: The fare the passenger paid for the ticket (continuous variable).
Cabin: Cabin number (with many missing values; could be partially used or omitted).
Embarked: Port of embarkation (categorical: C = Cherbourg, Q = Queenstown, S =
Southampton).
Survived (Target): Whether the passenger survived (1) or did not survive (0).
Additional features, such as Title (extracted from the Name column), or even Family size
(SibSp + Parch), might also be engineered during preprocessing.
Types of Models Used:

The Titanic survival prediction is a binary classification problem, where several machine
learning algorithms can be employed:
Logistic Regression: A simple and interpretable model for binary classification.
Decision Trees: A non-linear model that splits the data based on feature values. Often
prone to overfitting.
Random Forest: An ensemble of decision trees that reduces overfitting by averaging
multiple trees.
Support Vector Machines (SVMs): A classifier that finds the optimal hyperplane to
separate classes.
K-Nearest Neighbors (KNN): A simple algorithm that classifies based on the majority
class of the nearest neighbors.
Gradient Boosting (XGBoost, LightGBM, CatBoost): Ensemble methods that combine
multiple weak learners (decision trees) to improve accuracy.
Neural Networks: Deep learning models, though generally not necessary for this dataset’s
complexity, can still be used for more advanced approaches.
DATA SET
1. gender_submission

2. train

Exploratory Data Analysis (EDA)

1. Data Collection: Obtain the Titanic dataset, which is typically available as CSV files,
from sources like Kaggle or other open data repositories.

2.Data Preprocessing: Handling Missing Data: Many columns, like Age and Cabin, have
missing values that need to be imputed or removed.

Categorical Encoding: Convert categorical features (like Sex and Embarked) into
numerical form using one-hot encoding or label encoding.

Feature Engineering: Create new features such as family size (SibSp + Parch) or extract
titles from the Name column (e.g., Mr, Mrs, Miss).

Scaling/Normalization: Some algorithms like Logistic Regression or SVM may benefit

from feature scaling, especially for continuous features (e.g., Age, Fare).

3. Model Selection: Choose machine learning models suitable for classification tasks
(e.g., Random Forest, Logistic Regression, Gradient Boosting).

4. Model Training: Train the model on the preprocessed training dataset using the
Survived column as the target.
5. Model Evaluation -Accuracy: The proportion of correctly predicted survival status
(survived or not).

Confusion Matrix: To better understand the classification performance, including false

positives and false negatives.

Precision, Recall, F1-Score: Especially important if the dataset is imbalanced (i.e., more
passengers did not survive).

ROC-AUC: The area under the Receiver Operating Characteristic curve, which is useful
to evaluate classification models, especially when dealing with class imbalance.

6. Model Tuning: Hyperparameter tuning using techniques such as grid search or random
search to improve model performance.

7. Prediction and Submission: After training and evaluating the model, predict the
survival status for the test dataset and prepare the output in the required format (usually a
CSV file with PassengerId and Survived predictions).

Challenges and Considerations:

Handling Missing Data Many features (like Age and Cabin) have missing values. How to
impute missing values (mean, median, mode, or other methods) can significantly affect
model performance.

Class Imbalance: If the dataset is imbalanced (e.g., many more passengers did not
survive), special care must be taken to avoid biased models that predict one class more
often than the other.

Feature Engineering: Some features (like Name) need to be parsed into useful
information (e.g., extracting titles like Mr, Mrs) to improve model performance.
Model Interpretability: While models like decision trees provide interpretability, more
complex models (e.g., neural networks or gradient boosting) may be harder to explain but
often offer higher accuracy.

Overfitting: Overfitting to the training data can happen, especially when the model is too
complex. Regularization methods (e.g., pruning decision trees, using ensemble methods)
can help mitigate overfitting.

SOURCE CODE
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
%matplotlib inline
warnings.filterwarnings('ignore')

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

# To know number of columns and rows

train.shape
# (891, 12)

train.info()

train.isnull().sum()

f, ax = plt.subplots(1, 2, figsize=(12, 4))

train['Survived'].value_counts().plot.pie(
explode=[0, 0.1], autopct='%1.1f%%', ax=ax[0], shadow=False)
ax[0].set_title('Survivors (1) and the dead (0)')
ax[0].set_ylabel('')
sns.countplot(x='Survived', data=train, ax=ax[1])
ax[1].set_ylabel('Quantity')
ax[1].set_title('Survivors (1) and the dead (0)')
plt.show()

f, ax = plt.subplots(1, 2, figsize=(12, 4))

train[['Sex', 'Survived']].groupby(['Sex']).mean().plot.bar(ax=ax[0])
ax[0].set_title('Survivors by sex')
sns.countplot(x='Sex', hue='Survived', data=train, ax=ax[1])
ax[1].set_ylabel('Quantity')
ax[1].set_title('Survived (1) and deceased (0): men and women')
plt.show()
# Create a new column cabinbool indicating
# if the cabin value was given or was NaN
train["CabinBool"] = (train["Cabin"].notnull().astype('int'))
test["CabinBool"] = (test["Cabin"].notnull().astype('int'))

# Delete the column 'Cabin' from test

# and train dataset
train = train.drop(['Cabin'], axis=1)
test = test.drop(['Cabin'], axis=1)

train = train.drop(['Ticket'], axis=1)

test = test.drop(['Ticket'], axis=1)

# replacing the missing values in

# the Embarked feature with S
train = train.fillna({"Embarked": "S"})

# sort the ages into logical categories

train["Age"] = train["Age"].fillna(-0.5)
test["Age"] = test["Age"].fillna(-0.5)
bins = [-1, 0, 5, 12, 18, 24, 35, 60, np.inf]
labels = ['Unknown', 'Baby', 'Child', 'Teenager',
'Student', 'Young Adult', 'Adult', 'Senior']
train['AgeGroup'] = pd.cut(train["Age"], bins, labels=labels)
test['AgeGroup'] = pd.cut(test["Age"], bins, labels=labels)

# create a combined group of both datasets

combine = [train, test]

# extract a title for each Name in the

# train and test datasets
for dataset in combine:
dataset['Title'] = dataset.Name.str.extract(' ([A-Za-z]+)\.', expand=False)

pd.crosstab(train['Title'], train['Sex'])

# replace various titles with more common names

for dataset in combine:
dataset['Title'] = dataset['Title'].replace(['Lady', 'Capt', 'Col',
'Don', 'Dr', 'Major',
'Rev', 'Jonkheer', 'Dona'],
'Rare')

dataset['Title'] = dataset['Title'].replace(
['Countess', 'Lady', 'Sir'], 'Royal')
dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss')
dataset['Title'] = dataset['Title'].replace('Ms', 'Miss')
dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs')

train[['Title', 'Survived']].groupby(['Title'], as_index=False).mean()

# map each of the title groups to a numerical value

title_mapping = {"Mr": 1, "Miss": 2, "Mrs": 3,
"Master": 4, "Royal": 5, "Rare": 6}
for dataset in combine:
dataset['Title'] = dataset['Title'].map(title_mapping)
dataset['Title'] = dataset['Title'].fillna(0)
mr_age = train[train["Title"] == 1]["AgeGroup"].mode() # Young Adult
miss_age = train[train["Title"] == 2]["AgeGroup"].mode() # Student
mrs_age = train[train["Title"] == 3]["AgeGroup"].mode() # Adult
master_age = train[train["Title"] == 4]["AgeGroup"].mode() # Baby
royal_age = train[train["Title"] == 5]["AgeGroup"].mode() # Adult
rare_age = train[train["Title"] == 6]["AgeGroup"].mode() # Adult

age_title_mapping = {1: "Young Adult", 2: "Student",

3: "Adult", 4: "Baby", 5: "Adult", 6: "Adult"}
for x in range(len(train["AgeGroup"])):
if train["AgeGroup"][x] == "Unknown":
train["AgeGroup"][x] = age_title_mapping[train["Title"][x]]
for x in range(len(test["AgeGroup"])):
if test["AgeGroup"][x] == "Unknown":
test["AgeGroup"][x] = age_title_mapping[test["Title"][x]]

# map each Age value to a numerical value

age_mapping = {'Baby': 1, 'Child': 2, 'Teenager': 3,
'Student': 4, 'Young Adult': 5, 'Adult': 6,
'Senior': 7}
train['AgeGroup'] = train['AgeGroup'].map(age_mapping)
test['AgeGroup'] = test['AgeGroup'].map(age_mapping)

train.head()

# dropping the Age feature for now, might change

train = train.drop(['Age'], axis=1)
test = test.drop(['Age'], axis=1)

train = train.drop(['Name'], axis=1)

test = test.drop(['Name'], axis=1)

sex_mapping = {"male": 0, "female": 1}

train['Sex'] = train['Sex'].map(sex_mapping)
test['Sex'] = test['Sex'].map(sex_mapping)

embarked_mapping = {"S": 1, "C": 2, "Q": 3}

train['Embarked'] = train['Embarked'].map(embarked_mapping)
test['Embarked'] = test['Embarked'].map(embarked_mapping)

for x in range(len(test["Fare"])):
if pd.isnull(test["Fare"][x]):
pclass = test["Pclass"][x] # Pclass = 3
test["Fare"][x] = round(
train[train["Pclass"] == pclass]["Fare"].mean(), 4)

# map Fare values into groups of

# numerical values
train['FareBand'] = pd.qcut(train['Fare'], 4,
labels=[1, 2, 3, 4])
test['FareBand'] = pd.qcut(test['Fare'], 4,
labels=[1, 2, 3, 4])

# drop Fare values

train = train.drop(['Fare'], axis=1)
test = test.drop(['Fare'], axis=1)

from sklearn.model_selection import train_test_split

# Drop the Survived and PassengerId

# column from the trainset
predictors = train.drop(['Survived', 'PassengerId'], axis=1)
target = train["Survived"]
x_train, x_val, y_train, y_val = train_test_split(
predictors, target, test_size=0.2, random_state=0)

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

randomforest = RandomForestClassifier()

# Fit the training data along with its output

randomforest.fit(x_train, y_train)
y_pred = randomforest.predict(x_val)

# Find the accuracy score of the model

acc_randomforest = round(accuracy_score(y_pred, y_val) * 100, 2)
print(acc_randomforest)

ids = test['PassengerId']
predictions = randomforest.predict(test.drop('PassengerId', axis=1))

# set the output as a dataframe and convert

# to csv file named resultfile.csv
output = pd.DataFrame({'PassengerId': ids, 'Survived': predictions})
output.to_csv('resultfile.csv', index=False)
Prediction
We are provided with the testing dataset on which we have to perform the

prediction. To predict, we will pass the test dataset into our trained model and

save it into a CSV file containing the information, passengerid and survival.

PassengerId will be the passengerid of the passengers in the test data and the

survival will column will be either 0 or 1.

PassengerI
d Survived
892 0
893 1
894 0
895 0
896 1
897 0
898 1
899 0
900 1
901 0
902 0
903 0
904 1
905 0
906 1
907 1
908 0
909 0
910 1
911 1
912 0
913 0
914 1
915 0
916 1
917 0
918 1
919 0
920 0
921 0
922 0
923 0
924 1
925 1
926 0
927 0
928 1
929 1
930 0
931 0
932 0
933 0
934 0
935 1
936 1
937 0
938 0
939 0
940 1
941 1
942 0
943 0
944 1
945 1
946 0
947 0
948 0
949 0
950 0
951 1
952 0
953 0
954 0
955 1
956 0
957 1
958 1
959 0
960 0
961 1
962 1
963 0
964 1
965 0
966 1
967 0
968 0
969 1
970 0
971 1
972 0
973 0
974 0
975 0
976 0
977 0
978 1
979 1
980 1
981 0
982 1
983 0
984 1
985 0
986 0
987 0
988 1
989 0
990 1
991 0
992 1
993 0
994 0
995 0
996 1
997 0
998 0
999 0
1000 0
1001 0
1002 0
1003 1
1004 1
1005 1
1006 1
1007 0
1008 0
1009 1
1010 0
1011 1
1012 1
1013 0
1014 1
1015 0
1016 0
1017 1
1018 0
1019 1
1020 0
1021 0
1022 0
1023 0
1024 1
1025 0
1026 0
1027 0
1028 0
1029 0
1030 1
1031 0
1032 1
1033 1
1034 0
1035 0
1036 0
1037 0
1038 0
1039 0
1040 0
1041 0
1042 1
1043 0
1044 0
1045 1
1046 0
1047 0
1048 1
1049 1
1050 0
1051 1
1052 1
1053 0
1054 1
1055 0
1056 0
1057 1
1058 0
1059 0
1060 1
1061 1
1062 0
1063 0
1064 0
1065 0
1066 0
1067 1
1068 1
1069 0
1070 1
1071 1
1072 0
1073 0
1074 1
1075 0
1076 1
1077 0
1078 1
1079 0
1080 1
1081 0
1082 0
1083 0
1084 0
1085 0
1086 0
1087 0
1088 0
1089 1
1090 0
1091 1
1092 1
1093 0
1094 0
1095 1
1096 0
1097 0
1098 1
1099 0
1100 1
1101 0
1102 0
1103 0
1104 0
1105 1
1106 1
1107 0
1108 1
1109 0
1110 1
1111 0
1112 1
1113 0
1114 1
1115 0
1116 1
1117 1
1118 0
1119 1
1120 0
1121 0
1122 0
1123 1
1124 0
1125 0
1126 0
1127 0
1128 0
1129 0
1130 1
1131 1
1132 1
1133 1
1134 0
1135 0
1136 0
1137 0
1138 1
1139 0
1140 1
1141 1
1142 1
1143 0
1144 0
1145 0
1146 0
1147 0
1148 0
1149 0
1150 1
1151 0
1152 0
1153 0
1154 1
1155 1
1156 0
1157 0
1158 0
1159 0
1160 1
1161 0
1162 0
1163 0
1164 1
1165 1
1166 0
1167 1
1168 0
1169 0
1170 0
1171 0
1172 1
1173 0
1174 1
1175 1
1176 1
1177 0
1178 0
1179 0
1180 0
1181 0
1182 0
1183 1
1184 0
1185 0
1186 0
1187 0
1188 1
1189 0
1190 0
1191 0
1192 0
1193 0
1194 0
1195 0
1196 1
1197 1
1198 0
1199 0
1200 0
1201 1
1202 0
1203 0
1204 0
1205 1
1206 1
1207 1
1208 0
1209 0
1210 0
1211 0
1212 0
1213 0
1214 0
1215 0
1216 1
1217 0
1218 1
1219 0
1220 0
1221 0
1222 1
1223 0
1224 0
1225 1
1226 0
1227 0
1228 0
1229 0
1230 0
1231 0
1232 0
1233 0
1234 0
1235 1
1236 0
1237 1
1238 0
1239 1
1240 0
1241 1
1242 1
1243 0
1244 0
1245 0
1246 1
1247 0
1248 1
1249 0
1250 0
1251 1
1252 0
1253 1
1254 1
1255 0
1256 1
1257 1
1258 0
1259 1
1260 1
1261 0
1262 0
1263 1
1264 0
1265 0
1266 1
1267 1
1268 1
1269 0
1270 0
1271 0
1272 0
1273 0
1274 1
1275 1
1276 0
1277 1
1278 0
1279 0
1280 0
1281 0
1282 0
1283 1
1284 0
1285 0
1286 0
1287 1
1288 0
1289 1
1290 0
1291 0
1292 1
1293 0
1294 1
1295 0
1296 0
1297 0
1298 0
1299 0
1300 1
1301 1
1302 1
1303 1
1304 1
1305 0
1306 1
1307 0
1308 0
1309 0

Data Science Pocket Dictionary 1691284156
No ratings yet
Data Science Pocket Dictionary 1691284156
28 pages
Titanic Survival Analysis
No ratings yet
Titanic Survival Analysis
61 pages
Individual Asignment Ucs551
70% (10)
Individual Asignment Ucs551
15 pages
ML Mini Project 2
No ratings yet
ML Mini Project 2
26 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
No ratings yet
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
21 pages
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
No ratings yet
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
5 pages
Predicting_Titanic_Survivors_by_Using_Machine_Lear
No ratings yet
Predicting_Titanic_Survivors_by_Using_Machine_Lear
8 pages
Machine Learnig - Mini Project
No ratings yet
Machine Learnig - Mini Project
5 pages
ML Report-1
No ratings yet
ML Report-1
13 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
7 pages
ML Mini Project
No ratings yet
ML Mini Project
17 pages
M1 - 4Mlsp - Machine Learning: Project: Binary Classification Webapp
No ratings yet
M1 - 4Mlsp - Machine Learning: Project: Binary Classification Webapp
2 pages
MCA- Project Documentation Guidelines 2024-2025
No ratings yet
MCA- Project Documentation Guidelines 2024-2025
26 pages
Acknowledgement
No ratings yet
Acknowledgement
24 pages
ML Mini Project - Docx New (A)
No ratings yet
ML Mini Project - Docx New (A)
17 pages
Thesis Slide
No ratings yet
Thesis Slide
24 pages
Machine Learning With Python (Vasavi)
No ratings yet
Machine Learning With Python (Vasavi)
20 pages
ML Aniket
No ratings yet
ML Aniket
18 pages
Titanic Report ml report
No ratings yet
Titanic Report ml report
14 pages
Mini Project ml111
No ratings yet
Mini Project ml111
2 pages
iml project (1) (1)
No ratings yet
iml project (1) (1)
13 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
7 pages
Machine Learning
100% (1)
Machine Learning
62 pages
Report TSP
No ratings yet
Report TSP
13 pages
Titanic Disaster Using Machine Learning
No ratings yet
Titanic Disaster Using Machine Learning
7 pages
Titanic
No ratings yet
Titanic
6 pages
Oomd
No ratings yet
Oomd
11 pages
Titanic (4)
No ratings yet
Titanic (4)
3 pages
Titanic (5)
No ratings yet
Titanic (5)
3 pages
Intro To Machine Learning 101 Python Data Science v2
No ratings yet
Intro To Machine Learning 101 Python Data Science v2
101 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Using Titanic Dataset for Comprehensive Machine Learning Model Training
No ratings yet
Using Titanic Dataset for Comprehensive Machine Learning Model Training
3 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
Submitted To The Savitribai Phule Pune University, Pune FOR
No ratings yet
Submitted To The Savitribai Phule Pune University, Pune FOR
4 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Ipl Matches Documentation
No ratings yet
Ipl Matches Documentation
28 pages
Worksheet Titanic Python PDF
No ratings yet
Worksheet Titanic Python PDF
8 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Ahamed 123
100% (1)
Ahamed 123
7 pages
CEP Final
No ratings yet
CEP Final
11 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
No ratings yet
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
11 pages
Titanic Survival
No ratings yet
Titanic Survival
13 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Machine Learning Path
No ratings yet
Machine Learning Path
21 pages
Titanic - Machine Learning From Disaster: A Report ON
No ratings yet
Titanic - Machine Learning From Disaster: A Report ON
23 pages
Aim: Predicting The Survival of Titanic Passengers
No ratings yet
Aim: Predicting The Survival of Titanic Passengers
20 pages
Titanic: Machine Learning For Kids:: Teachers' Notes
No ratings yet
Titanic: Machine Learning For Kids:: Teachers' Notes
1 page
ML Report
No ratings yet
ML Report
3 pages
Copy of Titanic Classification P
No ratings yet
Copy of Titanic Classification P
19 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Titanic Disaster Prediction
No ratings yet
Titanic Disaster Prediction
20 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
B.E Instrumentation & Control Engineering Year of Graduation: 2021
No ratings yet
B.E Instrumentation & Control Engineering Year of Graduation: 2021
5 pages
OD327632431626000100
No ratings yet
OD327632431626000100
1 page
Dsce PP
No ratings yet
Dsce PP
3 pages
CHARGING AND DISCHARGING OF A CAPACITOR Final - 1
No ratings yet
CHARGING AND DISCHARGING OF A CAPACITOR Final - 1
7 pages
CHEM Topic
No ratings yet
CHEM Topic
12 pages
Bhaghavath - Bio Project
No ratings yet
Bhaghavath - Bio Project
17 pages
06.07.2024 12.00 Noon Trip
No ratings yet
06.07.2024 12.00 Noon Trip
2 pages
AI in Healthcare Article
No ratings yet
AI in Healthcare Article
14 pages
Charging and Discharging of Capacitors Final
No ratings yet
Charging and Discharging of Capacitors Final
3 pages
EHB 420E - Artificial Neural Networks Term Project: Machine Learning Models For Heart Attack Prediction
No ratings yet
EHB 420E - Artificial Neural Networks Term Project: Machine Learning Models For Heart Attack Prediction
10 pages
ROLE OF ARTIFICIAL INTELLIGENCE IN FUTURE OF EDUCATIONInternational Journal of Professional Business Review
No ratings yet
ROLE OF ARTIFICIAL INTELLIGENCE IN FUTURE OF EDUCATIONInternational Journal of Professional Business Review
15 pages
Homework 2
No ratings yet
Homework 2
3 pages
Poly Kernel
No ratings yet
Poly Kernel
6 pages
A Deep Learning-Based Model For Date Fruit Classification
No ratings yet
A Deep Learning-Based Model For Date Fruit Classification
17 pages
Predicting Soil Organic Carbon With Different Approaches and S - 2024 - Geoderma
No ratings yet
Predicting Soil Organic Carbon With Different Approaches and S - 2024 - Geoderma
14 pages
Machine Learning Seminar Report
No ratings yet
Machine Learning Seminar Report
19 pages
Childhood Asthma Prediction Model Using SVM
No ratings yet
Childhood Asthma Prediction Model Using SVM
9 pages
Analisis Gases Disueltos - Ingles
No ratings yet
Analisis Gases Disueltos - Ingles
5 pages
Get Supervised machine learning: optimization framework and applications with SAS and R First Edition Kolosova PDF ebook with Full Chapters Now
100% (1)
Get Supervised machine learning: optimization framework and applications with SAS and R First Edition Kolosova PDF ebook with Full Chapters Now
55 pages
Unit-5_Part 2
No ratings yet
Unit-5_Part 2
11 pages
Activity Detection For The Wellbeing of Dogs Using Wearable Sensors Based On Deep Learning
No ratings yet
Activity Detection For The Wellbeing of Dogs Using Wearable Sensors Based On Deep Learning
11 pages
Leaf Disease Detection and Classification Based On Machine Learning
No ratings yet
Leaf Disease Detection and Classification Based On Machine Learning
5 pages
Canoe
No ratings yet
Canoe
52 pages
mmmmmm
No ratings yet
mmmmmm
23 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
A Comparative Study On University Admiss
No ratings yet
A Comparative Study On University Admiss
12 pages
Book Chapter
No ratings yet
Book Chapter
23 pages
Chime Cry Deciphering Infants Cry
No ratings yet
Chime Cry Deciphering Infants Cry
10 pages
Crop Classification and Mapping For Agricultural Land From Satellite Images
No ratings yet
Crop Classification and Mapping For Agricultural Land From Satellite Images
21 pages
(Ebooks PDF) Download An Introduction To IoT Analytics 1st Edition Harry G Perros Full Chapters
100% (4)
(Ebooks PDF) Download An Introduction To IoT Analytics 1st Edition Harry G Perros Full Chapters
62 pages
Song Classification Using Machine Learning
No ratings yet
Song Classification Using Machine Learning
7 pages
Ebook Icesit2016 Final PDF
No ratings yet
Ebook Icesit2016 Final PDF
121 pages
Comparison of No-Reference Image Quality Assessment Machine Learning-Based Algorithms On Compressed Images
No ratings yet
Comparison of No-Reference Image Quality Assessment Machine Learning-Based Algorithms On Compressed Images
10 pages
Predictive_modeling_of_tennis_matches_a_review
No ratings yet
Predictive_modeling_of_tennis_matches_a_review
6 pages
Emotion Detection From Frontal Facial Image: Thesis Report
No ratings yet
Emotion Detection From Frontal Facial Image: Thesis Report
28 pages
A Review On Prognosis of Rolling Element Bearings
No ratings yet
A Review On Prognosis of Rolling Element Bearings
7 pages
Lab 04 - Supervised ML Classification - Updated
No ratings yet
Lab 04 - Supervised ML Classification - Updated
21 pages
Deepa
No ratings yet
Deepa
4 pages

Titanic Survival Prediction Using Machine Learning

Uploaded by

Titanic Survival Prediction Using Machine Learning

Uploaded by

EMERALD VALLEY PUBLIC SCHOOL

ACADEMIC YEAR : 2024 – 25

TITANIC SURVIVAL PREDICTION

This is to certify that NIVASINI M V, Roll. No. :

I am deeply appreciative of my project mentor, Priya P, for offering invaluable guidance

My special thanks to my classmates who were incredibly helpful. They assisted me in

remain grateful for their support.

USING MACHINE LEARNING

PROJECT DONE BY : NIVASINI M V

4 EXPLARATORY DATA ANALYSIS

The Titanic Survival Prediction problem is a classic

Key Concepts in Machine Learning

Types of Machine Learning

Machine learning is typically classified into three major categories:

The Titanic dataset typically includes the following features (columns):

Exploratory Data Analysis (EDA)

Scaling/Normalization: Some algorithms like Logistic Regression or SVM may benefit

Confusion Matrix: To better understand the classification performance, including false

Challenges and Considerations:

# To know number of columns and rows

f, ax = plt.subplots(1, 2, figsize=(12, 4))

f, ax = plt.subplots(1, 2, figsize=(12, 4))

# Delete the column 'Cabin' from test

train = train.drop(['Ticket'], axis=1)

# replacing the missing values in

# sort the ages into logical categories

# create a combined group of both datasets

# extract a title for each Name in the

# replace various titles with more common names

train[['Title', 'Survived']].groupby(['Title'], as_index=False).mean()

# map each of the title groups to a numerical value

age_title_mapping = {1: "Young Adult", 2: "Student",

# map each Age value to a numerical value

# dropping the Age feature for now, might change

train = train.drop(['Name'], axis=1)

sex_mapping = {"male": 0, "female": 1}

embarked_mapping = {"S": 1, "C": 2, "Q": 3}

# map Fare values into groups of

# drop Fare values

from sklearn.model_selection import train_test_split

# Drop the Survived and PassengerId

from sklearn.ensemble import RandomForestClassifier

# Fit the training data along with its output

# Find the accuracy score of the model

# set the output as a dataframe and convert

survival will column will be either 0 or 1.

You might also like