0% found this document useful (0 votes)

55 views

PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML

This document discusses predicting survival on the Titanic by applying exploratory data analytics and machine learning techniques to an available dataset. It first explores the dataset to understand how factors like age, sex, class, etc. influence survival. It then cleans the data by handling missing values. Various machine learning algorithms like logistic regression, random forest, and support vector machines are applied and their accuracy in predicting survival is compared to determine the best performing model.

Uploaded by

fitoj aka

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML

Uploaded by

fitoj aka

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

International Journal of Computer Applications (0975 – 8887)

Volume 179 – No.44, May 2018

Predicting Survival on Titanic by Applying Exploratory

Data Analytics and Machine Learning Techniques

Yogesh Kakde Shefali Agrawal

Asst. Professor UG Scholar
AITR, Indore AITR, Indore

ABSTRACT 2. DATA ANALYTICS AND ITS

The sinking of the RMS Titanic caused the death of thousands CATEGORIES
of passengers and crew is one of the deadliest maritime
disasters in history. One of the reasons that the shipwreck led
to such loss of life was that there were not enough lifeboats Data Data
for the passengers and crew. The interesting observation Data analysis Data
Tranfor and
cleaning Analytics
which comes out from the sinking is that some people were ming modeling
more likely to survive than others, like women, children were
the one who got the priority to rescue. The objective is to first
explore hidden or previously unknown information by
applying exploratory data analytics on available dataset and Fig 1: Data Analytics
then apply different machine learning models to complete the
analysis of what sorts of people were likely to survive. After
this the results of applying machine learning models are
compared and analyzed on the basis of accuracy.
Descriptive DA
General Terms
Data Analytics, Exploratory Data Analytics, Machine
Learning, Model Evaluation, Data Science.
Exploratory DA
Keywords
Data mining, ggplot, Logistic Regression, Random Forest,
Feature Engineering, Support Vector Machine, Confusion Data Analytics
Matrix.
Confirmative
1. INTRODUCTION DA
The most infamous disaster which occurred over a century
ago on April 15, 1912, that is well known as sinking of “The
Titanic”. The collision with the iceberg ripped off many parts
of the Titanic. Many classes of people of all ages and gender
where present on that fateful night, but the bad luck was that Predictive DA
there were only few life boats to rescue. The dead included a
large number of men whose place was given to the many
women and children on board. The men travelling in second Fig 2: Categories of Data Analytics
class were dead on the vine. [1]
Machine learning algorithms are applied to make a prediction
3. PROCESS FLOW
There is a step by step approach to choose a particular model
which passengers survived at the time of sinking of the
for the current problem. [27] We need to decide whether a
Titanic. Features like ticket fare, age, sex, class will be used to
particular machine learning model is suitable for our problem
make the predictions. Predictive analysis is a procedure that
or not. Here we can see process flow being followed.
incorporates the use of computational methods to determine
important and useful patterns in large data. Using the machine
learning algorithms, survival is predicted on different
combinations of features.
The objective is to perform exploratory data analytics to mine
various information in the dataset available and to know effect
of each field on survival of passengers by applying analytics
between every field of dataset with “Survival” field. The
predictions are done for newer data sets by applying machine
learning algorithm. The data analysis will be done on applied
algorithms and accuracy will be checked. Different algorithms
are compared on the basis of accuracy and the best performing
model is suggested for predictions. [2]

32
International Journal of Computer Applications (0975 – 8887)
Volume 179 – No.44, May 2018

Table 1. Description of each attribute in our dataset

Attribute Description Factors
Survival of
Survival 0 = No, 1 = Yes
passenger
1 = 1st, 2 = 2nd,
Pclass Ticket class
3 = 3rd
Sex Sex Male/Female
Age of passengers
Age
in years
# of siblings /
sibsp spouses aboard the
Titanic
# of parents /
parch children aboard the
Titanic
ticket Ticket number
Fig 3: Process of fitting a Machine Learning Model

4. DESCRIPTION OF DATA fare Passenger fare

In R str() function is used to find structure of dataset that we
cabin Cabin number
have in csv file. Below there is a snippet of output of we got
by executing str() in R studio. Port from where
passenger
embarked. C for
Embarked C, Q, S
Cherbourg, Q for
Queenstown, S for
Southampton

Now let us explore our dataset by knowing the influence of

each attribute on survival of passenger. We will create
histograms, Bar plots to achieve this.

5. DATA CLEANING
Before applying any type of data analytics on the dataset, the
data is first cleaned. There are some missing values in the
dataset which needs to be handled. In attributes like Age,
Cabin and Embarked, missing values are replaced with
random sample from existing age. [15]
In case of column Fare we found that there is one passenger
with missing fare having passenger id 1044. To put a
meaningful value of fair column we first found value of
Embarked and Pclass of this passenger. Then median is
calculated for fair values of all passenger who whose
embarkation and Pclass was same as of passenger id 1044.

6. EXPLORATORY DATA ANALYSIS

We are going to perform exploratory data analysis for our
problem in the first stage. In exploratory data analysis dataset
is explored to figure out the features which would influence
the survival rate. The data is deeply analysed by finding a
relationship between each attribute and survival.

6.1 Age verses Survival

Here fig. 5 shows how survival rate will be affected by age. If
Fig 4: Structure of input Dataset the value of age is less then chances of survival are more and
There is table showing meaning of each attribute. vice versa.

33
International Journal of Computer Applications (0975 – 8887)
Volume 179 – No.44, May 2018

Fig 5: Age v/s Survival

Fig 6: Sex v/s Survival

In the same way there are some more facts we found. There is Table 2. Age Group and Survival Rate
a table showing age group and survival rate of that age group. Age Group Survival Rate (%)
0-10 53.24675
10-20 38.29787
20-30 37.03704
30-40 40.21739
40-50 34.82143
50-60 34.61538
60-70 22.72727

34
International Journal of Computer Applications (0975 – 8887)
Volume 179 – No.44, May 2018

Sex verses Survival can be used for both classification and regression problems.
From Fig. 6 it is clear that females are more likely to survive For instance, it will take random samples of 100 observation
than males. We calculated that survival rate of female and and 5 randomly chosen initial variables to build a model. The
male are 74.20382% and 18.89081% respectively. same process is repeated a number of times, then the final
prediction is made according to the observations. Final
In similar way relationship between other attributes like fare, prediction is a function (mean) of each prediction.
cabin, title, family, Pclass, Embarked and survival is found.
We extracted the title from attribute ‘name’. We combined
parch and sibsp. In this way we will be able to decide
emphasis of each attribute on survival of passenger.

7. METHODOLOGY
7.1 Feature Engineering
Feature engineering is the most important part of data
analytics process. It deals with, selecting the features that are
used in training and making predictions. In feature
engineering the domain knowledge is used to find features in
the dataset which are helpful in building machine learning
model. It helps in understanding the dataset in terms of Fig 7: Example of a Decision Tree
modeling. A bad feature selection may lead to less accurate or There are two types of decision tree based on the type of
poor predictive model. The accuracy and the predictive power target variable.
depend on the choice of correct features. It filters out all the
unused or redundant features. 1. Categorical Variable Decision Tree: The tree in which
target variables have categorical values.
Based on the exploratory analysis above, following features 2. Continuous Variable Decision Tree: The tree in which
are used age, sex, cabin, title, Pclass, family size (parch plus the target variable has continuous values.
sibsp columns), fare, embarked. Survival column is chosen as
response column. These features are selected because their 7.2.4 Support Vector Machine
values have an impact on the rate of survival. These features Support Vector Machine (SVM) falls in supervised machine
will be the value of “x” in the bar-plots. If wrong features learning algorithm. This algorithm is used to solve both
where selected then even the good algorithm may produce the classification and regression problems. The classification is
bad predictions. Therefore, feature engineering acts like a performed by constructing hyper planes in a multidimensional
backbone in building an accurate predictive model. space that separates cases of different class labels. For
categorical data variables a dummy variable is created with
7.2 Machine Learning Models values as either 0 or 1. So, a categorical dependent variable
Various machine learning models are implemented to validate consisting three levels, say (A, B, C) can be represented by a
and predict survival. set of three dummy variables:
7.2.1 Logistic Regression A: {1, 0, 0}; B: {0, 1, 0}; C: {0, 0, 1}
Logistic regression is the technique which works best when
dependent variable is dichotomous (binary or categorical). 8. MODEL EVALUATION
[23] The data description and explaining the relationship The accuracy of the model is evaluated using “confusion
between one dependent binary variable and one or more matrix”. A confusion matrix is a table layout that allows to
nominal, ordinal, interval or ratio-level independent variables visualize the correctness and the performance of an algorithm.
is done with the help of logistic regression. It is used to solve
binary classification problem, some of the real life examples 8.1 Confusion Matrix
are spam detection- predicting if an email is spam or not, A confusion matrix is a method to verify how accurately the
health-Predicting if a given mass of tissue is benign or classification model works. It gives the actual number of
malignant, marketing- predicting if a given user will buy an predictions which were correct or incorrect when compared to
insurance product or not. the actual result of the data. The matrix is of the order N*N,
here N is the number of values. Performance of such models
7.2.2 Decision Tree is commonly evaluated using the data in the matrix.
Decision tree is a supervised learning algorithm. This is
generally used in problems based on classification. It is Sensitivity: It defines the percentage of actual positive
suitable for both categorical and continuous input and output which are correctly identified, and is complementary to the
variables. Each root node represents a single input variable (x) false negative rate. Sensitivity= true positive/(true negative +
and a split point on that variable. The dependent variable (y) false positive). The ideal value for sensitivity is “1.0” and
is present at leaf nodes. For example: Suppose there are two minimum value is “0.0”
independent variables, i.e. input variables (x) which are height
in centimeter and weight in kilograms and the task to find Specificity: It measures the proportion of negatives which
gender of person based on the given data. (Hypothetical are correctly identified, and is complementary to the false
example, for demonstration purpose only). positive rate. Specificity= true negatives/(true negatives +
false positives). The ideal value for specificity is “1.0” and
7.2.3 Random Forest least value is “0.0”.
Random forest algorithm is supervised classification
algorithm. The algorithm basically makes forest with large Positive Predictive Value: It gives the performance
number of trees. The higher the number of trees in the forest measure of the statistical test. It is a ratio true positive (event
gives the higher accuracy results. Random forest algorithm that makes true prediction and subject result is also true) and

35
International Journal of Computer Applications (0975 – 8887)
Volume 179 – No.44, May 2018

the sum of true positive and false positive (event that makes
false prediction and subject result is also false). 7.2.3 Decision Tree
Negative Predicted Value: It is the ratio of true
negatives (the event which makes negative prediction and
result is also false) and sum of true negative and false
negative (event that makes false prediction and subject result
is positive).

8.2 Accuracy: It gives the measure of percentage of

correct prediction done by the model/algorithm. The best
value is “1.0” and the worst value is “0.0”.

Fig 11: Confusion Matrix for Decision Tree

7.2.4 Support Vector Machine

Fig 8: Generalized confusion matrix

In R mathematical calculations are performed and accuracy
using each model is found. Here are the accuracies we
achieved for each model.

8.2.1 Logistic Regression

Fig 12: Confusion Matrix for SVM

9. PREDICTION
Here we can choose any of the models to predict survival of
test sample. Since we have evaluated all models by using
confusion matrix we will predict by using model which has
highest accuracy.
We performed prediction on data dataset by using logistic
model and SVM.

10. GUI IMPLEMENTATION IN R

Fig 9: Confusion Matrix for Logistic Regression We have also added a GUI in our implementation. R provides
In the confusion matrix the values of “a, b, c, d” gives the a library called “shiny” which is used to give the analysis in a
count of true positive, true negative, false positive and false presentable interface. Using shiny graphical user interface can
negative respectively. The accuracy of this confusion matrix be created. To use dashboard, “shinydashboard” library must
is close to “1” which shows that the model makes maximum be included. The dashboard contains different tabs showing
correct predictions. the exploratory data analytics which includes graph between
age v/s survival, sex v/s survival, cabin v/s survival, Pclass v/s
7.2.2 Random Forest survival, fare v/s survival, name v/s survival, family v/s
survival, embarked v/s survival. Another tab shows the
predictive analysis details under the heading of logistic
regression, decision tree, random forest and support vector
machine. A value box is included to show the accuracy of
each model.

Fig 10: Confusion Matrix for Random Forest

36
International Journal of Computer Applications (0975 – 8887)
Volume 179 – No.44, May 2018

Fig 13: GUI Environment

11. CONCLUSION We combined parch and sibsp column to know family size of
Data cleaning is the first step while performing data analysis. a particular passenger. We found that survival rate increases
Exploratory data analytics helps one to understand the dataset when family size lies from 0 to 3. But when family size
and the dependency among the attributes. EDA is used to becomes greater than 3, survival rate decrease.
figure out the relationship between the features of the dataset. Similarly it is found that passengers who has more cabins has
This is done by using various graphical techniques. The one higher survival rate.
used above is ggplot and histograms.
Table 4. Passenger Class Vs. Survival Rate
By applying EDA some conclusions are drawn and facts are
found. Age Group Survival Rate
There is high influence of age on survival. We can see from (0,50] 32.40223
table-2 that as age increases survival decreases.
(50,100] 65.42056
It can be seen that survival rate of female is very high
(approx. 74%) and survival rate of male is very low. This fact (100,150] 79.16667
can also be verified by extracting titles (Mr, Mrs, Ms etc) (150,200] 66.66667
from name column. Survival rate with title Mr. is
approximately 16% while survival rate for Mrs. is 79%. (200,250] 63.63636

We can also see survival v/s Pclass in following table- (250,300] 66.66667
(500,550] 100
Table 3. Passenger Class Vs. Survival Rate
Passenger class Survival Rate (%)
With these figure we can say that higher the fare higher will
1 62.96296 be survival rate.
2 47.28261 In feature engineering the actual parameters to be used while
designing the training model and prediction model is found
3 24.23625
out on the basis of exploratory data analytics process.
Machine Learning models predict the values of passengers
We found that Passengers who were travelling in first class is who survived. Logistic regression technique is used in making
more likely to survive. predictions in classification problem.

37
International Journal of Computer Applications (0975 – 8887)
Volume 179 – No.44, May 2018

The confusion matrix gives the accuracy of all the models, the [8] MICHAEL AARON WHITLEY, Using statistical
logistic regression is proves to be best among all with an learning to predict survival of passengers on the RMS
accuracy of 0.837261504. This means the predictive power of Titanic by Michael Aaron Whitley, 2015.
logistic regression in this dataset with the chosen features is [9] Kunal Vyas, Zeshi Zheng, Lin Li, Titanic- Machine
very high. Learning From Disaster- 2015.
[10] EECS 349 Titanic- Machine Learning From Disaster,
It is clearly stated that the accuracy of the models may vary Xiaodong Yang, Northwestern University.
when the choice of feature modelling is different. Ideally [11] Prediction of Survivors in Titanic Dataset: A
logistic regression and support vector machine are the models Comparitive Study using Machine Learning Algorithms,
which give a good level of accuracy when it comes to Tryambak Chatterlee, IJERMT-2017.
classification problem. [12] An Introduction to Logistic Regression Analysis and
12. FUTURE WORK Reporting by Chao-Yig Joanne Peng, Kuk Lida Lee &
Gary M. Ingersoll, April 2010.
This project involves implementation of data analytics and
[13] Zhenyan Liu, Yifei Zeng, Yida Yan, Pengfei Zhang and
machine learning. This project work can be used as reference
Yong Wang, Machine Learning for Analyzing Malware,
to learn implementation of EDA and machine learning from
Journal of Cyber Security and Mobility, Vol: 6 Issue:
very basic.
3, July 2017.
In future the idea can be extended by making more advanced [14] Andy Liaw and Metthew Wiener, Classification and
graphical user interface with the help of newer libraries like Regression by Random Forest, vol. 2/3, December 2002.
shiny in R. An interactive page can be made, i.e. if the value [15] Galit Shmueli and Otto R. Koppius MIS Quarterly,
of a attribute is changed on the scale the values corresponding Predictive Analytics in Information System Research, ,
to its graph (ggplot or histogram) will also change. We can Vol. 35, No. 3(September 2011), pp. 553-572.
also draw much focused conclusions by combining results we [16] john D. Kelleher, Brain Mac Namee, Aoife D’Arcy
obtained. Fundamentals of Machine Learning for Predictive Data
Analytics: Algorithms .
13. REFERENCES [17] Dr. Neeraj Bhargava, Girja Sharma, Decision Tree
[1] Analyzing Titanic disaster using machine learning Analysis on J48 Algorithm for Data Mining. Volume 3,
algorithms-Computing, Communication and Automation Issue 6, June 2013.
(ICCCA), 2017 International Conference on 21 [18] Data Mining: Practical Machine Learning Tools and
December 2017, IEEE. Techniques, by Ian H. Witten, Eibe Frank, Mark A. Hall,
[2] Eric Lam, Chongxuan Tang, "Titanic Machine Learning Christopher J. Pal.
From Disaster", LamTang-Titanic Machine Learning [19] A Comparison of Goodness of Fit Tests for the Logistic
From Disaster, 2012. Regression Model, D.W. Hosmer, T. Hosmer, S. Le
[3] S. Cicoria, J. Sherlock, M. Muniswamaiah, L. Clarke, Cessie and S. Lemeshow
"Classification of Titanic Passenger Data and Chances of [20] Breiman, L. 2001a. Random forests. Machine Learning
Surviving the Disaster", Proceedings of Student-Faculty 45:5-32.
Research Day CSIS, pp. 1-6, May 2014. [21] Stuart J. Russell, Peter Norvig, Artificial Intelligence: A
[4] Corinna Cortes, Vlasdimir Vapnik, “Support-vector Modern Approach, Pearson Education, 2003, pg 697-
networks”, Machine Learning, Volume 20, Issue 3,pp 702.
273-297. [22] Cortes, Corinna; and Vapnik, Vladimir N.;
[5] L Breman- “random forests”, Machine Learning, 2001 "SupportVector Networks", Machine Learning, 20, 1995.
Ng. CS229 Notes, Standford University, 2012. [23] Unwin A, Hofmann H (1999). \GUI and Command-line
[6] SJ Russsel P Norvig-“Artificial intelligence: A modern { Conict or Synergy?" In K Berk,M Pourahmadi (eds.),
approach”-2016. Computing Science and Statistics.
[7] Lonnie Stevans, David L. Gleicher, ”Who Survived the [24] Machine Learning Benchmarks and Random Forest
Titanic? A logistic regression analysis”-Article in Regression, Segal, Mark R, 2004.
International Journal of Maritime History, December [25] Proceedings of Student-Faculty Research Day, CSIS,
2004. Pace University, May 2nd, 2014.

IJCATM : www.ijcaonline.org
38

Anesthesia - Books 2019 Primary FRCA in A Box 2nd Edition PDF
100% (1)
Anesthesia - Books 2019 Primary FRCA in A Box 2nd Edition PDF
497 pages
Individual Asignment Ucs551
70% (10)
Individual Asignment Ucs551
15 pages
Nicholas Signorine Week 3 Midterm Exam COH315 Intro To Epidemiology
No ratings yet
Nicholas Signorine Week 3 Midterm Exam COH315 Intro To Epidemiology
5 pages
Titanic Survival Analysis
No ratings yet
Titanic Survival Analysis
61 pages
Titanic: Machine Learning From Disaster: Source
No ratings yet
Titanic: Machine Learning From Disaster: Source
1 page
Predictive Modeling of Titanic Survivors
No ratings yet
Predictive Modeling of Titanic Survivors
12 pages
Mcmi Iii
95% (19)
Mcmi Iii
111 pages
Report TSP
No ratings yet
Report TSP
13 pages
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
No ratings yet
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
21 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
No ratings yet
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
5 pages
CEP Final
No ratings yet
CEP Final
11 pages
Ahamed 123
100% (1)
Ahamed 123
7 pages
Thesis Slide
No ratings yet
Thesis Slide
24 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
34 pages
AI lab5
No ratings yet
AI lab5
5 pages
Titanic Data Analysis-Report
No ratings yet
Titanic Data Analysis-Report
4 pages
ML Report-1
No ratings yet
ML Report-1
13 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
Ipl Matches Documentation
No ratings yet
Ipl Matches Documentation
28 pages
Aim: Predicting The Survival of Titanic Passengers
No ratings yet
Aim: Predicting The Survival of Titanic Passengers
20 pages
Acknowledgement
No ratings yet
Acknowledgement
24 pages
Titanic Report ml report
No ratings yet
Titanic Report ml report
14 pages
ML Mini Project 2
No ratings yet
ML Mini Project 2
26 pages
Neural Network Project
No ratings yet
Neural Network Project
4 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
LamTang TitanicMachineLearningFromDisaster
No ratings yet
LamTang TitanicMachineLearningFromDisaster
5 pages
Titanic Disaster Using Machine Learning
No ratings yet
Titanic Disaster Using Machine Learning
7 pages
Machine Learning With Python (Vasavi)
No ratings yet
Machine Learning With Python (Vasavi)
20 pages
Aiml Team Presentation
No ratings yet
Aiml Team Presentation
18 pages
Terminal Assessment 2 DAP
No ratings yet
Terminal Assessment 2 DAP
25 pages
Coding Titanicmain
No ratings yet
Coding Titanicmain
58 pages
Titanic (5)
No ratings yet
Titanic (5)
3 pages
Titanic (4)
No ratings yet
Titanic (4)
3 pages
ML Report
No ratings yet
ML Report
3 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
Using Titanic Dataset for Comprehensive Machine Learning Model Training
No ratings yet
Using Titanic Dataset for Comprehensive Machine Learning Model Training
3 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
A Comparative Study On Machine Learning Techniques Using Titanic Dataset
No ratings yet
A Comparative Study On Machine Learning Techniques Using Titanic Dataset
6 pages
DAL Assignment 2 Endsem
No ratings yet
DAL Assignment 2 Endsem
8 pages
Titanic DS Callenge
No ratings yet
Titanic DS Callenge
24 pages
Titanic Survival
No ratings yet
Titanic Survival
13 pages
Machine Learning
100% (1)
Machine Learning
62 pages
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
No ratings yet
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
11 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
7 pages
10 - Eda To Prediction Dietanic
No ratings yet
10 - Eda To Prediction Dietanic
21 pages
Mini Project ml111
No ratings yet
Mini Project ml111
2 pages
Set Sail: Read - CSV Read - CSV Train Read - CSV Test Train Test
No ratings yet
Set Sail: Read - CSV Read - CSV Train Read - CSV Test Train Test
2 pages
Titanic Eda
No ratings yet
Titanic Eda
14 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Titanic
No ratings yet
Titanic
12 pages
Homework2
No ratings yet
Homework2
12 pages
Titanic Miniproject
No ratings yet
Titanic Miniproject
1 page
Business Analytics Presentation: Titanic Survival Analysis and Prediction
No ratings yet
Business Analytics Presentation: Titanic Survival Analysis and Prediction
15 pages
TD1
No ratings yet
TD1
2 pages
Machine Learnig - Mini Project
No ratings yet
Machine Learnig - Mini Project
5 pages
I2IT DataVisualizationI - JupyterLab
No ratings yet
I2IT DataVisualizationI - JupyterLab
18 pages
Titanic
No ratings yet
Titanic
6 pages
Data Analytics 01: Drag The Titanic Data Add Set Role Connect It Configure It
No ratings yet
Data Analytics 01: Drag The Titanic Data Add Set Role Connect It Configure It
2 pages
Midterm Text
No ratings yet
Midterm Text
13 pages
Python
No ratings yet
Python
9 pages
A.I. Cancer Timebomb
From Everand
A.I. Cancer Timebomb
charles r giardina
No ratings yet
Network Secutiy Planning Achitecture-MIT
No ratings yet
Network Secutiy Planning Achitecture-MIT
96 pages
Titanic Survivals Artificial Neural Network
No ratings yet
Titanic Survivals Artificial Neural Network
6 pages
Predicting Titanic Survivors Using Artificial Neural Network
No ratings yet
Predicting Titanic Survivors Using Artificial Neural Network
6 pages
Network Security For Virutal Machine in Cloud Computing
No ratings yet
Network Security For Virutal Machine in Cloud Computing
4 pages
Cryptography and Network Security 2010
No ratings yet
Cryptography and Network Security 2010
4 pages
6 - Praktek G Power
No ratings yet
6 - Praktek G Power
74 pages
Student Detection of Errors in Examinations
No ratings yet
Student Detection of Errors in Examinations
16 pages
Prelim Lab Quiz 2 - Attempt Review
No ratings yet
Prelim Lab Quiz 2 - Attempt Review
6 pages
Introduction To ROC Analysis
No ratings yet
Introduction To ROC Analysis
15 pages
S1TBX Landcover Classification With Sentinel-1 GRD
No ratings yet
S1TBX Landcover Classification With Sentinel-1 GRD
32 pages
Segmentation Using Morphological Watershed Transformation
No ratings yet
Segmentation Using Morphological Watershed Transformation
12 pages
Rdga DX Duke Slides
No ratings yet
Rdga DX Duke Slides
25 pages
Bivariate Analysis of Sensitivity and Specificity Produces Informative Summary Measures in Diagnostic Reviews
No ratings yet
Bivariate Analysis of Sensitivity and Specificity Produces Informative Summary Measures in Diagnostic Reviews
12 pages
SP Las 1 Quarter 4 Melc 1
No ratings yet
SP Las 1 Quarter 4 Melc 1
11 pages
Ebm Quiz March 2023
No ratings yet
Ebm Quiz March 2023
39 pages
Machine Learning Classification Bootcamp Cheatsheet
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
7 pages
Ai ( x ) Practice Paper 2
No ratings yet
Ai ( x ) Practice Paper 2
4 pages
Crime Hotspot Prediction Using Machine Learning v4
No ratings yet
Crime Hotspot Prediction Using Machine Learning v4
20 pages
Psychology and Marketing - 2023 - Pugliese - How To Conduct Efficient and Objective Literature Reviews Using Natural
No ratings yet
Psychology and Marketing - 2023 - Pugliese - How To Conduct Efficient and Objective Literature Reviews Using Natural
15 pages
Mental Stress Detection in University Students Using Machine Learning Algorithms
100% (1)
Mental Stress Detection in University Students Using Machine Learning Algorithms
5 pages
Fferent Immunization Protocols of Measles
No ratings yet
Fferent Immunization Protocols of Measles
15 pages
CORONA TEST - DR Michael Yeadon Lies Damned Lies and Health Statistics The Deadly Danger of False Positives
No ratings yet
CORONA TEST - DR Michael Yeadon Lies Damned Lies and Health Statistics The Deadly Danger of False Positives
9 pages
Milroy Lecture 2003
No ratings yet
Milroy Lecture 2003
28 pages
International Journal of Scientific and Statistical Computing (IJSSC) Volume (1) Issue
No ratings yet
International Journal of Scientific and Statistical Computing (IJSSC) Volume (1) Issue
18 pages
Liver Project
No ratings yet
Liver Project
61 pages
Understanding Sensitivity Specificity and Predicti
No ratings yet
Understanding Sensitivity Specificity and Predicti
5 pages
Stegers Jager Karin Marieke BEWERKT
No ratings yet
Stegers Jager Karin Marieke BEWERKT
152 pages
Statistics: Dr. Ebtisam El - Hamalawy
No ratings yet
Statistics: Dr. Ebtisam El - Hamalawy
20 pages
Introduction To Statistics With GraphPad Prism Slides
No ratings yet
Introduction To Statistics With GraphPad Prism Slides
101 pages
Siegelman Online Advantage 1.1
No ratings yet
Siegelman Online Advantage 1.1
35 pages
Counterfeit Bill Detection
No ratings yet
Counterfeit Bill Detection
4 pages

PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML

Uploaded by

PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML

Uploaded by

International Journal of Computer Applications (0975 – 8887)

Volume 179 – No.44, May 2018

Predicting Survival on Titanic by Applying Exploratory

Yogesh Kakde Shefali Agrawal

ABSTRACT 2. DATA ANALYTICS AND ITS

Table 1. Description of each attribute in our dataset

4. DESCRIPTION OF DATA fare Passenger fare

Now let us explore our dataset by knowing the influence of

6. EXPLORATORY DATA ANALYSIS

6.1 Age verses Survival

Fig 5: Age v/s Survival

Fig 6: Sex v/s Survival

8.2 Accuracy: It gives the measure of percentage of

Fig 11: Confusion Matrix for Decision Tree

Fig 8: Generalized confusion matrix

8.2.1 Logistic Regression

Fig 12: Confusion Matrix for SVM

10. GUI IMPLEMENTATION IN R

Fig 10: Confusion Matrix for Random Forest

Fig 13: GUI Environment

You might also like