0% found this document useful (0 votes)

50 views

Report TSP

This document is a mini project report on predicting survival on the Titanic using machine learning algorithms. It summarizes exploratory data analysis performed on the Titanic dataset including examining distributions of variables like age and fare. Logistic regression is implemented and achieves an accuracy of 82% in predicting survivors. The report concludes logistic regression is well-suited for this binary classification problem and performs better than other models based on the chosen features.

Uploaded by

Nishit Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Report TSP

Uploaded by

Nishit Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

A MINI PROJECT REPORT (KCS 354)

CONNECT FOUR DISCS USING JAVA

Submitted by

AVI CHAUDHARY (1901920130052)

RAHUL MOURYA (1901920130135)

Submitted to

MR. ANAND BHUSHAN PANDEY

(Assistant Professor GLBITM Greater Noida)

Department of Information Technology

G. L. Bajaj Institute of Technology and Management
Greater Noida, Uttar Pradesh.
(2020-21)
TABLE OF CONTENTS

SR. NO CONTENT Page No

1 INTRODUCTION 2

QUICK GLANCE ON DATA 3-4

NUMERICAL VARIABLES 4-5
DATA DISTRIBUTION 5-6
2 STORYTELLING 7-9

3 IMPLEMENTATION FOR 10
PREDICTING ACCURACY
4 CONCLUSION 11

5 REFERENCES 12

1
INTRODUCTION

The sinking of the Titanic ship caused the death of about thousands of passengers
and crew is one of the fatal accidents in history. The loss of lives was mostly
caused due to the shortage of the life boats. The mind shaking observation came
out from the incident is that some people were more sustainable to endure than
many others, like children, women were the one who got the more priority to be
rescued. The main objective of the algorithm is to firstly find predictable or
previously unknown data by implementing exploratory data analytics on the
available training data and then apply different machine learning models and
classifiers to complete the analysis.

This will predict which people are more likely to survive. After this the result of
applying machine learning algorithm is analyzed on the basis of performance and
accuracy

Exploratory Data Analysis or (EDA) is understanding the data sets by

summarizing their main characteristics often plotting them visually. This step is
very important especially when we arrive at modeling the data in order to apply
Machine learning. Plotting in EDA consists of Histograms, Box plot, Scatter plot
and many more. It often takes much time to explore the data. Through the process
of EDA, we can ask to define the problem statement or definition on our data set
which is very important.

EDA is used :

● To give insight into a data set.

● Understand the underlying structure.

● Extract important parameters and relationships that hold between them.

● Test underlying assumptions

It is a good practice to understand the data first and try to gather as many insights
from it. EDA is all about making sense of data in hand, before getting them dirty
with it.

2
1) A quick glance on data :

First, we will import the necessary packages and load the data set.

Fig 1 : Glance on data

In the train data, there‟re 891 passengers, and the average survival rate
is 38%. Age ranges from 0.42 to 80 and the average is ~30 year old. At
least 50% of passengers don‟t have siblings / spouses aboard the
Titanic, and at least 75% of passengers don‟t have parents / children
aboard the Titanic. The fare varies a lot.

Fig 2 : Train data

3
Above is a list of passengers with $0 fare. We spot checked a few
passengers to see if the $0 fare is intended.

Passengers that share the same ticket number seem to be in the same
traveling group. We can create a boolean variable for traveling group to
see if people travelled in groups would be more likely to survive.

Fig 3 : Missing data

20% of Age data is missing, 77% of Cabin data is missing, and 0.2% of
Embarked data is missing. We‟ll need to handle the missing data before
modeling. This will be covered in Feature Engineering article as well.

2) Numerical Variables:

As to the box plots, survivors and victims have similar quartiles in Age
and SibSp. Compared to victims, survivors were more likely to have
parents / children aboard the Titanic and have relatively more
expensive tickets.
Box plot provides a quick view of numerical data through quartiles.
Let‟s also check the data distribution using histograms to uncover
additional patterns.

Fig 4 :Box Plot

4
As to the box plots, survivors and victims have similar quartiles in Age
and SibSp. Compared to victims, survivors were more likely to have
parents / children aboard the Titanic and have relatively more
expensive tickets.
Box plot provides a quick view of numerical data through quartiles.

3) Data disrubution :

Fig 5 : Distribution plot

When comparing the distribution of two sets of data, it‟s preferred to use
the relative frequency instead of the absolute frequency. Using Age as an
example, the histogram with absolute frequency suggests that there were
a lot more victims than survivors in the age group of 20–30 .

Fig 6 : Relative Frequency of age

In the histogram of relative frequency for age, what really stands out is the
age group < 10. Children were more likely to survive compared to victims
among all age groups.

5
Fig 7 : Pie Plot for Survived data

From the pie plots, we can tell that passengers with missing
age were more likely to be victims.

Fig 8 : Pie plot for missing age

Regarding feature engineering for Age, I‟ll probably create

a categorical variable including categories for Children,
Adult, Senior and Missing Values respectively.

6
STORYTELLING

Fig 9 : Null values

The column „Age‟ and „Cabin‟ have got null values. While
„Cabin‟ has huge amount null values, „Age‟ has moderate
amount of null values.

We need to form a logic to impute the missing values of the

„Age‟ column. We shall come back to it later after
understanding the relation between „Age‟ and various other
variables.

Let us try to know if the dependent variable „Survived‟ has

any relation with the variable „Sex‟. To do so we would use
factor plot.

Fig 10 : Factorplot

7
Inference: As we all know from the movie as well as the story of
titanic females were given priority while saving passengers. The above
graph also tells us the same story. More number of male passengers
have died than female ones.

Similarly let us try to see how the variable „Pclass‟ is related

to the variable „Survived‟.

Fig 11 : Plot to find victim

according to class

The graph tells us that Pclass 3 were more likely to be survived. It

was meant for the richer people while Pclass 1 were the most likely
victims which was relatively cheaper than class 3.

Fig 12 : Number of Sibling or spouse

8
Here „SibSp‟ variable refers to the number of sibling or spouse
the person was accompanied with. We can see most of the people
came alone.

Fig 13 : Boxplot

Now, figure out a way to fill the missing value of the variable
„Age‟. Here we segregated the „Age‟ variable according to the
Pclass variable as it was found out that „Age‟ and „Pclass‟
column were related. We would draw a boxplot that would tell
us the mean value each of the Pclass.

9
IMPLEMENTATION FOR PREDICTING ACCURACY

HENCE , ACCURACY OF THE PREDICTION = 0.82 i.e 82%

10
CONCLUSION

The logistic regression provides a better accuracy i.e. almost of about

82%. It works better with binary dependent variable which means the
variable has a binary value as its output like yes or no, true or false.

In conclusion, we can say that this data gives us the information of the
travellers and whether they survived or not.

The confusion matrix gives the accuracy of all the models, the logistic
regression is proves to be best among all with an accuracy of 0.8272.
This means the predictive power of logistic regression in this dataset
with the chosen features is very high.

It is clearly stated that the accuracy of the models may vary when the
choice of feature modelling is different. Ideally logistic regression and
support vector machine are the models which give a good level of
accuracy when it comes to classification problem.

I really hope this has been a great read and a source of inspiration to
develop and innovate.

11
REFERENCES

[1] Analyzing Titanic disaster using machine learning

algorithms- Computing, Communication and Automation
(ICCCA), 2017 International Conference on 21 December
2017, IEEE.

[2] Prediction of Survivors in Titanic Dataset: A Comparative

Study using Machine Learning Algorithms, Tryambak
Chatterlee, IJERMT-2017.

[3] MICHAEL AARON WHITLEY, using statistical learning

to predict survival of passengers on the RMS Titanic by
Michael Aaron Whitley, 2015

[4] Atakurt, Y., 1999, Logistic Regression Analysis and an

Implementation in Its Use in Medicine, Ankara University
Faculty of Medicine Journal, C.52, Issue 4, P.195, Ankara

[5] MICHAEL AARON WHITLEY, using statistical learning

to predict survival of passengers on the RMS Titanic by
Michael Aaron Whitley, 2015.

[6] Bircan H., Logistic Regression Analysis: Practice in

Medical Data, Kocaeli University Social Sciences Institute
Journal, 2004
/ 2: 185- 208

[7] Atakurt, Y., 1999, Logistic Regression Analysis and an

Implementation in Its Use in Medicine, Ankara University
Faculty of Medicine Journal, C.52, Issue 4, P.195, Ankara

Titanic Survival Analysis
No ratings yet
Titanic Survival Analysis
61 pages
Individual Asignment Ucs551
70% (10)
Individual Asignment Ucs551
15 pages
Predictive Modeling of Titanic Survivors
No ratings yet
Predictive Modeling of Titanic Survivors
12 pages
Titanic: Machine Learning From Disaster: Source
No ratings yet
Titanic: Machine Learning From Disaster: Source
1 page
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
7 pages
CEP Final
No ratings yet
CEP Final
11 pages
Acknowledgement
No ratings yet
Acknowledgement
24 pages
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
No ratings yet
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
21 pages
Predicting_Titanic_Survivors_by_Using_Machine_Lear
No ratings yet
Predicting_Titanic_Survivors_by_Using_Machine_Lear
8 pages
ML Mini Project 2
No ratings yet
ML Mini Project 2
26 pages
MCA- Project Documentation Guidelines 2024-2025
No ratings yet
MCA- Project Documentation Guidelines 2024-2025
26 pages
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
No ratings yet
Exploratory Data Analysis of Titanic Survival Prediction Using Machine Learning Techniques
5 pages
Titanic Prediction
No ratings yet
Titanic Prediction
53 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
34 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
9
No ratings yet
9
4 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Aim: Predicting The Survival of Titanic Passengers
No ratings yet
Aim: Predicting The Survival of Titanic Passengers
20 pages
Thesis Slide
No ratings yet
Thesis Slide
24 pages
A Mathematical Essay On Logistic Regression: Awik Dhar
No ratings yet
A Mathematical Essay On Logistic Regression: Awik Dhar
4 pages
ML Report-1
No ratings yet
ML Report-1
13 pages
ML Report
No ratings yet
ML Report
3 pages
A Comparative Study On Machine Learning Techniques Using Titanic Dataset
No ratings yet
A Comparative Study On Machine Learning Techniques Using Titanic Dataset
6 pages
DAL Assignment 2 Endsem
No ratings yet
DAL Assignment 2 Endsem
8 pages
Ipl Matches Documentation
No ratings yet
Ipl Matches Documentation
28 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Machine Learning With Python (Vasavi)
No ratings yet
Machine Learning With Python (Vasavi)
20 pages
AI lab5
No ratings yet
AI lab5
5 pages
iml project (1) (1)
No ratings yet
iml project (1) (1)
13 pages
Titanic Disaster Using Machine Learning
No ratings yet
Titanic Disaster Using Machine Learning
7 pages
Titanic Report ml report
No ratings yet
Titanic Report ml report
14 pages
Titanic (5)
No ratings yet
Titanic (5)
3 pages
Titanic (4)
No ratings yet
Titanic (4)
3 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
Ahamed 123
100% (1)
Ahamed 123
7 pages
Machine Learning
100% (1)
Machine Learning
62 pages
Mini Project ml111
No ratings yet
Mini Project ml111
2 pages
Assignment 1 - TITANIC
No ratings yet
Assignment 1 - TITANIC
6 pages
Titanic Survival
No ratings yet
Titanic Survival
13 pages
Using Titanic Dataset for Comprehensive Machine Learning Model Training
No ratings yet
Using Titanic Dataset for Comprehensive Machine Learning Model Training
3 pages
LamTang TitanicMachineLearningFromDisaster
No ratings yet
LamTang TitanicMachineLearningFromDisaster
5 pages
Set Sail: Read - CSV Read - CSV Train Read - CSV Test Train Test
No ratings yet
Set Sail: Read - CSV Read - CSV Train Read - CSV Test Train Test
2 pages
I2IT DataVisualizationI - JupyterLab
No ratings yet
I2IT DataVisualizationI - JupyterLab
18 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Aiml Team Presentation
No ratings yet
Aiml Team Presentation
18 pages
Data Visualization With Seaborn PDF
No ratings yet
Data Visualization With Seaborn PDF
12 pages
Exploring The Titanic Dataset With Python
No ratings yet
Exploring The Titanic Dataset With Python
6 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
7 pages
Coding Titanicmain
No ratings yet
Coding Titanicmain
58 pages
jmp027-titanic-passengers
No ratings yet
jmp027-titanic-passengers
13 pages
08 Titanic
No ratings yet
08 Titanic
19 pages
Homework2
No ratings yet
Homework2
12 pages
Machine Learnig - Mini Project
No ratings yet
Machine Learnig - Mini Project
5 pages
Titanic Data Analysis-Report
No ratings yet
Titanic Data Analysis-Report
4 pages
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
No ratings yet
The Implication of Statistical Analysis and Feature Engineering For Model Building Using Machine Learning Algorithms
11 pages
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
No ratings yet
Titanic Machine Learning From Disaster: M.A.D.-Python Team: Dylan Kenny, Matthew Kiggans, Aleksandr Smirnov
11 pages
Neural Network Project
No ratings yet
Neural Network Project
4 pages
Picturing the Uncertain World: How to Understand, Communicate, and Control Uncertainty through Graphical Display
From Everand
Picturing the Uncertain World: How to Understand, Communicate, and Control Uncertainty through Graphical Display
Howard Wainer
4/5 (6)

Report TSP

Uploaded by

Report TSP

Uploaded by

A MINI PROJECT REPORT (KCS 354)

CONNECT FOUR DISCS USING JAVA

AVI CHAUDHARY (1901920130052)

RAHUL MOURYA (1901920130135)

MR. ANAND BHUSHAN PANDEY

Department of Information Technology

SR. NO CONTENT Page No

QUICK GLANCE ON DATA 3-4

Exploratory Data Analysis or (EDA) is understanding the data sets by

● To give insight into a data set.

● Understand the underlying structure.

● Extract important parameters and relationships that hold between them.

● Test underlying assumptions

Fig 1 : Glance on data

Fig 2 : Train data

Fig 3 : Missing data

Fig 4 :Box Plot

Fig 5 : Distribution plot

Fig 6 : Relative Frequency of age

Fig 8 : Pie plot for missing age

Regarding feature engineering for Age, I‟ll probably create

Fig 9 : Null values

We need to form a logic to impute the missing values of the

Let us try to know if the dependent variable „Survived‟ has

Similarly let us try to see how the variable „Pclass‟ is related

Fig 11 : Plot to find victim

The graph tells us that Pclass 3 were more likely to be survived. It

Fig 12 : Number of Sibling or spouse

HENCE , ACCURACY OF THE PREDICTION = 0.82 i.e 82%

The logistic regression provides a better accuracy i.e. almost of about

[1] Analyzing Titanic disaster using machine learning

[2] Prediction of Survivors in Titanic Dataset: A Comparative

[3] MICHAEL AARON WHITLEY, using statistical learning

[4] Atakurt, Y., 1999, Logistic Regression Analysis and an

[5] MICHAEL AARON WHITLEY, using statistical learning

[6] Bircan H., Logistic Regression Analysis: Practice in

[7] Atakurt, Y., 1999, Logistic Regression Analysis and an

You might also like