0% found this document useful (0 votes)
15 views

Group Project

This project analyzes Titanic passenger data to determine factors affecting survival rates. Descriptive analysis shows females and higher classes had higher survival rates. Correlation found survival negatively correlated with class and age, and positively with sex. Logistic regression predicted survival and found class, age, and sex were statistically significant with survival.

Uploaded by

ivan valencia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Group Project

This project analyzes Titanic passenger data to determine factors affecting survival rates. Descriptive analysis shows females and higher classes had higher survival rates. Correlation found survival negatively correlated with class and age, and positively with sex. Logistic regression predicted survival and found class, age, and sex were statistically significant with survival.

Uploaded by

ivan valencia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

BUSI group project 1

Group Project: Titanic Dataset

Busi 650: Business Analytics

March 29, 2023

Background

On April 10, 1912, the Titanic set sail from Southampton with 2240 passengers on board

ranging from ages five months to eighty from different classes of society. On April 5, 1912, the

ship hit an iceberg that broke the ship apart, killing more than 1500 crew and passengers

(National Oceanic and Atmospheric Administration, nd.). Since then, analysts have been trying
BUSI group project 2

to identify if more people could have survived and what were the factors leading to surviving or

not surviving the wreck.

This project will use data gathered from Kaggel of 891 passengers, or 40% of passengers

on the ship from different classes, ages, sex and where they embarked, to identify through

descriptive, diagnostic, and predictive analysis to help determine the likelihood of survival.

The objective of the Project

This project will use data to analyze the survival rate based on age, sex and class. We

want to determine how these factors contributed to whether or not people survived the wreckage.

By profoundly examining age, sex, and class and embarked through linear regression, we hope to

conclude which was most strongly associated with survival. The results from this project can be

used to clear myths and suggestions surrounding the sinking, where it is believed that no

passenger had an equal chance of survival.

Data source

The data was gathered from the Kaggle website:

https://www.kaggle.com/competitions/titanic/overview

Data attributes

The dataset contains information for 891 passengers, each with four attributes relevant to

determining whether or not they would likely survive.

Attribute Description

Survival Whether or not a passenger survived:


BUSI group project 3

0: No

1: Yes

Age This describes the age of each passenger.

Sex Male (0) or Female (1)

Class The division of passengers is based on their

socioeconomic status.

1: Typically, wealthy passengers with access

to luxurious amenities

2: Middle-class passengers with fewer

amenities

3: Mostly immigrants

Embark Which port each passenger started their

voyage from:

1: S- Southampton

2: C- Cherbourg

3: Q- Queenstown

Methodology

We used a Logistic Regression in excel miner to build a predictive model.

Data preparation and Processing


BUSI group project 4

Once the data was gathered from Kaggle, we saw that approximately 23.5% of the pages

needed to be included. We used an imputation technique to complete the dataset by following the

steps:

Step 1: Remove the inappropriate data.

First, we remove the data which cannot be converted into numerical. Like, Cabin, name and

ticket.

Step 2: Fill in all the blanks.

Second, we took average to fill in the blanks. We have filled in all the blanks using Go to special

tool in excel.

Step 3: Give the numbers.

Third, we have converted all the word form data into numerical using the replace tool in excel.

Like we give numbers to the male=0 and female=1. For Embarked, we give numbers S=1, C=2

and Q=3.

The average age of 30 was inputted to complete the dataset. This complete data was run against a

training set so it could learn the underlying patterns and relationships to develop an accurate

predictive model. The model is also run against a test file to assist us in assessing the model’s

ability to understand new data that may be inputted.

Descriptive Analysis

Figure 1

Total Count of Survivors by Gender


BUSI group project 5

Figure 1 shows that from the 891 passengers who boarded the ship, 577 were males, and

the remaining 314 were female.

Figure 2

Survivors and Non-Survivors by Class

Figure 2 describes that more people from class 3 died, a total of 372, compared to class 2

and 1, where 97 and 80 people died, respectively. On the other hand, more people from class 1
BUSI group project 6

survived, a total of 136 passengers, than from class 3, where 119 people survived. Class 2 had

the lowest survival rate of 87 passengers.

Figure 3

Survivors and Non-Survivors by Age

Figure 3 describes survivors by age, where 140 passengers aged 30 did not survive. This

does not consider other factors such as age or class. However, 62 passengers of the same age

survived. Most passengers who survived or did not survive were between the ages of 15-55, but

mostly those aged 30 survived.

Figure 4
BUSI group project 7

Total Survivors and Non-Survivors by Gender

Figure 4 describes the total survivors or non-survivors based on the passenger’s gender.

From the data, 468 males did not survive compared to 81 females. On the other hand, more

females totalling 233 survived than males, 109. It can be said that more female survived than

males.

Correlation

Figure 5

Correlation with variables


BUSI group project 8

Figure 5 shows the correlation between survived, class, sex, and age. The correlation

between survived, and class is -0.338481036, indicating a negative correlation between the two

variables. In other words, the likelihood of survival falls as class rises from 1 to 3. The

correlation between sex and survival is also positive at 0.543351381, demonstrating that these

two variables are related. This shows that sex and survival are related, with females having a

higher chance of surviving than men. The correlation coefficient of age indicates a weak negative

association between age and survival with survival, which is -0.070657231. However, the weak

association shows that older passengers may have had a lower chance of surviving than younger

ones.

Predictive Analysis

Logistic Regression

H0 : There is less male survival than female.

H1: There is more male survival than female.

H0: There are more male survival in class 3.

H1: There are less male survival in class 3.

H0: There are fewer people who survival between 20-30 age.

H1: There are more people who survival between 20-30 age.
BUSI group project 9

Ho: People between the ages 60-70 in class 1 did not survive.

H1: People between the ages 60-70 in class 1 survive.

The data below confirms that the p-value for class, age and sex are all very low. This

grounds us to reject the null hypothesis H1, H2, H3, and H4 accept the alternatives (H1) that say

there is a statistically significant relationship between the independent and dependent variables.

It should be noted that the p-value describes a probability and not certainty (Andrade, 2019). The

other variables, such as sib(siblings), parch(parents), fare and embarked, all have a p-value

greater than 0.05 which may indicate that these variables are insignificant to the outcome of

surviving or not.

Figure 6

Logistic Regress Summary Output


BUSI group project 10

Logistic model formula:

P= 1/(1+ e^-(Intercept+ Pclass*X1+ Sex*X2+ Age*X3+ SibSp*X4+ Parch*X5+ Fare*X6+

Embarked*X7))

Here,

X1= Pclass, X2= Sex, X3= Age, X4= SibSp, X5= Parch, X6= Fare, X7= Embarked.

By using the above model, we have predicted the data and used this model for the test file

to cross check.

Data Model

Given the data in figure 6, the model has predicted the probability of a 0.09 chance of

passenger 1 surviving, as the predicted versus actual is 0, and the label is True; this means the

model has accurately predicted that the passenger did not survive. Further, it predicted passenger

2 a probability of 0.91 of passengers surviving and the predicted versus actual is 1, and the label

is True; this means the model has again accurately predicted that the passenger would survive.

The model has accurately predicted the survival or non-survival of passengers with an acceptable

80% accuracy.

Results and Discussion

The predictive accuracy of 80% in the logistic model is considered high for acceptance.

Therefore this model can be used to predict that based on the age, sex, and class of a passenger;

they were likely or unlikely to live. From the findings, we can expect that if you are a male, you
BUSI group project 11

are more likely to die than a female. This may be because males were head of the household and

the ones to take care of the family; therefore, they would have opted for the females to go onto a

lifeboat while they followed later if they could be accommodated. More passengers between the

ages of 15-37 survived due to being more agile and able to swim, holding on to wreckage or

distance from rescue, and lacking lifeboats. Additionally, passenger class was significantly

related to survival, which may be linked to the societal norms when the wealthy were given first

preference regardless of age and gender. Based on reports, lifeboats were launched from the first-

and second-class desk, which may suggest why most survivors came from classes 1 and 2

(Henderson, 1998). The class 3 survival was surprisingly higher than class 2; however, this might

be due to their age and ability to reach a lifeboat after it was launched. It should be noted that as

most women survived, it can be said that the class one survivors were mostly females.

Test data:

By using the logistics regression model, we have test model accuracy in the test file. And we

found that there is a 63% chance of non-survival of passengers, and another 37% will survive.
BUSI group project 12

Conclusion

The sinking of the Titanic in 1912 is one of the best-known maritime accidents and

serious in history. We were commissioned to carry out a descriptive and predictive analysis based

on the information of the passengers on that ship.

Regarding the descriptive analysis, it was observed that although the proportion of people

traveling on the Titanic was predominantly male (65%), only 19% managed to survive. In

comparison, 74% of women managed to save themselves. On the other hand, a substantial

difference is observed in the proportion of people who succeeded considering the class in which

they traveled. The third class far exceeds the deaths they had over the second and first classes.

On the other hand, for the predictive analysis, the logical regression model is used,

considering passenger survival and death as the dependent variable. For the independent

variables, gender, age, and class in which they traveled were considered. On the class side, the

results show that class 3 passengers have the least probability of survival and class 1 passengers

have the highest likelihood of survival. For the sex variable, the findings indicate that women are

more likely to survive than men.


BUSI group project 13

References

R.M.S Titanic - history and significance. R.M.S Titanic - History and Significance | National

Oceanic and Atmospheric Administration. (n.d.). Retrieved from https://www.noaa.gov/gc-

international-section/rms-titanic-history-and-significance#:~:text=Titanic%2C

%20launched%20on%20May%2031,than%201%2C500%20passengers%20and%20crew

Andrade, C. (2019). The p value and statistical significance: Misunderstandings, explanations,

challenges, and alternatives. Indian journal of psychological medicine. Retrieved from

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6532382/

Henderson, J. R. (1998, June 6). Titanic: Demographics of the passengers. Retrieved March 26,

2023, from http://www.icyousee.org/titanic.html

You might also like