0% found this document useful (0 votes)
14 views

Terminal Assessment 2 DAP

The document outlines a data analysis project using the Titanic Survival Dataset, which includes 1,300 passenger records and aims to analyze factors influencing survival rates. It details the steps of data cleaning, exploratory data analysis (EDA), statistical testing, and key findings, such as significant differences in survival rates based on passenger class and embarkation location. The analysis concludes with actionable insights for real-world applications and suggestions for future research directions.

Uploaded by

Ariane Rubio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Terminal Assessment 2 DAP

The document outlines a data analysis project using the Titanic Survival Dataset, which includes 1,300 passenger records and aims to analyze factors influencing survival rates. It details the steps of data cleaning, exploratory data analysis (EDA), statistical testing, and key findings, such as significant differences in survival rates based on passenger class and embarkation location. The analysis concludes with actionable insights for real-world applications and suggestions for future research directions.

Uploaded by

Ariane Rubio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

DATA ANALYSIS USING

PYTHON
https://www.kaggle.com/datasets/karanprajapati7/titanic-survival-
REFERENCE: dataset?resource=download

DATASET
DESCRIPTION

TITANIC SURVIVAL
The Titanic Survival Dataset contains 1,300 rows and
DATASET
12 columns of records of passenger data
from the ill-fated Titanic voyage. Key features include
Survived (indicating survival status),
Dataset Name
Pclass (passenger class), Sex (gender), Age (age of
the passenger), and Fare (ticket fare). The
goal is to analyze the factors influencing survival
rates among passengers.
3. DATA CLEANING &
PREPARATION

Load the dataset using pandas and inspect its structure


DATA CLEANING &
PREPARATION
INITIAL DATA INSPECTION

OUTPUT:
DATA CLEANING &
PREPARATION
HANDLE MISSING VALUES

OUTPUT:
DATA CLEANING &
PREPARATION
REMOVE DUPLICATES

OUTPUT:
DATA CLEANING &
PREPARATION
ENCODING CATEGORICAL VARIABLES OUTPUT:
4. EXPLORATORY
DATA ANALYSIS (EDA)

Perform EDA to uncover insights about the dataset


4.1 SUMMARY
STATISTICS
EXAMPLE 1: DISTRIBUTION OF AGE:

4.2
UNIVARIATE
VISUALIZATIONS

EXAMPLE 2: COUNT OF PASSENGERS BY EMBARKED:


EXAMPLE: BOXPLOT OF AGE BY PCLASS:

4.3.
BIVARIATE
VISUALIZATIONS

EXAMPLE: SURVIVAL RATES BY PCLASS AND EMBARKED:

4.4.
MULTIVARIATE
VISUALIZATION
5. STATISTICAL
ANALYSIS

Explanation of the statistical techniques used


and their results.
HYPOTHESIS
TESTING (T-
TEST)

INTERPRETATION: THE P-VALUE INDICATES WHETHER THERE IS A SIGNIFICANT


DIFFERENCE IN SURVIVAL RATES BETWEEN PASSENGERS IN 1ST CLASS AND
3RD CLASS. IF THE P-VALUE IS LESS THAN 0.05, WE REJECT THE NULL
HYPOTHESIS AND CONCLUDE THAT THERE IS A SIGNIFICANT DIFFERENCE IN
SURVIVAL RATES.
CHI-SQUARE
TEST

INTERPRETATION: THE CHI-SQUARE TEST DETERMINES WHETHER THE


EMBARKED LOCATION AND SURVIVED STATUS ARE SIGNIFICANTLY RELATED.
IF THE P-VALUE IS LESS THAN 0.05, WE REJECT THE NULL HYPOTHESIS,
INDICATING A RELATIONSHIP BETWEEN BOARDING LOCATION AND SURVIVAL.
ANOVA

INTERPRETATION: THE ANOVA TEST EXAMINES WHETHER THERE ARE


SIGNIFICANT DIFFERENCES IN AGE ACROSS DIFFERENT TICKET CLASSES
PCLASS. A P-VALUE LESS THAN 0.05 SUGGESTS THAT THERE ARE DIFFERENCES
IN THE AVERAGE AGE BETWEEN THE CLASSES.
6. RESULT AND
INSIGHTS

Explanation of the statistical techniques used


and their results.
KEY FINDINGS

SUMMARIZE INSIGHTS HIGHLIGHT RESULTS OF


FROM EDA STATISTICAL TESTS

The Titanic dataset's EDA reveals most Statistical tests revealed significant findings:
passengers were 20–40 years old, averaging T-test: Higher survival rates in 1st vs. 3rd class (p <

30. First-class passengers had the highest 0.05).


Chi-square: Survival linked to embarkation, with
survival rate and were generally older than
Cherbourg showing higher rates (p < 0.05).
third-class passengers. Most embarked in
ANOVA: Older average age in 1st vs. 3rd class (p <
Southampton, with Cherbourg showing the
0.05).
best survival rate.
DISTRIBUTION
OF AGE

THE MAJORITY OF PASSENGERS WERE YOUNG ADULTS, WHICH MAY BE


IMPORTANT WHEN ANALYZING FACTORS INFLUENCING SURVIVAL.
COUNT OF
PASSENGERS
BY EMBARKED
LOCATION

THIS DISTRIBUTION HELPS CONTEXTUALIZE THE SIZE OF THE GROUP FROM


EACH LOCATION, WHICH COULD BE RELEVANT FOR UNDERSTANDING
REGIONAL FACTORS INFLUENCING SURVIVAL RATES.
BOXPLOT OF
AGE BY PCLASS

THE SIGNIFICANT AGE VARIATION BETWEEN CLASSES SHOWS THAT 1ST CLASS
PASSENGERS WERE TYPICALLY OLDER, POTENTIALLY WITH MORE FINANCIAL
MEANS AND RESOURCES FOR SURVIVAL.
BARPLOT OF
SURVIVAL RATES
BY PCLASS AND
EMBARKED

THE ANALYSIS HIGHLIGHTS THE STARK DIFFERENCES IN SURVIVAL RATES,


WHICH COULD BE DUE TO THE ACCESSIBILITY OF LIFEBOATS, CLASS
PRIVILEGE, AND LOCATION-SPECIFIC FACTORS DURING THE TITANIC’S
DISASTER.
ACTIONABLE INSIGHTS

Discuss how the results could be applied in real-world scenarios (e.g., targeting
high-performing branches or improving underperforming ones).

01
These findings not only help to better understand the Titanic dataset but also provide a
framework for applying statistical analysis and visual insights to real-world business
scenarios. By targeting high-performing groups and addressing challenges in
underperforming segments, businesses can make more informed decisions that optimize
performance and improve overall outcomes.
7. CONCLUSION AND
KEY TAKEAWAYS

Summary of insights and potential applications of the


findings.
SUMMARIZE THE ANALYSIS SUGGEST IMPROVEMENTS FOR
PROCESS, KEY INSIGHTS, AND ANY FUTURE ANALYSIS OR ADDITIONAL
CHALLENGES FACED. QUESTIONS THAT COULD BE
EXPLORED.
The analysis involved data cleaning by imputing missing
ages, dropping rows with missing Cabin and Embarked
data, removing duplicates, and encoding categorical Future analysis could include feature engineering
variables. EDA highlighted age distribution, survival rates
(e.g., family size, passenger titles), predictive
by class and embarkation, and age differences by class.
models (e.g., logistic regression, decision trees),
Statistical tests confirmed significant differences in
survival rates (class and embarkation) and age. Most and advanced tests like Mann-Whitney U or
passengers were 20–40 years old; 1st class had the highest Kaplan-Meier analysis. Exploring cabin data, such
survival rates and older passengers. Cherbourg had the as deck information, may reveal additional
best survival rate. Challenges included handling missing survival patterns.
Cabin data and potential information loss from encoding
Embarked.
THANK YOU!

ALAMEDA, KHAIZA
MIBULOS, PRINCE RVIC
RUBIO, ANGELICA ELLAINE

You might also like