0% found this document useful (0 votes)
5 views

Random Forest Algorithm - Titanic Dataset

The document discusses using the Random Forest Classifier to predict Titanic passenger survival based on features like age, sex, fare, and class. It outlines the model's accuracy, precision, recall, and F1-score, emphasizing the importance of features such as age, sex, and fare in survival predictions. The goal is to understand how the model makes decisions while practicing classification problems with a real-world dataset.

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Random Forest Algorithm - Titanic Dataset

The document discusses using the Random Forest Classifier to predict Titanic passenger survival based on features like age, sex, fare, and class. It outlines the model's accuracy, precision, recall, and F1-score, emphasizing the importance of features such as age, sex, and fare in survival predictions. The goal is to understand how the model makes decisions while practicing classification problems with a real-world dataset.

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

RANDOM FOREST-TITANIC DATASET

This Algorithm uses the Titanic dataset to predict survival of passengers based on features
like age, sex, fare, and class.
We used a machine learning algorithm called Random Forest Classifier to build a prediction
model.

🌲 Why Random Forest?


•It’s an ensemble model made of multiple decision trees.
•Provides high accuracy and handles missing or mixed data well.
•Automatically detects important features for predictions.

📊 Why the Titanic dataset?


•It’s one of the most popular beginner datasets.
•Real-world dataset with meaningful features.
•Perfect for practicing classification problems.
🎯 Goal: Predict whether a passenger survived or not using machine learning, and
understand how the model makes decisions.
Import Libraries
Used to:
•Load and process data (pandas, seaborn)
•Split data into training/testing (train_test_split)
•Build model (RandomForestClassifier)
•Handle text → numbers (LabelEncoder)
•Evaluate performance (accuracy_score, classification_report)
•Plot graphs (matplotlib, seaborn)
Explanation:
•Load Titanic dataset using seaborn
•Remove unwanted columns

•dropna(inplace=True) removes rows with missing


values directly
(inplace=True makes changes directly to the
dataframe)
•Loop through each column
•If column contains text (object) → convert to numbers
Example: "male" → 0, "female" → 1
•Makes data suitable for machine learning models
•X = Input features (all columns except
'survived')
•y = Output/label (just the 'survived'
column)
•axis=1 means drop a column
1 Precision
1️⃣
Out of all the people the model predicted as "survived", how many actually survived?
•Model said 10 people survived.
•But only 6 actually survived.
•Precision = 6 / 10 = 0.6 or 60% correct.
High precision = The model doesn’t make many false alarms.

2️⃣
Recall
Out of all the people who actually survived, how many did the model correctly predict?
•8 people actually survived.
•Model correctly found 6 of them.
•Recall = 6 / 8 = 0.75 or 75% correct.
High recall = The model doesn’t miss survivors.

3️⃣F1-Score 3️⃣⚖️
A balanced score between precision and recall.
It’s like the average of the two — but more careful about balance.
•If precision is 0.6 and recall is 0.75 → F1 is around 0.67
•F1-score is high only if both precision and recall are good
📊 Final Conclusion from Feature Importance Graph
•✅ Top 3 important features:
• Age
• Sex
• Fare
These had the highest influence on survival predictions.
• Sex and Age:
• The model learned that women and younger passengers had
higher survival chances.
•💰 Fare:
• Passengers who paid more (likely in higher classes) had a better
chance of survival.
•📉 Less important features:
• Embarked, Parch, and SibSp had lower importance, meaning
they played a smaller role in the model’s decisions.
•🧠 Overall:
• The Random Forest model made accurate predictions by focusing
more on key survival factors.

You might also like