0% found this document useful (0 votes)
24 views3 pages

AAM 6th Prac

The document outlines the implementation of the Random Forest Algorithm using Python and scikit-learn, detailing steps from data preprocessing to model evaluation. It includes data loading, feature encoding, model fitting, prediction, and performance metrics such as accuracy and classification report. Additionally, it highlights the importance of using RandomForestClassifier for real-world applications and discusses optional feature scaling and visualization techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views3 pages

AAM 6th Prac

The document outlines the implementation of the Random Forest Algorithm using Python and scikit-learn, detailing steps from data preprocessing to model evaluation. It includes data loading, feature encoding, model fitting, prediction, and performance metrics such as accuracy and classification report. Additionally, it highlights the importance of using RandomForestClassifier for real-world applications and discusses optional feature scaling and visualization techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

6.Implement the Random Forest Algorithm using following Steps a.

Data Preprocessing Step

b. Fitting the Random I orest Algorithm to the Framing Set

c. Predicting the Test Set Result

d. Creating the confusion Matrix

e. Visualizing the training set result

f. Visualizing the test set result

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report

import warnings

warnings.filterwarnings('ignore')

# Corrected URL for the dataset

url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"

titanic_data = pd.read_csv(url)

# Drop rows with missing 'Survived' values

titanic_data = titanic_data.dropna(subset=['Survived'])

# Features and target variable

X = titanic_data[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]

y = titanic_data['Survived']

# Encode 'Sex' column

X.loc[:, 'Sex'] = X['Sex'].map({'female': 0, 'male': 1})

# Fill missing 'Age' values with the median

X.loc[:, 'Age'].fillna(X['Age'].median(), inplace=True)


# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize RandomForestClassifier

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the classifier to the training data

rf_classifier.fit(X_train, y_train)

# Make predictions

y_pred = rf_classifier.predict(X_test)

# Calculate accuracy and classification report

accuracy = accuracy_score(y_test, y_pred)

classification_rep = classification_report(y_test, y_pred)

# Print the results

print(f"Accuracy: {accuracy:.2f}")

print("\nClassification Report:\n", classification_rep)

# Sample prediction

sample = X_test.iloc[0:1] # Keep as DataFrame to match model input format

prediction = rf_classifier.predict(sample)

# Retrieve and display the sample

sample_dict = sample.iloc[0].to_dict()

print(f"\nSample Passenger: {sample_dict}")

print(f"Predicted Survival: {'Survived' if prediction[0] == 1 else 'Did Not Survive'}")

output
 Scikit-learn's RandomForestClassifier: This code now uses the highly optimized and
well-tested RandomForestClassifier from scikit-learn. This is strongly recommended over
a manual implementation for any real-world use.
 Clearer Data Handling: The code explicitly separates features (X) and the target variable
(y).
 Feature Scaling (Optional): I've included the code for feature scaling (using
StandardScaler), but commented it out. Random Forest often doesn't require feature
scaling because the tree-based nature of the algorithm is less sensitive to feature scales.
However, for some datasets, it might improve performance. If you're unsure, experiment with
it both ways.
 Adjustable Parameters: The RandomForestClassifier has parameters you can tune
(like n_estimators, criterion, max_depth, etc.). n_estimators is the number of trees in
the forest. criterion is the function to measure the quality of a split ("gini" or "entropy").
random_state ensures you get the same results each time you run the code.
 Confusion Matrix: The code now calculates and prints the confusion matrix, which is
essential for evaluating classification model performance.
 Accuracy: The code calculates and prints the accuracy score.
 Visualization: The visualization code is improved and now uses meshgrid and contourf
to create a more visually appealing and informative decision boundary plot. It also allows you
to customize the colors used in the plot. The visualization is only really helpful when you
have 2 features (as in the example). For higher dimensional data, visualization becomes much
more complex.
 Comments and Explanations: The code has more detailed comments explaining each
step.

You might also like