AAM 6th Prac
AAM 6th Prac
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
titanic_data = pd.read_csv(url)
titanic_data = titanic_data.dropna(subset=['Survived'])
y = titanic_data['Survived']
# Initialize RandomForestClassifier
rf_classifier.fit(X_train, y_train)
# Make predictions
y_pred = rf_classifier.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
# Sample prediction
prediction = rf_classifier.predict(sample)
sample_dict = sample.iloc[0].to_dict()
output
Scikit-learn's RandomForestClassifier: This code now uses the highly optimized and
well-tested RandomForestClassifier from scikit-learn. This is strongly recommended over
a manual implementation for any real-world use.
Clearer Data Handling: The code explicitly separates features (X) and the target variable
(y).
Feature Scaling (Optional): I've included the code for feature scaling (using
StandardScaler), but commented it out. Random Forest often doesn't require feature
scaling because the tree-based nature of the algorithm is less sensitive to feature scales.
However, for some datasets, it might improve performance. If you're unsure, experiment with
it both ways.
Adjustable Parameters: The RandomForestClassifier has parameters you can tune
(like n_estimators, criterion, max_depth, etc.). n_estimators is the number of trees in
the forest. criterion is the function to measure the quality of a split ("gini" or "entropy").
random_state ensures you get the same results each time you run the code.
Confusion Matrix: The code now calculates and prints the confusion matrix, which is
essential for evaluating classification model performance.
Accuracy: The code calculates and prints the accuracy score.
Visualization: The visualization code is improved and now uses meshgrid and contourf
to create a more visually appealing and informative decision boundary plot. It also allows you
to customize the colors used in the plot. The visualization is only really helpful when you
have 2 features (as in the example). For higher dimensional data, visualization becomes much
more complex.
Comments and Explanations: The code has more detailed comments explaining each
step.