0% found this document useful (0 votes)
0 views

22-CP-63 ML Assignment Report (1)

The lab report focuses on developing and comparing various machine learning models for predicting customer churn using a dataset from Kaggle. Key methodologies included data preprocessing, feature engineering, and model evaluation, with Random Forest showing the best performance among the models tested. The report concludes with observations on model effectiveness and suggestions for future improvements in the analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

22-CP-63 ML Assignment Report (1)

The lab report focuses on developing and comparing various machine learning models for predicting customer churn using a dataset from Kaggle. Key methodologies included data preprocessing, feature engineering, and model evaluation, with Random Forest showing the best performance among the models tested. The report concludes with observations on model effectiveness and suggestions for future improvements in the analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

ML Assignment # 2

Submitted To:
Dr. Waqar Ahmad

Submitted By:
Qamar Sultan

22-Cp-63

Computer Engineering Department

University of Engineering &


Technology, Taxila
Machine Learning Lab Report: Customer
Churn Prediction Analysis
1. Introduction
1.1 Objective
The primary objective of this lab was to develop and compare multiple machine
learning models for predicting customer churn using a comprehensive dataset.
Customer churn prediction is crucial for businesses to understand and mitigate
customer attrition.
1.2 Methodology
We employed a systematic approach to machine learning model development:
 Data Preprocessing
 Feature Engineering
 Model Selection and Training
 Performance Evaluation
2. Dataset Description
I’m using Churn dataset from Kaggle
 customer_churn_dataset-training-master.csv
 customer_churn_dataset-testing-master.csv
2.1 Data Sources
 Training Dataset: customer_churn_dataset-training-master.csv
 Testing Dataset: customer_churn_dataset-testing-master.csv
2.3 Preprocessing Techniques
1. Categorical Variable Encoding
o Used Label Encoding for categorical variables

o Transformed categorical variables to numerical format

2. Missing Value Handling


o Filled missing numeric values with median

o Ensured data completeness and model readiness

3. Feature Scaling
o Applied Standard Scaler to normalize feature distributions

o Preventing bias from varying feature scales

3. Machine Learning Models


3.1 Logistic Regression
 Linear classification algorithm
 Used for binary classification problem
 Performance Metrics:
o Accuracy: 0.73

o Precision: 0.73

o Recall: 0.50

o F1-Score: 0.47

3.2 Decision Tree Classifier


 Non-linear classification algorithm
 Provides feature importance insights
 Key Characteristics:
o Maximum Depth: 5

o Performance Metrics:

 Accuracy: 0.75
 Precision: 0.24
 Recall: 0.50
 F1-Score: 0.47
3.3 Random Forest Classifier
 Ensemble learning method
 Combines multiple decision trees
 Reduces overfitting
 Performance Metrics:
o Accuracy: 0.47

o Precision: 0.22

o Recall: 0.47
o F1-Score: 0.30

3.4 k-Nearest Neighbors (k-NN)


 Non-parametric classification algorithm
 Number of Neighbors: 5
 Performance Metrics:
o Accuracy: 0.49

o Precision: 0.75

o Recall: 0.49

o F1-Score: 0.33

3.5 Naïve Bayes Classifier


 Probabilistic classification algorithm
 Assumes feature independence
 Performance Metrics:
o Accuracy: 0.48

o Precision: 0.75

o Recall: 0.48

o F1-Score: 0.31

3.6 Regularization Techniques


1. Lasso Regression (L1)
o Feature selection through coefficient reduction

2. Ridge Regression (L2)


o Coefficient shrinkage

4. Results and Discussion


4.1 Model Comparison
4.3 Key Observations
 Random Forest demonstrated the most robust performance
 Logistic Regression provided a solid baseline model
 Decision Tree offered interpretable insights into feature importance
5. Limitations and Future Work
1. Limited dataset size
2. Potential for more advanced feature engineering
3. Explore more complex ensemble methods
4. Implement cross-validation for more robust evaluation
6. Conclusion
The lab successfully demonstrated the application of multiple machine learning
algorithms to predict customer churn. Each model offered unique insights, with
Random Forest showing the most promising results.
7. References
1. Scikit-learn Documentation
2. Hands-On Machine Learning with Scikit-Learn and TensorFlow
3. Python for Data Analysis by Wes McKinney

You might also like