0% found this document useful (0 votes)
2 views

FML Micro Project

Uploaded by

devraj.patel247
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

FML Micro Project

Uploaded by

devraj.patel247
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

MICRO PROJECT REPORT

(Fundamentals of Machine Learning - 4341603)

Submitted By
Devraj Patel (236400316116)
Jenish Patel (236400316131)
Maharshi Patel (236400316135)

In partial fulfilment for the curriculum of the


4th Semester in

DIPLOMA ENGINEERING

in

Information Technology Department

R. C. Technical Institute, Ahmedabad

Gujarat Technological University, Ahmedabad


DECEMBER – 2025
INDEX
Content Page no
Introduction 1
Project Explanation 2-3
Code Explanation 4-7
Output 8
Evaluation 9
Conclusion 10
Introduction

The BigMart Sales Prediction project aims to develop a robust machine


learning model capable of accurately predicting the sales of products in various
BigMart outlets across different locations.

The provided dataset consists of product and outlet-specific features, such as item
weight, visibility, outlet size, establishment year, and the type of items sold..

In this project, we explore multiple machine learning algorithms, primarily


focusing on XGBoost and LightGBM, to predict the sales of items.

The dataset underwent extensive preprocessing, which involved handling missing


data, encoding categorical variables, and performing feature engineering to create
new meaningful features. Hyperparameter tuning was also carried out to optimize
the models' performance.

Our models achieved outstanding results, with the final XGBoost model
explaining 99.7% of the variance in sales predictions, indicated by an R² score of
0.9970, and an RMSE of 76.68, demonstrating high accuracy.

The LightGBM model also showed competitive results, highlighting the strength
of both algorithms in solving regression problems. The project demonstrates how
machine learning techniques can be leveraged to solve real-world business
challenges, offering insights into data-driven decision-making for sales
optimization.

1.
Project Introduction

Objective:
 Predict the sales of various products across different BigMart outlets using
historical data and machine learning techniques.

Context:
 BigMart is a retail chain with multiple stores in different cities, selling a
wide range of consumer products.

Dataset:
 Sales data from 2013: The dataset includes historical sales information
from various outlets.
 1,559 products across 10 outlets: Contains data for products and their re-
spective sales.
 Includes product attributes (e.g., weight, type, MRP) and outlet features
(e.g., size, location, type).

Problem Type:
 Supervised Regression Problem
 Target variable: Item_Outlet_Sales (the sales value for products in different
outlets).

Approach:
Data Preprocessing and Cleaning:
 Handle missing values and inconsistent data formats.
 Feature engineering to create new meaningful features (e.g., visibility ratio,
outlet years).
 Encoding categorical variables such as item type and outlet location.
 Scaling numerical features using StandardScaler.

2.
Models Used:
 XGBoost: For its robustness and ability to handle large datasets efficiently.
 LightGBM: A faster alternative to XGBoost for regression tasks.
 Random Forest Regressor: Used as an ensemble method to improve predic-
tion accuracy.
Goal:
 Build a predictive model with strong generalization performance to accur-
ately forecast product sales.
Results:
 Achieved an R² score of 0.9970 and an RMSE of 76.68, indicating high ac-
curacy.

The core goal of this project is to:


1. Clean and preprocess the data.
2. Perform exploratory data analysis (EDA).
3. Develop and optimize machine learning models to predict sales.

3.
Code Explanation
4.
5.
Imports:
 Essential libraries like pandas, numpy, matplotlib, seaborn for data handling
and visualization.
 scikit-learn for preprocessing, model evaluation, and splitting.
 xgboost for advanced regression modeling.

Load and Combine Data:


 Reads Train.csv and Test.csv.
 Combines them using pd.concat() for uniform preprocessing.

Data Cleaning:
 Standardizes inconsistent values in Item_Fat_Content (e.g., ‘low fat’, ‘LF’
→ ‘Low Fat’).
 Fills missing Item_Weight using average weights per Item_Identifier.
 Fills missing Outlet_Size with the most frequent category (mode).

Feature Engineering:
 Creates Item_Visibility_MeanRatio (placeholder for visibility adjustment).
 Calculates Outlet_Years (2025 - year of establishment).
 Extracts item category prefixes to create Item_Type_Combined.
 Updates fat content for non-consumable items as Non-Edible.

Encoding:
 Applies pd.get_dummies() to Item_Type_Combined and Item_Type.
 Uses LabelEncoder on categorical columns:
 Item_Fat_Content, Outlet_Location_Type, Outlet_Size, Outlet_Type,
Outlet_Identifier.

Feature Pruning:
 Drops columns not useful for modeling: Item_Identifier,
Outlet_Establishment_Year.

Data Splitting:
 Splits combined data back into train_data and test_data based on original
lengths.
 Separates features (X) and target (y) from training data.
 Splits training set into 85% training and 15% validation using
train_test_split.

6.
Imputation and Scaling:
 Applies SimpleImputer (mean strategy) to fill any remaining missing
numerical data.
 Scales features using StandardScaler for better model performance.

Model Tuning with GridSearchCV:


 Defines a hyperparameter grid for XGBRegressor.
 Performs 5-fold cross-validation to find the best parameters using
GridSearchCV.

Model Training and Evaluation:


 Trains XGBoost on the best parameters from grid search.
 Makes predictions on the validation set.
 Evaluates using:
o RMSE (Root Mean Squared Error)
o R² Score (Coefficient of Determination)
 Displays feature importances using xgb.plot_importance().

Final Prediction and Submission:


 Applies the best model to the test dataset.
 Reads SampleSubmission.csv, inserts predictions, and writes to a new CSV
file.

7.

Output
The final output of this project is a CSV submission file named
BigMart_Final_Submission.csv, which contains the predicted sales
(Item_Outlet_Sales) for each product in the test dataset.
The file mirrors the structure of the sample submission provided.
 Each row represents a unique product-outlet combination from the test set.
 The Item_Outlet_Sales column holds the predicted sales value for that specific
combination.
Given the high R² score of 0.9970 and low RMSE of 76.68, the predicted outputs
are highly reliable and closely aligned with real sales behavior, making them
valuable for actionable insights in a real-world retail setting.

8.
Evaluation
To assess the effectiveness of the regression models, two primary performance
metrics were used:
 Root Mean Squared Error (RMSE):
Measures the average magnitude of prediction errors. A lower RMSE
indicates better predictive accuracy.
 R² Score (Coefficient of Determination):
Reflects how well the model explains the variance in the target variable. A
value closer to 1 signifies strong predictive power.

Evaluation Results
Final Model – Optimized XGBoost (on Scaled Data):
 RMSE: 76.68
 R²: 0.9970

Insights
 The optimized XGBoost model achieved exceptional performance, with an
R² of 0.9970, indicating it can explain 99.7% of the variance in the target
variable.
 The very low RMSE of 76.68 reflects minimal prediction error, showing
that the model generalizes well.
 This performance is a significant improvement over baseline models (e.g.,
Linear, Ridge, Random Forest), clearly demonstrating the value of:
o Proper feature engineering
o Data scaling and imputation
o Hyperparameter tuning with GridSearchCV
 This model is suitable for real-world deployment to aid BigMart in accurate
demand forecasting and strategic decision-making.
 These predictions can be directly used by BigMart’s analytics team for:
o Demand forecasting
o Inventory planning
o Promotion targeting
o Strategic business decisions

9.
Conclusion
This project successfully demonstrated how machine learning techniques can be
leveraged to predict retail product sales with high accuracy. Using a robust
pipeline involving data preprocessing, feature engineering, and model
optimization, we built and fine-tuned several regression models to predict
Item_Outlet_Sales across various BigMart outlets.

Among all the models tested, the XGBoost Regressor delivered outstanding
performance with an RMSE of 76.68 and an R² score of 0.9970, indicating that
the model could explain nearly all the variance in the sales data.

This level of accuracy reflects the effectiveness of our data handling strategies
and the power of ensemble learning in capturing complex patterns within retail
data.

The results suggest that machine learning, when properly applied, can
significantly enhance business forecasting and decision-making. This model can
help BigMart optimize their inventory management, tailor marketing strategies,
and meet customer demand more effectively.

In summary, this project not only met but exceeded the performance goal of
achieving a positive or near-zero R² score, establishing a solid foundation for
further enhancements and real-world deployment.

Thank You

10.

You might also like