0% found this document useful (0 votes)

2 views

FML Micro Project

Uploaded by

devraj.patel247

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

FML Micro Project

Uploaded by

devraj.patel247

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

MICRO PROJECT REPORT

(Fundamentals of Machine Learning - 4341603)

Submitted By
Devraj Patel (236400316116)
Jenish Patel (236400316131)
Maharshi Patel (236400316135)

In partial fulfilment for the curriculum of the

4th Semester in

DIPLOMA ENGINEERING

Information Technology Department

R. C. Technical Institute, Ahmedabad

Gujarat Technological University, Ahmedabad

DECEMBER – 2025
INDEX
Content Page no
Introduction 1
Project Explanation 2-3
Code Explanation 4-7
Output 8
Evaluation 9
Conclusion 10
Introduction

The BigMart Sales Prediction project aims to develop a robust machine

learning model capable of accurately predicting the sales of products in various
BigMart outlets across different locations.

The provided dataset consists of product and outlet-specific features, such as item
weight, visibility, outlet size, establishment year, and the type of items sold..

In this project, we explore multiple machine learning algorithms, primarily

focusing on XGBoost and LightGBM, to predict the sales of items.

The dataset underwent extensive preprocessing, which involved handling missing

data, encoding categorical variables, and performing feature engineering to create
new meaningful features. Hyperparameter tuning was also carried out to optimize
the models' performance.

Our models achieved outstanding results, with the final XGBoost model
explaining 99.7% of the variance in sales predictions, indicated by an R² score of
0.9970, and an RMSE of 76.68, demonstrating high accuracy.

The LightGBM model also showed competitive results, highlighting the strength
of both algorithms in solving regression problems. The project demonstrates how
machine learning techniques can be leveraged to solve real-world business
challenges, offering insights into data-driven decision-making for sales
optimization.

1.
Project Introduction

Objective:
 Predict the sales of various products across different BigMart outlets using
historical data and machine learning techniques.

Context:
 BigMart is a retail chain with multiple stores in different cities, selling a
wide range of consumer products.

Dataset:
 Sales data from 2013: The dataset includes historical sales information
from various outlets.
 1,559 products across 10 outlets: Contains data for products and their re-
spective sales.
 Includes product attributes (e.g., weight, type, MRP) and outlet features
(e.g., size, location, type).

Problem Type:
 Supervised Regression Problem
 Target variable: Item_Outlet_Sales (the sales value for products in different
outlets).

Approach:
Data Preprocessing and Cleaning:
 Handle missing values and inconsistent data formats.
 Feature engineering to create new meaningful features (e.g., visibility ratio,
outlet years).
 Encoding categorical variables such as item type and outlet location.
 Scaling numerical features using StandardScaler.

2.
Models Used:
 XGBoost: For its robustness and ability to handle large datasets efficiently.
 LightGBM: A faster alternative to XGBoost for regression tasks.
 Random Forest Regressor: Used as an ensemble method to improve predic-
tion accuracy.
Goal:
 Build a predictive model with strong generalization performance to accur-
ately forecast product sales.
Results:
 Achieved an R² score of 0.9970 and an RMSE of 76.68, indicating high ac-
curacy.

The core goal of this project is to:

1. Clean and preprocess the data.
2. Perform exploratory data analysis (EDA).
3. Develop and optimize machine learning models to predict sales.

3.
Code Explanation
4.
5.
Imports:
 Essential libraries like pandas, numpy, matplotlib, seaborn for data handling
and visualization.
 scikit-learn for preprocessing, model evaluation, and splitting.
 xgboost for advanced regression modeling.

Load and Combine Data:

 Reads Train.csv and Test.csv.
 Combines them using pd.concat() for uniform preprocessing.

Data Cleaning:
 Standardizes inconsistent values in Item_Fat_Content (e.g., ‘low fat’, ‘LF’
→ ‘Low Fat’).
 Fills missing Item_Weight using average weights per Item_Identifier.
 Fills missing Outlet_Size with the most frequent category (mode).

Feature Engineering:
 Creates Item_Visibility_MeanRatio (placeholder for visibility adjustment).
 Calculates Outlet_Years (2025 - year of establishment).
 Extracts item category prefixes to create Item_Type_Combined.
 Updates fat content for non-consumable items as Non-Edible.

Encoding:
 Applies pd.get_dummies() to Item_Type_Combined and Item_Type.
 Uses LabelEncoder on categorical columns:
 Item_Fat_Content, Outlet_Location_Type, Outlet_Size, Outlet_Type,
Outlet_Identifier.

Feature Pruning:
 Drops columns not useful for modeling: Item_Identifier,
Outlet_Establishment_Year.

Data Splitting:
 Splits combined data back into train_data and test_data based on original
lengths.
 Separates features (X) and target (y) from training data.
 Splits training set into 85% training and 15% validation using
train_test_split.

6.
Imputation and Scaling:
 Applies SimpleImputer (mean strategy) to fill any remaining missing
numerical data.
 Scales features using StandardScaler for better model performance.

Model Tuning with GridSearchCV:

 Defines a hyperparameter grid for XGBRegressor.
 Performs 5-fold cross-validation to find the best parameters using
GridSearchCV.

Model Training and Evaluation:

 Trains XGBoost on the best parameters from grid search.
 Makes predictions on the validation set.
 Evaluates using:
o RMSE (Root Mean Squared Error)
o R² Score (Coefficient of Determination)
 Displays feature importances using xgb.plot_importance().

Final Prediction and Submission:

 Applies the best model to the test dataset.
 Reads SampleSubmission.csv, inserts predictions, and writes to a new CSV
file.

Output
The final output of this project is a CSV submission file named
BigMart_Final_Submission.csv, which contains the predicted sales
(Item_Outlet_Sales) for each product in the test dataset.
The file mirrors the structure of the sample submission provided.
 Each row represents a unique product-outlet combination from the test set.
 The Item_Outlet_Sales column holds the predicted sales value for that specific
combination.
Given the high R² score of 0.9970 and low RMSE of 76.68, the predicted outputs
are highly reliable and closely aligned with real sales behavior, making them
valuable for actionable insights in a real-world retail setting.

8.
Evaluation
To assess the effectiveness of the regression models, two primary performance
metrics were used:
 Root Mean Squared Error (RMSE):
Measures the average magnitude of prediction errors. A lower RMSE
indicates better predictive accuracy.
 R² Score (Coefficient of Determination):
Reflects how well the model explains the variance in the target variable. A
value closer to 1 signifies strong predictive power.

Evaluation Results
Final Model – Optimized XGBoost (on Scaled Data):
 RMSE: 76.68
 R²: 0.9970

Insights
 The optimized XGBoost model achieved exceptional performance, with an
R² of 0.9970, indicating it can explain 99.7% of the variance in the target
variable.
 The very low RMSE of 76.68 reflects minimal prediction error, showing
that the model generalizes well.
 This performance is a significant improvement over baseline models (e.g.,
Linear, Ridge, Random Forest), clearly demonstrating the value of:
o Proper feature engineering
o Data scaling and imputation
o Hyperparameter tuning with GridSearchCV
 This model is suitable for real-world deployment to aid BigMart in accurate
demand forecasting and strategic decision-making.
 These predictions can be directly used by BigMart’s analytics team for:
o Demand forecasting
o Inventory planning
o Promotion targeting
o Strategic business decisions

9.
Conclusion
This project successfully demonstrated how machine learning techniques can be
leveraged to predict retail product sales with high accuracy. Using a robust
pipeline involving data preprocessing, feature engineering, and model
optimization, we built and fine-tuned several regression models to predict
Item_Outlet_Sales across various BigMart outlets.

Among all the models tested, the XGBoost Regressor delivered outstanding
performance with an RMSE of 76.68 and an R² score of 0.9970, indicating that
the model could explain nearly all the variance in the sales data.

This level of accuracy reflects the effectiveness of our data handling strategies
and the power of ensemble learning in capturing complex patterns within retail
data.

The results suggest that machine learning, when properly applied, can
significantly enhance business forecasting and decision-making. This model can
help BigMart optimize their inventory management, tailor marketing strategies,
and meet customer demand more effectively.

In summary, this project not only met but exceeded the performance goal of
achieving a positive or near-zero R² score, establishing a solid foundation for
further enhancements and real-world deployment.

Thank You

10.

How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Relational Intelligence The People Skills You Need For The Life of Purpose You Want (Daniels, Dharius) (Z-Library) - Data - Alterno
No ratings yet
Relational Intelligence The People Skills You Need For The Life of Purpose You Want (Daniels, Dharius) (Z-Library) - Data - Alterno
6 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
Ain't It Fun - Paramore
No ratings yet
Ain't It Fun - Paramore
2 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
Gestalt
100% (3)
Gestalt
39 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
A Handbook To Conquer Casella and Berger Book in Ten Days: Oliver Y. Chén Last Update: June 25, 2016
No ratings yet
A Handbook To Conquer Casella and Berger Book in Ten Days: Oliver Y. Chén Last Update: June 25, 2016
15 pages
HET ka FML
No ratings yet
HET ka FML
13 pages
Report
No ratings yet
Report
40 pages
Implementation (Raw)
No ratings yet
Implementation (Raw)
12 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Ids Case Study
No ratings yet
Ids Case Study
15 pages
E Commerce
No ratings yet
E Commerce
20 pages
Iranian Churn
No ratings yet
Iranian Churn
16 pages
IJCRT2105404 Bigmart 4
No ratings yet
IJCRT2105404 Bigmart 4
4 pages
Predicting House Prices
No ratings yet
Predicting House Prices
9 pages
Part A Doc 1
No ratings yet
Part A Doc 1
21 pages
Bigmart Sales Using Machine Learning With Data Analysis
No ratings yet
Bigmart Sales Using Machine Learning With Data Analysis
5 pages
Report
No ratings yet
Report
9 pages
ML Project[1] Final
No ratings yet
ML Project[1] Final
15 pages
Data_Science_Project_Report_Long
No ratings yet
Data_Science_Project_Report_Long
177 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
Review 2
No ratings yet
Review 2
6 pages
Building a Stock Market Prediction Model Using Machine Learning (1) (2) (1)
No ratings yet
Building a Stock Market Prediction Model Using Machine Learning (1) (2) (1)
11 pages
Ex4.1 Walmart Forecasting
No ratings yet
Ex4.1 Walmart Forecasting
7 pages
Project Report
No ratings yet
Project Report
11 pages
Sales_Forecasting_Based_on_Ensemble_Learning
No ratings yet
Sales_Forecasting_Based_on_Ensemble_Learning
5 pages
Final DMT Report PDF
No ratings yet
Final DMT Report PDF
27 pages
Algorithm Analysis Project Statement
No ratings yet
Algorithm Analysis Project Statement
2 pages
AppliedMachineLearningforSupermarketSalesPrediction
No ratings yet
AppliedMachineLearningforSupermarketSalesPrediction
8 pages
Railways Tender Price Prediction Using Machine Learning and Deep Learning Algorithms
No ratings yet
Railways Tender Price Prediction Using Machine Learning and Deep Learning Algorithms
8 pages
ML Assignment (22BCE8086) 2
No ratings yet
ML Assignment (22BCE8086) 2
19 pages
Car_Dekho-Used_Car_Price_Prediction
No ratings yet
Car_Dekho-Used_Car_Price_Prediction
10 pages
Phase 2 Aiml
No ratings yet
Phase 2 Aiml
7 pages
Predicting Stock Values Using A Recurrent Neural Network
No ratings yet
Predicting Stock Values Using A Recurrent Neural Network
12 pages
Big Data Framework Final Project
No ratings yet
Big Data Framework Final Project
2 pages
SSMDA project
No ratings yet
SSMDA project
27 pages
Forecasting Future Sales of Bigmarts
100% (1)
Forecasting Future Sales of Bigmarts
5 pages
Detailed-Report-Regression-Models-for-House-Price-Prediction
No ratings yet
Detailed-Report-Regression-Models-for-House-Price-Prediction
3 pages
Sales Fore Casting Using Ensemble Methods
No ratings yet
Sales Fore Casting Using Ensemble Methods
6 pages
Conference Paper
No ratings yet
Conference Paper
11 pages
Predictive Model For E-Commerce
100% (1)
Predictive Model For E-Commerce
3 pages
Abstract:: ISSN: 2357-7592
No ratings yet
Abstract:: ISSN: 2357-7592
7 pages
Improvizing Big Market Sales Prediction: Meghana N
No ratings yet
Improvizing Big Market Sales Prediction: Meghana N
7 pages
MyFile (9)
No ratings yet
MyFile (9)
20 pages
ML Project
100% (1)
ML Project
10 pages
Green Minimalist Professional Business Proposal Presentation
No ratings yet
Green Minimalist Professional Business Proposal Presentation
20 pages
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
No ratings yet
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
18 pages
Document1
No ratings yet
Document1
4 pages
MyFile (9)
No ratings yet
MyFile (9)
27 pages
BS MINI PROJECT 2
No ratings yet
BS MINI PROJECT 2
5 pages
Supply Chain Management _ ML_ FA _ DA Project
No ratings yet
Supply Chain Management _ ML_ FA _ DA Project
13 pages
Sample Report
No ratings yet
Sample Report
3 pages
Phase 2
No ratings yet
Phase 2
4 pages
BA TERM 2 PROJECT
No ratings yet
BA TERM 2 PROJECT
6 pages
Report - TNSDC 2021118012
No ratings yet
Report - TNSDC 2021118012
21 pages
Optimizing Sales Forecasting_ A Comprehensive Analysis
No ratings yet
Optimizing Sales Forecasting_ A Comprehensive Analysis
11 pages
Business Analytics Course
No ratings yet
Business Analytics Course
11 pages
Car Price Prediction
No ratings yet
Car Price Prediction
5 pages
IFMPT2024-343-349 (1)
No ratings yet
IFMPT2024-343-349 (1)
7 pages
C6_ ML Project P1 and P2
No ratings yet
C6_ ML Project P1 and P2
4 pages
Evaluatingperformanceofregressionmachinelearningmodels SSRN Id3177507
No ratings yet
Evaluatingperformanceofregressionmachinelearningmodels SSRN Id3177507
17 pages
Model Analytics Performance
No ratings yet
Model Analytics Performance
5 pages
Each Stage of A Data Mining Project
No ratings yet
Each Stage of A Data Mining Project
5 pages
Inventory Demand Forecasting
No ratings yet
Inventory Demand Forecasting
13 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Attention Is All You Need
50% (2)
Attention Is All You Need
11 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Realworld - Python - Hackers Guide2021
67% (3)
Realworld - Python - Hackers Guide2021
362 pages
97 Things Every Programmer Should Know Extended
100% (3)
97 Things Every Programmer Should Know Extended
143 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Pranav SOP Harvard 2
No ratings yet
Pranav SOP Harvard 2
2 pages
The Singularity: Creating Skynet and The Destruction of Humanity.
No ratings yet
The Singularity: Creating Skynet and The Destruction of Humanity.
212 pages
Scientific American - April 2024
100% (1)
Scientific American - April 2024
88 pages
Mythic Magazine #009
100% (3)
Mythic Magazine #009
27 pages
Being Human in The Age of Artificial Intelligence
No ratings yet
Being Human in The Age of Artificial Intelligence
1 page
Sudoku Theory
No ratings yet
Sudoku Theory
13 pages
Cognitive Bias Cheat Sheet
100% (1)
Cognitive Bias Cheat Sheet
17 pages
U.S. Army Intelligence Analysis Manual PDF
100% (1)
U.S. Army Intelligence Analysis Manual PDF
146 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
25 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
34 pages
Risk Management Presentation
No ratings yet
Risk Management Presentation
15 pages
Data Analytics Project
No ratings yet
Data Analytics Project
7 pages
Model R R Square Adjusted R Square Std. Error of The Estimate 1, 892, 723, 47657
No ratings yet
Model R R Square Adjusted R Square Std. Error of The Estimate 1, 892, 723, 47657
2 pages
TMT Siciliano
No ratings yet
TMT Siciliano
9 pages
Henseler. pdf (11)
No ratings yet
Henseler. pdf (11)
15 pages
Uji Hipotesis Sample T-Test
No ratings yet
Uji Hipotesis Sample T-Test
2 pages
Uji Multivariat Regresi Logistik Ganda Penelitian IMD - Deka
No ratings yet
Uji Multivariat Regresi Logistik Ganda Penelitian IMD - Deka
20 pages
Where can buy (eBook PDF) Statistics: Learning from Data 2nd Edition ebook with cheap price
100% (1)
Where can buy (eBook PDF) Statistics: Learning from Data 2nd Edition ebook with cheap price
55 pages
Instant Download (Ebook) A Guide to Econometrics by Peter Kennedy ISBN 9781405182577, 1405182571 PDF All Chapters
100% (2)
Instant Download (Ebook) A Guide to Econometrics by Peter Kennedy ISBN 9781405182577, 1405182571 PDF All Chapters
77 pages
518 2023 05 23 GECO170523English
No ratings yet
518 2023 05 23 GECO170523English
8 pages
Example Candidate Responses Paper 4 A Level Further Mathematics
No ratings yet
Example Candidate Responses Paper 4 A Level Further Mathematics
22 pages
Full Download An Introduction to Latent Variable Growth Curve Modeling Concepts Issues and Application Second Edition Terry E. Duncan PDF DOCX
No ratings yet
Full Download An Introduction to Latent Variable Growth Curve Modeling Concepts Issues and Application Second Edition Terry E. Duncan PDF DOCX
64 pages
Design Space Process Models With Monte Carlo Simulation
No ratings yet
Design Space Process Models With Monte Carlo Simulation
38 pages
BBS11 ISM Ch14
No ratings yet
BBS11 ISM Ch14
50 pages
Augmented Dickey-Fuller Test - Wikipedia
No ratings yet
Augmented Dickey-Fuller Test - Wikipedia
4 pages
EDX As s1 2017 v1
No ratings yet
EDX As s1 2017 v1
4 pages
Assignment-1 (BMAS0108)
No ratings yet
Assignment-1 (BMAS0108)
2 pages
Book Solutions
No ratings yet
Book Solutions
17 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Untitled
No ratings yet
Untitled
5 pages
Autocorrelation
No ratings yet
Autocorrelation
5 pages
2023-3-Per-Smkjt (Q)
No ratings yet
2023-3-Per-Smkjt (Q)
3 pages
Bayesian Analysis: Bayesian Inference Bayesian Statistics Test For Significance
No ratings yet
Bayesian Analysis: Bayesian Inference Bayesian Statistics Test For Significance
1 page
Multiple Choice Test Bank Questions No Feedback - Chapter 5
No ratings yet
Multiple Choice Test Bank Questions No Feedback - Chapter 5
7 pages
Question Bank - PA
No ratings yet
Question Bank - PA
3 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
E - Jurnal Riset Manajemen Fakultas Ekonomi Dan Bisnis Unisma Website
No ratings yet
E - Jurnal Riset Manajemen Fakultas Ekonomi Dan Bisnis Unisma Website
14 pages
004 - Modelling Volatility - Arch and Garch Models
0% (1)
004 - Modelling Volatility - Arch and Garch Models
31 pages