Ml Final Report
Ml Final Report
A PROJECT REPORT
Submitted by
MUTHUMANI J D [RA22110030112002]
LOKESHWARAN R [RA22110030112018]
GOKUL S [RA2211003011996]
VIKKESH P [RA2211003012005]
Dr. Janani M
Assistant Professor
Department of Computing Technologies
In partial fulfillment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE ENGINEERING
SCHOOL OF COMPUTING
COLLEGE OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR- 603 203
NOVEMBER 2024
i
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
knowledge the work reported herein does not form part of any other project
report or dissertation.
SIGNATURE SIGNATURE
Janani M Dr. Niranjana. G
Course Faculty Head of the Department
Assistant Professor Professor
Department of Computing Department of Computing
Technologies Technologies
SRM Institute of Science and SRM Institute of Science and
Technology Technology
Kattankulathur Kattankulathur
ii
ABSTRACT
The prediction of calories burnt has gained significant interest, particularly in
fitness, health monitoring, and medical fields. Accurately estimating calories
burnt helps individuals manage weight, optimize exercise routines, and maintain
overall health. Caloric expenditure prediction involves analyzing several
variables such as heart rate, activity type, duration, intensity, age, weight, and
gender. Advanced machine learning models, including regression techniques,
decision trees, and neural networks, offer promising accuracy in predicting
calories burnt by capturing complex relationships between these factors.
Wearable devices and fitness apps frequently incorporate these models,
providing real-time feedback and personalized insights to users. Additionally,
with the increasing availability of wearable sensor data, deep learning models
have emerged as powerful tools to further enhance prediction accuracy. Despite
advancements, challenges remain, such as handling personalized predictions,
accounting for metabolic variations, and ensuring data privacy. This study
focuses on developing an optimized calorie prediction model that leverages
physiological and contextual data to provide reliable calorie burn estimates,
catering to a wide range of user demographics and physical activity levels.
iii
TABLE OF CONTENTS
ABSTRACT iii
LIST OF FIGURES v
LIST OF TABLES vi
ABBREVIATIONS vii
1 INTRODUCTION 1
1.1 Accurate Caloric Expenditure Estimation 2
2 LITERATURE SURVEY 5
LIST OF FIGURES
iv
Fig.No. Figure Name Page no.
3.1 Sample Dataset 17
3.2 Data Information 17
3.3 Age Vs Count 18
3.4 Height Vs Count 19
3.5 Weight Vs Count 20
3.6 Duration Vs Count 21
3.7 Heart_Rate Vs Count 22
3.8 Model Cross Validation Score 23
3.9 Correlation Matrix 24
3.10 Linear Regression 25
3.11 XGB regressor 26
5.1 Deployment 1 46
5.2 Deployment 2 47
5.3 Deployment 3 47
LIST OF TABLES
v
2.2 Compartitive Study of 14
Calories Burnt Prediction
4.1 Model Comparison 29
ABBREVIATIONS
vi
SVR - Support Vector Regression
LR - Linear Regression
RR - Ridge Regression
RF - Random Forest
ML - Machine Learning
CV - Cross-Validation
DB - Database
vii
CHAPTER 1
INTRODUCTION
Caloric expenditure prediction is a crucial component in the domains of
fitness, health monitoring, and lifestyle management, as it aids individuals and
healthcare providers in assessing energy output related to various activities.
This capability enables targeted improvements in exercise routines, weight
management plans, and even treatment protocols for metabolic conditions.
Estimating calories burnt requires a comprehensive understanding of
physiological and contextual factors, including an individual's basal metabolic
rate (BMR), activity intensity, heart rate, age, weight, gender, and even
environmental conditions.With the evolution of wearable technologies and the
proliferation of fitness tracking devices, there is a growing wealth of sensor
data available for analysis. This data, when combined with machine learning
algorithms, allows for sophisticated calorie burn prediction models that can
provide near real-time estimates of energy expenditure with significant
accuracy. learning approaches, particularly with access to extensive datasets,
are increasingly applied to enhance prediction precision by automatically
recognizing patterns and dependencies in the data.
1
1.1 Accurate Caloric Expenditure Estimation
Accurate caloric expenditure estimation is essential for individuals aiming to
achieve and maintain a healthy lifestyle. Caloric expenditure, or the number of
calories burned, reflects the energy cost of physical activities, ranging from
simple daily movements to structured exercise routines. Understanding this
expenditure is vital for various purposes, such as weight management, athletic
performance optimization, and general health monitoring. For individuals
engaged in fitness and weight loss programs, knowing the calories burned
allows them to balance calorie intake with expenditure, leading to effective
weight management. Furthermore, accurate calorie tracking is essential for
athletes who want to optimize their energy balance to enhance performance
and recovery.Several physiological and contextual factors influence caloric
expenditure, making precise estimation a challenging task. Key determinants
include age, weight, height, gender, heart rate, and body composition, all of
which contribute to an individual's basal metabolic rate (BMR) — the energy
expended while at rest. The intensity, type, and duration of physical activity
play a significant role in determining calorie burn. Machine learning models
are particularly advantageous in refining calorie predictions by identifying
patterns across large datasets. Techniques like regression analysis, decision
trees, and neural networks have been implemented to map relationships
between inputs like heart rate, age, weight, and activity type to accurately
predict calorie burn. As wearable devices continue to evolve and machine
learning models become increasingly sophisticated, the future holds promise
for even more accurate and individualized calorie expenditure predictions,
ultimately empowering users to make informed health and lifestyle decisions.
2
1.2 Machine Learning in Calorie Prediction
Machine learning plays a crucial role in advancing calorie prediction accuracy
by processing complex, multifaceted data related to caloric expenditure.
Traditional methods, while useful, often fail to account for individual variability
in factors like age, weight, heart rate, activity intensity, and metabolic rate.
Machine learning models, such as regression techniques, decision trees, and
neural networks, can analyze large datasets and recognize patterns across
multiple variables simultaneously, enabling personalized and real-time calorie
predictions. By leveraging continuous data from wearable devices, machine
learning models can adapt to users' unique physiological and lifestyle factors,
providing tailored insights that are far more accurate than conventional
estimation methods. These capabilities make machine learning a powerful tool
in fitness and health management, offering users a deeper understanding of their
energy expenditure and helping them achieve their health goals effectively.
Machine learning, on the other hand, leverages data-driven approaches to create
highly individualized predictions by integrating factors like age, weight, height,
activity type, heart rate, and other biometric data. This personalized and
adaptive approach to calorie prediction not only enhances accuracy but also
aligns closely with individual goals, such as weight management, athletic
performance, or general health improvement. As machine learning algorithms
continue to improve, they hold the potential to refine calorie prediction further,
making it more precise, contextaware, and ultimately beneficial for a wide range
of users
3
1.3 Software Requirements Specification
1.Purpose and Scope:The SRS begins with a clear description of the project’s
purpose and scope, establishing the primary goals and defining the boundaries of the
system. The purpose section specifies why the software is being developed,
identifying the main objectives and the intended benefits.
4
CHAPTER 2
LITERATURE SURVEY
5
2.1 Models for Caloric Expenditure Estimation
The estimation of caloric expenditure, or the number of
calories burned during various activities, is critical for individuals seeking to
manage weight, optimize fitness routines, and monitor overall health.
Numerous models have been developed to predict caloric burn, each
employing different methodologies and relying on various input factors. This
section delves into several prominent models for caloric expenditure
estimation, including traditional predictive equations, heart rate monitoring
techniques, and advanced machine learning algorithms.
2.Physiological Models
6
2.2 Comparative study of Calories Burnt Prediction
Method/Mo Description Strengths Weaknesses
del
Predictive Traditional Simple to use, requires Limited accuracy
Equations equations that minimal for individuals,
estimate BMRand data, well- not tailored to
TDEE based on established activity type.
demographic
factors.
7
Neural Deep learning Highly accurate, Requires
Networks models that learn capable of handling significant
complex patterns in vast amounts of data computational
large datasets to and capturing resources, data-
predict calories nonlinear relationships. intensive, can be a
burned. "black box" for
interpretation.
CHAPTER 3
8
METHODOLOGY OF CALORIES BURNT PREDICTION
The methodology for predicting calories burnt typically
involves a multi-step approach that integrates data collection, model selection, and
validation. The first step in the process is data collection, which can be sourced
from various inputs, including user-provided demographic information (such as
age, weight, height, and gender), physiological measurements (like heart rate and
activity type), and data gathered from wearable devices that monitor real-time
activity levels. Once the data is collected, it undergoes preprocessing to clean and
normalize the dataset, ensuring consistency and accuracy for analysis. Traditional
approaches may utilize predictive equations to estimate Basal Metabolic Rate
(BMR) and Total Daily Energy Expenditure (TDEE), while advanced methods may
incorporate machine learning techniques, such as regression analysis, decision
trees, or neural networks, to capture complex relationships in the data.
9
Fig 3.1 Sample Dataset
10
Age Vs Count
The analysis of "Age vs. Count" in the context of calories burnt prediction
involves examining the distribution of individuals across various age groups and
their corresponding counts within the dataset. This analysis is crucial for
understanding demographic trends and how age influences physical activity
patterns and caloric expenditure. Typically, data visualization techniques such as
bar charts or histograms are employed to illustrate the number of participants in
each age category. It is commonly recognized that metabolic rates tend to decline
with age, which may affect the total caloric expenditure during physical activities.
As a result, individuals in older age brackets might burn fewer calories for the
same level of exertion compared to younger individuals. By investigating these
age-related patterns, researchers can better tailor caloric burn prediction models to
accommodate the varying metabolic rates and activity levels across different age
groups. ultimately leading to more accurate and personalized calorie predictions.
Height Vs Count
The "Height vs. Count" analysis serves as an essential aspect of understanding the
demographics of the dataset in calories burnt prediction. By examining the
distribution of participants across various height categories, this analysis can reveal
patterns that may influence caloric expenditure. This analysis not only provides
11
insight into the physical diversity of the dataset but also highlights potential
relationships between height and caloric expenditure. Taller individuals typically
have a higher basal metabolic rate (BMR) due to greater body mass and surface
area, which can lead to increased caloric burn during physical activities.
Conversely, shorter individuals might exhibit lower caloric expenditure for the
same activity level. By integrating height-related insights, the prediction models
can become more precise, offering better personalized caloric expenditure
estimates tailored to individuals' unique physical characteristics.
Weight Vs Count
The "Weight vs. Count" analysis is a pivotal component in understanding
how body weight distribution impacts caloric expenditure within the dataset for
calories burnt prediction. This analysis typically employs visual representations,
such as histograms or bar charts, to illustrate the frequency of individuals across
various weight categories. Conversely, underrepresentation of certain groups,
such as individuals with lower body weight, may lead to less accurate predictions
12
for those demographics.In addition to frequency counts, the analysis may also
explore how weight correlates with other variables, such as height and age, to
understand the broader context of caloric expenditure.
Duration Vs Count
The length of time you engage in physical activity directly impacts the total
calories burned. For longer sessions, the body can tap into fat stores after initial
glycogen depletion, particularly during moderate-intensity activities. Activities like
running, cycling, or swimming burn more calories over extended durations,
making duration-based workouts ideal for endurance and steady calorie burning.
On the other hand, the count or frequency of workout sessions throughout the week
can also play a critical role. Shorter, high-intensity workouts done more frequently
13
can add up to a high calorie burn over time. The optimal balance between duration
and count varies based on individual goals. The duration of the workouts is spread
unevenly with high spikes in 6, 12, 18, 25 min ranges making it a good factor for
predicting calories.
Heart_Rate Vs Count
The "Heart Rate vs. Count" analysis plays a crucial role in understanding how
heart rate data is distributed among individuals and its significance in predicting
caloric expenditure. By utilizing visual tools such as histograms or line charts, this
analysis can effectively illustrate the frequency of heart rate measurements within
specific ranges during various activities. Typically, one might observe a normal
distribution where most individuals fall within a certain heart rate range, while
extremes may indicate either very low activity levels or very high-intensity efforts.
This analysis is particularly important as heart rate is a key physiological indicator
14
Fig 3.7 Heart_Rate Vs Count
15
fold cross-validation. Cross-validation can help identify any potential biases in the
model by revealing how performance varies across different segments of the
dataset. The model's strengths and weaknesses, cross-validation assists in refining
feature selection and model tuning, ultimately leading to better prediction
accuracy.
Correlation Matrix
A correlation matrix is a powerful tool used to assess the relationships between
multiple variables in a dataset, providing insights into how features interact with
one another. In the context of calories burnt prediction, a correlation matrix
allows researchers to examine the strength and direction of relationships
between various factors, such as demographic data, physiological metrics, and
activity levels. Each cell in the matrix represents the correlation coefficient
between two variables, typically ranging from -1 to +1. A value closer to 1
indicates a strong positive correlation, meaning that as one variable increases,
the other tends to increase as well. This situation can complicate the modeling
16
process by making it difficult to ascertain the individual contribution of
correlated features to the target variable—calories burnt.
Linear Regression
Linear regression is a fundamental machine learning technique widely used in
the field of predictive modeling, including calories burnt prediction. This
method establishes a linear relationship between the dependent variable, which
in this case is the total calories burnt, and one or more independent variables,
such as heart rate, weight, duration of activity, and age. By fitting a linear
equation to the observed data, linear regression enables researchers to
understand how changes in predictor variables influence caloric expenditure. We
evaluated the model's performance using metrics such as Mean Squared Error
(MSE) and R-squared, which helped us assess how well the model explained the
variability in insurance charges. While linear regression serves as a useful
17
baseline model, we also recognized its limitations, particularly in capturing
complex non-linear relationships, which informed our exploration of more
advanced machine learning techniques in subsequent phases of the project.
XGBregressor
19
CHAPTER 4
20
real-time monitoring through wearable technology, and addressing the nuances
of individual variability in metabolic rates.
21
4.2 Comparative Analysis
22
4.Overfitting Prevention: XGBRegressor exhibited enhanced robustness due to
its built-in regularization techniques, which helped prevent overfitting. This was
particularly evident in scenarios with diverse data points, where the model
maintained accuracy across various demographic segments.
Key Insights:
23
● The comparatively simple Random Forest model outperformed more
complex algorithms, highlighting the effectiveness of straightforward
approaches in certain scenarios.
● The performance gap between the best (0.90) and worst (0.79) models
was only 11%, indicating a relatively narrow range of accuracy across the
evaluated algorithms.
● Traditional models, particularly Random Forest demonstrated superior
performance compared to the ensemble methods in our analysis,
suggesting that the nature of the dataset may have influenced the
effectiveness of these more complex algorithms.
# Load your model and data (make sure to adjust the file paths) class
CaloriePredictor:
def __init__(self):
workout_df = pd.read_csv(self.workout_csv_path)
X = workout_df.drop(columns=["Calories", "User_ID"])
24
Y = workout_df[["Calories"]]
# Train model
self.XGBR_model = XGBRegressor().fit(X, Y)
self.XGBR_model.predict(input_data) return
return {
return {
else:
return {
25
"Breakfast": "Please select a valid goal.",
"Lunch": "",
"Dinner": ""
= CaloriePredictor()
** 2)
is: {bmi:.2f}\n"
{diet_recommendation['Dinner']}")
26
# Get the MET value for the selected activity met =
met_values.get(activity.lower()) if met is None: return "Activity not found.
Please enter a valid activity."
inputs=inputs,
outputs=output,
27
CHAPTER 5
Future Enhancement
The incorporation of real-time data from wearable devices could improve the
accuracy of predictions. By leveraging heart rate, movement patterns, and
metabolic data collected during activities, models could be fine-tuned for
individual users. Development of Hybrid Models: Future research could focus
on developing hybrid models that combine the strengths of various machine
learning techniques. By integrating ensemble methods or neural networks with
traditional algorithms, predictions could be enhanced further, providing more
accurate and reliable results. Expanding the feature set to include variables such
as age, gender, and activity type could provide a more holistic view of caloric
28
expenditure. These demographic factors can significantly influence metabolism
and should be considered in any predictive model. Future enhancements could
also focus on creating user-friendly applications that allow individuals to input
their data and receive real-time caloric burn estimates. This could include
mobile applications that utilize machine learning models to provide immediate
feedback during workouts.
29
DEPLOYMENT
30
Fig 5.3 Deployment
31
REFERENCES
[1] Goukens, Caroline, and Anne Kathrin Klesse. "Internal and external
forces that prevent (vs. Facilitate) healthy eating: Review and outlook within
consumer Psychology." Current Opinion in Psychology (2022): 101328.
[3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5496172/
[4] Roberts, K. C., Shields, M., de Groh, M., Aziz, A., & Gilbert, J. A.
(2012).
Overweight and obesity in children and adolescents: results from the 2009 to
2011 Canadian Health Measures Survey. Health rep, 23(3), 37-41.
[5] Kalpesh, Jadhav, et al. "Human Physical Activities Based Calorie Burn
Calculator Using LSTM." Intelligent Cyber Physical Systems and Internet of
Things: ICoICI 2022. Cham: Springer International Publishing, 2023. 405-
424.
[6] Tayade, Akshit Rajesh, and Hadi Safari Katesari. "A Statistical Analysis
to Develop Machine Learning Models: Prediction of User Diet Type."
[7] Gour, Sanjay, et al. "A Machine Learning Approach for Heart Attack
Prediction." Intelligent Sustainable Systems: Selected Papers of WorldS4
2021, Volume 1. Springer Singapore, 2022.
32
33