0% found this document useful (0 votes)

3 views10 pages

vertopal.com_Project_16_Calories_Burnt_Prediction

The document outlines a data analysis workflow using Python, focusing on calorie and exercise data collected in a Pandas DataFrame. It includes steps for data collection, processing, visualization, and model training using XGBoost to predict calorie consumption based on various features. The final evaluation shows a Mean Absolute Error of approximately 2.72 for the model's predictions on test data.

Uploaded by

raiv09827

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views10 pages

vertopal.com_Project_16_Calories_Burnt_Prediction

Uploaded by

raiv09827

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Importing the Dependencies

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn import metrics

Data Collection & Processing

# loading the data from csv file to a Pandas DataFrame

calories = pd.read_csv('/content/calories.csv')

# print the first 5 rows of the dataframe

calories.head()

User_ID Calories
0 14733363 231.0
1 14861698 66.0
2 11179863 26.0
3 16180408 71.0
4 17771927 35.0

exercise_data = pd.read_csv('/content/exercise.csv')

exercise_data.head()

User_ID Gender Age Height Weight Duration Heart_Rate

Body_Temp
0 14733363 male 68 190.0 94.0 29.0 105.0
40.8
1 14861698 female 20 166.0 60.0 14.0 94.0
40.3
2 11179863 male 69 179.0 79.0 5.0 88.0
38.7
3 16180408 female 34 179.0 71.0 13.0 100.0
40.5
4 17771927 female 27 154.0 58.0 10.0 81.0
39.8

Combining the two Dataframes

calories_data = pd.concat([exercise_data, calories['Calories']],

axis=1)

calories_data.head()
User_ID Gender Age Height ... Duration Heart_Rate Body_Temp
Calories
0 14733363 male 68 190.0 ... 29.0 105.0 40.8
231.0
1 14861698 female 20 166.0 ... 14.0 94.0 40.3
66.0
2 11179863 male 69 179.0 ... 5.0 88.0 38.7
26.0
3 16180408 female 34 179.0 ... 13.0 100.0 40.5
71.0
4 17771927 female 27 154.0 ... 10.0 81.0 39.8
35.0

[5 rows x 9 columns]

# checking the number of rows and columns

calories_data.shape

(15000, 9)

# getting some informations about the data

calories_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000 entries, 0 to 14999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User_ID 15000 non-null int64
1 Gender 15000 non-null object
2 Age 15000 non-null int64
3 Height 15000 non-null float64
4 Weight 15000 non-null float64
5 Duration 15000 non-null float64
6 Heart_Rate 15000 non-null float64
7 Body_Temp 15000 non-null float64
8 Calories 15000 non-null float64
dtypes: float64(6), int64(2), object(1)
memory usage: 1.0+ MB

# checking for missing values

calories_data.isnull().sum()

User_ID 0
Gender 0
Age 0
Height 0
Weight 0
Duration 0
Heart_Rate 0
Body_Temp 0
Calories 0
dtype: int64

Data Analysis

# get some statistical measures about the data

calories_data.describe()

User_ID Age ... Body_Temp Calories

count 1.500000e+04 15000.000000 ... 15000.000000 15000.000000
mean 1.497736e+07 42.789800 ... 40.025453 89.539533
std 2.872851e+06 16.980264 ... 0.779230 62.456978
min 1.000116e+07 20.000000 ... 37.100000 1.000000
25% 1.247419e+07 28.000000 ... 39.600000 35.000000
50% 1.499728e+07 39.000000 ... 40.200000 79.000000
75% 1.744928e+07 56.000000 ... 40.600000 138.000000
max 1.999965e+07 79.000000 ... 41.500000 314.000000

[8 rows x 8 columns]

Data Visualization

sns.set()

# plotting the gender column in count plot

sns.countplot(calories_data['Gender'])

/usr/local/lib/python3.7/dist-packages/seaborn/_decorators.py:43:
FutureWarning: Pass the following variable as a keyword arg: x. From
version 0.12, the only valid positional argument will be `data`, and
passing other arguments without an explicit keyword will result in an
error or misinterpretation.
FutureWarning

<matplotlib.axes._subplots.AxesSubplot at 0x7fcbbd756110>
# finding the distribution of "Age" column
sns.distplot(calories_data['Age'])

/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed
in a future version. Please adapt your code to use either `displot` (a
figure-level function with similar flexibility) or `histplot` (an
axes-level function for histograms).
warnings.warn(msg, FutureWarning)

<matplotlib.axes._subplots.AxesSubplot at 0x7fcbbd200550>
# finding the distribution of "Height" column
sns.distplot(calories_data['Height'])

<matplotlib.axes._subplots.AxesSubplot at 0x7fcbb1ed3d10>
# finding the distribution of "Weight" column
sns.distplot(calories_data['Weight'])

<matplotlib.axes._subplots.AxesSubplot at 0x7fcbb1e2c190>
Finding the Correlation in the dataset

1. Positive Correlation
2. Negative Correlation
correlation = calories_data.corr()

# constructing a heatmap to understand the correlation

plt.figure(figsize=(10,10))
sns.heatmap(correlation, cbar=True, square=True, fmt='.1f',
annot=True, annot_kws={'size':8}, cmap='Blues')

<matplotlib.axes._subplots.AxesSubplot at 0x7fcbd5c75650>
Converting the text data to numerical values

calories_data.replace({"Gender":{'male':0,'female':1}}, inplace=True)

calories_data.head()

User_ID Gender Age Height ... Duration Heart_Rate Body_Temp

Calories
0 14733363 0 68 190.0 ... 29.0 105.0 40.8
231.0
1 14861698 1 20 166.0 ... 14.0 94.0 40.3
66.0
2 11179863 0 69 179.0 ... 5.0 88.0 38.7
26.0
3 16180408 1 34 179.0 ... 13.0 100.0 40.5
71.0
4 17771927 1 27 154.0 ... 10.0 81.0 39.8
35.0

[5 rows x 9 columns]

Separating features and Target

X = calories_data.drop(columns=['User_ID','Calories'], axis=1)
Y = calories_data['Calories']

print(X)

Gender Age Height Weight Duration Heart_Rate Body_Temp

0 0 68 190.0 94.0 29.0 105.0 40.8
1 1 20 166.0 60.0 14.0 94.0 40.3
2 0 69 179.0 79.0 5.0 88.0 38.7
3 1 34 179.0 71.0 13.0 100.0 40.5
4 1 27 154.0 58.0 10.0 81.0 39.8
... ... ... ... ... ... ... ...
14995 1 20 193.0 86.0 11.0 92.0 40.4
14996 1 27 165.0 65.0 6.0 85.0 39.2
14997 1 43 159.0 58.0 16.0 90.0 40.1
14998 0 78 193.0 97.0 2.0 84.0 38.3
14999 0 63 173.0 79.0 18.0 92.0 40.5

[15000 rows x 7 columns]

print(Y)

0 231.0
1 66.0
2 26.0
3 71.0
4 35.0
...
14995 45.0
14996 23.0
14997 75.0
14998 11.0
14999 98.0
Name: Calories, Length: 15000, dtype: float64

Splitting the data into training data and Test data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,

test_size=0.2, random_state=2)

print(X.shape, X_train.shape, X_test.shape)

(15000, 7) (12000, 7) (3000, 7)

Model Training

XGBoost Regressor

# loading the model

model = XGBRegressor()

# training the model with X_train

model.fit(X_train, Y_train)

[10:06:32] WARNING: /workspace/src/objective/regression_obj.cu:152:

reg:linear is now deprecated in favor of reg:squarederror.

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,

colsample_bynode=1, colsample_bytree=1, gamma=0,
importance_type='gain', learning_rate=0.1,
max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None,
n_estimators=100,
n_jobs=1, nthread=None, objective='reg:linear',
random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)

Evaluation

Prediction on Test Data

test_data_prediction = model.predict(X_test)

print(test_data_prediction)

[129.06204 223.79721 39.181965 ... 145.59767 22.53474

92.29064 ]

Mean Absolute Error

mae = metrics.mean_absolute_error(Y_test, test_data_prediction)

print("Mean Absolute Error = ", mae)

Mean Absolute Error = 2.7159012502233186

Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Subnetting Cheat Sheet
No ratings yet
Subnetting Cheat Sheet
2 pages
Cardiovascular_Disease_Prediction
No ratings yet
Cardiovascular_Disease_Prediction
2 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Heart Disease Prediction! ❤️?
No ratings yet
Heart Disease Prediction! ❤️?
52 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
vertopal.com_python2025
No ratings yet
vertopal.com_python2025
25 pages
Python Solution
No ratings yet
Python Solution
30 pages
Vertopal.com Heart Failure Prediction With Detailed Headings
No ratings yet
Vertopal.com Heart Failure Prediction With Detailed Headings
12 pages
MlProject Cse 30 37
No ratings yet
MlProject Cse 30 37
27 pages
KNN - Jupyter Notebook (1)
No ratings yet
KNN - Jupyter Notebook (1)
7 pages
lab_8__(6)عفان عبدالله احمد_التكليف_
No ratings yet
lab_8__(6)عفان عبدالله احمد_التكليف_
18 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
Major project - Colab
No ratings yet
Major project - Colab
15 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
Fitness 3 18
No ratings yet
Fitness 3 18
16 pages
Importing Files Through Pandas
No ratings yet
Importing Files Through Pandas
16 pages
ML Proj Diabetes.pptx
No ratings yet
ML Proj Diabetes.pptx
51 pages
DAL Experiment Outputs 6to10
No ratings yet
DAL Experiment Outputs 6to10
16 pages
Clothes Size Prediction with KNN (1)
No ratings yet
Clothes Size Prediction with KNN (1)
11 pages
Linear Regression: Data Exploration
No ratings yet
Linear Regression: Data Exploration
12 pages
eda-ml-decision-tree.ipynb - Colab
No ratings yet
eda-ml-decision-tree.ipynb - Colab
20 pages
Untitled2.Ipynb - Colab
No ratings yet
Untitled2.Ipynb - Colab
8 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
pandas correlation,visualization 5
No ratings yet
pandas correlation,visualization 5
8 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Openlab1
No ratings yet
Openlab1
17 pages
MEHAK MONIKA IP PROJECT FINAL 1
No ratings yet
MEHAK MONIKA IP PROJECT FINAL 1
24 pages
Distributional Regression Rage Against the Mean
No ratings yet
Distributional Regression Rage Against the Mean
25 pages
indexdw (1)
No ratings yet
indexdw (1)
34 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Department of Statistics: COURSE STATS 330/762
No ratings yet
Department of Statistics: COURSE STATS 330/762
8 pages
lab task 8 .ipynb - Colab
No ratings yet
lab task 8 .ipynb - Colab
3 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Data Loading- Jupyter Notebook
No ratings yet
Data Loading- Jupyter Notebook
15 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
Dovdush_KN-305_lab3
No ratings yet
Dovdush_KN-305_lab3
2 pages
DIAGRAMA DE CABLEADO SA 10.2
No ratings yet
DIAGRAMA DE CABLEADO SA 10.2
5 pages
Diabetes - Prediction - Project - Ipynb - Colab
No ratings yet
Diabetes - Prediction - Project - Ipynb - Colab
11 pages
Design Manual Is-800 Chapter 7
100% (1)
Design Manual Is-800 Chapter 7
120 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
Exp 5
No ratings yet
Exp 5
7 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
CardioGoodFitness - Jupyter Notebook
No ratings yet
CardioGoodFitness - Jupyter Notebook
12 pages
mokhless_hajji_project
No ratings yet
mokhless_hajji_project
5 pages
Assignment3 VidulGarg
No ratings yet
Assignment3 VidulGarg
14 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
Hare Krishna
No ratings yet
Hare Krishna
1 page
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Project 3 - Diabetes Prediction.ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction.ipynb - Colab
4 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
Pima Indian Diabetes Prediction
No ratings yet
Pima Indian Diabetes Prediction
22 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
MLII-101 PRACTACAL
No ratings yet
MLII-101 PRACTACAL
9 pages
Cven9806 Prestressed Concrete: School of Civil and Environmental Engineering
No ratings yet
Cven9806 Prestressed Concrete: School of Civil and Environmental Engineering
7 pages
Lofa El240 User Manual en
No ratings yet
Lofa El240 User Manual en
23 pages
Nexon Main Brochure Oct 2024
No ratings yet
Nexon Main Brochure Oct 2024
21 pages
STD-004 Junction Box and Manhole Details 6 of 6
No ratings yet
STD-004 Junction Box and Manhole Details 6 of 6
1 page
RTX 09 16 - Datasheet
No ratings yet
RTX 09 16 - Datasheet
6 pages
April 2006
100% (1)
April 2006
68 pages
C++ 2024H1 Assignment-8
No ratings yet
C++ 2024H1 Assignment-8
19 pages
Su 25
No ratings yet
Su 25
4 pages
ProductPlan ProductMgr2020
No ratings yet
ProductPlan ProductMgr2020
42 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Queue Data Structure
No ratings yet
Queue Data Structure
4 pages
PGP in Data Science and AI With Fellowship
No ratings yet
PGP in Data Science and AI With Fellowship
14 pages
Bard Spell List - DND 5th Edition
No ratings yet
Bard Spell List - DND 5th Edition
1 page
Amazon Basics Headset Troubleshooting 2
No ratings yet
Amazon Basics Headset Troubleshooting 2
4 pages
English HP48-USB-Drivers-&-Installation-Guide
No ratings yet
English HP48-USB-Drivers-&-Installation-Guide
10 pages
Oil Gas Brochure
No ratings yet
Oil Gas Brochure
12 pages
Control4 Builder Program Brochure Rev B
No ratings yet
Control4 Builder Program Brochure Rev B
20 pages
COMPUTERSCIENCESET1MarkingSc_50ce3fd735224405ae56dd157acdb1a4_87164
No ratings yet
COMPUTERSCIENCESET1MarkingSc_50ce3fd735224405ae56dd157acdb1a4_87164
9 pages
8.L4 - CoCu 5 - Network Procurement (PG 93 - 104)
No ratings yet
8.L4 - CoCu 5 - Network Procurement (PG 93 - 104)
13 pages
And, Or, Gate
100% (1)
And, Or, Gate
5 pages
Fa 4 STS (Ged104) Subsec 2
No ratings yet
Fa 4 STS (Ged104) Subsec 2
2 pages
4 GPVT450 T650E T650 Shrink Tunnel
No ratings yet
4 GPVT450 T650E T650 Shrink Tunnel
40 pages
Revised 22-23 BSCS With Version Identifier
No ratings yet
Revised 22-23 BSCS With Version Identifier
2 pages
Shri Ramswaroop Memorial University: Topic - Three Level Image Password Authentication System
No ratings yet
Shri Ramswaroop Memorial University: Topic - Three Level Image Password Authentication System
22 pages
SolarEdge Home Wave SE3800H-US 5mCT7mw
No ratings yet
SolarEdge Home Wave SE3800H-US 5mCT7mw
3 pages
Photography in Brief: Pleasures and Terrors OF DOMESTIC COMFORT by Peter Galassi (Museum of
No ratings yet
Photography in Brief: Pleasures and Terrors OF DOMESTIC COMFORT by Peter Galassi (Museum of
2 pages
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
From Everand
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
Bart Baesens
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet