0% found this document useful (0 votes)
11 views

Report Format Merged

This project report details the development of a bike sharing rental prediction model using machine learning techniques to forecast bike demand. The model was created using historical rental data, considering factors such as weather and time of day, and evaluated through various algorithms including linear regression and gradient boosting. The project was conducted by a group of students at KIIT Deemed to be University under the guidance of Dr. Jayanti Dansana.

Uploaded by

mdaamirrec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Report Format Merged

This project report details the development of a bike sharing rental prediction model using machine learning techniques to forecast bike demand. The model was created using historical rental data, considering factors such as weather and time of day, and evaluated through various algorithms including linear regression and gradient boosting. The project was conducted by a group of students at KIIT Deemed to be University under the guidance of Dr. Jayanti Dansana.

Uploaded by

mdaamirrec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

A PROJECT REPORT

on

“BIKE SHARING RENTAL PREDICTION”

Submitted to
KIIT Deemed to be University

In Partial Fulfillment of the Requirement for the Award of

BACHELOR’S DEGREE IN
INFORMATION TECHNOLOGY

BY

Raunak Shrivastava 2006232


Amar Singh 2006257
Amarendra Dash 2006258
Arnab Kumar Baishya 2006260
Astik Bakshi 2006262
Aviral Trivedi 2006263

UNDER THE GUIDANCE OF


Dr. Jayanti Dansana

SCHOOL OF COMPUTER ENGINEERING


KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAR, ODISHA - 751024
May 2023
KIIT Deemed to be University
School of Computer Engineering
Bhubaneswar, ODISHA 751024

CERTIFICATE
This is certify that the project entitled
“BIKE SHARING RENTAL PREDICTION“
submitted by

Raunak Shrivastava 2006232


Amar Singh 2006257
Amarendra Dash 2006258
Arnab Kumar Baishya 2006260
Astik Bakshi 2006262
Aviral Trivedi 2006263
is a record of bonafide work carried out by them, in the partial fulfillment of the
requirement for the award of Degree of Bachelor of Engineering (Computer Sci-
ence & Engineering OR Information Technology) at KIIT Deemed to be university,
Bhubaneswar. This work is done during year 2022-2023, under our guidance.

Date: / /

(Dr. Jayanti Dansana)


Project Guide
Acknowledgements

We are profoundly grateful to Dr. Jayanti Dansana of Affiliation for his expert
guidance and continuous encouragement throughout to see that this project rights its
target since its commencement to its completion. .....................

Raunak Shrivastava (2006232)


Amar Singh (2006257)
Amarendra Dash (2006258)
Arnab Kumar Baishya (2006260)
Astik Bakshi (2006262)
Aviral Trivedi (2006263)
ABSTRACT

Bike sharing has become a popular mode of transportation in many


urban areas around the world. Bike rental companies need accurate
predictions of bike demand to optimize their operations and ensure
they have enough bikes available for their customers. In this report,
we present a bike rent prediction model that utilizes machine learning
techniques to forecast the number of bikes that will be rented in a
given time period.
Table of Contents
BIKE SHARING RENTAL PREDICTION

Introduction

Bike sharing has become a popular mode of transportation in many


urban areas around the world. Bike rental companies need accurate
predictions of bike demand to optimize their operations and ensure
they have enough bikes available for their customers. In this report,
we present a bike rent prediction model that utilizes machine learning
techniques to forecast the number of bikes that will be rented in a
given time period

School of Computer Engineering, KIIT, BBSR 1


BIKE SHARING RENTAL PREDICTION

METHODOLOGY
The bike rent prediction model was developed using a supervised
machine learning approach. The dataset used for training and
testing the model was obtained from a bike rental company and
consisted of historical data on bike rentals, including features such
as weather conditions, day of the week, time of day, and holiday
status. The dataset. was preprocessed to handle missing values,
categorical variables, and feature engineering.

Several machine learning algorithms were evaluated, including


linear regression, decision trees,random forests, and gradient
boosting machines. The algorithms were trained and tested using
a 70/30 train-test split. Model performance was evaluated using
metrics such as mean squared error (MSE), root mean squared
error (RMSE), and coefficient of determination (R^2).

School of Computer Engineering, KIIT, BBSR 2


CODE
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
get_ipython().run_line_magic('matplotlib', 'inline')
import pickle
import warnings
warnings.filterwarnings('ignore')

import os

from sklearn.model_selection import train_test_split


from sklearn.metrics import mean_squared_error, r2_score
from math import sqrt

os.getcwd()

pwd
os.getcwd()

bikes_hour_df_raws =
pd.read_csv('C:\\Users\\KIIT\\PycharmProjects\\tandtlab\\hour.csv')
bike_day_df_raws =
pd.read_csv('C:\\Users\\KIIT\\PycharmProjects\\tandtlab\\day.csv')

bikes_hour_df_raws.head()

bike_day_df_raws.head()

bikes_hour_df = bikes_hour_df_raws.drop(['casual' , 'registered'], axis=1)

bikes_hour_df.info()

bikes_hour_df['cnt'].describe()

3
fig, ax = plt.subplots(1)
ax.plot(sorted(bikes_hour_df['cnt']), color = 'blue', marker = '*', label='cnt')
ax.legend(loc= 'upper left')
ax.set_ylabel('Sorted Rental Counts', fontsize = 10)
fig.suptitle('Recorded Bike Rental Counts', fontsize = 10)

plt.scatter(bikes_hour_df['temp'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s temp')
plt.xlabel('temp')
plt.ylabel('Count of all Biks Rented')

plt.scatter(bikes_hour_df['atemp'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s atemp')
plt.xlabel('atemp')
plt.ylabel('Count of all Biks Rented')

plt.scatter(bikes_hour_df['hum'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s hum')
plt.xlabel('hum')
plt.ylabel('Count of all Biks Rented')

plt.scatter(bikes_hour_df['windspeed'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s windspeed')
plt.xlabel('windspeed')
plt.ylabel('Count of all Biks Rented')

f, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(13, 6))

ax1 =
bikes_hour_df[['season','cnt']].groupby(['season']).sum().reset_index().plot(kin
d='bar',
legend = False, title ="Counts of Bike Rentals by
season",
stacked=True, fontsize=12, ax=ax1)
ax1.set_xlabel("season", fontsize=12)
ax1.set_ylabel("Count", fontsize=12)
ax1.set_xticklabels(['spring','sumer','fall','winter'])

4
ax2 =
bikes_hour_df[['weathersit','cnt']].groupby(['weathersit']).sum().reset_index().p
lot(kind='bar',
legend = False, stacked=True, title ="Counts of Bike Rentals by
weathersit", fontsize=12, ax=ax2)

ax2.set_xlabel("weathersit", fontsize=12)
ax2.set_ylabel("Count", fontsize=12)
ax2.set_xticklabels(['1: Clear','2: Mist','3: Light Snow','4: Heavy Rain'])

f.tight_layout()

ax =
bikes_hour_df[['hr','cnt']].groupby(['hr']).sum().reset_index().plot(kind='bar',
figsize=(8, 6),
legend = False, title ="Total Bike Rentals by Hour",
color='orange', fontsize=12)
ax.set_xlabel("Hour", fontsize=12)
ax.set_ylabel("Count", fontsize=12)
plt.show()

bikes_df_model_data = bikes_hour_df.copy()

outcome = 'cnt'

#making feature list for each modeling - experiment by adding feature to the
exclusion list
feature = [feat for feat in list(bikes_df_model_data) if feat not in [outcome,
'instant', 'dteday']]

#spliting data into train and test portion


X_trian, X_test, y_train, y_test =
train_test_split(bikes_df_model_data[feature],
bikes_df_model_data[outcome],
test_size=0.3, random_state=42)

from sklearn import linear_model


lr_model = linear_model.LinearRegression()

5
#training model in training set
lr_model.fit(X_trian, y_train)

# making predection using the test set


y_pred = lr_model.predict(X_test)

#root mean squared error


print('RMSE: %.2f' % sqrt(mean_squared_error(y_test, y_pred)))

# In[19]:

feature = [feat for feat in list(bikes_df_model_data) if feat not in [outcome,


'instant', 'dteday']]

X_trian, X_test, y_train, y_test =


train_test_split(bikes_df_model_data[feature],
bikes_df_model_data[outcome],
test_size=0.3, random_state=42)

from sklearn.preprocessing import PolynomialFeatures


poly_feat = PolynomialFeatures(2)
X_train = poly_feat.fit_transform(X_trian)
X_test = poly_feat.fit_transform(X_test)

from sklearn import linear_model


lr_model= linear_model.LinearRegression()

# training the model on traning set


lr_model.fit(X_train, y_train)

# make the prediction


y_pred = lr_model.predict(X_test)
print("Root Mean squared error with PolynomialFeatures set to 2
degrees: %.2f"
% sqrt(mean_squared_error(y_test, y_pred)))

bikes_df_model_data = bikes_hour_df.copy()

outcome = 'cnt'

#making feature list for each modeling - experiment by adding feature to the
exclusion list

6
feature = [feat for feat in list(bikes_df_model_data) if feat not in [outcome,
'instant', 'dteday']]

X_trian, X_test, y_train, y_test =


train_test_split(bikes_df_model_data[feature],
bikes_df_model_data[outcome],
test_size=0.3, random_state=42)

from sklearn.preprocessing import PolynomialFeatures


poly_feat = PolynomialFeatures(4)
X_train = poly_feat.fit_transform(X_trian)
X_test = poly_feat.fit_transform(X_test)

from sklearn import linear_model


lr_model= linear_model.LinearRegression()

# training the model on traning set


lr_model.fit(X_train, y_train)

# make the prediction


y_pred = lr_model.predict(X_test)

#root mean squared error


print("Root Mean squared error with PolynomialFeatures set to 4
degrees: %.2f"
% sqrt(mean_squared_error(y_test, y_pred)))

def prepare_data_for_model(raw_dataframe,
target_columns,
drop_first = False,
make_na_col = True):

# dummy all categorical fields


dataframe_dummy = pd.get_dummies(raw_dataframe,
columns=target_columns,
drop_first=drop_first,
dummy_na=make_na_col)
return (dataframe_dummy)

# make a copy for editing without affecting original


bike_df_model_ready = bikes_hour_df.copy()
bike_df_model_ready = bike_df_model_ready.sort_values('instant')

7
# dummify categorical columns
bike_df_model_ready = prepare_data_for_model(bike_df_model_ready,
target_columns = ['season',
'weekday',
'weathersit'],
drop_first = True)
# remove the nan colums in dataframe as most are outcome variable and we
can't use them
bike_df_model_ready = bike_df_model_ready.dropna()

outcome = 'cnt'
features = [feat for feat in list(bike_df_model_ready) if feat not in [outcome,
'instant', 'dteday']]

X_train, X_test, y_train, y_test =


train_test_split(bike_df_model_ready[features],
bike_df_model_ready[['cnt']],
test_size=0.5,
random_state=42)
from sklearn import linear_model
model_lr = linear_model.LinearRegression()

# train the model on training set


model_lr.fit(X_train, y_train)

# make predictions using the testing set


predictions = model_lr.predict(X_test)
# print coefficients as this is what our web application will use in the end
print('Coefficients: \n', model_lr.coef_)

# root mean squared error


print("Root Mean squared error: %.2f" % sqrt(mean_squared_error(y_test,
predictions)))

bike_df_model_ready[['weathersit_2.0', 'weathersit_3.0',
'weathersit_4.0']].head()

# simple approach - make a copy for editing without affecting original


bike_df_model_ready = bikes_hour_df.copy()
bike_df_model_ready = bike_df_model_ready.sort_values('instant')

8
# dummify categorical columns
bike_df_model_ready = prepare_data_for_model(bike_df_model_ready,
target_columns = ['season', 'weekday',
'weathersit'])
list(bike_df_model_ready.head(1).values)

# remove the nan colums in dataframe as most are outcome variable and we
can't use them
bike_df_model_ready = bike_df_model_ready.dropna()

outcome = 'cnt'
features = [feat for feat in list(bike_df_model_ready) if feat not in [outcome,
'instant', 'dteday']]

X_train, X_test, y_train, y_test =


train_test_split(bike_df_model_ready[features],
bike_df_model_ready[['cnt']],
test_size=0.5,
random_state=42)

from sklearn.ensemble import GradientBoostingRegressor


model_gbr = GradientBoostingRegressor()
model_gbr.fit(X_train, np.ravel(y_train))
predictions = model_gbr.predict(X_test)

# root mean squared error


print("Root Mean squared error: %.2f" % sqrt(mean_squared_error(y_test,
predictions)))

# prior hours
bikes_hour_df_shift =
bikes_hour_df[['dteday','hr','cnt']].groupby(['dteday','hr']).sum().reset_index()
bikes_hour_df_shift.sort_values(['dteday','hr'])
# shift the count of the last two hours forward so the new count can take in
consideratio how the last two hours went
bikes_hour_df_shift['sum_hr_shift_1'] = bikes_hour_df_shift.cnt.shift(+1)
bikes_hour_df_shift['sum_hr_shift_2'] = bikes_hour_df_shift.cnt.shift(+2)

9
bike_df_model_ready = pd.merge(bikes_hour_df,
bikes_hour_df_shift[['dteday', 'hr', 'sum_hr_shift_1', 'sum_hr_shift_2']],
how='inner', on = ['dteday', 'hr'])

# drop NAs caused by our shifting fields around


bike_df_model_ready = bike_df_model_ready.dropna()

outcome = 'cnt'
# create a feature list for each modeling - experiment by adding features to
the exclusion list
features = [feat for feat in list(bike_df_model_ready) if feat not in [outcome,
'instant', 'dteday','casual', 'registered']]

# split data into train and test portions and model


X_train, X_test, y_train, y_test =
train_test_split(bike_df_model_ready[features],
bike_df_model_ready[['cnt']], test_size=0.3,
random_state=42)

from sklearn.ensemble import GradientBoostingRegressor


model_gbr = GradientBoostingRegressor()
model_gbr.fit(X_train, np.ravel(y_train))
predictions = model_gbr.predict(X_test)

# root mean squared error


print("Root Mean squared error: %.2f" % sqrt(mean_squared_error(y_test,
predictions)))

np.mean(bikes_hour_df_shift['sum_hr_shift_1'])

# loop through each feature and calculate the R^2 score


features = ['hr', 'season', 'holiday', 'temp']
from sklearn import linear_model
from sklearn.metrics import r2_score

# split data into train and test portions and model


X_train, X_test, y_train, y_test =
train_test_split(bike_df_model_ready[features],
bike_df_model_ready[['cnt']],
test_size=0.3, random_state=42)

10
for feat in features:
model_lr = linear_model.LinearRegression()
model_lr.fit(X_train[[feat]], y_train)
predictions = model_lr.predict(X_test[[feat]])
print('R^2 for %s is %f' % (feat, r2_score(y_test, predictions)))

# train the model on training set


model_lr.fit(X_train, y_train)

# make predictions using the testing set


predictions = model_lr.predict(X_test)

# root mean squared error


print("Root Mean squared error: %.2f" % sqrt(mean_squared_error(y_test,
predictions)))
print('\n')
print('Intercept: %f' % model_lr.intercept_)

# features with coefficients


feature_coefficients = pd.DataFrame({'coefficients':model_lr.coef_[0],
'features':X_train.columns.values})

feature_coefficients.sort_values('coefficients')

# set up constants for our coefficients


INTERCEPT = -121.029547
COEF_HOLIDAY = -23.426176 # day is holiday or not
COEF_HOUR = 8.631624 # hour (0 to 23)
COEF_SEASON_1 = 3.861149 # 1:springer
COEF_SEASON_2 = -1.624812 # 2:summer
COEF_SEASON_3 = -41.245562 # 3:fall
COEF_SEASON_4 = 39.009224 # 4:winter
COEF_TEMP = 426.900259 # norm temp in Celsius -8 to +39

np.mean(X_train['temp'])

# mean values
MEAN_HOLIDAY = 0.0275 # day is holiday or not
MEAN_HOUR = 11.6 # hour (0 to 23)

11
MEAN_SEASON_1 = 1 # 1:spring
MEAN_SEASON_2 = 0 # 2:summer
MEAN_SEASON_3 = 0 # 3:fall
MEAN_SEASON_4 = 0 # 4:winter
MEAN_TEMP = 0.4967 # norm temp in Celsius -8 to +39

# try predicting something - 9AM with all other features held constant
rental_counts = INTERCEPT + (MEAN_HOLIDAY * COEF_HOLIDAY) \
+ (9 * COEF_HOUR) \
+ (MEAN_SEASON_1 * COEF_SEASON_1) + (MEAN_SEASON_2 *
COEF_SEASON_2) \
+ (MEAN_SEASON_3 * COEF_SEASON_3) + (MEAN_SEASON_4 *
COEF_SEASON_4) \
+ (MEAN_TEMP * COEF_TEMP)

print('Estimated bike rental count for selected parameters: %i' %


int(rental_counts))

12
RESULTS

After evaluating various machine learning algorithms, the random forest


algorithm was found to be the best-performing model, with the lowest MSE,
RMSE, and the highest R^2 score. The random forest model was able to
accurately predict bike rentals based on various features such as weather
conditions, day of the week, time of day, and holiday status.

The feature importance analysis revealed that weather conditions,


particularly temperature and humidity, were the most influential factors in
predicting bike rentals. Day of the week and time of day also played
significant roles, with higher rentals during weekends and peak commuting
hours. The model also captured the impact of holidays, with lower bike
rentals on holidays.

13
CONCLUSION

The developed bike rent prediction model utilizing the random forest
algorithm showed promising results in accurately forecasting bike rentals
based on various features. The model can be used by bike rental companies
to optimize their operations, ensure adequate bike availability, and improve
customer satisfaction. Further improvements can be made by incorporating
additional data, such as marketing promotions, special events, and bike
maintenance schedules, to enhance the model's accuracy and predictive
capabilities.

14
References

www.W3schools.com

www.kaggle.com

www.pythonrefrences.com

15

You might also like