Report Format Merged
Report Format Merged
on
Submitted to
KIIT Deemed to be University
BACHELOR’S DEGREE IN
INFORMATION TECHNOLOGY
BY
CERTIFICATE
This is certify that the project entitled
“BIKE SHARING RENTAL PREDICTION“
submitted by
Date: / /
We are profoundly grateful to Dr. Jayanti Dansana of Affiliation for his expert
guidance and continuous encouragement throughout to see that this project rights its
target since its commencement to its completion. .....................
Introduction
METHODOLOGY
The bike rent prediction model was developed using a supervised
machine learning approach. The dataset used for training and
testing the model was obtained from a bike rental company and
consisted of historical data on bike rentals, including features such
as weather conditions, day of the week, time of day, and holiday
status. The dataset. was preprocessed to handle missing values,
categorical variables, and feature engineering.
import os
os.getcwd()
pwd
os.getcwd()
bikes_hour_df_raws =
pd.read_csv('C:\\Users\\KIIT\\PycharmProjects\\tandtlab\\hour.csv')
bike_day_df_raws =
pd.read_csv('C:\\Users\\KIIT\\PycharmProjects\\tandtlab\\day.csv')
bikes_hour_df_raws.head()
bike_day_df_raws.head()
bikes_hour_df.info()
bikes_hour_df['cnt'].describe()
3
fig, ax = plt.subplots(1)
ax.plot(sorted(bikes_hour_df['cnt']), color = 'blue', marker = '*', label='cnt')
ax.legend(loc= 'upper left')
ax.set_ylabel('Sorted Rental Counts', fontsize = 10)
fig.suptitle('Recorded Bike Rental Counts', fontsize = 10)
plt.scatter(bikes_hour_df['temp'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s temp')
plt.xlabel('temp')
plt.ylabel('Count of all Biks Rented')
plt.scatter(bikes_hour_df['atemp'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s atemp')
plt.xlabel('atemp')
plt.ylabel('Count of all Biks Rented')
plt.scatter(bikes_hour_df['hum'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s hum')
plt.xlabel('hum')
plt.ylabel('Count of all Biks Rented')
plt.scatter(bikes_hour_df['windspeed'], bikes_hour_df['cnt'])
plt.suptitle('Numerical Feature: Cnt v/s windspeed')
plt.xlabel('windspeed')
plt.ylabel('Count of all Biks Rented')
ax1 =
bikes_hour_df[['season','cnt']].groupby(['season']).sum().reset_index().plot(kin
d='bar',
legend = False, title ="Counts of Bike Rentals by
season",
stacked=True, fontsize=12, ax=ax1)
ax1.set_xlabel("season", fontsize=12)
ax1.set_ylabel("Count", fontsize=12)
ax1.set_xticklabels(['spring','sumer','fall','winter'])
4
ax2 =
bikes_hour_df[['weathersit','cnt']].groupby(['weathersit']).sum().reset_index().p
lot(kind='bar',
legend = False, stacked=True, title ="Counts of Bike Rentals by
weathersit", fontsize=12, ax=ax2)
ax2.set_xlabel("weathersit", fontsize=12)
ax2.set_ylabel("Count", fontsize=12)
ax2.set_xticklabels(['1: Clear','2: Mist','3: Light Snow','4: Heavy Rain'])
f.tight_layout()
ax =
bikes_hour_df[['hr','cnt']].groupby(['hr']).sum().reset_index().plot(kind='bar',
figsize=(8, 6),
legend = False, title ="Total Bike Rentals by Hour",
color='orange', fontsize=12)
ax.set_xlabel("Hour", fontsize=12)
ax.set_ylabel("Count", fontsize=12)
plt.show()
bikes_df_model_data = bikes_hour_df.copy()
outcome = 'cnt'
#making feature list for each modeling - experiment by adding feature to the
exclusion list
feature = [feat for feat in list(bikes_df_model_data) if feat not in [outcome,
'instant', 'dteday']]
5
#training model in training set
lr_model.fit(X_trian, y_train)
# In[19]:
bikes_df_model_data = bikes_hour_df.copy()
outcome = 'cnt'
#making feature list for each modeling - experiment by adding feature to the
exclusion list
6
feature = [feat for feat in list(bikes_df_model_data) if feat not in [outcome,
'instant', 'dteday']]
def prepare_data_for_model(raw_dataframe,
target_columns,
drop_first = False,
make_na_col = True):
7
# dummify categorical columns
bike_df_model_ready = prepare_data_for_model(bike_df_model_ready,
target_columns = ['season',
'weekday',
'weathersit'],
drop_first = True)
# remove the nan colums in dataframe as most are outcome variable and we
can't use them
bike_df_model_ready = bike_df_model_ready.dropna()
outcome = 'cnt'
features = [feat for feat in list(bike_df_model_ready) if feat not in [outcome,
'instant', 'dteday']]
bike_df_model_ready[['weathersit_2.0', 'weathersit_3.0',
'weathersit_4.0']].head()
8
# dummify categorical columns
bike_df_model_ready = prepare_data_for_model(bike_df_model_ready,
target_columns = ['season', 'weekday',
'weathersit'])
list(bike_df_model_ready.head(1).values)
# remove the nan colums in dataframe as most are outcome variable and we
can't use them
bike_df_model_ready = bike_df_model_ready.dropna()
outcome = 'cnt'
features = [feat for feat in list(bike_df_model_ready) if feat not in [outcome,
'instant', 'dteday']]
# prior hours
bikes_hour_df_shift =
bikes_hour_df[['dteday','hr','cnt']].groupby(['dteday','hr']).sum().reset_index()
bikes_hour_df_shift.sort_values(['dteday','hr'])
# shift the count of the last two hours forward so the new count can take in
consideratio how the last two hours went
bikes_hour_df_shift['sum_hr_shift_1'] = bikes_hour_df_shift.cnt.shift(+1)
bikes_hour_df_shift['sum_hr_shift_2'] = bikes_hour_df_shift.cnt.shift(+2)
9
bike_df_model_ready = pd.merge(bikes_hour_df,
bikes_hour_df_shift[['dteday', 'hr', 'sum_hr_shift_1', 'sum_hr_shift_2']],
how='inner', on = ['dteday', 'hr'])
outcome = 'cnt'
# create a feature list for each modeling - experiment by adding features to
the exclusion list
features = [feat for feat in list(bike_df_model_ready) if feat not in [outcome,
'instant', 'dteday','casual', 'registered']]
np.mean(bikes_hour_df_shift['sum_hr_shift_1'])
10
for feat in features:
model_lr = linear_model.LinearRegression()
model_lr.fit(X_train[[feat]], y_train)
predictions = model_lr.predict(X_test[[feat]])
print('R^2 for %s is %f' % (feat, r2_score(y_test, predictions)))
feature_coefficients.sort_values('coefficients')
np.mean(X_train['temp'])
# mean values
MEAN_HOLIDAY = 0.0275 # day is holiday or not
MEAN_HOUR = 11.6 # hour (0 to 23)
11
MEAN_SEASON_1 = 1 # 1:spring
MEAN_SEASON_2 = 0 # 2:summer
MEAN_SEASON_3 = 0 # 3:fall
MEAN_SEASON_4 = 0 # 4:winter
MEAN_TEMP = 0.4967 # norm temp in Celsius -8 to +39
# try predicting something - 9AM with all other features held constant
rental_counts = INTERCEPT + (MEAN_HOLIDAY * COEF_HOLIDAY) \
+ (9 * COEF_HOUR) \
+ (MEAN_SEASON_1 * COEF_SEASON_1) + (MEAN_SEASON_2 *
COEF_SEASON_2) \
+ (MEAN_SEASON_3 * COEF_SEASON_3) + (MEAN_SEASON_4 *
COEF_SEASON_4) \
+ (MEAN_TEMP * COEF_TEMP)
12
RESULTS
13
CONCLUSION
The developed bike rent prediction model utilizing the random forest
algorithm showed promising results in accurately forecasting bike rentals
based on various features. The model can be used by bike rental companies
to optimize their operations, ensure adequate bike availability, and improve
customer satisfaction. Further improvements can be made by incorporating
additional data, such as marketing promotions, special events, and bike
maintenance schedules, to enhance the model's accuracy and predictive
capabilities.
14
References
www.W3schools.com
www.kaggle.com
www.pythonrefrences.com
15