0% found this document useful (0 votes)

135 views1 page

Yulu Case Study

The document discusses a business problem faced by Yulu, an Indian micro-mobility service provider. Yulu has seen dips in revenue and has hired a consulting company to understand factors affecting demand for shared electric cycles. The consulting company needs to identify which variables are significant in predicting demand and how well those variables describe electric cycle demand through hypothesis testing and exploratory data analysis of Yulu usage data.

Uploaded by

vidya.bisht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views1 page

Yulu Case Study

Uploaded by

vidya.bisht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Business Case: Yulu - Hypothesis Testing

In [ ]:

About Yulu
Yulu is India’s leading micro-mobility service provider, which offers unique vehicles for the daily commute. Starting off as a mission to eliminate traffic congestion in India, Yulu provides the safest commute
solution through a user-friendly mobile app to enable shared, solo and sustainable commuting.

Yulu zones are located at all the appropriate locations (including metro stations, bus stands, office spaces, residential areas, corporate offices, etc) to make those first and last miles smooth, affordable, and
convenient!

Yulu has recently suffered considerable dips in its revenues. They have contracted a consulting company to understand the factors on which the demand for these shared electric cycles depends. Specifically,
they want to understand the factors affecting the demand for these shared electric cycles in the Indian market.

In [ ]:

Bussiness Problem
The company wants to know: Which variables are significant in predicting the demand for shared electric cycles in the Indian market?
How well those variables describe the electric cycle demands

In [ ]:

Attribute Information
datetime: datetime

season: season (1: spring, 2: summer, 3: fall, 4: winter)

holiday: whether day is a holiday or not

workingday: if day is neither weekend nor holiday is 1, otherwise is 0.

weather:
1: Clear, Few clouds, partly cloudy, partly cloudy
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

temp: temperature in Celsius

atemp: feeling temperature in Celsius

humidity: humidity

windspeed: wind speed

casual: count of casual users

registered: count of registered users

count: count of total rental bikes including both casual and registered

In [ ]:

Problem Statement
1. Define Problem Statement and perform Exploratory Data Analysis.

1. Univariate and bivariate Analysis

1. Hypothesis Testing:
a. Sample T-Test to check if Working Day has an effect on the number of electric cycles rented
b. ANNOVA to check if No. of cycles rented is similar or different in different 1. weather 2. season
c. Chi-square test to check if Weather is dependent on the season.

In [ ]:

Import Libraries and Data.

In [143… import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from scipy.stats import ttest_ind, ttest_1samp, ttest_rel, chi2_contingency, f_oneway, chisquare, levene, shapiro, boxcox
%matplotlib inline
from statsmodels.graphics.gofplots import qqplot

import warnings
warnings.filterwarnings('ignore')

In [144… df=pd.read_csv("C:\\Users\\vidya\\Downloads\\bike_sharing.txt")

In [145… df

Out[145]: datetime season holiday workingday weather temp atemp humidity windspeed casual registered count

0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0000 3 13 16

1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0000 8 32 40

2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0000 5 27 32

3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0000 3 10 13

4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0000 0 1 1

... ... ... ... ... ... ... ... ... ... ... ... ...

10881 2012-12-19 19:00:00 4 0 1 1 15.58 19.695 50 26.0027 7 329 336

10882 2012-12-19 20:00:00 4 0 1 1 14.76 17.425 57 15.0013 10 231 241

10883 2012-12-19 21:00:00 4 0 1 1 13.94 15.910 61 15.0013 4 164 168

10884 2012-12-19 22:00:00 4 0 1 1 13.94 17.425 61 6.0032 12 117 129

10885 2012-12-19 23:00:00 4 0 1 1 13.12 16.665 66 8.9981 4 84 88

10886 rows × 12 columns

In [146… df.shape

(10886, 12)
Out[146]:

In [147… df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 10886 non-null object
1 season 10886 non-null int64
2 holiday 10886 non-null int64
3 workingday 10886 non-null int64
4 weather 10886 non-null int64
5 temp 10886 non-null float64
6 atemp 10886 non-null float64
7 humidity 10886 non-null int64
8 windspeed 10886 non-null float64
9 casual 10886 non-null int64
10 registered 10886 non-null int64
11 count 10886 non-null int64
dtypes: float64(3), int64(8), object(1)
memory usage: 1020.7+ KB

In [148… df.isna().sum()

datetime 0
Out[148]:
season 0
holiday 0
workingday 0
weather 0
temp 0
atemp 0
humidity 0
windspeed 0
casual 0
registered 0
count 0
dtype: int64

There is no Missing Values are present in Dataframe

In [149… df[df.duplicated()]

Out[149]: datetime season holiday workingday weather temp atemp humidity windspeed casual registered count

There are no Duplicate present in Yulu Data

In [ ]:

Datatype Validation
In [150… df.dtypes

datetime object
Out[150]:
season int64
holiday int64
workingday int64
weather int64
temp float64
atemp float64
humidity int64
windspeed float64
casual int64
registered int64
count int64
dtype: object

column "datetime" dtype is not datetime dtype. conversion of dtype of "datetime" column.

In [151… df['datetime'] = pd.to_datetime(df['datetime'])

In [152… df.describe()

Out[152]: season holiday workingday weather temp atemp humidity windspeed casual registered count

count 10886.000000 10886.000000 10886.000000 10886.000000 10886.00000 10886.000000 10886.000000 10886.000000 10886.000000 10886.000000 10886.000000

mean 2.506614 0.028569 0.680875 1.418427 20.23086 23.655084 61.886460 12.799395 36.021955 155.552177 191.574132

std 1.116174 0.166599 0.466159 0.633839 7.79159 8.474601 19.245033 8.164537 49.960477 151.039033 181.144454

min 1.000000 0.000000 0.000000 1.000000 0.82000 0.760000 0.000000 0.000000 0.000000 0.000000 1.000000

25% 2.000000 0.000000 0.000000 1.000000 13.94000 16.665000 47.000000 7.001500 4.000000 36.000000 42.000000

50% 3.000000 0.000000 1.000000 1.000000 20.50000 24.240000 62.000000 12.998000 17.000000 118.000000 145.000000

75% 4.000000 0.000000 1.000000 2.000000 26.24000 31.060000 77.000000 16.997900 49.000000 222.000000 284.000000

max 4.000000 1.000000 1.000000 4.000000 41.00000 45.455000 100.000000 56.996900 367.000000 886.000000 977.000000

The average temperature was 20.23 degrees Celsius, with 20.5 happening 50% of the time.

68% of the data points are collected for the working day, which makes sense as a lot of people use public transportation on working days.

The median temperature is noted at 20.5 degrees Celsius, while 75% of the data has been recorded at 26.24 degrees Celsius. The average temperature is noted as 20.36 degrees Celsius.

In [ ]:

In [153… df.nunique()

datetime 10886
Out[153]:
season 4
holiday 2
workingday 2
weather 4
temp 49
atemp 60
humidity 89
windspeed 28
casual 309
registered 731
count 822
dtype: int64

season, holiday, workingday and weather are categorical variables. Therefore updating the dtype for the same.

In [154… #changing it from object dtype to category

df["season"]=df["season"].astype("category")
df["holiday"]=df["holiday"].astype("category")
df["workingday"]=df["workingday"].astype("category")
df["weather"]=df["weather"].astype("category")

In [155… df.dtypes

datetime datetime64[ns]
Out[155]:
season category
holiday category
workingday category
weather category
temp float64
atemp float64
humidity int64
windspeed float64
casual int64
registered int64
count int64
dtype: object

In [ ]:

Derived Column
In [156… bins=[0,40,100,200, 300, 500, 700, 900, 1000]
group=['Low','Average','medium', 'H1', 'H2', 'H3', 'H4' , 'Very high']
df['Rent_count']= pd.cut(df['count'],bins,labels=group) # Create new categorical column

In [157… df

Out[157]: datetime season holiday workingday weather temp atemp humidity windspeed casual registered count Rent_count

0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0000 3 13 16 Low

1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0000 8 32 40 Low

2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0000 5 27 32 Low

3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0000 3 10 13 Low

4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0000 0 1 1 Low

... ... ... ... ... ... ... ... ... ... ... ... ... ...

10881 2012-12-19 19:00:00 4 0 1 1 15.58 19.695 50 26.0027 7 329 336 H2

10882 2012-12-19 20:00:00 4 0 1 1 14.76 17.425 57 15.0013 10 231 241 H1

10883 2012-12-19 21:00:00 4 0 1 1 13.94 15.910 61 15.0013 4 164 168 medium

10884 2012-12-19 22:00:00 4 0 1 1 13.94 17.425 61 6.0032 12 117 129 medium

10885 2012-12-19 23:00:00 4 0 1 1 13.12 16.665 66 8.9981 4 84 88 Average

10886 rows × 13 columns

In [ ]:

Value count and unique value

In [158… df.season.value_counts()

4 2734
Out[158]:
2 2733
3 2733
1 2686
Name: season, dtype: int64

In [ ]:

In [159… df.weather.value_counts()

1 7192
Out[159]:
2 2834
3 859
4 1
Name: weather, dtype: int64

In [160… df.workingday.nunique()

2
Out[160]:

In [161… df.humidity.nunique()

89
Out[161]:

There are 4 season in yulu data set in which demand is allmost equal in all season.

Weather 1 which is Clear, Few clouds, partly cloudy, partly cloudy have higher demand for shared electric market as compared to other weathers.

There are only two unique values for working day day weekend or holiday and weekday.

In [ ]:

Univariate Analysis
In [162… col_category=["season","holiday","workingday","weather"]

fig, axis = plt.subplots(nrows=2, ncols=2, figsize=(14, 12))

index = 0
for row in range(2):

for col in range(2):

cp=sns.countplot(df[col_category[index]], ax=axis[row, col],palette="cubehelix")
cp.bar_label(cp.containers[0])

index += 1

plt.show()

1. Almost all season have same count. There exist negligible change in number.

1. More count on Holiday as compared to working day.

1. Graph 2 we see It is highly imbalanced to holiday and working day, because a lot of people don't use vehicles on holiday

1. If seen in weather , weather 1 that is clear weather having the maximum demands for bike goes on decreasing as weather changes to mist and then light snow and almost negligible in the heavy rain. As it
is much risky to use Bike in such a climate.

1. 1 more categorical variable is made so as to bin the count of number of bicycles rented in low, medium , high etc. which shows the lognormal distribution as maximum times Low and then for different
High values for many reasons.

1. Data looks common as it should be like equal number of days in each season, more working days and weather is mostly Clear, Few clouds, partly cloudy, partly cloudy.

In [ ]:

In [163… fig, axis = plt.subplots(nrows=3, ncols=2, figsize=(16, 20))

index = 0
for row in range(3):
for col in range(2):
sns.histplot(df[col_numerical[index]], ax=axis[row, col], kde=True,color="green")
index += 1
plt.show()

plt.figure(figsize=(12,8))
sns.histplot(df[col_numerical[-1]], kde=True,color="r")
plt.show()

1. casual, registered and count somewhat looks like Log Normal Distribution

1. temp, atemp and humidity looks like they follows the Normal Distribution

1. windspeed follows the binomial distribution

In [ ]:

In [164… fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(18, 12))

sns.countplot(data=df, x='season', ax=axs[0,0],hue=df.weather,palette="Accent")
sns.countplot(data=df, x='holiday', ax=axs[0,1],hue=df.weather,palette="twilight_shifted")
sns.countplot(data=df, x='workingday', ax=axs[1,0],hue=df.weather,palette="mako")
sns.countplot(data=df, x='weather', ax=axs[1,1],hue=df.workingday,palette="cubehelix")
plt.show()

1. Whatever may be the season is the weather has a strong impact as clear wheather Most demand then mist and then light snow. And heavy rain less demand is shown from the above plot.

1. More demand of Yulu bikes is on working day . As it can be used and a transport to commute to their offices.

1. On week day and holiday or weekend demand is high of Yulu bikes when weather is clear or few cloudy.

In [ ]:

In [165… fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(18, 15))

sns.countplot(data=df, x='Rent_count', ax=axs[0,0],hue=df.weather,palette="Accent")
sns.countplot(data=df, x='Rent_count', ax=axs[0,1],hue=df.workingday,palette="twilight_shifted")
sns.countplot(data=df, x='Rent_count', ax=axs[1,0],hue=df.season,palette="cubehelix")
sns.countplot(data=df, x='Rent_count', ax=axs[1,1],hue=df.holiday,palette="cubehelix")
plt.show()

1. In spring season total count of rental bikes is more than other seasons.

1. Whenever there is rain, thunderstorm, snow or fog, there were less bikes were rented.

In [ ]:

Bivariate analysis
In [166… col_numerical = ['temp', 'atemp', 'humidity', 'windspeed', 'casual', 'registered','count']
fig, axis = plt.subplots(nrows=3, ncols=2, figsize=(16, 14))
index = 0
for row in range(3):
for col in range(2):
sns.boxplot(df[col_numerical[index]], ax=axis[row, col],color="aqua")
index += 1

plt.show()
sns.boxplot(df[col_numerical[-1]])
plt.show()

In [ ]:

In [167… sns.pairplot(df,kind='reg',hue="weather")

<seaborn.axisgrid.PairGrid at 0x29d29fd5310>
Out[167]:

In [ ]:

In [168… plt.figure(figsize=(12,8))
sns.heatmap(df.corr(), annot=True, cmap="Blues", linewidth=.5)

<AxesSubplot:>
Out[168]:

In [ ]:

In [169… fig, axis = plt.subplots(nrows=2, ncols=3, figsize=(16, 12))

index = 0
for row in range(2):
for col in range(3):
qqplot(df[col_numerical[index]], line="s", ax=axis[row, col])
index += 1

qqplot(df[col_numerical[-1]], line = "s")

plt.show()

To verify for ANOVA assumptions we have plot qqplot of all the numerival attributes and can observe
1. casual, registered and count somewhat looks like Log Normal Distribution are not aligned to red "S" line.

1. temp, atemp and humidity looks like they follows the Normal Distribution are aligned to red "S" line.

1. windspeed follows the binomial distribution not aligned to red "S" line.

In [ ]:

sample t test
Perfoming 2 sample t test on working day and non working day counts.

Taking significant level(alpha) as 0.05 for all test.

considreing: Null hypothesis Ho = mean of count of bike on non working day is equal to mean of counts of bike on working day.

Alternate hypothesis Ha = mean of count of bike on non working day is not equal to mean of counts of bike on working day.

In [ ]:

In [170… df.loc[df['workingday']==1]['count'].plot(kind='kde')

<AxesSubplot:ylabel='Density'>
Out[170]:

In [171… #The distribution does not follows normal distribution

df1=df.loc[df['workingday']==1]['count'].reset_index()
df1.drop(['index'], axis=1, inplace=True)
df2=df.loc[df['workingday']==0]['count'].reset_index()
df2.drop(['index'], axis=1, inplace=True)
ttest,p_value=ttest_ind(df1,df2)
print("p_value = ",p_value)

p_value = [0.22644804]

Since the P value is greater than 0.05 hence null hypotheis has failed to reject.

So we can say that non non working day has no effect on counts of bike.

In [ ]:

Hypothesis Testing
Sample T-Test to check if Working Day has an effect on the number of electric cycles rented.ANNOVA to check if No. of cycles rented is similar or different in different 1. weather 2. season.Chi-square test to
check if Weather is dependent on the season.

1. Working Day has an effect on the number of electric cycles rented.

In [172… t_stat, p_value = levene(df["count"],df["workingday"])
p_value
alpha = 0.5

In [173… ttest_ind(df["count"], df["workingday"])

Ttest_indResult(statistic=109.95076974934595, pvalue=0.0)
Out[173]:

In [174… population_mean_count = df["count"].mean()

population_mean_count

191.57413191254824
Out[174]:

Select an appropriate test to check whether:

1. Working Day has effect on number of electric cycles r of cycles rented similar or different ented
2. No.in different seasons
3. No. of cycles rented similar or different in different weather
4. Weather is dependent on season (check between 2 predictor variable)

First 3 statements to check are having one Numerical variable i.e. Count and one Categorical_variable as working Day or seasons or Weather. So For these type of questions we use ttest or Anova i.e (Numeric,
catagorical) 4th one is both the categorical variables so use Chisquare or chi2_contingency test.

In [ ]:

In [175… #1.Working Day has effect on number of electric cycles rented

population_mean_count = df["count"].mean()
population_mean_count

191.57413191254824
Out[175]:

In [176… df_workingday_count = df[df["workingday"] == 1]["count"]

df_workingday_count.mean()

193.01187263896384
Out[176]:

In [177… df_non_workingday_count = df[df["workingday"] == 0]["count"]

df_non_workingday_count.mean()

188.50662061024755
Out[177]:

In [ ]:

Using Anova
In [178… #H0 = Working day does not have any effect on number of cycles rented.
#HA = Working day has an positive effect on number of cycles rented. i.e. mu1 > mu2
# We consider it to be Right Tailed
#Test Statistic and p_value
#We will consider alpha as 0.01 significance value. i.e 99% confidence
alpha = 0.05
f_stat, p_value = f_oneway(df_workingday_count,df_non_workingday_count)
print(f"Test statistic = {f_stat} pvalue = {p_value}")
if (p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to reject Null Hypothesis")

Test statistic = 1.4631992635777575 pvalue = 0.22644804226428558

Fail to reject Null Hypothesis

In [ ]:

Using ttest
In [179… #H0 = Working day does not have any effect on number of cycles rented.
#HA = Working day has an effect on number of cycles rented. mu1 > m2
# We consider it to be Righ Tailed.
#Test Statistic and p_value
#We will consider alpha as 0.01 significance value. i.e 99% confidence
alpha = 0.01
t_stat, p_value = ttest_ind(df_workingday_count,df_non_workingday_count, alternative = "greater")
print(f"Test statistic = {t_stat} pvalue = {p_value}")
if (p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to reject Null Hypothesis")

Test statistic = 1.2096277376026694 pvalue = 0.11322402113180674

Fail to reject Null Hypothesis

In [ ]:

2.No. of cycles rented similar or different in different seasons

In [180… #As we have 4 different seasons ttest will not work here. Need to use ANOVA #Using ANOVA

In [181… df_season1_spring = df[df["season"] == 1]["count"]

df_season1_spring_subset = df_season1_spring.sample(100)

df_season2_summer =df[df["season"] == 2]["count"]

df_season2_summer_subset = df_season2_summer.sample(100)

df_season3_fall = df[df["season"] == 3]["count"]

df_season3_fall_subset = df_season3_fall.sample(100)

df_season4_winter = df[df["season"] == 4]["count"]

df_season4_winter_subset = df_season4_winter.sample(100)

In [182… #We have taken samples of each dataframe to send it to shapiro as Shapiro test

checking for assumptions:

In [183… #Levene's Test

#H0 = All samples have equal variance
#HA = At least one sample will have different variance
t_stat, p_value = levene(df_season1_spring, df_season2_summer, df_season3_fall, df_season4_winter)
p_value

1.0147116860043298e-118
Out[183]:

Shapiro == Test for normality #We are taking samples of the available data. As it works well with (50 to 200)
values. So we have created subset of
each of 100 values.

In [184… #H0 = Sample is drawn from NormalDistribution

#HA = Sample is not from Normal Distribution
##Here we are considering alpha (significance value as ) 0.05
t_stat, pvalue = shapiro(df_season1_spring_subset)
if pvalue < 0.05:
print("Reject H0 Data is not Gaussian")
else:
print("Fail to reject Data is Gaussian")

Reject H0 Data is not Gaussian

In [185… t_stat, pvalue = shapiro(df_season2_summer_subset)

if pvalue < 0.05:
print("Reject H0 Data is not Gaussian")
else:
print("Fail to reject Data is Gaussian")

Reject H0 Data is not Gaussian

In [186… t_stat, pvalue = shapiro(df_season3_fall_subset)

if pvalue < 0.05:
print("Reject H0 Data is not Gaussian")
else:
print("Fail to reject Data is Gaussian")

Reject H0 Data is not Gaussian

In [187… t_stat, pvalue = shapiro(df_season4_winter_subset)

if pvalue < 0.05:
print("Reject H0 Data is not Gaussian")
else:
print("Fail to reject Data is Gaussian")

Reject H0 Data is not Gaussian

In all the above 4 test we got p_value almost 0.0 (like 10^-6 or so) which is less than alpha so we Reject the Null Hypothesis of these samples from Normal Distribution

In [ ]:

In [188… #H0 = season does not have any effect on number of cycles rented.
#HA = At least one season out of four (1:spring, 2:summer,3:fall, 4:winter) has an effect on number of cycles rented.
#Righ Tailed /Left/Two
#Test Statistic and p_value
#We will consider alpha as 0.01 significance value. i.e 99% confidence
alpha = 0.01
f_stat, p_value = f_oneway(df_season1_spring, df_season2_summer, df_season3_fall, df_season4_winter)
print(f"Test statistic = {f_stat} pvalue = {p_value}")
if (p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to reject Null Hypothesis")

Test statistic = 236.94671081032106 pvalue = 6.164843386499654e-149

Reject Null Hypothesis

In [ ]:

3.No. of cycles rented similar or different in different weather

In [ ]:

In [189… #As we have 4 different weather ttest will not work here. Need to use ANOVA

In [190… df_weather1_clear = df[df["weather"] == 1]["count"]

df_weather1_clear.mean()

205.23679087875416
Out[190]:

In [191… df_weather2_Mist = df[df["weather"] == 2]["count"]

df_weather2_Mist.mean()

178.95553987297106
Out[191]:

In [192… df_weather3_LightSnow = df[df["weather"] == 3]["count"]

df_weather3_LightSnow.mean()

118.84633294528521
Out[192]:

In [193… df_weather4_HeavyRain = df[df["weather"] == 4]["count"]

df_weather4_HeavyRain.mean()

164.0
Out[193]:

In [194… #levene's Test = It is chexking for variance

In [195… #H0 = All samples have equal variance

#HA = At least one sample will have different variance
t_stat, p_value = levene(df_weather1_clear, df_weather2_Mist, df_weather3_LightSnow, df_weather4_HeavyRain)
p_value

3.504937946833238e-35
Out[195]:

In [196… #Shapiro == Test for normality

In [197… #H0 = Sample is drawn from NormalDistribution

#HA = Sample is not from Normal Distribution
##Here we are considering alpha (significance value as ) 0.05
shapiro(df_weather1_clear)

ShapiroResult(statistic=0.8909225463867188, pvalue=0.0)
Out[197]:

In [198… shapiro(df_weather2_Mist)

ShapiroResult(statistic=0.8767688274383545, pvalue=9.781063280987223e-43)
Out[198]:

In [199… shapiro(df_weather3_LightSnow)

ShapiroResult(statistic=0.7674333453178406, pvalue=3.876134581802921e-33)
Out[199]:

In [ ]:

Using ANOVA
In [200… #H0 = weather does not have any effect on number of cycles rented.
#HA = At least one weather out of four (1: clear, 2: Mist, 3:Light snow, 4:Heavy Rain) has an effect on number of cycles re
#Righ Tailed /Left/Two
#Test Statistic and p_value
#We will consider alpha as 0.01 significance value. i.e 99% confidence
alpha = 0.01
f_stat, p_value = f_oneway(df_weather1_clear,df_weather2_Mist,df_weather3_LightSnow,df_weather4_HeavyRain)
print(f"Test statistic = {f_stat} pvalue = {p_value}")
if (p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to reject Null Hypothesis")

Test statistic = 65.53024112793271 pvalue = 5.482069475935669e-42

Reject Null Hypothesis

In [201… #H0 = weather does not have any effect on number of cycles rented.
#HA = At least one weather out of four (1: clear, 2: Mist, 3:Light snow, 4:Heavy Rain) has an effect on number of cycles re
#Righ Tailed /Left/Two
#Test Statistic and p_value
#We will consider alpha as 0.01 significance value. i.e 99% confidence
alpha = 0.01
f_stat, p_value = f_oneway(df_weather1_clear,df_weather2_Mist,df_weather3_LightSnow)
print(f"Test statistic = {f_stat} pvalue = {p_value}")
if (p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to reject Null Hypothesis")

Test statistic = 98.28356881946706 pvalue = 4.976448509904196e-43

Reject Null Hypothesis

As we can see he pvalue is very very very low and we are Rejecting Null Hypothesis becasue we see weather 4 having rent count negligible and clear and lightsnow have good number of bikes rented. So it
does impact and not all similar.

In [ ]:

4.Weather is dependent on season (check between 2 predictor variable)

Using chisquare_test
In [202… val = pd.crosstab(index = df["weather"], columns = df["season"])
print(val)
chisquare(val)

season 1 2 3 4
weather
1 1759 1801 1930 1702
2 715 708 604 807
3 211 224 199 225
4 1 0 0 0
Power_divergenceResult(statistic=array([2749.33581534, 2821.39590194, 3310.63995609, 2531.07388442]), pvalue=array([0., 0., 0., 0.]))
Out[202]:

In [203… val = pd.crosstab(index = df["weather"], columns = df["season"])

print(val)
chisquare(val)

In [ ]:

Using chi2_contigency test

In [204… #H0 = Weather is not dependent (Independent) on season.
#HA = Weather is dependent on Season
#Righ Tailed /Left/Two
#Test Statistic and p_value
#We will consider alpha as 0.01 significance value. i.e 99% confidence
alpha = 0.01
val = pd.crosstab(index = df["weather"], columns = df["season"])
#print(val)
chi_stat, p_value, df, confusion_matrix = chi2_contingency(val)
print(f"Test statistic = {chi_stat} pvalue = {p_value}") #degree of freedom (df) = {df}")
#print("The confusion matrix is :")
#print(confusion_matrix)
if (p_value < alpha):
print("Reject Null Hypothesis")
else:
print("Fail to reject Null Hypothesis")

Test statistic = 49.15865559689363 pvalue = 1.5499250736864862e-07

Reject Null Hypothesis

We reject NULL hypothesis that is Weather is independent from season at significance 0.01 we get that the p_value comes out to very low and These 2 attributes are strongly dependent on each other.

In [ ]:

Insights
A 2-sample T-test on working and non-working days with respect to count,implies that the mean population count of both categories are the same.

An ANOVA test on different seasons with respect to count,implies that population count means under different seasons are not the same, meaning there is a difference in the usage of Yulu bikes in different
seasons.

By performing an ANOVA test on different weather conditions except 4 with respect to count, we can infer that population count means under different weather conditions are the same, meaning there is a
difference in the usage of Yulu bikes in different weather conditions.

By performing a Chi2 test on season and weather (categorical variables), we can infer that there is an impact on weather dependent on season.

The maximum number of holidays can be seen during the fall and winter seasons.

There is a positive corelation between counts and temperature.

There is a negative corelation between counts and humidity.

More number of counts when weather is clear with less clouds, proved by annova hypothesis test.

In [ ]:

Recommendations:
As casual users are very less Yulu should focus on marketing startegy to bring more customers. for eg. first time user discount, friends and family discounts, referral bonuses etc.

On non working days as count is very low Yulu can think on the promotional activities like city exploration competition, some health campaigns etc.

In heavy rains as rent count is very low Yulu can introduce a different vehicle such as car or having shade or protection from that rain.

In [ ]:

Retail Analysis With Walmart Data
100% (10)
Retail Analysis With Walmart Data
2 pages
FRENCH Grade 1 Part 1 - Pupils Book
90% (10)
FRENCH Grade 1 Part 1 - Pupils Book
89 pages
Data Analysis and Visualization in R - Final Paper - Bike Sharing Dataset Analysis
No ratings yet
Data Analysis and Visualization in R - Final Paper - Bike Sharing Dataset Analysis
16 pages
Codes From Pages 3-11
100% (1)
Codes From Pages 3-11
13 pages
Yulu Business Case Study
No ratings yet
Yulu Business Case Study
42 pages
Data Analysis Dummy Report: 0. Data Import and Cleaning
No ratings yet
Data Analysis Dummy Report: 0. Data Import and Cleaning
1 page
Cac2 22112338
No ratings yet
Cac2 22112338
50 pages
Untitled 23
No ratings yet
Untitled 23
4 pages
Output
No ratings yet
Output
24 pages
Study B - 180303
No ratings yet
Study B - 180303
13 pages
Nikitha
No ratings yet
Nikitha
15 pages
Trip 20230529 060328
No ratings yet
Trip 20230529 060328
213 pages
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
6 pages
Practical 1
No ratings yet
Practical 1
6 pages
ML LAB Prob 1 5
No ratings yet
ML LAB Prob 1 5
22 pages
Answer Statictics
No ratings yet
Answer Statictics
165 pages
Bike Sharing Data Analysis
No ratings yet
Bike Sharing Data Analysis
24 pages
Study A - 180303
No ratings yet
Study A - 180303
14 pages
churn_V2
No ratings yet
churn_V2
15 pages
LAB-Skill Advanced Course Machine Learning With Python Experiments
No ratings yet
LAB-Skill Advanced Course Machine Learning With Python Experiments
23 pages
House Price Prediction
No ratings yet
House Price Prediction
1 page
explainable-ai-driven-rainfall-prediction-using-dl
No ratings yet
explainable-ai-driven-rainfall-prediction-using-dl
66 pages
Assignment2 VidulGarg
No ratings yet
Assignment2 VidulGarg
11 pages
air-quality-randomforest
No ratings yet
air-quality-randomforest
5 pages
ML Cops
No ratings yet
ML Cops
17 pages
Study D - 180303
No ratings yet
Study D - 180303
14 pages
ML#05
No ratings yet
ML#05
35 pages
Data Science Python Notebook (1)
No ratings yet
Data Science Python Notebook (1)
15 pages
Shared Bike Demand Analysis.ipynb
No ratings yet
Shared Bike Demand Analysis.ipynb
390 pages
Vertopal.com_457 Labs
No ratings yet
Vertopal.com_457 Labs
19 pages
Pandas - Removing Duplicates
No ratings yet
Pandas - Removing Duplicates
1 page
Institute of Technology Management & Research
No ratings yet
Institute of Technology Management & Research
10 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
ML Practical 1 Code
100% (1)
ML Practical 1 Code
1 page
Japneet's File
No ratings yet
Japneet's File
13 pages
Agartala STAAD-1
No ratings yet
Agartala STAAD-1
302 pages
AI Data Science Practical
No ratings yet
AI Data Science Practical
9 pages
vertopal.com_Delhivery
No ratings yet
vertopal.com_Delhivery
20 pages
uXEW8KdLT2tBVAk5YG4ciEBzcpaYvT
No ratings yet
uXEW8KdLT2tBVAk5YG4ciEBzcpaYvT
13 pages
Group-3 Report
No ratings yet
Group-3 Report
38 pages
019) Pandas - Batch 2 - Day 019 (FINAL DAY)
No ratings yet
019) Pandas - Batch 2 - Day 019 (FINAL DAY)
43 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
0903 Abdul Basit
No ratings yet
0903 Abdul Basit
4 pages
Ankita Jupyternotebook PDF
No ratings yet
Ankita Jupyternotebook PDF
13 pages
fds qb
No ratings yet
fds qb
6 pages
Data
No ratings yet
Data
31 pages
Using Dplyr To Group, Manipulate and Summarize Data
No ratings yet
Using Dplyr To Group, Manipulate and Summarize Data
9 pages
business-case-yulu-hypothesis-testing.ipynb - Colab
No ratings yet
business-case-yulu-hypothesis-testing.ipynb - Colab
4 pages
Lab Exercise 2-CS0017
No ratings yet
Lab Exercise 2-CS0017
17 pages
ML Practical 1
No ratings yet
ML Practical 1
15 pages
Bike Sharing by Hours
No ratings yet
Bike Sharing by Hours
1,047 pages
capstone_removed
No ratings yet
capstone_removed
17 pages
Traversing Dataframe Elements Using: Iterrows, Iteritems and Itertuples
No ratings yet
Traversing Dataframe Elements Using: Iterrows, Iteritems and Itertuples
8 pages
210430_PracticalWeek03a
No ratings yet
210430_PracticalWeek03a
1 page
AirBnb - Graphs and Explanation
No ratings yet
AirBnb - Graphs and Explanation
1 page
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
Innovative Assignment PDF
No ratings yet
Innovative Assignment PDF
11 pages
q1
No ratings yet
q1
2 pages
Dinesh DWDM CCE
No ratings yet
Dinesh DWDM CCE
17 pages
Introduction To Missing Data: Nicholas Tierney
No ratings yet
Introduction To Missing Data: Nicholas Tierney
30 pages
Profound Python Libraries
From Everand
Profound Python Libraries
Onder Teker
No ratings yet
Personal Statement
100% (1)
Personal Statement
2 pages
ICT For ESP
No ratings yet
ICT For ESP
9 pages
Brainwashing
100% (1)
Brainwashing
9 pages
Fire Warden
No ratings yet
Fire Warden
1 page
Frames Tables Forms HTML
No ratings yet
Frames Tables Forms HTML
27 pages
SSR of GDC Nandyal
No ratings yet
SSR of GDC Nandyal
284 pages
Language And Time 1st Edition Quentin Smith instant download
No ratings yet
Language And Time 1st Edition Quentin Smith instant download
80 pages
Archaeology, Anthropology, and The Culture Concept
No ratings yet
Archaeology, Anthropology, and The Culture Concept
13 pages
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
No ratings yet
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
5 pages
Dialogue and Discourse
No ratings yet
Dialogue and Discourse
6 pages
Planning For Assessment Matrix
No ratings yet
Planning For Assessment Matrix
2 pages
Permiability Test Aim of The Experiment
No ratings yet
Permiability Test Aim of The Experiment
7 pages
Long-Run Economic Growth: An Interdisciplinary Approach: Aykut Kibritcioglu and Selahattin Dibooglu
No ratings yet
Long-Run Economic Growth: An Interdisciplinary Approach: Aykut Kibritcioglu and Selahattin Dibooglu
12 pages
Wams Part1 Ver2
No ratings yet
Wams Part1 Ver2
64 pages
Jrotc 2015-16 Master Training Schedule
No ratings yet
Jrotc 2015-16 Master Training Schedule
10 pages
Specification of The Learning Outcomes For Muet Test Specification
No ratings yet
Specification of The Learning Outcomes For Muet Test Specification
8 pages
Contemporary Algorithms Theory And Applications Christopher I Argyros pdf download
No ratings yet
Contemporary Algorithms Theory And Applications Christopher I Argyros pdf download
86 pages
Olympus Ck2 Brochure
No ratings yet
Olympus Ck2 Brochure
6 pages
Roger Bacon
No ratings yet
Roger Bacon
12 pages
The 3-In-1 Implementation Workbook Integrating ISO 9001 ISO 14001 OHSAS 18001 Management Systems
No ratings yet
The 3-In-1 Implementation Workbook Integrating ISO 9001 ISO 14001 OHSAS 18001 Management Systems
12 pages
Robotics and Vision System
No ratings yet
Robotics and Vision System
64 pages
Bautista, Cristel V. Engr. Mark Anthony Castro Bs Ece - 2B February 11, 2020
No ratings yet
Bautista, Cristel V. Engr. Mark Anthony Castro Bs Ece - 2B February 11, 2020
2 pages
Quiz Question1
No ratings yet
Quiz Question1
16 pages
Healing Power of Your Chakras: Unlocking The
50% (4)
Healing Power of Your Chakras: Unlocking The
13 pages
Defining The Marketing Research Problem and Developing An Approach
100% (2)
Defining The Marketing Research Problem and Developing An Approach
21 pages
The Strategies of Tiangge Sellers To The Purchasin PDF
No ratings yet
The Strategies of Tiangge Sellers To The Purchasin PDF
17 pages
C Interview Questions and Answers For Experienced
No ratings yet
C Interview Questions and Answers For Experienced
12 pages
Fixed - Bed Column Studies in Laboratory and Pilot Scale
No ratings yet
Fixed - Bed Column Studies in Laboratory and Pilot Scale
1 page
A Bibliography of Philippine Language Dictionaries and Vocabularies 1991 PDF
No ratings yet
A Bibliography of Philippine Language Dictionaries and Vocabularies 1991 PDF
174 pages