0% found this document useful (0 votes)
48 views

Your First Neural Network

This document describes building a neural network to predict daily bike rentals. It loads and prepares the bike rental dataset, which contains hourly rental counts over two years. Categorical variables are converted to dummy variables. The target variables are scaled for training. The data is split into training, validation, and test sets. The neural network has an input layer, hidden layer with sigmoid activations, and output layer. The student is tasked with implementing the forward and backward passes through the network, including calculating the derivative of the output activation function, and setting hyperparameters like the learning rate and number of hidden units.

Uploaded by

Una Si Ndéso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Your First Neural Network

This document describes building a neural network to predict daily bike rentals. It loads and prepares the bike rental dataset, which contains hourly rental counts over two years. Categorical variables are converted to dummy variables. The target variables are scaled for training. The data is split into training, validation, and test sets. The neural network has an input layer, hidden layer with sigmoid activations, and output layer. The student is tasked with implementing the forward and backward passes through the network, including calculating the derivative of the output activation function, and setting hyperparameters like the learning rate and number of hidden units.

Uploaded by

Una Si Ndéso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

9/7/22, 9:45 PM Your_first_neural_network

Fisseha Berhane, PhD


Data Scientist

Resume (../resume.pdf)
(https://www.linkedin.com/in/fisseha-berhane-phd-543b5717?
trk=nav_responsive_tab_profile) (https://github.com/fissehab?tab=repositories)

(https://twitter.com/FishBerhane)

(https://scholar.google.com/citations?user=zBPe3MkAAAAJ&hl=en)

Home (../index.html)
Education (../education.html)
Publications (../publications.html)

Python & Spark (../Python/pythonindex.html)


R & SparkR (../R/Rindex.html)
Data

Science Certificates & Skills (../certificates.html) Deep Learning (dlindex.html) BI Tools

SQL
Hadoop (../Hadoop/hadoop.html) Miscellaneous

(../Miscellaneous/miscellaneous.html)

Prediction of daily bike rental ridership with neural


network
In this project, you'll build your first neural network and use it to predict daily bike rental ridership. We've
provided some of the code, but left the implementation of the neural network up to you (for the most part).
After you've submitted this project, feel free to explore the data and the model more.

%matplotlib inline

%load_ext autoreload

%autoreload 2

%config InlineBackend.figure_format = 'retina'

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

Load and prepare the data


A critical step in working with neural networks is preparing the data correctly. Variables on different scales
make it difficult for the network to efficiently learn the correct weights. Below, we've written the code to load
and prepare the data. You'll learn more about this soon!

data_path = 'Bike-Sharing-Dataset/hour.csv'

rides = pd.read_csv(data_path)

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 1/15
9/7/22, 9:45 PM Your_first_neural_network

rides.head()

instant dteday season yr mnth hr holiday weekday workingday weathersit te

2011-
0 1 1 0 1 0 0 6 0 1 0.
01-01

2011-
1 2 1 0 1 1 0 6 0 1 0.
01-01

2011-
2 3 1 0 1 2 0 6 0 1 0.
01-01

2011-
3 4 1 0 1 3 0 6 0 1 0.
01-01

2011-
4 5 1 0 1 4 0 6 0 1 0.
01-01

Checking out the data


This dataset has the number of riders for each hour of each day from January 1 2011 to December 31 2012.
The number of riders is split between casual and registered, summed up in the cnt column. You can see the
first few rows of the data above.

Below is a plot showing the number of bike riders over the first 10 days or so in the data set. (Some days
don't have exactly 24 entries in the data set, so it's not exactly 10 days.) You can see the hourly rentals here.
This data is pretty complicated! The weekends have lower over all ridership and there are spikes when
people are biking to and from work during the week. Looking at the data above, we also have information
about temperature, humidity, and windspeed, all of these likely affecting the number of riders. You'll be trying
to capture all this with your model.

rides[:24*10].plot(x='dteday', y='cnt')

<matplotlib.axes._subplots.AxesSubplot at 0x8a3f2b0>

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 2/15
9/7/22, 9:45 PM Your_first_neural_network

Dummy variables
Here we have some categorical variables like season, weather, month. To include these in our model, we'll
need to make binary dummy variables. This is simple to do with Pandas thanks to get_dummies().

dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']

for each in dummy_fields:

dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)

rides = pd.concat([rides, dummies], axis=1)

fields_to_drop = ['instant', 'dteday', 'season', 'weathersit',

'weekday', 'atemp', 'mnth', 'workingday', 'hr']

data = rides.drop(fields_to_drop, axis=1)

data.head()

yr holiday temp hum windspeed casual registered cnt season_1 season_2 ...

0 0 0 0.24 0.81 0.0 3 13 16 1 0 ...

1 0 0 0.22 0.80 0.0 8 32 40 1 0 ...

2 0 0 0.22 0.80 0.0 5 27 32 1 0 ...

3 0 0 0.24 0.75 0.0 3 10 13 1 0 ...

4 0 0 0.24 0.75 0.0 0 1 1 1 0 ...

5 rows × 59 columns

Scaling target variables


To make training the network easier, we'll standardize each of the continuous variables. That is, we'll shift and
scale the variables such that they have zero mean and a standard deviation of 1.

The scaling factors are saved so we can go backwards when we use the network for predictions.

quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed']

# Store scalings in a dictionary so we can convert back later

scaled_features = {}
for each in quant_features:

mean, std = data[each].mean(), data[each].std()

scaled_features[each] = [mean, std]

data.loc[:, each] = (data[each] - mean)/std

Splitting the data into training, testing, and validation sets


We'll save the data for the last approximately 21 days to use as a test set after we've trained the network.
We'll use this set to make predictions and compare them with the actual number of riders.

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 3/15
9/7/22, 9:45 PM Your_first_neural_network

# Save data for approximately the last 21 days

test_data = data[-21*24:]

# Now remove the test data from the data set

data = data[:-21*24]

# Separate the data into features and targets

target_fields = ['cnt', 'casual', 'registered']

features, targets = data.drop(target_fields, axis=1), data[target_fields]

test_features, test_targets = test_data.drop(target_fields, axis=1), test_data[target_fi


elds]

We'll split the data into two sets, one for training and one for validating as the network is being trained. Since
this is time series data, we'll train on historical data, then try to predict on future data (the validation set).

# Hold out the last 60 days or so of the remaining data as a validation set

train_features, train_targets = features[:-60*24], targets[:-60*24]

val_features, val_targets = features[-60*24:], targets[-60*24:]

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 4/15
9/7/22, 9:45 PM Your_first_neural_network

Time to build the network


Below you'll build your network. We've built out the structure. You'll implement both the forward pass and
backwards pass through the network. You'll also set the hyperparameters: the learning rate, the number of
hidden units, and the number of training passes.

The network has two layers, a hidden layer and an output layer. The hidden layer will use the sigmoid
function for activations. The output layer has only one node and is used for the regression, the output of the
node is the same as the input of the node. That is, the activation function is f (x) = x . A function that takes
the input signal and generates an output signal, but takes into account the threshold, is called an activation
function. We work through each layer of our network calculating the outputs for each neuron. All of the
outputs from one layer become inputs to the neurons on the next layer. This process is called forward
propagation.

We use the weights to propagate signals forward from the input to the output layers in a neural network. We
use the weights to also propagate error backwards from the output back into the network to update our
weights. This is called backpropagation.

Hint: You'll need the derivative of the output activation function (f (x) = x ) for the
backpropagation implementation. If you aren't familiar with calculus, this function is equivalent
to the equation y = x . What is the slope of that equation? That is the derivative of f (x) .

Below, you have these tasks:

1. Implement the sigmoid function to use as the activation function. Set self.activation_function
in __init__ to your sigmoid function.
2. Implement the forward pass in the train method.
3. Implement the backpropagation algorithm in the train method, including calculating the output
error.
4. Implement the forward pass in the run method.

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 5/15
9/7/22, 9:45 PM Your_first_neural_network

class NeuralNetwork(object):

def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate):

# Set number of nodes in input, hidden and output layers.

self.input_nodes = input_nodes

self.hidden_nodes = hidden_nodes

self.output_nodes = output_nodes

# Initialize weights

self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5,

(self.input_nodes, self.hidden_nodes))

self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5,

(self.hidden_nodes, self.output_nodes))

self.lr = learning_rate

#### TODO: Set self.activation_function to your implemented sigmoid function ###


#

# Note: in Python, you can define a function with a lambda expression,

# as shown below.

self.activation_function = lambda x : 1/(1 + np.exp(-x)) # Replace 0 with your


calculation.

### If the lambda code above is not something you're familiar with,

# You can uncomment out the following three lines and put your

# implementation there instead.

def sigmoid(x):

return 1/(1 + np.exp(-x)) # Replace 0 with your sigmoid calculation here

self.activation_function = sigmoid

def train(self, features, targets):

''' Train the network on batch of features and targets.

Arguments

---------

features: 2D array, each row is one data record, each column is a feature

targets: 1D array of target values

'''

n_records = features.shape[0]

delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape)

delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape)

for X, y in zip(features, targets):

final_outputs, hidden_outputs = self.forward_pass_train(X) # Implement the


forward pass function below

# Implement the backproagation function below

delta_weights_i_h, delta_weights_h_o = self.backpropagation(final_outputs, h


idden_outputs, X, y,

delta_weights_i_
h, delta_weights_h_o)

self.update_weights(delta_weights_i_h, delta_weights_h_o, n_records)

def forward_pass_train(self, X):

''' Implement forward pass here

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 6/15
9/7/22, 9:45 PM Your_first_neural_network

Arguments

---------

X: features batch

'''

#### Implement the forward pass here ####

### Forward pass ###

# TODO: Hidden layer - Replace these values with your calculations.

hidden_inputs = np.matmul(X, self.weights_input_to_hidden) # signals into hidden


layer

hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden l


ayer

hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden l


ayer

# TODO: Output layer - Replace these values with your calculations.

final_inputs = np.matmul(hidden_outputs, self.weights_hidden_to_output)

final_outputs = final_inputs

return final_outputs, hidden_outputs

def backpropagation(self, final_outputs, hidden_outputs, X, y, delta_weights_i_h, de


lta_weights_h_o):

''' Implement backpropagation

Arguments

---------

final_outputs: output from forward pass

y: target (i.e. label) batch

delta_weights_i_h: change in weights from input to hidden layers

delta_weights_h_o: change in weights from hidden to output layers

'''

#### Implement the backward pass here ####

### Backward pass ###

# TODO: Output error - Replace this value with your calculations.

error = y - final_outputs # Output layer error is the difference between desired


target and actual output.

# TODO: Backpropagated error terms - Replace these values with your calculation
s.

output_error_term = error

# TODO: Calculate the hidden layer's contribution to the error

hidden_error = np.dot(self.weights_hidden_to_output, output_error_term)

hidden_error_term = hidden_error.T * (hidden_outputs * (1 - hidden_outputs))

# Weight step (input to hidden)

delta_weights_i_h += X.reshape(-1,1) * hidden_error_term

# Weight step (hidden to output)

delta_weights_h_o += hidden_outputs.reshape(-1,1) * output_error_term

return delta_weights_i_h, delta_weights_h_o

def update_weights(self, delta_weights_i_h, delta_weights_h_o, n_records):

''' Update weights on gradient descent step

Arguments

---------

delta_weights_i_h: change in weights from input to hidden layers

delta_weights_h_o: change in weights from hidden to output layers

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 7/15
9/7/22, 9:45 PM Your_first_neural_network

n_records: number of records

'''

self.weights_hidden_to_output += self.lr*delta_weights_h_o/n_records # update hi


dden-to-output weights with gradient descent step

self.weights_input_to_hidden += self.lr*delta_weights_i_h/n_records # update in


put-to-hidden weights with gradient descent step

def run(self, features):

''' Run a forward pass through the network with input features

Arguments

---------

features: 1D array of feature values

'''

#### Implement the forward pass here ####

# TODO: Hidden layer - replace these values with the appropriate calculations.

hidden_inputs = np.matmul(features, self.weights_input_to_hidden) # signals into


hidden layer

hidden_outputs = self.activation_function(hidden_inputs) # signals from hidden l


ayer

# TODO: Output layer - Replace these values with the appropriate calculations.

final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) # signals i


nto final output layer

final_outputs = final_inputs # signals from final output layer

return final_outputs

#########################################################

# Set your hyperparameters here

##########################################################

iterations = 10000

learning_rate = 0.5

hidden_nodes = 19

output_nodes = 1

from my_answers import NeuralNetwork

def MSE(y, Y):

return np.mean((y-Y)**2)

Unit tests
Run these unit tests to check the correctness of your network implementation. This will help you be sure your
network was implemented correctly befor you starting trying to train it. These tests must all be successful to
pass the project.

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 8/15
9/7/22, 9:45 PM Your_first_neural_network

import unittest

inputs = np.array([[0.5, -0.2, 0.1]])

targets = np.array([[0.4]])

test_w_i_h = np.array([[0.1, -0.2],

[0.4, 0.5],

[-0.3, 0.2]])

test_w_h_o = np.array([[0.3],

[-0.1]])

class TestMethods(unittest.TestCase):

##########

# Unit tests for data loading

##########

def test_data_path(self):

# Test that file path to dataset has been unaltered

self.assertTrue(data_path.lower() == 'bike-sharing-dataset/hour.csv')

def test_data_loaded(self):

# Test that data frame loaded

self.assertTrue(isinstance(rides, pd.DataFrame))

##########

# Unit tests for network functionality

##########

def test_activation(self):

network = NeuralNetwork(3, 2, 1, 0.5)

# Test that the activation function is a sigmoid

self.assertTrue(np.all(network.activation_function(0.5) == 1/(1+np.exp(-0.5))))

def test_train(self):

# Test that weights are updated correctly on training

network = NeuralNetwork(3, 2, 1, 0.5)

network.weights_input_to_hidden = test_w_i_h.copy()

network.weights_hidden_to_output = test_w_h_o.copy()

network.train(inputs, targets)

self.assertTrue(np.allclose(network.weights_hidden_to_output,

np.array([[ 0.37275328],

[-0.03172939]])))

self.assertTrue(np.allclose(network.weights_input_to_hidden,

np.array([[ 0.10562014, -0.20185996],

[0.39775194, 0.50074398],

[-0.29887597, 0.19962801]])))

def test_run(self):

# Test correctness of run method

network = NeuralNetwork(3, 2, 1, 0.5)

network.weights_input_to_hidden = test_w_i_h.copy()

network.weights_hidden_to_output = test_w_h_o.copy()

self.assertTrue(np.allclose(network.run(inputs), 0.09998924))

suite = unittest.TestLoader().loadTestsFromModule(TestMethods())

unittest.TextTestRunner().run(suite)

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 9/15
9/7/22, 9:45 PM Your_first_neural_network

.....

----------------------------------------------------------------------

Ran 5 tests in 0.005s

OK

<unittest.runner.TextTestResult run=5 errors=0 failures=0>

Training the network


Here you'll set the hyperparameters for the network. The strategy here is to find hyperparameters such that
the error on the training set is low, but you're not overfitting to the data. If you train the network too long or
have too many hidden nodes, it can become overly specific to the training set and will fail to generalize to the
validation set. That is, the loss on the validation set will start increasing as the training set loss drops.

You'll also be using a method know as Stochastic Gradient Descent (SGD) to train the network. The idea is
that for each training pass, you grab a random sample of the data instead of using the whole data set. You
use many more training passes than with normal gradient descent, but each pass is much faster. This ends
up training the network more efficiently. You'll learn more about SGD later.

Choose the number of iterations


This is the number of batches of samples from the training data we'll use to train the network. The more
iterations you use, the better the model will fit the data. However, this process can have sharply diminishing
returns and can waste computational resources if you use too many iterations. You want to find a number
here where the network has a low training loss, and the validation loss is at a minimum. The ideal number of
iterations would be a level that stops shortly after the validation loss is no longer decreasing.

Choose the learning rate


This scales the size of weight updates. If this is too big, the weights tend to explode and the network fails to fit
the data. Normally a good choice to start at is 0.1; however, if you effectively divide the learning rate by
n_records, try starting out with a learning rate of 1. In either case, if the network has problems fitting the data,
try reducing the learning rate. Note that the lower the learning rate, the smaller the steps are in the weight
updates and the longer it takes for the neural network to converge.

Choose the number of hidden nodes


In a model where all the weights are optimized, the more hidden nodes you have, the more accurate the
predictions of the model will be. (A fully optimized model could have weights of zero, after all.) However, the
more hidden nodes you have, the harder it will be to optimize the weights of the model, and the more likely it
will be that suboptimal weights will lead to overfitting. With overfitting, the model will memorize the training
data instead of learning the true pattern, and won't generalize well to unseen data.

Try a few different numbers and see how it affects the performance. You can look at the losses dictionary for
a metric of the network performance. If the number of hidden units is too low, then the model won't have
enough space to learn and if it is too high there are too many options for the direction that the learning can
take. The trick here is to find the right balance in number of hidden units you choose. You'll generally find that
the best number of hidden nodes to use ends up being between the number of input and output nodes.

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 10/15
9/7/22, 9:45 PM Your_first_neural_network

import sys

####################

### Set the hyperparameters in you myanswers.py file ###

####################

from my_answers import iterations, learning_rate, hidden_nodes, output_nodes

N_i = train_features.shape[1]

network = NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate)

losses = {'train':[], 'validation':[]}

for ii in range(iterations):

# Go through a random batch of 128 records from the training data set

batch = np.random.choice(train_features.index, size=128)

X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt']

network.train(X, y)

# Printing out the training progress

train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values)

val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values)

sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \

+ "% ... Training loss: " + str(train_loss)[:5] \

+ " ... Validation loss: " + str(val_loss)[:5])

sys.stdout.flush()

losses['train'].append(train_loss)

losses['validation'].append(val_loss)

Progress: 0.1% ... Training loss: 3.941 ... Validation loss: 5.640
C:\Users\fberhane\AppData\Local\Continuum\anaconda3\lib\site-packages\ipyke
rnel\__main__.py:17: DeprecationWarning:

.ix is deprecated. Please use

.loc for label based indexing or

.iloc for positional indexing

See the documentation here:

http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-dep
recated (http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-index
er-is-deprecated)

Progress: 100.0% ... Training loss: 0.051 ... Validation loss: 0.151

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 11/15
9/7/22, 9:45 PM Your_first_neural_network

plt.plot(losses['train'], label='Training loss')

plt.plot(losses['validation'], label='Validation loss')

plt.legend()

_ = plt.ylim()

Check out your predictions


Here, use the test data to view how well your network is modeling the data. If something is completely wrong
here, make sure each step in your network is implemented correctly.

fig, ax = plt.subplots(figsize=(8,4))

mean, std = scaled_features['cnt']

predictions = network.run(test_features).T*std + mean

ax.plot(predictions[0], label='Prediction')

ax.plot((test_targets['cnt']*std + mean).values, label='Data')

ax.set_xlim(right=len(predictions))

ax.legend()

dates = pd.to_datetime(rides.ix[test_data.index]['dteday'])

dates = dates.apply(lambda d: d.strftime('%b %d'))

ax.set_xticks(np.arange(len(dates))[12::24])

_ = ax.set_xticklabels(dates[12::24], rotation=45)

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 12/15
9/7/22, 9:45 PM Your_first_neural_network

ALSO ON HTTP://DATASCIENCE-ENTHUSIAST.COM

Face Recognition for Spark R


Benefits the Happy House DBSCAN Vs K-means DataFra
4 years ago • 10 comments 4 years ago • 2 comments 4 years ago • 1 comment 5 years ag

Recently, I took hands-on, Face Recognition for the For Kmeans clustering to This is th
performance-based Happy House¶ Welcome to work well, the following on the S
certification for Spark on … the first assignment of … assumptions have to hold … DataFram

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 13/15
9/7/22, 9:45 PM Your_first_neural_network

Sponsored

Man Who Made $95 million in 2008 Reveals New Prediction


VisionaryProfit

Apakah Anda lelah karena kelebihan berat badan? Coba ini.


feed-app

Wanita tercantik di Indonesia tahun 2021


BestFamilyMag

Suami ilustrasikan kehidupan sehari-hari bersama istri dalam 22 gambar, usahakan jangan
menangis!
5minstory.com

Always Place a Plastic Bottle on Your Tires, Here's Why


Rich Houses

Exotic Bra and Panty Sets to Boost Your Confidence (Take a Look)
Bra and Panty Sets

0 Comments http://datascience-enthusiast.com 🔒 Disqus' Privacy Policy 


1 Login

 Favorite t Tweet f Share Sort by Best

Start the discussion…

LOG IN WITH
OR SIGN UP WITH DISQUS ?

Name

Be the first to comment.

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 14/15
9/7/22, 9:45 PM Your_first_neural_network

✉ Subscribe d Add Disqus to your siteAdd DisqusAdd ⚠ Do Not Sell My Data

Sponsored

Man Who Made $95 million in 2008 Reveals New Prediction


VisionaryProfit

Apakah Anda lelah karena kelebihan berat badan? Coba ini.


feed-app

Rise in Identity-Based Attacks Drives Demand for a New Security Approach


CSOonline.com

Wanita tercantik di Indonesia tahun 2021


BestFamilyMag

The Hidden Secret for Making Men Look Slim Is Finally Out!
The Super Shaper

Always Place a Plastic Bottle on Your Tires, Here's Why


Rich Houses

https://datascience-enthusiast.com/DL/bike_prediction_nn.html 15/15

You might also like