0% found this document useful (0 votes)
90 views

Cabs Availability Prediction Using Deep Learning: Project Member

Uploaded by

Ateeq Javed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Cabs Availability Prediction Using Deep Learning: Project Member

Uploaded by

Ateeq Javed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Cabs availability Prediction using Deep Learning

Project Member

Ateeq Javed
3005-FBAS/BSSE-F15

Supervisor:
Dr. Jamal Abdul Nasir
(Assistant Professor)

Department of Computer Science and Software


Engineering Faculty of Basic and Applied Sciences
International Islamic University
Islamabad 2019
“In the Name of Allah, the Most Beneficent, the Most Merciful”
Final Approval

It is certified that I have read the project report title "Cabs availability Prediction using
Deep Learning" Submitted by Ateeq Javed Registration No. 3005- FBAS/BSSE/F15-
B. It is my judgement that this project is of enough standard to merit its acceptance by
the International Islamic University Islamabad for the Bachelor’s Degree of software
engineering.

OMMITTEE:
C

Internal Examiner:
Mr.
Assistant Professor
Department of Computer Science and Software
Engineering, FBAS, International Islamic University,
Islamabad.

External Examiner:
Mr.
Assistant Professor
Department of Computer Science and Software
Engineering, FBAS, International Islamic University,
Islamabad.

Supervisor:

Dr. Jamal Abdul Nasir


Assistant Professor
Department of Computer Science and Software
Engineering, FBAS, International Islamic University,
Islamabad.
Cabs availability prediction using deep learning Dedication

Dedication

I would dedicate this humble effort to my beloved parents and family for their endless
support, affections, trust and encouragement. With completion of my final year project
after dedicating this accomplishment and effort to my lovely parents who have
constantly supported and provided all the possible facilities that I could complete my
studies and the project.

i
Cabs availability prediction using deep learning learning Dissertation

A Dissertation submitted to
DEPARTMENT OF COMPUTER SCIENCE

AND
SOFTWARE ENGINEERING
INTERNATIONAL ISLAMIC UNIVERSITY, ISLAMABAD
As partial fulfilment of the requirement of bachelor’s degree in
Computer Science

ii
Cabs availability prediction using deep learning Declaration

Declaration

I hereby declare that the development of this project and its report is thoroughly
based on my personal efforts and learning accomplished under the absolute support
and guidance of the assigned supervisor Dr. Jamal Abdul Nasir. Not a single part
of this work appeared in this report has been submitted for any other degree or any
other University or any other educational institute. I further declare that this
project, all code, and associated documents and reports are submitted as partial
requirements for the degree of Bachelor of Science in Software engineering.

Ateeq javed

3005-FBAS/BSSEF15

iii
Cabs availability prediction using deep learning Acknowledgement

Acknowledgment

All praise to Allah Almighty for his blessings in completing this project and
blessings upon our last Prophet Muhammad (PBUH).

I would like to pay my sincere gratitude to my supervisor Dr. Jamal Abdul


Nasir for his meaningful and continuous support and motivation in my project. I
am thankful of his brilliant supervision throughout the completion of the project.

I would also like to appreciate teaching methods of all my teachers


throughout my degree. I am thankful to him whose guidance was efficient and our
classmates who helped us when we needed them.

Finally, I would like to thank everybody who was important to the successful
realization of project as well as expressing my apology that we could not mention
personally one by one.

Ateeq javed
3005-FBAS/BSSEF15

iv
Cabs availability prediction using deep learning Abstract

Project in Brief

Project Title Cabs availability Prediction


Using Deep Learning

Member Ateeq javed

Supervised by Dr. Jamal Abdul Nasir


Assistant Professor
DCS & SE, IIUI
Date Started March 2019

Date Completed November 2019

Language/Technology Python
Deep Learning
RNN
anaconda, jupyter notebook
pandas, numpy
System Used Dell latitude E6230
8 GB RAM
Core i75Processor
AMD RAEDON HD 8500M

Operating System Ubuntu 18.04

Project Title Cabs availability Prediction


Using Deep Learning

v
Abstract

There are roughly millions of taxi rides in china city Shenzhen. The
demand for taxi cabs is growing rapidly, especially in large cities. Exploiting an
understanding of taxi supply and demand could increase the efficiency of the
city’s taxi system. In the Shenzhen city of china, people use taxi in a frequency
much higher than any other cities. Instead of booking a taxi by phone one day
ahead of time, Shenzhen taxi drivers pick up passengers on street. The ability to
predict taxi ridership could present valuable insights to city planners and taxi
dispatchers in answering questions such as how to position cabs where they are
most needed, how many taxis to dispatch, and how ridership varies over time.
Our project focuses on predicting the number of taxi pickups and predicting
where the traffic is slow, fast and congested and at which place the taxi demand
is more using taxi global positioning system.

Deep learning is an AI strategy that instructs PCs to do what falls into place
without any issues for people: learn by model. Deep learning is a key innovation
behind driver less autos, empowering them to perceive a stop sign, or to recognize a
person on foot from a lamppost. It is the way to voice control in shopper gadgets like
telephones, tablets, TVs, and without hands speakers. Deep learning is getting bunches
of consideration of late and in light of current circumstances. It's accomplishing results
that were unrealistic previously. In deep learning models are trained by using a large
set of labeled data and neural network architectures that contain many layers
The aim of this project is to assist passengers where taxi will come or where taxi will
go. RNN model will be used to predict the availability from a given data set. This is done
by LSTM Long Short Term Memory by making an LSTM model and then we will
train the LSTM model through historical previous taxi data of shenzen city that will
predict the next destination of taxi.

vi
Table of contents

Contents
CHAPTER 1 ..........................................................................................................................................1
INTRODUCTION .................................................................................................................................1
1. 1INTRODUCTION...................................................................................................................2
1.1 Deep Learning .................................................................................................................................. 3
1.2 Need of project ................................................................................................................................. 3
1.3 Scope of project ................................................................................................................................ 3
1.4 Objectives .......................................................................................................................................... 4
Chapter 2 ................................................................................................................................. 5
Literature Review ................................................................................................................................5
2. Literature Review ..................................................................................................................................6
2.1 Deep Learning .................................................................................................................................. 6
2.1Neural Network ............................................................................................................................7
2.2.1 Layers of neural network ..............................................................................................................7
2.2.2 Neuron ............................................................................................................................................8
2.3 Types of neural networks ............................................................................................................... 11
2.4 Artificial neural networks……………………………………………………………………..11
2.4.1 Feed Forward propagation(ANN)..................................................................................................11
2.4.2 Recurrent neural networks (RNN).............................................................................................12
2.4.3 Long short term memory (LSTM) ................................................................................................13
Chapter 3 ..............................................................................................................................................16
Methodology.........................................................................................................................................16
3. Methodology ..........................................................................................................................................17
3.1 Recurrent Neural Network .............................................................................................................. 17
3.2 steps of methodologies ......................................................................................................................18
3.3 Block diagram ...................................................................................................................................19
3.4 Input ...................................................................................................................................................19
3.5 pre processing ....................................................................................................................................21
3.6 Normailzation ....................................................................................................................................21

vii
Table of contents

3.7 Building model .................................................................................................................................21


3.8 compiling model ...............................................................................................................................21
3.9 Training model...................................................................................................................................21

Chapter 4 ..............................................................................................................................................22
Results & Discussions..........................................................................................................................22
4Results .................................................................................................................................................23
4.1 Results .............................................................................................................................................. 23
4.2 Results when train model ................................................................................................................ 23
4.3 Data ................................................................................................................................................... 23
4.4 column description ......................................................................................................................... 24
4.5 Data analysis .................................................................................................................................... 26
4.6 Data pre processing ......................................................................................................................... 29
4.7 Missing values ................................................................................................................................ 30
4.8 Feature engineering ......................................................................................................................... 30

Chapter 5 ............................................................................................................................... 33
5.1 Use Case Diagram........................................................................................................... 34
5.2 Use Cases ....................................................................................................................................... 35

5.1.2Use Case Make Data set………………………………………………………………………35

5.1.3Use Case Train Model……………………………………………………………………….. 36


System sequence Diagram .................................................................................................... 37

5.2.1Make Data set…………………………………………………………………………….…..37

5.2.2 System Sequence Diagram …………………………………………………………………..38

Chapter 6 ............................................................................................................................... 40

conclusions ............................................................................................................................................40

vii
Table of contents
Chapter 7 ............................................................................................................................... 40

References ............................................................................................................................... 40

vii
List of tables and figures

List of Tables
Table 4.1: Cab Prediction Data Summary (Test)…………………………………………………24

Table 4.2: Cab Prediction Data Summary (Train)………………………………………………..24

Table 4.3: Cab Prediction Data Summary………………………………………………………..27

Table 4.4: Cab Prediction Data Summary…………………………………………………..…...28

Table 4.5: Number of Missing Value (Train)…………………………………………………….29

List of Figures

Figure No. 2.1: Comparison b/w performance of old learning and deep learning algorithm ........... 7
Figure No. 2.2: Layers in Neural Network ....................................................................................... 8
Figure No. 2.3: Neuron function........................................................................................................8
Figure No. 2.4: Linear Function ........................................................................................................ 9
Figure No. 2.5: Step Function ........................................................................................................... 9
Figure No. 2.6: Sigmoid Function ................................................................................................... 10
Figure No. 2.7: Tanh function ......................................................................................................... 10
Figure No. 2.8: Relu Function ......................................................................................................... 11
Figure No. 2.9: Single layer feed forward network ......................................................................... 12
Figure No. 2.10: Multi layer feed forward network ........................................................................ 12
Figure No. 2.11: Recurrent neural network LSTM ......................................................................... 13
Figure No. 2.12: Artificial neural network LSTM .......................................................................... 13
Figure No. 2.13: long-short term memory ...................................................................................... 14
Figure No. 2.14: Pooling ................................................................................................................. 14
Figure No. 3.1: Block Diagram ....................................................................................................... 17
Figure No. 3.2: Input image............................................................................................................. 18
Figure No. 4.1 Scatter matrix of continuous variable of a Cab Prediction Dataset ……………25
Figure No. 4.2 : Column Data Type ( Train & Test)……………………………………………..27
Figure No. 4.3 : Log Distribution of Fare Amount………………………………………………29
List of tables and figures
List of abbreviations

LIST OF ABBREVIATIONS

AI Artificial Intelligence

ML Machine Learning

DL Deep Learning

RNN Recurrent Neural Network

CNN Convolutional Neural Network

DBN Deep belief networks

DNN Deep neural networks

RELU Rectified linear unit

FFNN Feed Forward Neural Networks

LSTM Long short-term Networks

NLP Natural Language Processing

x
Chapter 1 Introduction

CHAPTER 1
Introduction

Cabs availability prediction using deep learning 1


chapter 1 introduction

1.INTRODUCTION

There are roughly millions of taxi rides in china city Shenzhen. The demand for taxi
cabs is growing rapidly, especially in large cities. Exploiting an understanding of taxi
supply and demand could increase the efficiency of the city’s taxi system. In the
Shenzhen city of china, people use taxi in a frequency much high er than any other
cities. Instead of booking a taxi by phone one day ahead of time, Shenzhen taxi drivers
pick up passengers on street. The ability to predict taxi ridership could present valuable
insights to city planners and taxi dispatchers in answering questions such as how to
position cabs where they are most needed, how many taxis to dispatch, and how
ridership varies over time. Our project focuses on predicting the number of taxi
pickups and predicting where the traffic is slow, fast and congested an d at which place
the taxi demand is more using taxi global positioning system.

There are various algorithms to predict the speed but they are either too time consuming to
be used in real time, too demanding in terms of storage or give too poor results. The one we
are using here is Deep Learning architecture named as Recurrent Neural Network (RNN).
Deep learning is an Artificial Intelligence (AI) function that copies the functionality of
human brain in processing data and creating patterns for use in decision making.

Deep learning is a subcategory of machine learning in Artificial Intelligence that has networks
capable of learning unsupervised from data that is unstructured or unlabeled also known as
Deep Neural Network.
Neural networks are one of the most popular machine learning algorithms these days because
neural networks outperform other algorithms in speed and accuracy. Prediction on large data set
was quite a problem in earlier days but Deep Learning has evolved to such extent that we can
easily predict results on large data sets with limited number of resources.

In this particular project we have tried to predict the speed of moving vehicles using CNN. In
doing so we have faced certain challenges like rapid movement of object and changes in object
orientation across the frame making it difficult to track. Moreover, the illumination is not same
through all the frames and sometimes there are backgrounds which interfere with object making
it difficult to detect. Therefore, we needed to address these challenges by proposing a robust
and more efficient technique for that purpose we have used Dense Optical

Cabs availability prediction using deep learning 2


chapter 1 introduction

Flow method which is based on Gunner Farneback’s algorithm. This algorithm computes
optical flow for all points in frame.

The traffic flow and cabs availability are highly researched because of its due importance. This
project focuses on solving a problem faced by traffic authorities because sometimes it is
difficult to determine where will taxi go and from where passengers find taxis on busy
highways.

1.1 Deep Learning:


Deep Learning is a sub field of machine learning concerned with
algorithms inspired by structure and function of brain called Artificial Neural Networks.
Learning can be supervised, semi-supervised or unsupervised. Deep learning
architectures such deep neural networks(DNN) , deep belief networks(DBN), recurrent
neural networks(RNN) and convolutional neural network(CNN) have been applied to
fields including computer vision, speech recognition, natural language processing, audio
recognition, machine translation, bio informatics, drug design, medical image analysis,
material inspection and board games where they have produced results comparable to
and in some cases superior to human experts. Deep learning technique is used is
becoming a mainstream technology for prediction in any field of data. In this
research I used deep learning technique to predict taxi availability. We also predict
speed of traffic in particular places. Deep learning have improved the accuracy of
prediction. Deep learning algorithms are one promising avenue of research in to
automated extraction of complex data. Deep learning algorithms are quietly
beneficial when dealing with learning from large amount of unsupervised data.
These algorithms are largely motivated by the field of artificial intelligence which
has the general goal of emulating the ability of human brain to think, analyze ,
learn and make decisions especially for more complex problems.
1.2 Need of project:

The problem is that taxi drivers are unaware where will be the highly
demand of taxi. And passengers are also unaware where will be more taxis are
available in less rates. By developing this system will be able to predict where the more
passengers are. The objective of the system there will be a high density of

Cabs availability prediction using deep learning 3


chapter 1 introduction

pickups. Millions of taxis will be able to run to a particular place where the demand of
taxi is more. System will also predict either taxi is occupied or not.
Increased vehicle speed has increased the road side accidents to a very large
extent. This is a very serious problem because it has increased the no of casualties and
no of injuries to a very large extent. Due to heavy loss of wealth and lives the road traffic
management has decided to take serious steps to avoid accidents but these steps are not
enough because the loss of property and lives are increasing day by day. Trend of loss
of lives and property and road side accidents are increasing in recent years. However,
the figures don’t give the real picture because most of accidents are not even reported.
So, an efficient system is required which estimates the speed of vehicles and identify the
over speeding vehicles which will be helpful to avoid accidents to a great extent.
1.3 Scope of project:
For a given location in Shenzhen City, our goal is to predict the availability of taxi
at that given location at particular time interval. Some location require more taxis at a particular
time than other locations owing to the presence schools, hospitals, offices etc. The prediction
result can be transferred to the taxi drivers via Smartphone app, and they can subsequently
move to the locations where predicted pickups are high. According to business point of view
this project is more important in other businesses like stock price prediction, departmental
customer prediction, and cesarean delivery prediction.

1.4 Objectives:

The main objectives of this project are:


• Building a system that ensures that what is next destination of taxi.
• The problem is that taxi drivers are unaware where will be the highly demand of taxi. And
passengers are also unaware where will be more taxis are available in less rates. By
developing this system will be able to predict where the more passengers are.

Cabs availability prediction using deep learning 4


Chapter 2 Literature review

Chapter 2
Literature Review

Cabs availability prediction using deep learning 5


Chapter 2 Literature Review

2Literature Review:

As this document focuses on the technique of deep learning, the literature review
will encompass this technique in detail respectively.

2.1Deep Learning:
Deep Learning is a sub field of machine learning concerned with
algorithms inspired by structure and function of brain called Artificial Neural Networks.
Learning can be supervised, semi-supervised or unsupervised. Deep learning
architectures such deep neural networks(DNN) , deep belief networks(DBN), recurrent
neural networks(RNN) and convolutional neural network(CNN) have been applied to
fields including computer vision, speech recognition, natural language processing, audio
recognition, machine translation, bio informatics, drug design, medical image analysis,
material inspection and board games where they have produced results comparable to
and in some cases superior to human experts.

Deep learning has largely evolved with modernization of digital era which has
brought about an explosion of data in all forms and from every region of world. This
data known simply as big data is extracted from sources like social media, internet
search engines, e-commerce platforms and among others. This data is easily available
and can be shared through fintech applications like cloud computing. Normally
unstructured data is vast and it could take decades to comprehend and extract relevant
information. After realization of this fact adaptation of AI systems for automated
support has increased.

Deep learning algorithms are trained to learn progressively using data. So,
large amount of data is required by machine to give accurate results. Sometimes the
large difference is seen in trained data set and new unseen data set so a lot of efficiency
is required for this. Training a data set requires a lot of data so a machine requires
adequate processing power.

Cabs availability prediction using deep learning 6


Chapter 2 Literature Review

Figure2.1 Comparison between performance of old learning and deep learning algorithms

In older learning algorithms data was structured while in deep learning


networks do not necessarily need structured or labeled data to classify the images.
That is why the performance of deep learning techniques is better than older learning
techniques.
2.1 Neural Network:

Neural networks are a set of algorithms, formed very close to human


brain, to recognize patterns. They interpret sensory data through a kind of machine
perception labeling or clustering raw input. The patterns they recognize are numerical,
contained in vectors, into which all real-world data, be it images, sound, text or time
series, must be translated. Neural network is capable of performing several tasks such as
• Classification of data
• Prediction of data
• Decision making
• Visualization
2.2.1 Layers of neural network:

Neural network consists of three interconnected layers.


I. Input Layer
II. Hidden Layer (It can have more than one layer)
III. Output Layer

Cabs availability prediction using deep learning 7


Chapter 2 Literature Review

Figure 2.2 Layers in Neural Network

There are input neurons in input layer that send information to hidden layer and hidden
layer sends data to output layer.
2.2.2 Neuron:

The basic building block of neural network is called Neuron whose


functionality is similar to that of a human neuron. Just like human brain these building
blocks take input data and gives output. In mathematical terms a neuron in machine
learning world is a place holder for function and its work is to take input and provide
results after applying function on input.

Figure 1.3 Neuron function

A neuron has weighed inputs (synapses), an activation function and one output.

Cabs availability prediction using deep learning 8


Chapter 2 Literature Review

2.2.2.1 Activation functions:

There are five major activation functions which are as follows:


i. Linear Function
ii. Step Function
iii. Sigmoid Function
iv. Tanh Function
v. ReLU (Rectified linear unit) Function
Linear Function:

A function whose graph is a straight line is known as linear function.


𝑓(𝑥) = 𝒶𝓍

Figure 2.4 Linear Function

Step Function:

When value of x is less than zero output will be zero and if value of x is greater than
equal to zero output will be one.
0, 𝑥<0
𝑓(𝑥) = {
1, 𝑥≥0
A step function is non differentiable at zero. It is not able to make progress with
gradient descent approach and so it fails to update the weights. To overcome this problem a
new function was introduced named as sigmoid function.

Figure 2.5 Step Function

Sigmoid Function:

A sigmoid function is defined mathematically as

Cabs availability prediction using deep learning 9


Chapter 2 Literature Review

1
𝜎(𝓏) =
1 + ℯ –𝓏
The value of function tends to zero when z tends to negative infinity and tends to one when z
tends to infinity.

Figure 2.6 Sigmoid Function

Tanh Function:

Tanh function is a rescaled version of sigmoid function and its output ranges in [−1, 1]
instead of [0, 1], [1] .
2
𝑓(𝓏) = tanh(𝓏) = −1
1 + ℯ –2𝓏

Figure 2.7 Tanh Function


ReLU Function:

The Rectified Linear Unit (ReLU) function returns zero when 𝓏 is less than zero and for
any positive value of 𝓏 it returns that value back
0, 𝓏<0
𝑓(𝓏) = {
𝓏, 𝓏≥0

Cabs availability prediction using deep learning 10


Chapter 2 Literature Review

Figure 2.8 ReLU Function

The one we used here is ReLU function because of following reasons.


The two major reasons for using ReLU function except all other activation functions is sparsity and
reduce likelihood of vanishing gradients. This helps in faster learning in any neural network.

2.3 Types of neural networks:

Neural networks are of various types but the most commonly used types of
neural networks are following
• Feed forward neural networks (FFNN)
• Recurrent neural networks (RNN)
• Long short Term memory (LSTM)

2.3.1 Feed forward neural networks (FFNN):

The first artificial neural network which was created was feed forward neural
network and now this is the most commonly used neural network. The feed forward neural
networks are named as feed forward because the information flow through them is
unidirectional without going through the loops. On basis of presence of intermediate hidden
layers, the feed forward neural networks can be further classified as single layered and
multilayered networks. The no of layers to be used basically depends on the complexity of
function which needs to be performed.

In Single layer feed forward network outputs are directly fed by inputs
through series of weights with no hidden layers.

Cabs availability prediction using deep learning 11


Chapter 2 Literature Review

Figure 2.9 Single layer


feed forward network
In multi-layer feed forward network there are multiple hidden layers between
input and output layers. These multiple hidden layers allow for further information
processing.

Figure 2.10 Multi layer feed forward network


Feed forward networks mostly find their applications in image recognition and to
perform basic patterns.

2.3.2 Recurrent neural networks (RNN):

Recurrent neural networks as the name suggests works in a way that it saves the output
of a layer and feeds it back to the input to help in predicting the outcome of a layer. In this
network first layer formed is similar to the feed forward network with product of sum of
weights and features. Once this is computed the recurrent neural network process will then start
this means from one-time step to next each neuron will remember some information that it had
in previous time step. So, in this way each neuron acts like a memory cell in performing
computations. So, in this way each neuron has to work on front propagation and remember
what information it needs for later use. In case of wrong prediction, the error correction is made
to make small changes so that it gradually works towards making right prediction during back
propagation. Recurrent neural networks are difficult to train and also have a very short-term
memory which limits their functionality. So, in order to overcome this a new form of RNN also
called as long short-term networks or LSTM are used. LSTM extend the memory RNNs to
perform tasks involving long-term memory.
They are mostly used in applications like language processing and speech recognition

Cabs availability prediction using deep learning 12


Chapter 2 Literature Review

Figure 2.11 Recurrent neural network

2.3.3 Long short term memory (LSTM):


Long short term memory is an artificial recurrent neural network architecture used in deep
learning for storing some previous historic data in their memory and making decision and
prediction based on previous data. It is used in making prediction in time series data.

Figure 2.12 LSTM

Cabs availability prediction using deep learning 13


Chapter 2 Literature Review

Figure 2.13 LSTM


2.3.4 Convolutional neural networks (CNN):

Convolution neural network is one of the types of neural networks which is


used heavily in field of computer vision. Convolution neural network has been named as
convolutional neural network because of the hidden layer it has that is convolutional layer.
The hidden layers in a convolutional network consists of
• Convolutional layers
• Pooling layers
• Fully connected layers
• Normalization layers
So, in a convolutional neural network instead of using simple activation functions pooling
and convolutional functions are used as the activation functions.

Figure 2.12 Convolutional neural network

Convolutional neural networks can well understood if we have complete


information about the convolution and pooling.

2.3.3.1 Convolutional:

Convolutional function requires two signals in case of one-dimensional operations


and two images in case of two-dimensional operations. In case of two-dimensional operations,
one is input signal or image and the other which is also called kernel is a filter on input image.
In simple words convolutional is nothing but simply the multiplication of input image with
filter to produce a modified output image.

Cabs availability prediction using deep learning 14


Chapter 2 Literature Review

So convolutional is nothing but dot product of input image and kernel. In image
processing kernel slides over the entire image and changes the value of each pixel involved in
the process.

Figure 2.13 Convolutional

2.3.3.2 Pooling:

Pooling function is simple a sample-based discretization process. The main objective


of the pooling layer is to down sample an input representation (image, hidden layer, output
matrix etc.) to reduce the input dimensions. There are two main types of pooling
• Max pooling
• Min pooling
It is evident from name of max pooling that maximum value from selected region is
picked up and in min pooling the minimum value from selected region is picked up.

Figure 2.14 Pooling


Convolutional neural network is simply a neural network which consists of
convolution and pooling functions in hidden layers in addition to activation function for
introducing non linearity.

Cabs availability prediction using deep learning 15


Chapter 3 Methodology

Chapter 3
Methodology

Cabs availability prediction using deep learning 16


Chapter 3 Methodology

1. Methodology:

In this chapter I will define what is recurrent Neural Network and


what approach I will be using to get the desired results, followed by the description of
our design and the model. I will also discuss about the lstm and its method and its
significant in our project. Keras is a python library used in building in lstm models. I
will discuss about that in detail in this chapter.

3.1 Recurrent neural networks

Recurrent neural networks as the name suggests works in a way that it saves the output
of a layer and feeds it back to the input to help in predicting the outcome of a layer. In this
network first layer formed is similar to the feed forward network with product of sum of
weights and features. Once this is computed the recurrent neural network process will then start
this means from one-time step to next each neuron will remember some information that it had
in previous time step. So, in this way each neuron acts like a memory cell in performing
computations. So, in this way each neuron has to work on front propagation and remember
what information it needs for later use. In case of wrong prediction, the error correction is made
to make small changes so that it gradually works towards making right prediction during back
propagation. Recurrent neural networks are difficult to train and also have a very short-term
memory which limits their functionality. So, in order to overcome this a new form of RNN also
called as long short-term networks or LSTM are used. LSTM extend the memory RNNs to
perform tasks involving long-term memory.

BUT They are mostly used in applications like language processing and speech recognition
A convolutional neural network (CNN) is a specific type of artificial neural
network that uses perceptron, a machine learning unit algorithm, for supervised learning, to
analyze data. CNNs apply to image processing, natural language processing and other kinds
of cognitive tasks.

CNNs are incredible picture handling, man-made brainpower (AI) that utilization
profound figuring out how to perform both generative and descriptive tasks, frequently
utilizing machine vison that incorporates picture and video acknowledgment, alongside
recommender frameworks and characteristic language preparing (NLP)

A CNN uses a system much like a multilayer perceptron that has been designed for
reduced processing requirements. The layers of a CNN consist of an input layer, an output
layer and a hidden layer that includes multiple convolutional layers, fully connected layers
and normalization layers. The removal of limitation and increment in proficiency for picture
preparing brings about a framework that is unmistakably increasingly viable, more
straightforward to trains constrained for picture handling and characteristic language
preparing.

Cabs availability prediction using deep learning 17


Chapter 3 Methodology

Steps of methodology:

We determined Cabs availability predictions using following steps:


• Converting raw data into data frames.
• Normalization, feature scalling.
• Training model using RNN
• Predicting availability.

Cabs availability prediction using deep learning 18


Chapter 3 Methodology

Input

Preprocessing

Normalization

Building the model

Model Training

Assessment of Results

Figure 3.1 Block Diagram

3.2 Input:

The first step is to obtain effective data set to be passed to system as an input. So,
the data set in our project is a 1.9 GB data of previous rides of taxi. We use data sets of
taxi GPS of Shenzhen, china city. The data associates each taxi ride including taxi id,
time, longitude, latitude, occupancy status, and speed. This data is downloaded from
https://www.cs.rutgers.edu/~dz220/data.html. This is 1.5 GB of data sets.

Cabs availability prediction using deep learning 19


Chapter 3 Methodology

Figure 3.2 Input image

Cabs availability prediction using deep learning 20


Chapter 3 Methodology

3.3 Pre-processing:

Data per-processing is required to be done before further computations. So, this


is done by converting the csv file in form of data frames. About 40 millions data frames
were formed from this data sets. These data frames are then saved in a csv file along with
longitude latitude time and speed.

3.4 Normalization:

Dataset requires further processing before sending it to recurrent neural


network. In normalization we convert data frames values in 0 to 1 form.

3.4.1 Vanishing gradient problem :


Solutuion:

• Exploding gradient
• truncated back propagation
• penalties
• gradient clipping
• vanishing gradient
• weight initialization
• echo state networks
• long short-term memory networks

Cabs availability prediction using deep learning 21


Chapter 3 Methodology

Figure 3.4 vanishing gradient problem

Cabs availability prediction using deep learning 22


Chapter 3 Methodology

3.5 Building the model:

The next step is to build a Keras model which has two types
1.Sequential Keras model
2. Functional Keras model
Sequential Keras model helps you to build layer by layer models but it also has certain
limitations as well. It does not allow to build a model that share layers or have multiple inputs
or outputs. Functional Keras model allows you to build a more complex and advanced model
as it helps to build a model that can not only connect previous or next layer but they can literally
be connected to any other layer. Functional Keras model is also useful when you have multiple
inputs.
In our project we have used sequential Keras model because when don’t have multiple inputs
so functional model is not required. Sequential model is easiest to be built in Keras because it
allows to build model layer by layer. We have used ‘add ()’ function to add layers in our
network. Layers used in our model are LSTM layers, dense layers, Dropout layer and
activation functions. First layers used in our project are Conv2D layers. A convolution layer
contains set of filters and these filters are subset of input data at a time. So, kernel size is size of
filter matrix for our convolution. These convolution layers have to deal with the input images
which are simply two-dimensional matrices. Activation is the activation function used with
convolutional.

layer. The activation function which we are using in our project is ReLU or Rectified Linear
Activation function. The ReLU or Rectified Linear Activation function is best to use in neural
networks as compared to other activation functions. Next is the dense layer which is a fully
connected layers in which each input node is connected to each output node and dropout layer
is similar to dense layer instead when dropout layer is used activations are set to zero for some
random nodes to avoid overfitting.

Compiling the model:

Next step is compiling the model which takes two steps


1. Optimizer
2.Loss function

3.6 Training the model:

Next step after building the model is model training so that it can work accordingly.
1000 batches of data frames were given to model at one time. Total 500 epochs were performed
after which our model was fully trained.

Cabs availability prediction using deep learning 23


Chapter 04 Results & Discussions

Chapter 4
Results & Discussions

Cabs availability prediction using deep learning 22


Chapter 4 Results and Discussions

4Results:
In this chapter we will discuss about the results we have obtained after performing all the tasks
and procedures.

4.1 Results when trained model:

An immense growth of cab service has been observed in a past few year .cab ride is becoming part
of common people life. Many of us take cabs for traveling purpose but something strange we see every time
during the ride of cab and that is in its fare. It’s quite known that cab price depends upon the distance we
travel. But does every time price of a cab is depended upon distance? No, apart from distance there are
many other factors which is directly proportional to the fare of a cab, some of them are availability of a cab,
timing of booking cab, day of traveling, etc. Cabs are always expensive when there is less availability of
cabs and during peak time i.e, a time when most of the people are in travelling mode. At late night, early
morning also cab fare price varies. Here from the historical data by apply analytics I will develop a model
for cab fare prediction. Basically, I will design a system that predicts the fare amount for a cab ride in the
city.

4.2Data
The problem discussed above is a regression problem. Objective is to predict cab fare, based on co-
ordinates and time of travel. Train data and test data are two sets of data, through which we will try to
predict cab fare. Train data is consists of fare amount, pick up and drop off co-ordinates along with
passenger count where as test data, as name suggest will be use for testing of predicted fare, all the column
are just like train data, only the difference is fare amount is missing in test data which we have to predict.
Given below is a glimpse of sample data set that we are using to predict cab fare prediction.

Cabs availability prediction using deep learning 23


Chapter 4 Results and Discussions

Table 4.1: Cab Prediction Data Summary (Test)

Table 4.2: Cab Prediction Data Summary


As we can see above table 1.1 1nd 1.2 is showing summary of train and test set.
Let us have a name of all the columns individually.’ 'fare_amount’ is a dependant variable for which
we have to develop a model for prediction of cab fare.
Input:
print(data_train.columns)
Index(['fare_amount', 'pickup_datetime', 'pickup_longitude', 'pickup_latitude', 'dropoff_longitude',
'dropoff_latitude', 'passenger_count'], dtype='object')
print(data_test.columns)
Index(['pickup_datetime', 'pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude',
'passenger_count'], dtype='object')
4.3 Column Description:
3 pickup_datetime - timestamp value indicating when the cab ride started.
4 pickup_longitude - float for longitude coordinate of where the cab rides started.
5 pickup_latitude - float for latitude coordinate of where the cab rides started.
6 dropoff_longitude - float for longitude coordinate of where the cab rides ended.
Cabs availability prediction using deep learning 24
Chapter 4 Results and Discussions
7 dropoff_latitude - float for latitude coordinate of where the cab rides ended.
8 passenger_count - an integer indicating the number of passengers in the cab ride.

Let us visualize all continuous variable of a sample dataset in Fig. 1.1 In a next chapter we will
further analyze the data in more detail.

Figure 4.1 Scatter matrix of continuous variable of a Cab Prediction Dataset (Train)

Python Code:

pd.scatter_matrix(data.loc[:,col],figsize=(8,8),diagonal='kde')

Cabs availability prediction using deep learning 25


Chapter 4 Results and Discussions

4.4Data Analysis
Data Analysis is a process of inspecting, cleaning, transforming and modelling data with the goal of
discovering useful information, suggesting conclusions and supporting decision-making. For any data
science project, data is like a fuel of an engine therefore knowing the data is first and foremost task which
should be done very efficiently and proficiently. Therefore we will start with knowing our data and then we
more further to other stages which is as follow:

3 Emphasize for understand the problem and data

4 Data exploration / data cleaning

5 Feature engineering / feature selection

6 Model evaluation and selection

7 Model optimization

8 Interpretation of results and predictions

First thing to perform whenever we deal with any kind of data is know the shape, data type and dimension
of a data.

Input:

data_train=pd.read_csv("train_cab.CSV",sep=',')

print("Training Data Shape :", data_train.shape)

data_test=pd.read_csv("test.CSV",sep=',')

print("Test Data Shape :", data_test.shape)

Output :

Training Data Shape: (16067, 7)


Test Data Shape: (9914, 6)

Input:
print("\n Training Set Columns Type\n ",data_train.dtypes)
print("\n Test Set Columns Type\n ",data_test.dtypes)
Cabs availability prediction using deep learning 26
Chapter 4 Results and Discussions

Figure 4.2 : Column Data Type ( Train & Test)

Next, let us check descriptive statistics of train and test set, which is very important for knowing the
distribution of the data.

Input:
# Descriptive summary
data_train.describe()

Output:

Cabs availability prediction using deep learning 27


Chapter 4 Results and Discussions

Table 4.3: Cab Prediction Data Summary


Input:
# Descriptive summary
data_test.describe()

Output:
Table 4.4: Cab Prediction Data Summary

Distibution of Dependant Variable:


As fare_anount is not normally distributed we are taking log scale for all the values, and disbution of
fare_amount is below:
Code:
plt.figure(figsize=(8,5))
sns.kdeplot(np.log(data_train['fare_amount'].values)).set_title("Fare amount (log scale) Distribution")

Cabs availability prediction using deep learning 28


Chapter 4 Results and Discussions

Figure 4.3 : Log Distribution of Fare Amount

From descriptive analysis it is clear that data set is consists of lots of NaN, Zero, columns like
passenger_count is highly deviated from the mean, co-ordinates columns

( dropoff_longitud dropoff_latitude
pickup_latitue
pickup_longitude e )

is consists of the value having no meaning, therefore these values should be identified and should be
treated. In the next section we will see how step should be taken to clean the data.

4.6 Data Pre-Processing


4.6.1 Data Pre Processing:
Data is growing day-by-day. Extracting knowledge from it is a real threat otherwise it is nothing but
garbage. Data, when loaded into the database from various source, emerge as a messy dataset. These datasets
are of no use because extracting valuable information from it is very tough. Therefore data pre-processing
is the first and mandatory step for any data scientist before mining [2]. According to Lour, data scientists

Cabs availability prediction using deep learning 29


Chapter 4 Results and Discussions
spend 50% - 80% of their valuable time and effort in data collection and preparing disorderly digital data,
before it can be explored for useful nuggets.
In descriptive analysis we have seen columns contain zero values, which does not play any role
during fare prediction therefore, we will identify such columns and replace it with null values. Those null
values will be further treated during missing value analysis.

Table 4.5: Number of Missing Value (Train)

4.6.2 Missing Value Analysis:


In the first step of data pre-processing we look into the dataset to collect missing values in different
variables. Missing value should be recognized and replace (mean, median or knn imputation) or drop it
because in the presence of missing values, a machine learning algorithm doesn’t perform well. Cab
Prediction dataset contain many missing values therefore lets identify it and treat it accordingly.
In the table 3.1, we can see missing values in almost all columns except fare_amount and
pickup_datetime. We will replace those with mean or median depending which will suitable for a
particular column.

4.7 Feature Engineering:


Feature engineering is the process of using domain knowledge of the data to create features that
make machine learning algorithms work. If feature engineering is done correctly, it increases the predictive
power of machine learning algorithms by creating features from raw data that help facilitate the machine
learning process. Feature Engineering is an art of deriving new columns from given column in a dataset.

Cabs availability prediction using deep learning 30


Chapter 4 Results and Discussions
Pick up date is provided in a dataset which is very important. But as we know we cannot
use date directly for model building. We extract year, month, day, weekday, hour from it with the help of
function extract_date. This function is applied to train as well as test data.
def extract_date(data) :
data['pickup_datetime'] = pd.to_datetime(data['pickup_datetime'], format= "%Y-%m-%d
%H:%M:%S.%f")
data["year"] = pd.DatetimeIndex(data["pickup_datetime"]).year
data["month"] = pd.DatetimeIndex(data["pickup_datetime"]).month
data["weekday"] = data["pickup_datetime"].dt.weekday_name
data["day"] = data["pickup_datetime"].dt.day
data["hour"]= data["pickup_datetime"].dt.hour
return data
Distance is another variable which is derived from the pickup and drop off longitude and latitude.
# Lets Calcute Distance From Cordinate
def distance(data):
pickup_latitude = np.radians(data["pickup_latitude"])
pickup_longitude = np.radians(data["pickup_longitude"])
dropoff_latitude = np.radians(data["dropoff_latitude"])
dropoff_longitude = np.radians(data["dropoff_longitude"])
dlon=dropoff_longitude-pickup_longitude
dlat=dropoff_latitude-pickup_latitude
a = (np.sin(dlat/2))**2 + p.cos(pickup_latitude)*np.cos(dropoff_latitude)*(np.sin(dlon/2))**2
c = 2 * np.arctan2( np.sqrt(a), np.sqrt(1-a) )
R = 6373.0
d=R*c
data["Distance"] = d
return data
4.8 Feature selection
It is not realistic to perform a complex data analytics on a massive dataset. Data reduction is a
technique to achieve a smaller dataset. Prior to any machine learning modelling we should measure the
significance of each independent variable of our dataset. It can happen that many variables are not
performing important role during model building. These types of variable should be recognized and dropped
before any predictive analysis. Let us look in our dataset for insignificant independent variable.
As we can see that apart from distance no other variable is correlated to fare_amoun t(dependant
variable), we will visualize the data from different aspects and try to find out the relation. We will drop
pickup_date before modelling.

Cabs availability prediction using deep learning 31


Chapter 4 Results and Discussions

Table 4.6: Cab Prediction Data Summary

Cabs availability prediction using deep learning 32


Project Design
Chapter 05

chapter 5
Project Design

• Cabs availability prediction using deep learning 33


Project Design
Chapter 05

5Project Design:

This chapter describes the design of our system. It explains the use cases, and
actors involved and how user will interact with our system. Project Design explains the code
in-depth and methodologies used in building software.
Project Design includes: -
1. Use case Diagram
2. Use Cases
3. System Sequence Diagram
4. Interaction Diagram

5.1The Use Case Diagram:

The use-case diagram is used to show that action which will be performed by
user on our system. Figure 5.1 shows how user will perform tasks on our system and
how he will interact with user.

Fig5.1 Use Case Diagram

• Cabs availability prediction using deep learning 34


Project Design
Chapter 05
5.1.1Use Cases:

A use cases are detailed description of how users will perform tasks on our system.
It also shows our system behavior on user actions and what will reply of our system on user
requests. Each use case is written with starting goal of user and what will be the required
ending.

5.1.2Use Case Make Dataset:

Details of use case Make Dataset which converts the raw data into frames and save path in csv.

ID no. UseCase-01

Scope Use dataframes and speed values to make dataset

Name Make Dataset

Primary Actor User

Goals Let user to convert images and frames into dataset.

Pre-Conditions System is running

Valid raw data file and text file

Post Conditions Data frames is converted to frames

Success Scenarios All the dataset is converted and

saved into csv

Alternate The dataset is not saved into csv file

The data and text file were invalid

• Cabs availability prediction using deep learning 35


Chapter 5 Project Design
5.1.3Use Case Train Model:

Detail of use case train model where we load dataset and train our model.

ID no. UseCase-02

Scope Load Dataset and train model

Name Train Model

Primary Actor(s) User

Goals User can load dataset from machine and train model.

Pre-Conditions Dataset is converted and saved into csv.


The data frames must be existing on the path.

Post Conditions The model weights are saved.

Success Scenarios Model is fully trained.

Alternate Model is not trained due to overfitting.


Model is trained but not giving good results

5.1.4Use Case Predict availability

Detail of use case train model where we detect cabs and predict their availability.

Cabs availability prediction using deep learning 36


Chapter 5 Project Design
ID no. UseCase-03

Scope Detect cabs and get locations

Name Predict availability

Primary Actor(s) User

Goals Detect cars from longitude and latitude and get results.

Pre-Conditions Model is fully trained

Success Scenarios Cabs availability is predicted in a real time

Alternate System is too slow to run model.

5.2System Sequence Diagram:

System sequence diagram shows the situation of use cases, the actions
which actors perform and possible system response.

5.2.1Make Dataset:

Figure 5.2 shows the System Sequence Diagram of Use-case 01: Make Dataset,
where User inputs a valid file and text file which is converted to frames and path is saved in
csv.
5.2.2 System sequence diagram:

Figure 5.3 shows the System Sequence Diagram of Use-case 02: Train Model, where
User inputs datasets and train model.

Cabs availability prediction using deep learning 37


Chapter 5 Project Design

Fig 5.3 System sequence diagram

Cabs availability prediction using deep learning 38


Chapter 06 Conclusion

chapter 6

conclusions

Car speed prediction using deep learning 39


Chapter 5 Project Design

6.1CONCLUSION:

In this Chapter I will discuss about the summary and results of method which were
Implemented.

In first when model was trained using lstm it performed really better. In this
approach before passing the data set directly to model it was passed through lstm layers .
This prevented model to over fitting Model and it gave better results.

In Second approach when raw data were directly passed to model. The model was
continuously over fitting. The model didn’t improve after running for several times.

My project of Cabs availability Prediction gives an opportunity to perform more


experiments on where taxi will go.

Cabs availability prediction using deep learning 40


rrerref rences

References

References:

1. M. Müller, M. Reif, M. Pandit, W. Staiger, B. Martin, Vehicle


speed prediction for driver assistance systems, SAE Tech. Pap.
(2004), 2004-01-0170
2. Gonder, Route-based control of hybrid electric vehicles, SAE World
Congress, U. S. A. (2008)
3. Fischer, Philipp, Dosovitskiy, Alexey, Ilg, Eddy, Hausser, Philip,
Hazrba, Caner, Golkov, Vladimir, van der ¨ Smagt, Patrick, Cremers,
Daniel, and Brox, Thomas Flownet: Learning optical flow with
convolutional neural networks. In ICCV,2015.
4. Autonomous Vehicle Steering Wheel Estimation from a Video, Arthur Emidio T.
Ferreira, Ana Paula G. S. de Almeida2 and Flavio de Barros Vidal
5. Multichannel Convolutional Neural NetworksLandgrebe, D. A. (2003). Signal
theory methods in multispectral remote sensing. Hoboken: John Wiley and Sons.
6. G A. Mahmoudabadi, Using artificial neural network to estimate
average speed of vehicles in rural roads, Int. Conf. Intell. Network
Comput. (2010) 25e30
7. G J. Park, D. Li, Y.L. Murphey, J. Kristinsson, R. McGee, M. Kuang, T.
Phillips, Real time vehicle speed prediction using a neural network traffic
model, in: The 2011ernational Joint Conference on Neural Networks, San
Jose, 2011, pp.2991e2996.
8. A Deep Learning Approach toVehicle Speed Estimation,Benjamin Penchas
,[email protected], Tobin Bell [email protected],Marco
Monteiro [email protected].
9. Bene E.I. Vlahogianni, M.G. Karlaftis, J.C. Golias, Optimized and meta-
optimized neural networks for short-term traffic flow prediction: a genetic
approach, Transp. Res. Part C Emerg. Technol. 13 (3) (2005) 211e234.

Cabs availability prediction using deep learning 40

You might also like