0% found this document useful (0 votes)
12 views

Stock Price Predication

stp

Uploaded by

Rajiv Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Stock Price Predication

stp

Uploaded by

Rajiv Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

STOCK PRICE PREDICTOR

Authors:

Deepak, Chandigarh University

Rahul Dhiman, Chandigarh University


TABLE OF CONTENTS

Table Contents ....................................................................................................


1. DEFINITION...................................................................................................
.............................................................................................1.1 Project Overview
...............................................................................................................................
..........................................................................................1.2 Problem Statement
...............................................................................................................................
............................................................................................................1.3 Metrices
...............................................................................................................................
2. ANALYSIS.......................................................................................................
2.1 Data Exploration..............................................................................................
2.2 Exploratory Visualisation................................................................................
2.2 Algorithms and Techniques.............................................................................
2.2 Benchmark model............................................................................................
3. METHODOLOGY..........................................................................................
3.1 Data Processing................................................................................................
................................................................................................................................
3.2 Implementation...............................................................................................
................................................................................................................................
3.3 Refinement......................................................................................................
4. RESULT ........................................................................................................
4.1 Model Evaluation and Validation.................................................................
4.2 Exploratory Visualisation..............................................................................
4. CONCLUSION..............................................................................................
5.1 Free-Form Visualisation................................................................................
5.2 Reflection.........................................................................................................
5.3 Improvement...................................................................................................

References (If Any)................................................................................................


Chapter 1.

DEFINITION

1.1 Project Overview


Investment firms, hedge funds and even individuals have
been using financial models to better understand market
behavior and make profitable investments and trades. A
wealth of information is available in the form of historical
stock prices and company performance data, suitable for
machine learning algorithms to process.

Can we actually predict stock prices with machine learning?


Investors make educated guesses by analyzing data. They'll
read the news, study the company history, industry trends
and other lots of data points that go into making a
prediction. The prevailing theory is that stock prices are
totally random and unpredictable but that raises the
question why top firms like Morgan Stanley and Citigroup
hire quantitative analysts to build predictive models. We
have this idea of a trading floor being filled with adrenaline
infused men with loose ties running around yelling
something into a phone but these days they're more likely
to see rows of machine learning experts quietly sitting in
front of computer screens. In fact about 70% of all orders
on Wall Street are now placed by software, we're now living
in the age of the algorithm.

This project seeks to utilize Deep Learning models, Long-


Short Term Memory (LSTM) Neural Network algorithm, to
predict stock prices. For data with time frames recurrent
neural networks (RNNs) come in handy but recent research
has shown that LSTM networks are the most popular and
useful variants of RNNs.

I will use Keras to build a LSTM to predict stock prices using


historical closing price and trading volume and visualize
both the predicted price values over time and the optimal
parameters for the model.

1.2 Problem Statement

The challenge of this project is to accurately predict the


future closing value of a given stock across a given period
of time in the future. For this project I will use a Long
Short Term Memory networks – usually just called
“LSTMs” to predict the closing price of the S&P 5002 using
a dataset of past prices

Goals

1. Explore stock prices.


2. Implement a basic model using linear regression.
3. Implement LSTM using keras library.
4. Compare the results and submit the report.

1.3 Metrics
For this project measure of performance will be using the
Mean Squared Error (MSE) and Root Mean Squared Error
(RMSE) calculated as the difference between predicted and
actual values of the target stock at adjusted close price and
the delta between the performance of the benchmark
model (Linear Regression) and our primary model (Deep
Learning).

Chapter 2.

ANALYSIS

2.1 Data Exploration

The data used in this project is of the Alphabet Inc3 from


January 1, 2005 to June 20, 2017, this is a series of data
points indexed in time order or a time series. My goal was
to predict the closing price for any given date after training.
For ease of reproducibility and reusability, all data was
pulled from the Google Finance Python API4.

The prediction has to be made for the Closing (Adjusted


closing) price of the data. Since Google Finance already
adjusts the closing prices for us5, we just need to make
a prediction for the “CLOSE” price.

The dataset is of following form :


Date Ope High Low Close Volu
n me

30- 943. 945. 929. 929.6 228


Jun- 99 00 61 8 766
17 2

29- 951. 951. 929. 937.8 320


Jun- 35 66 60 2 667
17 4

28- 950. 963. 936. 961.0 274


Jun- 66 24 16 1 556
17 8

Table: The whole data can be found out in ‘Google.csv’ in


the project root folder6

Note: I did not observe any abnormality in datasets, i.e, no


feature is empty and does not contains any incorrect value
as negative values.
The mean, standard deviation, maximum and minimum of
the data was found to be following:

Feat Ope High Low Close Volum


ure n e

Mean 382. 385.87 378.7 382.35 420570


5141 20 371 02 7.8896

Std 213. 214.60 212.0 213.43 387748


4865 22 8010 59 3.0077

Max 1005 1008.6 1008. 1004.2 411828


.49 1 61 8 89

Min 87.7 89.29 86.37 87.58 521141


4
We can infer from this dataset that date, high and low
values are not important features of the data. As it does not
matter at what was the highest prices of the stock for a
particular day or what was the lowest trading prices. What
matters is the opening price of the stock and closing prices
of the stock. If at the end of the day we have higher closing
prices than the opening prices that we have some profit
otherwise we saw losses. Also volume of share is important
as a rising market should see rising volume, i.e, increasing
price and decreasing volume show lack of interest, and this
is a warning of a potential reversal. A price drop (or rise) on
large volume is a stronger signal that something in the
stock has fundamentally changed.

Therefore I have removed Date, High and low features from


the data set at preprocessing step. The mean, standard
deviation, maximum and minimum of the preprocessed
data was found to be following:

Mean Std Max Min

Open 0.321 0.232 1.0 0.0


2 61

Close 0.321 0.2328 1.0 0.0


5

Volu 0.090 0.095 1.0 0.0


me 61 3

2.2 Exploratory Visualization

To visualize the data i have used the matplotlib7 library. I


have plotted Closing stock price of the data with the no of
items( no of days) available.
Following is the snapshot of the plotted data :

X-axis: Represents Tradings Days Y-axis: Represents


Closing Price In USD

Y-axis: Represents Closing Price In USD

Through this data we can see a continuous growth in


Alphabet Inc. The major fall in the prices between 600-1000
might be because of the Global Financial Crisis of 2008-
2009.

2.3 Algorithms and Techniques

The goal of this project was to study time-series data and


explore as many options as possible to accurately predict
the Stock Price. Through my research I came to know about
Recurrent Neural Nets (RNN)8 which are used specifically
for sequence and pattern learning. As they are networks
with loops in them, allowing information to persist and thus
ability to memorize the data accurately. But Recurrent
Neural Nets have a vanishing Gradient descent problem
which does not allow it to learn from past data as was
expected. The remedy of this problem was solved in Long-
Short Term Memory Networks, usually referred to as LSTMs.
These are a special kind of RNN, capable of learning long-
term dependencies.

In addition to adjusting the architecture of the Neural


Network, the following full set of parameters can be tuned
to optimize the prediction model:

• Input Parameters
• Preprocessing and Normalization (see Data Preprocessing
Section)

• Neural Network Architecture


• Number of Layers (how many layers of nodes in the
model; used 3)
• Number of Nodes (how many nodes per layer; tested
1,3,8, 16, 32, 64, 100,128)

• Training Parameters

• Training / Test Split (how much of dataset to train versus


test model on; kept constant at 82.95% and 17.05% for
benchmarks and lstm model)

• Validation Sets (kept constant at 0.05% of training sets)


• Batch Size (how many time steps to include during a
single training step; kept at 1 for basic lstm model and at
512 for improved lstm model)

• Optimizer Function (which function to optimize by


minimizing error; used “Adam” throughout)

• Epochs (how many times to run through the training


process; kept at 1 for base and at 20 for improved LSTM)

2.4 Benchmark Model

For this project I have used a Linear Regression model as


its primary benchmark. One of my goals is to understand
the relative performance and implementation differences of
machine learning versus deep learning models. This Linear
Regressor was based on the examples presented in
Udacity’s Machine Learning for Trading course and was
used for error rate comparison MSE and RMSE utilizing the
same dataset as the deep learning models.
Following is the predicted results that i got from my
benchmark model :
X-axis: Represents Trading Days Y-axis: Represents Closing
Price In USD

Y-axis: Represents closing price in USD

Green line: Adjusted close price

Blue line: Predicted close price

Train Score: 0.1852 MSE (0.4303 RMSE)


Test Score: 0.08133781 MSE (0.28519784 RMSE)
Chapter 3.

METHODOLOGY

3.1 Data Preprocessing

Acquiring and preprocessing the data for this project occurs


in following sequence, much of which has been modularized
into the preprocess.py file for importing and use across all
notebooks:

• Request the data from the Google Finance Python API and
save it in google.csv file in the following format.

• Remove unimportant features(date, high and low) from


the acquired data and reversed the order of data, i.e., from
January 03, 2005 to June 30, 2005

Item Open Close Volume

0 98.80 101.46 158606


92

1 100.77 97.35 137623


96

2 96.82 96.85 823954


5

3 97.72 94.37 103898


03

Normalized the data using MinMaxScaler helper function


from Scikit-Learn.

Item Open Close Volume

0 0.01205 0.01514 0.37724


1 1 8

1 0.01419 0.01065 0.32564


8 8 4

2 0.00989 0.01011 0.18982


4 2 0

3 0.01087 0.00740 0.24270


4 7 1
• Stored the normalized data in google_preprocessed.csv
file for future reusability.

• Splitted the dataset into the training (68.53%) and test


(31.47%) datasets for linear regression model. The split was
of following shape :

x_train (2155, 1)

y_train (2155, 1)

x_test (990, 1)

y_test (990, 1)

• Splitted the dataset into the training (82.95%) and test


(17.05%) datasets for LSTM model. The Split was of
following shape:

x _train (2589, 50, 3)


y_train (2589,)

x_test (446, 50, 3)

y_test (446,)

3.2 Implementation

Once the data has been downloaded and preprocessed, the


implementation process occurs consistently through all
three models as follow:
I have thoroughly specified all the steps to build, train and
test model and its predictions in the notebook itself.
Some code implementation insight:

Benchmark model :

Step 1 : Split into train and test model :

Here I am calling a function defined in ‘stock_data.py’


which splits the data for linear regression model. The
function is as follows :
Step 2: In this step model is built using scikit-learn

linear_model10 library.

Here I am calling a function defined in


‘LinearRegressionModel.py’ which builds the model for
the project. The screenshot of the function is as follows:
Step 3: Now it’s time to predict the prices for given test
datasets.

The screenshot of the function is as follows, it is defined in


‘LinearRegressionModel.py’:

Step 4: Finally calculate the test score and plot the results
of benchmark model.
Improved LSTM model :
Step 1 : Split into train and test model :
Note : The same set of training and testing data is used for
improved LSTM as is used with basic LSTM.

Step 2 : Build an improved LSTM model :

Here I am calling a function defined in ‘lstm.py’ which


builds the improved lstm model for the project. The
screenshot of the function is as follows:

NOTE: The function uses keras Long short term


memory11

I have increased the batch_size to 512 from 1

Epochs from 1 to 20 for my improved LSTM model.

library to implement LSTM model.

Also in the function I have increased the number of nodes


in the hidden layer to 128 from 100 and have added a drop
out of 0.2 to all the layers.
Step 3: We now need to train our model.
I have used a built-in library function to train the model.
Step 4: Now it’s time to predict the prices for given test

datasets.

I have used a built-in function to predict the outcomes of


the model.
Step 5: Finally calculate the test score and plot the results
of improved LSTM model.

3.3 Refinement

For this project I have worked on fine tuning


parameters of LSTM to get better predictions. I
did the improvement by testing and analyzing
each parameter and then selecting the final
value for each of them.

To improve LSTM i have done following:

● Increased the number of hidden nodes from 100 to 128.

● Added Dropout of 0.2 at each layer of LSTM

● Increased batch size from 1 to 512

● Increased epochs from 1 to 20

● Added verbose = 2

● Made prediction with the batch size


Thus improved my mean squared error, for testing sets,
from 0.01153170 MSE to

0.00093063 MSE.

The predicted plot difference can be seen as follows:


Fig : Plot For Adjusted Close and Predicted Close Prices for
basic

Fig : Plot For Adjusted Close and Predicted Close


Prices for improved LSTM model
Chapter 4.

RESULT

4.1 Model Evaluation and Validation

With each model I have refined and fine tuned my

predictions and have reduced mean squared error

significantly.

● For my first model using linear regression model:

● Train Score: 0.1852 MSE (0.4303 RMSE)

● Test Score: 0.08133781 MSE

(0.28519784 RMSE)
Fig: Plot of Linear Regression Model

● For my second model using basic Long-Short Term

memory model:

● Train Score: 0.00089497 MSE (0.02991610

RMSE)
● Test Score: 0.01153170 MSE (0.10738577

RMSE)
Fig: Plot of basic Long-Short Term Memory

model

● For my third and final model, using improved Long-

Short Term memory model:

● Train Score: 0.00032478 MSE (0.01802172

RMSE)

● Test Score: 0.00093063 MSE (0.03050625

RMSE)


Fig: Plot of Improved Long-Short Term

Memory Model

Robustness Check :

For checking the robustness of my final model I used

unseen data, i.e, data of Alphabet Inc. from July 1,


2017 to July 20, 2017. On predicting the values of

unseen data I got a decent result for the data. The

results are as follows:

Test Score: 0.3897 MSE (0.6242 RMSE)

4.2 Justification

Comparing the benchmark model - Linear Regression

to the final improved LSTM model, the Mean Squared

Error improvement ranges from 0.08133781 MSE

(0.28519784 RMSE) [Linear Regression Model] to

0.00093063 MSE (0.03050625 RMSE) [Improved

LSTM]. This significant decrease in error rate clearly

shows that my final model has surpassed the basic

and benchmark model.


Also the Average Delta Price between actual and

predicted Adjusted Closing Price values was:

Delta Price: 0.000931 - RMSE * Adjusted Close

Range

Which is less than one cent :)


5. CONCLUSION

5.1 Free-Form Visualization

I have already discussed all the important features of

the datasets and their visualization in one of the


above sections. But to conclude my report I would

choose my final model visualization, which is an

improved version of LSTM by fine tuning parameters.

I was very impressed on seeing how close i have

gotten to the actual data, with a mean square error

of just 0.0009. It was an ‘Aha!’ moment for me as i

had to poke around a lot (really ALOT !! :P ). But it

was fun working on this project.

Fig: Plot of Improved Long-Short Term

Memory Model

5.2 Reflection

To recap, the process undertaken in this project:

● Set Up Infrastructure

○ iPython Notebook
○ Incorporate required Libraries (Keras, Tensor flow,

Pandas, Matplotlib, Sklearn, Numpy)

○ Git project organization

● Prepare Dataset

○ Incorporate data of Alphabet Inc company

○ Process the requested data into Pandas

Dataframe

○ Develop function for normalizing data

○ Dataset used with a 80/20 split on training and

test data across all models

● Develop Benchmark Model

○ Set up basic Linear Regression model with Scikit-

Learn
○ Calibrate parameters

● Develop Basic LSTM Model

○ Set up basic LSTM model with Keras utilizing

parameters from Benchmark Model

● Improve LSTM Model

○ Develop, document, and compare results

using additional labels for the LSMT model 5.

Document and Visualize Results

● Plot Actual, Benchmark Predicted Values, and

LSTM Predicted Values per time series

● Analyze and describe results for the report.

I started this project with the hope to learn a

completely new algorithm, i.e, Long-Short Term

Memory and also to explore real time series data


sets. The final model really exceeded my

expectations and has worked remarkably well. I am

greatly satisfied with these results.

The major problem I faced during the

implementation of the project was exploring the

data. It was the toughest task. To convert data from

raw format to preprocess data and then to split them

into training and test data. All of these steps require

a great deal of patience and a very precise approach.

Also i had to work around a lot to successfully use

the data for 2 models, i.e, Linear Regression and

Long-Short Term Memory, as both of them have

different inputs sizes. I read many research papers to


get this final model right and I think it was all worth it

:)

Improvement

Before starting my journey as a Machine Learning

Nanodegree Graduate i had no prior experience in

python. In the beginning of this course to do

everything with python, I had to google it. But now I

have not only made 7 projects in python, I have

explored many libraries along the way and can use

them very comfortably. This is all because of highly

interactive videos and forums provided by Udacity. I

am really happy and satisfied with taking up this

course.
And as there is scope of improvement in each

individual, so is the case with this project. This

project predicts closing prices with very minimum

Mean Squared Error, still there are many things that

are lagging in this project. Two of most important

things are :

● There is no user interaction or interface provided

in this project. A UI can be provided where users can

check the value for future dates.

● The stocks used for this project are only of

Alphabet Inc, we can surely add more S&P 500 in the

list so as to make this project more comprehensive.

I would definitely like to add these improvements to

this project in future.


References:

1 Long Short Term Memory networks

2 S&P 500 companies

3 Alphabet Inc

4 Google Finance python api

5 adjusts the closing prices for us

6 Google.csv

7 Matplotlib

8 Recurrent Neural Network

9 Long-Short Term Memory

10 Linear Model

11 Long Short Term Memory

You might also like