Portfolio Optimization Using Time Series Analysis
Portfolio Optimization Using Time Series Analysis
Analysis
Group ID - 16
Shobhit Sachan Jatin Khandelwal Sourabh Ingale
170677 180325 180777
Introduction
Time series analysis comprises methods for analyzing time series data in order
to extract meaningful statistics and other characteristics of the data. Time series
forecasting is the use of a model to predict future values based on previously
observed values. Time series are widely used for non-stationary data, like
economic, weather, stock price, etc. We will demonstrate the use of time series to
help an investor select the most profitable trades in the stock market and
manage his portfolio to maximum efficiency.
About the Data:
We have the 2017 per minute stock price data of 111 large cap companies listed
on the NSE. That, in turn, translates that we are dealing with 110 different time
series. The data contains the columns: 'Dates', 'stockVWAP', 'futureVWAP',
'bidPrice', 'askPrice', 'total_value', 'total_size'.
Making the Portfolio:
Selecting the Top Performing Stocks:
We started with 111 stocks and split the data of each stock time series into train
and test data frames and operate further on the train set.
After we have the final dataframes to operate upon, we begin computing our
moving average data in a newly formed array. As a healthy habit in finance, to
calculate the moving average, we should resort to using three times the data
points than we are calculating for to maintain a more stable trend to compare
data upon. If we take the time interval of 2hrs, then 360 data points taken into
consideration for the moving average.
The first 360 data points of ‘stock volume weighted average price’ go into the
array as they are. The following data points are a moving average of 360 data
points preceding this position.
Now, we calculate the deviation of each data point’s stock Volume weighted
average price from the moving average taking the spread of ask and bid price,
and brokerage into consideration. We take note of all instances when the
deviation is positive(i.e > 0) and count these instances as the number of trades
and the deviations for Total profit and mean deviation.
Further, we compile this data into a final data frame .
Now when it comes to selecting the top performing stocks, there is no fixed
approach needed. One of the approaches we used was adding the column of
total profit(avg.profit per trade* number of trades) . Now we selected 19 stocks
whose total profit , avg profit and num stocks were greater than the lowest 75%
values ,individually.
Removing Correlated Stocks:
Many of the top performing companies, especially those in the same sector, can
have a high correlation, and our goal here is to create a diversified portfolio. To
do that, we have eliminated stocks having correlation value greater than 0.8. A
diversified portfolio helps in risk aversion.
After this step, we are left with 9 stocks.
Stocklist:
● RELCAPITAL
● TATASTEEL
● SUNTV
● RELINFRA
● BHARATFIN
● ESCORTS
● HINDPETRO
● INFY
● SRTRANSFIN
Visualising the Time Series:
The first step is to visualize the data to understand what type of model we should
use. We will check for the overall trend in our data. Also, look for any seasonal
trends. This is important for deciding which type of model to use. We have used
the predefined library functions to decompose the time series into trend,
seasonal and residual components.
Stationarize the data:
ESCORTS LTD. STOCK DATA, 2017
Definition- {Xt} is said to be a stationary process if for every n, and every
admissible t1,t2, . . . .tn and any integer k, the joint distribution of {Xt1,Xt2,. . . .,Xtn}
is identical to the joint distribution of {Xt1+k,Xt2+k,. . . .,Xtn+k}.
A time series is defined to be stationary if its joint probability distribution is mostly
invariant under translations in time or space. In particular, and of key importance
for traders, the mean and variance of the process do not change over time or
space and they each do not follow a trend.
If a time series is stationary in nature, we observe that the probability distribution
is invariant and hence a lot of factors somewhat constant remain in control and
such a series is easier to work upon for statistical purposes. Hence, calculating
the stationarity of the series becomes important.
There are two ways you can check the stationarity of a time series. The first is by
looking at the data. By visualizing the data it should be easy to identify a
changing mean or variation in the data. For a more accurate assessment there is
the Dickey-Fuller test.
Dickey Fuller Test
In statistics, the Dickey-Fuller Test tests the null hypothesis that a unit root is
present in an autoregressive model. The alternative hypothesis is different
depending on which version of the test is used, but is usually stationarity or
trend-stationarity. Hence our null hypothesis H0 is not stationary against our
alternate hypothesis H1 which is data stationary. The Dickey-Fuller test is testing
if φ=1 in this model of the data:
Yt = α + βt +φyt-1+et
which is written as:
Δyt = yt− yt-1= α + βt + γyt-1+et
Where yt is our data.
It is written this way so we can do a linear regression of Δyt against t and yt-1 and
test if γ is different from 0. If γ=0 , then we have a random walk process. If not
and −1<1+γ<1, then we have a stationary process. We apply the test to check
whether our hypothesis is true or not, we look for a p-value in the test, and if the
p-value is less than a specific significant level often 0.05 or 0.01, we reject our
null hypothesis and thus making our time series stationary.
So now we need to transform the data to make it more stationary. There are
various transformations you can do to stationarize the data.
● Logarithmic - Converts multiplicative patterns to additive patterns and/or
linearize exponential growth. Converts absolute change to percentage changes.
Often stabilizes the variance of data with compound growth, regardless of
whether deflation is also used.
● First Difference - Converts “levels” to “changes”. (Xt -Xt-1)
● Seasonal Difference - Convert “levels” to “seasonal changes”.(Xt -Xt-s) Now we
will apply various transformations recursively until we obtain a stationary time
series according to the Dickey-Fuller test.
Plot the ACF and PACF charts and find the optimal parameters
● Autocorrelation Function (ACF). The plot summarizes the correlation of an
observation with lag values. The x-axis shows the lag and the y-axis shows the
correlation coefficient between -1 and 1 for negative and positive correlation.
● Partial Autocorrelation Function (PACF). The plot summarizes the
correlations for an observation with lag values that is not accounted for by prior
lagged observations. Some useful patterns you may observe on these plots are:
● The model is AR if the ACF trails off after a lag and has a hard cut-off in the
PACF after a lag. This lag is taken as the value for p.
● The model is MA if the PACF trails off after a lag and has a hard cut-off in the
ACF after the lag. This lag value is taken as the value for q.
● The model is a mix of AR and MA if both the ACF and PACF tail off.
We plot the ACF and PACF of the original data.
As it is evident from the ACF chart, the time series does not appear to have a
strong seasonal component to it.
We can see that after applying the Dickey-Fuller Test on our data, the p value
comes out to be significantly greater than 0.05, as a result we accept the null
hypothesis. Hence our data is not-stationary. To make the data stationary first we
apply log difference as a result in order to eliminate 1st order trends, present in
the data.
As it is evident from the differenced data that now there is no trend component
present in it. We already know from the ACF plots that the data does not possess
a seasonal trend.
We confirm stationarity by applying the Dickey-Fuller test:
The p-value is small enough for us to reject the null hypothesis,
and we can consider that the time series is stationary.
ACF and PACF of the stationary data:
On observing all the 9 stocks:
● RELCAPITAL -> Non-Stationary -> ARIMA(6,1,0)
● TATASTEEL -> Non-Stationary -> ARIMA(6,1,0)
● SUNTV -> Non-Stationary -> ARIMA(5,1,0)
● RELINFRA -> Non-Stationary -> ARIMA(1,1,1)
● BHARATFIN -> Non-Stationary -> ARIMA(3,1,0)
● ESCORTS -> Non-Stationary -> ARIMA(3,1,3)
● HINDPETRO -> Non-Stationary -> ARIMA(2,1,2)
● INFY -> Stationary -> ARIMA(1,1,0)
● SRTRANSFIN -> Non-Stationary -> ARIMA(5,1,0)
Model Building:
For the above shown process, the AIC and BIC values came out to be
small and hence it is the required model which came out to be the same as
we anticipated using ACF and PACF plots.
Residual Analysis:
Model Predictions:
Markowitz Portfolio Optimization :-
Portfolio Optimization assists in the selection of the most efficient portfolio by
analyzing various possible portfolios of the given securities/stocks
available.Portfolio Optimization is a process through which we can maximize our
returns with minimum amount of risk.
This model of Portfolio Optimization is also called Mean-Variance due to the fact
that it is based on expected returns (mean) and the standard deviation (variance)
of the various portfolios.
To choose the best portfolio from a number of possible portfolios, each with
different return and risk, two separate decisions are to be made, detailed in the
below sections:
Methodology :-
A portfolio that gives maximum return for a given risk, or minimum risk for given
return is an efficient portfolio. Thus, portfolios are selected as follows:
(a) From the portfolios that have the same return, the investor will prefer the
portfolio with lower risk.
(b) From the portfolios that have the same risk level, an investor will prefer the
portfolio with a higher rate of return.
In the market for portfolios that consist of risky and risk-free securities, we use
the metric called sharpe ratio. The Sharpe Ratio (or Sharpe Index or Modified
Sharpe Ratio) is commonly used to gauge the performance of an investment by
adjusting for its risk. The higher the ratio, the greater the investment return
relative to the amount of risk taken, and thus, the better the investment. The ratio
can be used to evaluate a single stock or investment, or a portfolio. Our goal
here is to assign random weights to the portfolio and aim to maximise the Sharpe
Ratio.