Complete Time Series Analysis in Python 1673057003
Complete Time Series Analysis in Python 1673057003
Hi there!
Let’s take a look at how to work with time series in Python, what
methods and models we can use for prediction; what’s double and
triple exponential smoothing; what to do if stationarity is not you
favorite game; how to build SARIMA and stay alive; how to make
predictions using xgboost. All of this will be applied to (harsh) real
world example.
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 1/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Article outline
1. Introduction
— Basic definitions
— Quality metrics
3. Econometric approach
— Stationarity, unit root
— Getting rid of non-stationarity
— SARIMA intuition and model building
5. Assignment #9
Introduction
Small definition of time series:
Time series — is a series of data points indexed (or listed or graphed) in
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 2/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
time order.
Let’s import some libraries. First and foremost we will need statsmodels
library that has tons of statistical modeling functions, including time
series. For R afficionados (that had to move to python) statsmodels will
definitely look familiar as it supports model definitions like ‘Wage ~
Age + Education’.
As an example let’s use some real mobile game data on hourly ads
watched by players and daily in-game currency spent:
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 3/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 4/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
metrics
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 5/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Out: 116805.0
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 6/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 7/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 8/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
And now let’s create a simple anomaly detection system with the help
of the moving average. Unfortunately, in this particular series
everything is more or less normal, so we’ll intentionally make one of
the values abnormal in the dataframe ads_anomaly
1 ads_anomaly = ads.copy()
2 ads_anomaly.iloc[-20] = ads_anomaly.iloc[-20] * 0.2 # say we
plot_anomalies=True)
plot_anomalies=True)
Oh no! Here is the downside of our simple approach — it did not catch
monthly seasonality in our data and marked almost all 30-day peaks as
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 9/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
an anomaly. If you don’t want to have that many false alarms — it’s best
to consider more complex models.
Out: 98423.0
Exponential smoothing
And now let’s take a look at what happens if instead of weighting the
last nn values of the time series we start weighting all available
observations while exponentially decreasing weights as we move
further back in historical data. There’s a formula of the simple
exponential smoothing that will help us in that:
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 10/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Here the model value is a weighted average between the current true
value and the previous model values. The α weight is called a
smoothing factor. It defines how quickly we will “forget” the last
available true observation. The less α is the more influence previous
model values have, and the smoother the series is.
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 11/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 12/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 13/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 14/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 15/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Below is the code for a triple exponential smoothing model, also known
by the last names of its creators — Charles Holt and his student Peter
Winters. Additionally Brutlag method was included into the model to
build confidence intervals:
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 16/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 17/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
1 class HoltWinters:
2
3 """
4 Holt-Winters model with the anomalies detection using
5
6 # series - initial time series
7 # slen - length of a season
8 # alpha, beta, gamma - Holt-Winters model coefficients
9 # n_preds - predictions horizon
10 # scaling_factor - sets the width of the confidence in
11
12 """
13
14
15 def __init__(self, series, slen, alpha, beta, gamma, n_
16 self.series = series
17 self.slen = slen
18 self.alpha = alpha
19 self.beta = beta
20 self.gamma = gamma
21 self.n_preds = n_preds
22 self.scaling_factor = scaling_factor
23
24
25 def initial_trend(self):
26 sum = 0.0
27 for i in range(self.slen):
28 sum += float(self.series[i+self.slen] - self.s
29 return sum / self.slen
30
31 def initial_seasonal_components(self):
32 seasonals = {}
33 season_averages = []
34 n_seasons = int(len(self.series)/self.slen)
35 # let's calculate season averages
36 for j in range(n_seasons):
37 season_averages.append(sum(self.series[self.sl
38 # let's calculate initial values
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 18/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
39 for i in range(self.slen):
40 sum_of_vals_over_avg = 0.0
41 for j in range(n_seasons):
42 sum_of_vals_over_avg += self.series[self.s
43 seasonals[i] = sum_of_vals_over_avg/n_seasons
44 return seasonals
45
46
47 def triple_exponential_smoothing(self):
48 self.result = []
49 self.Smooth = []
50 self.Season = []
51 self.Trend = []
52 self.PredictedDeviation = []
53 self.UpperBond = []
54 self.LowerBond = []
55
56 seasonals = self.initial_seasonal_components()
57
58 for i in range(len(self.series)+self.n_preds):
59 if i == 0: # components initialization
60 smooth = self.series[0]
61 trend = self.initial_trend()
62 self.result.append(self.series[0])
63 self.Smooth.append(smooth)
64 self.Trend.append(trend)
65 self.Season.append(seasonals[i%self.slen])
66
67 self.PredictedDeviation.append(0)
68
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 19/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 20/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 21/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
1 %%time
2 data = ads.Ads[:-20] # leave some data for testing
3
4 # initializing model parameters alpha, beta and gamma
5 x = [0, 0, 0]
6
7 # Minimizing the loss function
8 opt = minimize(timeseriesCVscore, x0=x,
9 args=(data, mean_squared_log_error),
10 method="TNC", bounds = ((0, 1), (0, 1), (0,
11 )
12
13 # Take optimal values...
14 alpha_final, beta_final, gamma_final = opt.x
15 print(alpha_final, beta_final, gamma_final)
16
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 22/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 23/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
1 plt.figure(figsize=(25, 5))
2 plt.plot(model.PredictedDeviation)
3 plt.grid(True)
4 plt.axis('tight')
We’ll apply the same algorithm for the second series which, as we
know, has trend and 30-day seasonality
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 24/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
1 %%time
2 data = currency.GEMS_GEMS_SPENT[:-50]
3 slen = 30 # 30-day seasonality
4
5 x = [0, 0, 0]
6
7 opt = minimize(timeseriesCVscore, x0=x,
8 args=(data, mean_absolute_percentage_error,
9 method="TNC", bounds = ((0, 1), (0, 1), (0,
10 )
11
12 alpha_final, beta_final, gamma_final = opt.x
13 print(alpha_final, beta_final, gamma_final)
14
Looks quite adequate, model has caught both upwards trend and
seasonal spikes and overall fits our values nicely
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 25/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Econometric approach
Stationarity
Before we start modeling we should mention such an important
property of time series as stationarity.
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 26/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
• The red graph below is not stationary because the mean increases
over time.
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 27/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
1 white_noise = np.random.normal(size=1000)
2 with plt.style.context('bmh'):
3 plt.figure(figsize=(15, 5))
4 plt plot(white noise)
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 28/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 29/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
On the first chart you can see the same stationary white noise you’ve
seen before. On the second one the value of ρρ increased to 0.6, as a
result wider cycles appeared on the chart but overall it is still
stationary. The third chart deviates even more from the 0 mean but still
oscillates around it. Finally, the value of ρ equal to 1 gives us a random
walk process — non-stationary time series.
This happens because after reaching the critical value the series
x(t)=ρ*x(t−1)+e(t) does not return to its mean value. If we subtract
x(t−1) from the left and the right side we will get x(t)−x(t−1)=
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 30/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 31/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 32/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 33/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
ARIMA-family Crash-Course
A few words about the model. Letter by letter we’ll build the full name
— SARIMA(p,d,q)(P,D,Q,s), Seasonal Autoregression Moving Average
model:
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 34/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Adding this letter to four previous gives us ARIMA model which knows
how to handle non-stationary data with the help of nonseasonal
differences. Awesome, last letter left!
After attaching the last letter we find out that instead of one additional
parameter we get three in a row — (P,D,Q)
• Q — same logic, but for the moving average model of the seasonal
component, use ACF plot
Now, knowing how to set initial parameters, let’s have a look at the
final plot once again and set the parameters:
tsplot(ads_diff[24+1:], lags=60)
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 35/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
• p is most probably 4, since it’s the last significant lag on PACF after
which most others are becoming not significant.
Now we want to test various models and see which one is better
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 36/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Out: 54
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 37/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Well, it’s clear that the residuals are stationary, there are no apparent
autocorrelations, let’s make predictions using our model
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 38/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 39/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Feature exctraction
Alright, model needs features and all we have is a 1-dimentional time
series to work with. What features can we exctract?
Window statistics:
• Window variance
• etc.
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 40/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Target encoding
Let’s run through some of the methods and see what we can extract
from our ads series
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 41/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 42/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
39 deviation = np.sqrt(cv.std())
40
41 lower = prediction - (scale * deviation)
42 upper = prediction + (scale * deviation)
43
44 plt.plot(lower, "r--", label="upper bond / lower bo
45 plt.plot(upper, "r--", alpha=0.5)
46
47 if plot_anomalies:
48 anomalies = np.array([np.NaN]*len(y_test))
49 anomalies[y_test<lower] = y_test[y_test<lower]
50 anomalies[y_test>upper] = y_test[y_test>upper]
51 plt.plot(anomalies, "o", markersize=10, label =
52
53 error = mean_absolute_percentage_error(prediction, y_te
54 plt.title("Mean absolute percentage error {0:.2f}%".for
55 plt.legend(loc="best")
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 43/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Well, simple lags and linear regression gave us predictions that are not
that far from SARIMA in quality. There are lot’s of unnecessary
features, but we’ll do feature selection a bit later. Now let’s continue
engineering!
We’ll add into our dataset hour, day of the week and boolean for the
weekend. To do so we need to transform current dataframe index into
datetime format and exctract hour and weekday out of it.
1 data.index = data.index.to_datetime()
2 data["hour"] = data.index.hour
3 data["weekday"] = data.index.weekday
4 data['is weekend'] = data weekday isin([5 6])*1
1 plt.figure(figsize=(16, 5))
2 plt.title("Encoded features")
3 data.hour.plot()
4 data.weekday.plot()
5 data.is_weekend.plot()
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 44/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Blue spiky line — hour feature, green ladder — weekday, red bump — weekends!
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 45/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Test error goes down a little bit and judging by the coefficients plot we
can say that weekday and is_weekend are rather useful features
Target encoding
I’d like to add another variant of encoding categorical variables — by
mean value. If it’s undesirable to explode dataset by using tons of
dummy variables that can lead to the loss of information about the
distance, and if they can’t be used as real values because of the conflicts
like “0 hours < 23 hours”, then it’s possible to encode a variable with a
little bit more interpretable values. Natural idea is to encode with the
mean value of the target variable. In our example every day of the week
and every hour of the day can be encoded by the corresponding
average number of ads watched during that day or hour. It’s very
important to make sure that the mean value is calculated over train set
only (or over current cross-validation fold only), so that the model is
not aware of the future.
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 46/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 47/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 48/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 49/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
current series state better. Or we can just drop it manually, since we're
sure here it makes things only worse.
Second model — Lasso regression, here we add to the loss function not
squares but absolute values of the coefficients, as a result during the
optimization process coefficients of unimportant features may become
zeroes, so Lasso regression allows for automated feature selection. This
regularization type is called L1.
First, make sure we have things to drop and data truly has highly
correlated features
1 plt.figure(figsize=(10, 8))
2 sns.heatmap(X_train.corr());
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 50/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 51/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
We can clearly see how coefficients are getting closer and closer to zero
(thought never actually reach it) as their importance in the model
drops
1 lasso = LassoCV(cv=tscv)
2 lasso.fit(X_train_scaled, y_train)
3
4 plotModelResults(lasso,
5 X_train=X_train_scaled,
6 X_test=X_test_scaled,
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 52/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Boosting
Why not try XGBoost now?
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 53/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Here is the winner! The smallest error on the test set among all the
models we’ve tried so far.
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 54/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
Yet this victory is decieving and it might not be the brightest idea to fit
xgboost as soon as you get your hands over time series data. Generally
tree-based models poorly handle trends in data, compared to linear
models, so you have to detrend your series first or use some tricks to
make the magic happen. Ideally — make the series stationary and then
use XGBoost, for example, you can forecast trend separately with a
linear model and then add predictions from xgboost to get final
forecast.
Conclusion
We got acquainted with different time series analysis and prediction
methods and approaches. Unfortunately, or maybe luckily, there’s no
silver bullet to solve this kind of problems. Methods developed in the
60s of the last century (and some even in the beginning of the XIX
century) are still popular along with the LSTM and RNN (not covered
in this article). Partially this is related to the fact that the prediction
task as any other data related task is creative in so many aspects and
definitely requires research. In spite of the large number of formal
quality metrics and approaches to parameters estimation, it’s often
required to seek and try something different for each time series. Last
but not least the balance between quality and cost is important. As a
good example SARIMA model mentioned here not once or twice can
produce spectacular results after due tuning but might require many
hours of tambourine dancing time series manipulation, as in the same
time simple linear regression model can be build in 10 minutes giving
more or less comparable results.
Assignment #9
Full versions of assignments are announced each week in a new run of
the course (October 1, 2018). Meanwhile, you can practice with a
demo version: Kaggle Kernel, nbviewer.
Useful resources
• Online textbook of the advanced statistical forecasting course of
the Duke University — covers in details various smoothing
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 55/57
5/30/2019 Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
. . .
https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3 56/57