0% found this document useful (0 votes)

65 views30 pages

An Overview of Practical Time Series Forecasting Using Pytho

Uploaded by

Lawrence Owusu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views30 pages

An Overview of Practical Time Series Forecasting Using Pytho

Uploaded by

Lawrence Owusu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Practical Time Series Forecasting with Python,

and SARIMAX

An Ultimate Guide to hands-on-practical time series

implementation using Python, and algorithms like SARIMAX

Aditya Kaushal
Copyright
Copyright © 2021 by Aditya Kaushal

All rights reserved. No part of this Book may be reproduced, distributed, or

transmitted in any form or means, including photocopying, recording, or
any other electronic or mechanical methods, without prior written
permission of the published, except in the case of brief quotations embodied
in critical reviews and certain other non-commercial uses by the copyright
law.

For feedback, media contact, omissions or errors regarding this book, please
contact the author at: [email protected]
Table of Contents
Preface
Prerequisites
What’s in it for you?
Let’s get started
Objective
Methodology/Tools Deployed:
Dataset
Installing essential Libraries
Importing Raw Data
Visualising Imported Dataset
Data Pre-processing
Checking for NULL values
Re-sampling the data from 30-second to 1 hour Interval
Decomposing the Time Series
What is Additive and Multiplicative?
What is Grid Search and AIC?
Train and Test
Afterword

Preface

This is a short book to show the readers how to build a Time Series Model
using mathematical models, Python and concepts of statistics to predict
real-time air quality in a local mapped area by using open source data from
OpenAQ sensors that measures Air pollution metrics. Currently OpenAQ is
collecting data in many different countries and primarily aggregate PM2.5,
PM10, ozone (O3), sulphur dioxide (SO2), and many more additional
metrics. The main objective of this book is to teach the readers how to build
a Python project to forecast and monitor air pollution to track personal
exposure to PM 2.5.

Prerequisites

The readers are expected to have a basic understanding and hands-on-

experience with Python Programming knowledge. You should be familiar
with the concept of time series forecasting. The concept of Time Series
forecasting requires you to have some basic or beginner level knowledge of
statistics and mathematical concepts like Averages, and Moving Averages.

What’s in it for you?

Most articles and online blogs only show you how to forecast for a very
short period of time. Further, there are lots of things that are missing in
between which helps to correlate with some of the information in the blog
to have a broader understanding of the whole concept. This book’s intent is
to deliver a concise and most easy way to have a forecast which is reliable
and effective in terms of making good decisions based on the insights
extracted.

At the end of the book, you will have a good understanding of SARIMAX
Algorithm to make a good forecast Particulate Matter 2.5 (PM 2.5) similar
to what Sci-kit – Learn regression algorithms provide.
Let’s get started

This book would straightaway deep dive into the implementation with the
code snippets and all the visualizations. It is an requirement to be familiar
with Python, Scikit Libraries (NumPy, Pandas, Seaborn, Matplotlib), and
other miscellaneous libraries. Other than that, it is good to have a good
understanding of mathematical concepts like Moving Averages and other
statistical concepts. For any questions and comments, feel free to drop me a
message at: [email protected]

Objective

The main Objective of the Book is to Prototype a model suited for

forecasting PM 2.5. The aim of this Book is to help you understand and
develop a system to predict future values of Particulate Matter 2.5, so to
generate quality insights. This e-book consists of content which helps the
readers to understand and analyse the various fluctuations in PM 2.5. This
Book will help you to have an overview of mathematical tools like moving
averages, Seasonal Auto Regressive Integrated Moving Average, Time
Series Analysis, Python Programming Language and it’s frameworks like
Flask.

Methodology/Tools Deployed

The utilization of NumPy, Pandas, Matplotlib, Seaborn, Time Series

Forecasting Algorithms like (SARIMAX) Statistical Components, Tableau
and Python will help you to gain practical exposure to implement a full-
fledged Flask web application to forecast air quality.

Dataset
Before starting we need to have the historical and trustable data in hand to
make predictions. OpenAQ is an open source non-profit organization
empowering communities around the globe to clean their air by sharing and
using open air quality data. We can download the raw data in csv formats
from their website.

Fig. RK Puram, New Delhi, India PM 2.5 Air Quality measurement

Fig. Different Air Quality metrics of the location

Fig. Selecting options
Fig. Select the metric you want in your dataset

Installing essential Libraries

After downloading the dataset we need to clean, perform some data

wrangling, data imputations, filling null values and do some data
corrections. We also need to download some python libraries such as
NumPy, and Pandas. Further, we need to install matplotlib to have a visual
understanding of the data. Run the below mentioned command in your
terminal.
1. NumPy is a library for the Python Programming language, adding support for
large, multidimensional, arrays and matrices, along with a large collection of high-
level mathematical functions to operate on these datasets.
2. Pandas helps to manipulate your data by facilitating operations such as selecting,
replacing columns and indices or reshaping your data.
3. Matplotlib helps to visualize your data using different types of charts and plots to
have a visual understanding.
4. Statsmodel is a Python package that allows users to explore data, estimate
statistical models, and perform statistical tests. An extensive list of descriptive
statistics, statistical tests, plotting functions, and result statistics are available for
different types of data and each estimator.
5. Prophet is a forecasting procedure implemented in Python and R. It is fast and
provides completely automated forecasts that can be tuned by hand by data
scientists and analysts.

That’s all we need to start, and we can begin with importing the data and
start with our model creation.

Importing Raw Data

In this section we will import all the libraries and the raw data.
All the above libraries would help to preprocess the data to make it usable
for forecasting. The raw data contains lots of anomalies such as empty
rows, NULL values, and string data which is not usable to a time series
model.

Caveats #1: In this book a different set of dataset is being used which
cannot be shared with the readers due to copyright and privacy reasons.
Due to this it is recommended that the readers download the data from
OpenAQ platform for their project purposes and follow the guide along this
book to create forecasting model for their own purposes.

The next step is to import the data using the below mentioned snippet of
code. We have utilised the Pandas library to convert the raw csv datasheet
into a Pandas Data-frame to perform data pre-processing.

The pandas . read_csv() function is used to convert the csv file into a Data-
frame. The ‘ device41.csv’ is the name of the csv file. The dataset contains
the PM 2.5 values corresponding to the datetime values i.e., the datetime
needs to be converted into a proper format which can be deciphered by
python matplotlib and statsmodel statistical functions . Hence we have
used the parameter parse_dates = [‘Datetime’].

Caveat #2: The data downloaded from the OpenAQ platform would be
having different columns. This is an exercise for the reader to manipulate
the data-frame which would be suitable for forecasting purposes. The
reader should find a way to only have two columns i.e., Datetime and
PM2.5 values.

Remove all the columns that you don’t need in the dataset. This is a choice
that is left to the readers needs and purposes. This can be unique and does
not affect the overall the objective of the project.

After importing the data we can see the top 5 rows for verifying if the
dataset is as per our desired format.
Fig. Pm2.5 data-frames

Visualizing Imported Dataset

Fig. PM2.5 plots.

The matplotlib library would help us understand the overall fluctuations of
the PM 2.5 values in our dataset. In this plot, as we can see, the X – axis
represents the DateTime and the Y-Axis represents the PM 2.5 values. In
this case the PM 2.5 values have a lot of fluctuations.

Before moving forward we need to understand about seasonality and trends.

Seasonality : Seasonality is a characteristic of a time series model where
the dataset experiences frequent and predictable changes that recur every
year. Any predictable fluctuation or pattern that recurs or repeats over a
one-year period is said to be seasonal.

e.g. Ice-cream sales can be the best example to explain seasonality. Ice
cream sales are usually higher in the summer seasons as compared to the
winter seasons. So, based upon the actual ice-cream sales there can be a
predictable seasonal fluctuation in the sales. Seasonality is often
predictable.

Trends: The trend is the component of a time series that represents

variations of low frequency in a time series, the high and medium frequency
fluctuations having been filtered out.

e.g. Trends can be often seen in the stock market. If a group of people
predicts that a particular stock is going to be profitable, then this can spread
like a wildfire. This can lead to a increase in buying of that particular stock.
So, based upon that speculation we would be seeing a trend in the price of
that stock. Trends can be of many types, which is beyond the scope of this
book.

Fig. Seasonality and Trends.

Data Pre-processing

This process is usually the most important and usually overlooked. Data
Pre-processing is that part which can determine the quality/reliability of
your forecasts. Make sure you spend enough time in determining that your
data looks pristine and reliable after pre-processing. This means that the
dataset should not have any NULL values, no string values which are going
to be considered as an input to the model, no huge variations in the
numbers, and the datetime should be absolutely consistent in its spacing.

The data pre-processing for this dataset has to go from certain quality
checkmarks.
1. Consistency in Datetime
2. Variation in PM 2.5 values

The values should be consistent overall, all the outliers should be removed
from the dataset. This is because there can be certain days where the PM
2.5 values was very low/high. This can be due to a lot of reasons. But we
have to make sure that we remove all the outliers from the dataset.

We also need to resample the data over a time period i.e., 15 minutes, 30
minutes, 1 hour, 2 hour, or any specific time period as per our wish. This is
due to the reason as we can then specify the exact time in the model for
which we have to forecast the PM 2.5 values. If we want to have forecast of
PM 2.5 every 1 hour, then we need to resample the data for every hour.

Checking for NULL values

For the model to consider the datetime we would need to convert the
datetime column as an index of the dataset.

The to_index() function would make the DateTime column as the index of
the dataset. This is necessary as the model requires the index of the dataset
to be as a datetime column.

Fig. DateTime index

Re-sampling the data from 30-second to 1

hour Interval
The above mentioned snippet of code would resample or would take a
rolling mean of all the combined 30 second data evaluating as 1 hour and
consider that values as the average pm 2.5 value in that hour. In simpler
terms this means the average of pm 2.5 in every 30 second would be adding
PM 2.5 values for 120 times and dividing it by 120. This would convert the
data from 30 second interval to hourly interval basis. This would also help
us to reduce the size of the dataset to our needs.

The fillna() method is used to fill any null values if there are any after the
resampling is done. The ffill is used as a forward fill which is used for
telling the fillna() function to fill the null value by taking the average of the
next two value after the null values.

Fig. Dataset
We can save the newly resampled data into a new data-frame variable.

This would save all the resampled values inside the df_hrs variable.

Fig. Rolling Mean of PM2.5 values.

From the above image we can see that all the values have been resampled
and have been converted on an hourly basis. To see the data on an hourly
basis we need to slice the data.

You can zoom into the data using the slice operator. The datetime column is
an index column so you have to slice the index and then plot that sliced
range.

Decomposing the Time Series

A ny time series is composed of two things
1. Seasonality
2. Trends
By the help of statsmodel library we can break the time series into its
seasonal pattern and trends. This will help us to understand the data clearly
and will help us to make more sense of the data.

Let’s decompose the data using the statsmodel library.

The above snippet of code would help us to decompose the data into an
additive seasonal and trend patterns. The time series can be an additive or
multiplicative of its seasonal and trend component. I have explained the
additive and multiplicative time series.

Fig. The pm 2.5 is at the peak in the afternoon and at the lowest in the morning. This
pattern can be identified as seasonality when forecasting air quality.
What is Additive and Multiplicative?

There are three components to a time series:

1. Trend : Trend tells you how things are overall changing.
2. Seasonality: Seasonality shows you how things change within a given
period e.g., a year, month, week, day.
3. Residual: The Error/residual/irregular activity are the anomalies which
cannot be not explained by the trend or the seasonal value .

In a multiplicative time series , the components multiply together to make

the time series. If you have an increasing trend, the amplitude of seasonal
activity increases. Everything becomes more exaggerated. This is common
when you’re looking at web traffic.

In an additive time series , the components add together to make the time
series . If you have an increasing trend, you still see roughly the same size
peaks and troughs throughout the time series. This is often seen in indexed
time series where the absolute value is growing but changes stay relative.

The most commonly and recommended methods used in time series

forecasting is known as ARIMA model. In this book I have used an
extended version of ARIMA model known as SARIMAX ( Seasonal Auto
Regressive Integrated Moving Averages with exogenous factors ) model.
The SARIMAX model is used when the data sets have seasonal cycles . In
the datasets concerning air quality/pm2.5 there is an seasonal pattern which
I have explained in the above section.

ARIMA is a model that can be fitted to time series data in order to better
understand or predict future points in the series.

In the above snippet of code we are finding the right p, d, and q parameters
to correctly forecast and predict the pm2.5 values. These values are crucial
and have to be near ideal to have reliable forecasts.

There are three distinct integers (p, d, q) that are used to parametrize
ARIMA models. Because of that, ARIMA models are denoted with the
notation ARIMA(p, d, q). Together these three parameters account for
seasonality, trend, and noise in datasets:
p is the auto-regressive part of the model. It allows us to incorporate the
effect of past values into our model. Intuitively, this would be similar to
stating that it is likely to be warm tomorrow if it has been warm the past
3 days.

d is the integrated part of the model. This includes terms in the model
that incorporate the amount of differencing (i.e. the number of past time
points to subtract from the current value) to apply to the time series.
Intuitively, this would be similar to stating that it is likely to be same
temperature tomorrow if the difference in temperature in the last three
days has been very small.

q is the moving average part of the model. This allows us to set the error
of our model as a linear combination of the error values observed at
previous time points in the past.
Fig. AIC Grid Search Values
This above method is also known as the grid search method for finding the
right p,d,q values that would be given as an input to the SARIMAX time
series model.

What is Grid Search and AIC?

Grid search is a tuning technique that attempts to compute the optimum

values of hyperparameters.

We have to find the lowest AIC values which would have the best
corresponding p,d,q values to have the best forecast of PM 2.5 values.

In my case the best AIC value was ARIMA(1, 1, 1)x(0, 1, 1, 12)12 -

AIC:1781.133929163659

The Akaike Information Criteria (AIC ) is a widely used measure of a

statistical model. It basically quantifies 1) the goodness of fit , and 2) the
simplicity/parsimony , of the model into a single statistic.
When comparing two models, the one with the lower AIC is generally
“better”.
Fig. Summary Tables

The above snippet of code would help them to fit the dataset to the
SARIMAX model. As seen in the 1 st line in the above code we have used
the p,d,q values that we searched using the grid search method.

Then the results = mod.fit() is used to fit the model. The table above will
show you all the statistical variables such as the Z score, P values and
standard errors.

Fig. Summary Plots.

Train and Test

The only step left is the verification/testing of the model that we just
created. We have to split the data into a train and test dataset. This will help
us to actually verify that the result are somewhat reliable.

To split the data, it is recommended to split it in 70:30 ratio . 70% of the

data is the training data , and 30% of the data is the testing data.

The training data is the dataset which is used to train the model. The model
will be trained on the patterns/fluctuations existing in the training dataset.
Whereas, the testing dataset is the unseen data. The model has to
predict/forecast the values based on the training data. If the forecasted data
overlaps the testing data values then we can say that forecast can
predict/forecast reliable and trustable values.

The above code is used to get the predicted values after we have created the
model. The .get_prediction() method is used to get the predicted values
based on the datetime you have mentioned in the start parameter.
Fig. Forecasted and Observed values.
The above graph is the clear example and evidence that the testing data and
training data (observed and Predicted) are overlapping which I discussed in
the above section. This means that the forecasting model is overall
performing as it should.
The next step is to create a separate data-frame which would be helpful to
compare the true test data and the predicted values by using the mean
square error estimation.

The above snippet of code would be helpful to calculate the mean square
error.

As we are approaching towards the end of the project we can see that the
above code is responsible to forecast/predict the next 7 values. The variable
results contains actual model information which is discussed and mentioned
in the above sections.

The . get_forecast() method is responsible to take the information about the

model from the variable results, and then based upon the observation of the
various patterns it would generate the required forecast value.
The final part is to create the actual plot which would then make all the
sense of this long project. As you can see, the observed and the forecasted
values in the plot i.e., the blue line is the observed values and the orange
lines represents the values which are predicted using the SARIMAX Time
Series model. The shaded region tells that it is within 95% confidence
interval.

The readers can even extend this project by creating a web app or mobile
Application. The below charts can be plotted using various web plotting
libraries such as Plotly.

As I said earlier that with the help of this book you can learn how to
forecast up to a significant time. But, there are some caveats to this:
1. Air Quality is subjected to external factors which are uncontrollable
and natural such as weather, wind speed, temperature, and pressure.
You also need to find out the correlation between these variables. But
overall, the forecast can give you a general sense of how the value
would fluctuate.
2. The forecast reliability also depends upon the algorithm used.

You can alter the steps parameter in the get_prediction(step = ‘’) method
to any desired value. But, be careful and study the properties of the
values/metrics you are forecasting. Sometime, the values are only good
until forecasted up to a certain step.
Fig. Forecasted values

Afterword

This e-book is relatively very short to read. The purpose of this book was to
help to make the readers finish this book in one sitting. I hope you have
enjoyed the book so far. There are so many other methods out there like
Facebook Prophet, ARIMA, even supervised learning algorithms such as
Linear regression. All these above algorithms and methods can also give
you decent results. Time Series data is everywhere around us. Once again
thank you for the purchase. I hope this book helped you in creating the
project and having a better understanding of the forecasting algorithms.

The Westminster Doctrine of Predestination
100% (1)
The Westminster Doctrine of Predestination
140 pages
PHYSICS PAPER 1 MOCK 2024 SCHEME
No ratings yet
PHYSICS PAPER 1 MOCK 2024 SCHEME
6 pages
HDD (Horizontal Directional Drilling) WORK: Project: Report No: Client: Location: Consultant: Date
80% (10)
HDD (Horizontal Directional Drilling) WORK: Project: Report No: Client: Location: Consultant: Date
7 pages
Flight Without Formulae 158p PDF
No ratings yet
Flight Without Formulae 158p PDF
158 pages
2021 New - Template Agro Bali BAHASA INDONESIA & ENGLISH
No ratings yet
2021 New - Template Agro Bali BAHASA INDONESIA & ENGLISH
9 pages
Eliwell 978 Manual
No ratings yet
Eliwell 978 Manual
12 pages
DETERMINANTS 2nd-Order
No ratings yet
DETERMINANTS 2nd-Order
18 pages
Pandas
No ratings yet
Pandas
167 pages
Practical Data Science
No ratings yet
Practical Data Science
121 pages
Program Enrollment Test Quiz _ WorldQuant University
No ratings yet
Program Enrollment Test Quiz _ WorldQuant University
4 pages
PT Jasa Guna Cemerlang - Company Profile - March 2022
No ratings yet
PT Jasa Guna Cemerlang - Company Profile - March 2022
15 pages
MATLAB
No ratings yet
MATLAB
217 pages
Flynn and Giraldez - Born With A "Silver Spoon" (The Origin of World Trade in 1571)
No ratings yet
Flynn and Giraldez - Born With A "Silver Spoon" (The Origin of World Trade in 1571)
22 pages
Linear Programming: Optimization), Comparative Statics, and Dynamics, Let Us Return To The Problem of
No ratings yet
Linear Programming: Optimization), Comparative Statics, and Dynamics, Let Us Return To The Problem of
40 pages
Modern Physics - Top 500 Question Bank For JEE Main by MathonGo
No ratings yet
Modern Physics - Top 500 Question Bank For JEE Main by MathonGo
35 pages
Classifier - Separator Inspection Form
100% (2)
Classifier - Separator Inspection Form
1 page
1745516832930-Pandas-Handbook
No ratings yet
1745516832930-Pandas-Handbook
33 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Envelope Design
No ratings yet
Envelope Design
53 pages
REVIEWER-IN-EARTH-SCIENCE-QUARTER-1
No ratings yet
REVIEWER-IN-EARTH-SCIENCE-QUARTER-1
9 pages
Lecture8_Quantity Take Off and Cost Calculation
No ratings yet
Lecture8_Quantity Take Off and Cost Calculation
25 pages
AFRICAN STUDIES PRESENTATION Unit 5 Session 4
No ratings yet
AFRICAN STUDIES PRESENTATION Unit 5 Session 4
11 pages
Pingara Horticulture Farmer Producer
No ratings yet
Pingara Horticulture Farmer Producer
21 pages
4 5936176936635074808
No ratings yet
4 5936176936635074808
4 pages
The Pregnant Trauma Patient
No ratings yet
The Pregnant Trauma Patient
77 pages
phase 3
No ratings yet
phase 3
23 pages
Bajaj
No ratings yet
Bajaj
3 pages
Standard GSD
No ratings yet
Standard GSD
11 pages
Project Integration Management
No ratings yet
Project Integration Management
9 pages
Python (Advanced)
No ratings yet
Python (Advanced)
84 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
OOPs With Java - Introduction To Java
No ratings yet
OOPs With Java - Introduction To Java
53 pages
Buy Ebook Data Analysis With Python and PySpark (MEAP V07) Jonathan Rioux Cheap Price
100% (1)
Buy Ebook Data Analysis With Python and PySpark (MEAP V07) Jonathan Rioux Cheap Price
62 pages
3 2 InverseLaplace Summary
No ratings yet
3 2 InverseLaplace Summary
19 pages
ECC3012 TOPIC3 Ode2order
No ratings yet
ECC3012 TOPIC3 Ode2order
32 pages
ML - Full Slides Srikanth Allamshatty
No ratings yet
ML - Full Slides Srikanth Allamshatty
369 pages
Python Challenge
No ratings yet
Python Challenge
10 pages
Reactions of Halogens (As Aqueous Solutions)
No ratings yet
Reactions of Halogens (As Aqueous Solutions)
4 pages
Advance Python Sheet 1696337837
No ratings yet
Advance Python Sheet 1696337837
237 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Language of Creation
98% (51)
Language of Creation
654 pages
Po Box 427 Donald, or 97020-0427
No ratings yet
Po Box 427 Donald, or 97020-0427
2 pages
Matlab To Numpy PDF
No ratings yet
Matlab To Numpy PDF
14 pages
22am901 Data Science Using Python Unit 2
No ratings yet
22am901 Data Science Using Python Unit 2
116 pages
Advanced Data Analytics Using Python - Unit II
No ratings yet
Advanced Data Analytics Using Python - Unit II
57 pages
Rituals of Hamzad Muakil &amp Fairy
92% (26)
Rituals of Hamzad Muakil &amp Fairy
13 pages
Technical Interview Questions For Freshers - With Answers (2024)
No ratings yet
Technical Interview Questions For Freshers - With Answers (2024)
7 pages
R For MATLAB Users - Mathesaurus
No ratings yet
R For MATLAB Users - Mathesaurus
12 pages
Neelesh Sir Year Book-merged
No ratings yet
Neelesh Sir Year Book-merged
144 pages
Oop Assignment
No ratings yet
Oop Assignment
16 pages
Multivariate Time Series Forecasting With LSTMs in Keras
No ratings yet
Multivariate Time Series Forecasting With LSTMs in Keras
20 pages
Pandas - PySpark Equivalents-1
No ratings yet
Pandas - PySpark Equivalents-1
3 pages
Edexcel A2 IAL Biology: Topic 5 - On The Wild Side
100% (1)
Edexcel A2 IAL Biology: Topic 5 - On The Wild Side
17 pages
Pytorch: Tensors and Datasets
No ratings yet
Pytorch: Tensors and Datasets
9 pages
Python Interview Questions
No ratings yet
Python Interview Questions
61 pages
Python (3) Leaflet: Roland Becker December 16, 2020
No ratings yet
Python (3) Leaflet: Roland Becker December 16, 2020
15 pages
Evolution Kuraev 3
No ratings yet
Evolution Kuraev 3
22 pages
Python Guide
No ratings yet
Python Guide
162 pages
Time Series
No ratings yet
Time Series
31 pages
Numerical Analysis For Engineer - 1
No ratings yet
Numerical Analysis For Engineer - 1
18 pages
Pandas Visualisation
No ratings yet
Pandas Visualisation
27 pages
Pyspark Code
No ratings yet
Pyspark Code
3 pages
Planning and Organizing Hospital Units and Ancillary Services
100% (1)
Planning and Organizing Hospital Units and Ancillary Services
33 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Data Visualization - Getting Started With Plotly
No ratings yet
Data Visualization - Getting Started With Plotly
37 pages
Jupyter Installation
100% (1)
Jupyter Installation
19 pages
ReactJS - CredoSystems
No ratings yet
ReactJS - CredoSystems
14 pages
Py Spark
No ratings yet
Py Spark
427 pages
Matlab Python Xref
No ratings yet
Matlab Python Xref
17 pages
Data Mining With Py Draft PDF
No ratings yet
Data Mining With Py Draft PDF
103 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
Aws Three Practical Use Cases With Databricks Ebook v5 101221
No ratings yet
Aws Three Practical Use Cases With Databricks Ebook v5 101221
34 pages
2018 02 08 Whats New in Apache Spark 2 180213220045
No ratings yet
2018 02 08 Whats New in Apache Spark 2 180213220045
57 pages
Pytorch Lightning Manual Readthedocs Io English May2020
No ratings yet
Pytorch Lightning Manual Readthedocs Io English May2020
562 pages
Notes On ARIMA: ND RD
No ratings yet
Notes On ARIMA: ND RD
4 pages
7 Time Series Datasets For Machine Learning
No ratings yet
7 Time Series Datasets For Machine Learning
8 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
Numpy-User-1 10 1
No ratings yet
Numpy-User-1 10 1
107 pages
Introduction To Data Visualization in Python
No ratings yet
Introduction To Data Visualization in Python
16 pages
Dataeng-Zoomcamp - 5 - Batch - Processing - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
No ratings yet
Dataeng-Zoomcamp - 5 - Batch - Processing - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
41 pages
Tutorial 6 PDF
No ratings yet
Tutorial 6 PDF
3 pages
XL Wings
No ratings yet
XL Wings
214 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
Python Date Time
No ratings yet
Python Date Time
6 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Ultimate Enterprise Data Analysis and Forecasting using Python
From Everand
Ultimate Enterprise Data Analysis and Forecasting using Python
Shanthababu Pandian
No ratings yet
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
From Everand
Data Driven Guide for Python Programming : Master Essentials to Advanced Data Structures
Younes Hamdani
No ratings yet
Data Manipulation with Python Step by Step: A Practical Guide with Examples
From Everand
Data Manipulation with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
From Everand
Python Natural Language Processing Cookbook: Over 60 recipes for building powerful NLP solutions using Python and LLM libraries
Zhenya Antić
No ratings yet
Mathematica Data Analysis
From Everand
Mathematica Data Analysis
Suchok Sergiy
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
KNIME Essentials
From Everand
KNIME Essentials
Gábor Bakos
No ratings yet
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
From Everand
Hands-on Data Analysis and Visualization with Pandas: Engineer, Analyse and Visualize Data, Using Powerful Python Libraries
PURNA CHANDER RAO. KATHULA
5/5 (1)

An Overview of Practical Time Series Forecasting Using Pytho

Uploaded by

An Overview of Practical Time Series Forecasting Using Pytho

Uploaded by

Practical Time Series Forecasting with Python,

An Ultimate Guide to hands-on-practical time series

All rights reserved. No part of this Book may be reproduced, distributed, or

The readers are expected to have a basic understanding and hands-on-

What’s in it for you?

The main Objective of the Book is to Prototype a model suited for

The utilization of NumPy, Pandas, Matplotlib, Seaborn, Time Series

Fig. RK Puram, New Delhi, India PM 2.5 Air Quality measurement

Fig. Different Air Quality metrics of the location

Installing essential Libraries

After downloading the dataset we need to clean, perform some data

Importing Raw Data

Visualizing Imported Dataset

Fig. PM2.5 plots.

Before moving forward we need to understand about seasonality and trends.

Trends: The trend is the component of a time series that represents

Fig. Seasonality and Trends.

Checking for NULL values

Fig. DateTime index

Re-sampling the data from 30-second to 1

Fig. Rolling Mean of PM2.5 values.

Decomposing the Time Series

Let’s decompose the data using the statsmodel library.

There are three components to a time series:

In a multiplicative time series , the components multiply together to make

The most commonly and recommended methods used in time series

What is Grid Search and AIC?

Grid search is a tuning technique that attempts to compute the optimum

In my case the best AIC value was ARIMA(1, 1, 1)x(0, 1, 1, 12)12 -

The Akaike Information Criteria (AIC ) is a widely used measure of a

Fig. Summary Plots.

To split the data, it is recommended to split it in 70:30 ratio . 70% of the

The . get_forecast() method is responsible to take the information about the

You might also like