0% found this document useful (0 votes)
8 views

Week 10 Resample Hourly Data

Time series

Uploaded by

arnablions
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Week 10 Resample Hourly Data

Time series

Uploaded by

arnablions
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

22/04/2024, 03:17 week_10_resample_hourly_data

CMPINF 2120 - Week 10


Resampling Hourly Data to Monthly data
Previous examples worked with data collected monthly. However, you may come across
higher frequency data. This example demonstrates working with hourly data and shows
how to summarize the data by month. I highly recommend summarizing higher
frequency data first to get a general idea of the patterns before examining the patterns
at the higher frequencies. The summarization is performed via the .resample()
method in this example.
The data were originally downloaded from the US Energy Information Agency (EIA) and
corresponds to hourly data for Nuclear electricity generation in the lower 48 United
States.

Import Modules
In [1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import seaborn as sns

In [2]: import statsmodels.api as sm

Read data
The CSV file is assumed to exist in the same working directory as this notebook. Specify
the file name below.
In [3]: data_file = 'Net_generation_from_nuclear_for_United_States_Lower_48_(region)

Read in the data from the CSV file but we must skip the first 5 rows and provide our own
column names.
In [4]: nuclear = pd.read_csv( data_file, skiprows=5, names=['timestamp', 'generatio

In [5]: nuclear.info()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 1/21


22/04/2024, 03:17 week_10_resample_hourly_data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23808 entries, 0 to 23807
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 timestamp 23808 non-null object
1 generation 23808 non-null int64
dtypes: int64(1), object(1)
memory usage: 372.1+ KB

The timestamp column looks different from other examples because it includes the
hour!
In [6]: nuclear

Out[6]: timestamp generation


0 03/19/2021 04H 85915
1 03/19/2021 03H 85091
2 03/19/2021 02H 85414
3 03/19/2021 01H 85628
4 03/19/2021 00H 85781
... ... ...
23803 07/1/2018 09H 81733
23804 07/1/2018 08H 81700
23805 07/1/2018 07H 75650
23806 07/1/2018 06H 75818
23807 07/1/2018 05H 58363
23808 rows × 2 columns

Reorganize data
We must convert the timestamp from a string to a date time object. The
pd.to_datetime() function is usually quite good at guessing the format.
In [6]: pd.to_datetime( nuclear.timestamp )

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 2/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[6]: 0 2021-03-19 04:00:00


1 2021-03-19 03:00:00
2 2021-03-19 02:00:00
3 2021-03-19 01:00:00
4 2021-03-19 00:00:00
...
23803 2018-07-01 09:00:00
23804 2018-07-01 08:00:00
23805 2018-07-01 07:00:00
23806 2018-07-01 06:00:00
23807 2018-07-01 05:00:00
Name: timestamp, Length: 23808, dtype: datetime64[ns]

You may get warnings when executing the conversion. Hourly data can sometimes
require information associated with the time zone. The utc argument is one such
argument for specifying how to time zone behavior is controlled. By default
utc=False . For this example, including utc=True does not change the outcome
though older versions of Pandas may need utc=True to run properly. As shown below,
the date and hour are the same as the default result shown above. However, more
information is displayed because the UTC option is specified.
In [7]: pd.to_datetime( nuclear.timestamp, utc=True )

Out[7]: 0 2021-03-1904:00:00+00:00
1 2021-03-1903:00:00+00:00
2 2021-03-1902:00:00+00:00
3 2021-03-1901:00:00+00:00
4 2021-03-1900:00:00+00:00
...
23803 2018-07-01 09:00:00+00:00
23804 2018-07-01 08:00:00+00:00
23805 2018-07-01 07:00:00+00:00
23806 2018-07-01 06:00:00+00:00
23807 2018-07-01 05:00:00+00:00
Name: timestamp, Length: 23808, dtype: datetime64[ns, UTC]

Let's assign the converted date time object to the date_dt column.
In [8]: nuclear['date_dt'] = pd.to_datetime( nuclear.timestamp, utc=True )

In [9]: nuclear

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 3/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[9]: timestamp generation date_dt


0 03/19/2021 04H 85915 2021-03-19 04:00:00+00:00
1 03/19/2021 03H 85091 2021-03-19 03:00:00+00:00
2 03/19/2021 02H 85414 2021-03-19 02:00:00+00:00
3 03/19/2021 01H 85628 2021-03-19 01:00:00+00:00
4 03/19/2021 00H 85781 2021-03-19 00:00:00+00:00
... ... ... ...
23803 07/1/2018 09H 81733 2018-07-01 09:00:00+00:00
23804 07/1/2018 08H 81700 2018-07-01 08:00:00+00:00
23805 07/1/2018 07H 75650 2018-07-01 07:00:00+00:00
23806 07/1/2018 06H 75818 2018-07-01 06:00:00+00:00
23807 07/1/2018 05H 58363 2018-07-01 05:00:00+00:00
23808 rows × 3 columns
In [10]: nuclear.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23808 entries, 0 to 23807
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 timestamp 23808 non-null object
1 generation 23808 non-null int64
2 date_dt 23808 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), int64(1), object(1)
memory usage: 558.1+ KB

There are more attributes of the datetime available to us now.


In [11]: nuclear['the_year'] = nuclear.date_dt.dt.year

In [12]: nuclear

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 4/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[12]: timestamp generation date_dt the_year


0 03/19/2021 04H 85915 2021-03-19 04:00:00+00:00 2021
1 03/19/2021 03H 85091 2021-03-19 03:00:00+00:00 2021
2 03/19/2021 02H 85414 2021-03-19 02:00:00+00:00 2021
3 03/19/2021 01H 85628 2021-03-19 01:00:00+00:00 2021
4 03/19/2021 00H 85781 2021-03-19 00:00:00+00:00 2021
... ... ... ... ...
23803 07/1/2018 09H 81733 2018-07-01 09:00:00+00:00 2018
23804 07/1/2018 08H 81700 2018-07-01 08:00:00+00:00 2018
23805 07/1/2018 07H 75650 2018-07-01 07:00:00+00:00 2018
23806 07/1/2018 06H 75818 2018-07-01 06:00:00+00:00 2018
23807 07/1/2018 05H 58363 2018-07-01 05:00:00+00:00 2018
23808 rows × 4 columns
In [13]: nuclear['the_month'] = nuclear.date_dt.dt.month

In [14]: nuclear

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 5/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[14]: timestamp generation date_dt the_year the_month


0 03/19/2021 85915 2021-03-19 2021 3
04H 04:00:00+00:00
1 03/19/2021 85091 2021-03-19 2021 3
03H 03:00:00+00:00
2 03/19/2021 85414 2021-03-19 2021 3
02H 02:00:00+00:00
3 03/19/2021 85628 2021-03-19 2021 3
01H 01:00:00+00:00
4 03/19/2021 85781 2021-03-19 2021 3
00H 00:00:00+00:00
... ... ... ... ... ...
23803 07/1/2018 09H 81733 2018-07-01 2018 7
09:00:00+00:00
23804 07/1/2018 08H 81700 2018-07-01 2018 7
08:00:00+00:00
23805 07/1/2018 07H 75650 2018-07-01 2018 7
07:00:00+00:00
23806 07/1/2018 06H 75818 2018-07-01 2018 7
06:00:00+00:00
23807 07/1/2018 05H 58363 2018-07-01 2018 7
05:00:00+00:00
23808 rows × 5 columns
In [15]: nuclear['the_day'] = nuclear.date_dt.dt.day

In [16]: nuclear

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 6/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[16]: timestamp generation date_dt the_year the_month the_day


0 03/19/2021 85915 2021-03-19 2021 3 19
04H 04:00:00+00:00
1 03/19/2021 85091 2021-03-19 2021 3 19
03H 03:00:00+00:00
2 03/19/2021 85414 2021-03-19 2021 3 19
02H 02:00:00+00:00
3 03/19/2021 85628 2021-03-19 2021 3 19
01H 01:00:00+00:00
4 03/19/2021 85781 2021-03-19 2021 3 19
00H 00:00:00+00:00
... ... ... ... ... ... ...
23803 07/1/2018 81733 2018-07-01 2018 7 1
09H 09:00:00+00:00
23804 07/1/2018 81700 2018-07-01 2018 7 1
08H 08:00:00+00:00
23805 07/1/2018 75650 2018-07-01 2018 7 1
07H 07:00:00+00:00
23806 07/1/2018 75818 2018-07-01 2018 7 1
06H 06:00:00+00:00
23807 07/1/2018 58363 2018-07-01 2018 7 1
05H 05:00:00+00:00
23808 rows × 6 columns
But we can also extract the HOUR!!!!!
In [17]: nuclear['the_hour'] = nuclear.date_dt.dt.hour

In [18]: nuclear

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 7/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[18]: timestamp generation date_dt the_year the_month the_day the_h


0 03/19/2021 2021-03-19
85915 04:00:00+00:00 2021 3 19
04H
1 03/19/2021 2021-03-19
85091 03:00:00+00:00 2021 3 19
03H
2 03/19/2021 2021-03-19
85414 02:00:00+00:00 2021 3 19
02H
3 03/19/2021 2021-03-19
85628 01:00:00+00:00 2021 3 19
01H
4 03/19/2021 2021-03-19
85781 00:00:00+00:00 2021 3 19
00H
... ... ... ... ... ... ...
23803 07/1/2018 2018-07-01
81733 09:00:00+00:00 2018 7 1
09H
23804 07/1/2018 2018-07-01
81700 08:00:00+00:00 2018 7 1
08H
23805 07/1/2018 2018-07-01
75650 07:00:00+00:00 2018 7 1
07H
23806 07/1/2018 2018-07-01
75818 06:00:00+00:00 2018 7 1
06H
23807 07/1/2018 2018-07-01
58363 05:00:00+00:00 2018 7 1
05H
23808 rows × 7 columns
We can also pull out the day of the week.
In [19]: nuclear['the_dow'] = nuclear.date_dt.dt.dayofweek

In [20]: nuclear

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 8/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[20]: timestamp generation date_dt the_year the_month the_day the_h


0 03/19/2021 2021-03-19
85915 04:00:00+00:00 2021 3 19
04H
1 03/19/2021 2021-03-19
85091 03:00:00+00:00 2021 3 19
03H
2 03/19/2021 2021-03-19
85414 02:00:00+00:00 2021 3 19
02H
3 03/19/2021 2021-03-19
85628 01:00:00+00:00 2021 3 19
01H
4 03/19/2021 2021-03-19
85781 00:00:00+00:00 2021 3 19
00H
... ... ... ... ... ... ...
23803 07/1/2018 2018-07-01
81733 09:00:00+00:00 2018 7 1
09H
23804 07/1/2018 2018-07-01
81700 08:00:00+00:00 2018 7 1
08H
23805 07/1/2018 2018-07-01
75650 07:00:00+00:00 2018 7 1
07H
23806 07/1/2018 2018-07-01
75818 06:00:00+00:00 2018 7 1
06H
23807 07/1/2018 2018-07-01
58363 05:00:00+00:00 2018 7 1
05H
23808 rows × 8 columns

Visualizations
Although our previous example focused on using time series specific visuals...we can still
use the standard or conventional plots to explore time series data!!!!!
We can explore the distribution of the value, generation , grouped by the newly
created date time attributes!
In [21]: sns.catplot(data = nuclear, x='the_year', y='generation', kind='box', aspect

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 9/21


22/04/2024, 03:17 week_10_resample_hourly_data

The violin plot shows the shape of the distribution in addition to the important summary
stats.
In [22]: sns.catplot(data = nuclear, x='the_year', y='generation', kind='violin', asp

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

Summarize the generation by month.


In [23]: sns.catplot(data = nuclear, x='the_month', y='generation', kind='box', aspec

plt.show()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 10/21


22/04/2024, 03:17 week_10_resample_hourly_data

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

But we could look the month grouped by the year.


In [24]: sns.catplot(data = nuclear, x='the_month', y='generation', hue='the_year', k

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

Lastly, group by hour of the day.


In [25]: sns.catplot(data = nuclear, x='the_hour', y='generation', kind='box', aspect

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 11/21


22/04/2024, 03:17 week_10_resample_hourly_data

In [26]: sns.catplot(data = nuclear, x='the_hour', y='generation', hue='the_year', ki

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

The first chart I would usually make is a TIME SERIES chart.


In [27]: sns.relplot(data = nuclear, x='date_dt', y='generation', kind='line', aspect

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

Time series grouping and resampling


file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 12/21
22/04/2024, 03:17 week_10_resample_hourly_data

Our previous examples worked with MONTHLY data. Let's convert our HOURLY or high
frequency data to the MONTHLY START sampling frequency. Before calling
.resample() method let's review grouping with .groupby().aggregate() .

In [28]: nuclear.groupby(['the_year', 'the_month']).\


aggregate(num_rows = ('generation', 'size'),
num_nonmissing = ('generation', 'count'),
num_hours = ('the_hour', 'nunique'),
num_days = ('the_day', 'nunique'),
month_sum = ('generation', 'sum'),
month_avg = ('generation', 'mean'),
month_min = ('generation', 'min'),
month_max = ('generation', 'max'),
month_sd = ('generation', 'std')).\
reset_index()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 13/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[28]: the_year the_month num_rows num_nonmissing num_hours num_days month


0 2018 7 739 739 24 31 5158
1 2018 8 744 744 24 31 491
2 2018 9 720 720 24 30 535
3 2018 10 744 744 24 31 615
4 2018 11 720 720 24 30 655
5 2018 12 744 744 24 31 746
6 2019 1 744 744 24 31 769
7 2019 2 672 672 24 28 675
8 2019 3 744 744 24 31 683
9 2019 4 720 720 24 30 6286
10 2019 5 744 744 24 31 698
11 2019 6 720 720 24 30 718
12 2019 7 744 744 24 31 754
13 2019 8 744 744 24 31 749
14 2019 9 720 720 24 30 690
15 2019 10 744 744 24 31 642
16 2019 11 720 720 24 30 662
17 2019 12 744 744 24 31 735
18 2020 1 744 744 24 31 713
19 2020 2 696 696 24 29 633
20 2020 3 744 744 24 31 629
21 2020 4 720 720 24 30 593
22 2020 5 744 744 24 31 6446
23 2020 6 720 720 24 30 674
24 2020 7 744 744 24 31 696
25 2020 8 744 744 24 31 692
26 2020 9 720 720 24 30 659
27 2020 10 744 744 24 31 5964
28 2020 11 720 720 24 30 6178
29 2020 12 744 744 24 31 7014
30 2021 1 744 744 24 31 7204
31 2021 2 672 672 24 28 632
file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 14/21
22/04/2024, 03:17 week_10_resample_hourly_data

the_year the_month num_rows num_nonmissing num_hours num_days month


32 2021 3 437 437 24 19 386
Prepare the data for time series analysis.
In [29]: my_series = nuclear.generation.copy()

In [30]: my_series

Out[30]: 0 85915
1 85091
2 85414
3 85628
4 85781
...
23803 81733
23804 81700
23805 75650
23806 75818
23807 58363
Name: generation, Length: 23808, dtype: int64

In [31]: my_series.index

Out[31]: RangeIndex(start=0, stop=23808, step=1)

In [32]: my_series.index = nuclear.date_dt

In [33]: my_series.index

Out[33]: DatetimeIndex(['2021-03-19 04:00:00+00:00', '2021-03-19 03:00:00+00:00',


'2021-03-19 02:00:00+00:00', '2021-03-19 01:00:00+00:00',
'2021-03-19 00:00:00+00:00', '2021-03-18 23:00:00+00:00',
'2021-03-18 22:00:00+00:00', '2021-03-18 21:00:00+00:00',
'2021-03-18 20:00:00+00:00', '2021-03-18 19:00:00+00:00',
...
'2018-07-01 14:00:00+00:00', '2018-07-01 13:00:00+00:00',
'2018-07-01 12:00:00+00:00', '2018-07-01 11:00:00+00:00',
'2018-07-01 10:00:00+00:00', '2018-07-01 09:00:00+00:00',
'2018-07-01 08:00:00+00:00', '2018-07-01 07:00:00+00:00',
'2018-07-01 06:00:00+00:00', '2018-07-01 05:00:00+00:00'],
dtype='datetime64[ns, UTC]', name='date_dt', length=23808, fr
eq=None)

RESAMPLE to the MONTHLY END summing ALL values within the MONTH!!!
In [34]: ready_series = my_series.copy().resample("M").sum()

In [35]: ready_series

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 15/21


22/04/2024, 03:17 week_10_resample_hourly_data

Out[35]: date_dt
2018-07-31 00:00:00+00:00 51584844
2018-08-31 00:00:00+00:00 49120222
2018-09-30 00:00:00+00:00 53522993
2018-10-31 00:00:00+00:00 61530792
2018-11-30 00:00:00+00:00 65513045
2018-12-31 00:00:00+00:00 74614079
2019-01-31 00:00:00+00:00 76917712
2019-02-28 00:00:00+00:00 67581408
2019-03-31 00:00:00+00:00 68366739
2019-04-30 00:00:00+00:00 62864882
2019-05-31 00:00:00+00:00 69872088
2019-06-30 00:00:00+00:00 71835222
2019-07-31 00:00:00+00:00 75485578
2019-08-31 00:00:00+00:00 74914172
2019-09-30 00:00:00+00:00 69033350
2019-10-31 00:00:00+00:00 64232759
2019-11-30 00:00:00+00:00 66277431
2019-12-31 00:00:00+00:00 73554316
2020-01-31 00:00:00+00:00 71315906
2020-02-29 00:00:00+00:00 63381879
2020-03-31 00:00:00+00:00 62970581
2020-04-30 00:00:00+00:00 59340276
2020-05-31 00:00:00+00:00 64463536
2020-06-30 00:00:00+00:00 67451902
2020-07-31 00:00:00+00:00 69636785
2020-08-31 00:00:00+00:00 69271076
2020-09-30 00:00:00+00:00 65977276
2020-10-31 00:00:00+00:00 59644949
2020-11-30 00:00:00+00:00 61784698
2020-12-31 00:00:00+00:00 70140098
2021-01-31 00:00:00+00:00 72048088
2021-02-28 00:00:00+00:00 63280414
2021-03-31 00:00:00+00:00 38627244
Freq: M, Name: generation, dtype: int64

In [36]: ready_series.size

Out[36]: 33

The time series visualization for the resampled montly frequency data.
In [37]: ready_series.plot( figsize=(12, 5) )

plt.show()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 16/21


22/04/2024, 03:17 week_10_resample_hourly_data

If we wanted WEEKLY data we would still need to identify the summary stat and the
sampling frequency. The total or SUMMED value per week.
In [38]: weekly_series = my_series.copy().resample('W').sum()

In [39]: weekly_series

Out[39]: date_dt
2018-07-01 00:00:00+00:00
1512691
2018-07-08 00:00:00+00:00
14569442
2018-07-15 00:00:00+00:00
10907296
2018-07-22 00:00:00+00:00
10438491
2018-07-29 00:00:00+00:00
10976691
...
2021-02-21 00:00:00+00:00 15795919
2021-02-28 00:00:00+00:00 15273776
2021-03-07 00:00:00+00:00 15024074
2021-03-14 00:00:00+00:00 14914081
2021-03-21 00:00:00+00:00 8689089
Freq: W-SUN, Name: generation, Length: 143, dtype: int64

In [40]: weekly_series.plot( figsize=(12, 6) )

plt.show()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 17/21


22/04/2024, 03:17 week_10_resample_hourly_data

Time series specific visualizations


Autocorrelation plot with the monthly data
In [41]: fig, ax = plt.subplots(figsize=(12, 6))

sm.graphics.tsa.plot_acf( ready_series.values.squeeze(), lags=24, ax = ax)

plt.show()

Weekly data autocorrelation.

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 18/21


22/04/2024, 03:17 week_10_resample_hourly_data

In [42]: fig, ax = plt.subplots(figsize=(12, 6))

sm.graphics.tsa.plot_acf( weekly_series.values.squeeze(), lags=52, ax = ax)

plt.show()

The time series decomposition using the classic additive decomposition.


In [43]: my_decomposition = sm.tsa.seasonal_decompose( ready_series, model='additive'

In [44]: fig = my_decomposition.plot()

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 19/21


22/04/2024, 03:17 week_10_resample_hourly_data

The seasonally adjusted behavior.


In [45]: df_decomp = pd.DataFrame({'observed': my_decomposition.observed,
'seasonal_adjust': my_decomposition.observed - my_
index=ready_series.index)

In [46]: sns.relplot(data = df_decomp, kind='line', aspect= 2.5 )

plt.show()

C:\Users\XPS15\Anaconda3\envs\cmpinf2120_2024\lib\site-packages\seaborn\axis
grid.py:118: UserWarning: The figure layout has changed to tight
self._figure.tight_layout(*args, **kwargs)

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 20/21


22/04/2024, 03:17 week_10_resample_hourly_data

Summary
This report showed how to organize high frequency hourly data for monthly time series
exploration.
In [ ]:

file:///Users/arnabdeysarkar/Desktop/Spring 2024 UPitt GIT/cmpinf 2120/Class/Week 10/week_10_resample_hourly_data.html 21/21

You might also like