0% found this document useful (0 votes)
267 views

Applications of Higher-Order Hidden Markov Models in Exotic Commodity Markets

Uploaded by

pukkapad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
267 views

Applications of Higher-Order Hidden Markov Models in Exotic Commodity Markets

Uploaded by

pukkapad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 225

Western University

Scholarship@Western

Electronic Thesis and Dissertation Repository

2-8-2018 10:00 AM

Some applications of higher-order hidden Markov models in the


exotic commodity markets
Heng Xiong
The University of Western Ontario

Supervisor
Mamon, Rogemar.
The University of Western Ontario

Graduate Program in Statistics and Actuarial Sciences


A thesis submitted in partial fulfillment of the requirements for the degree in Doctor of
Philosophy
© Heng Xiong 2018

Follow this and additional works at: https://ir.lib.uwo.ca/etd

Part of the Applied Statistics Commons, Finance and Financial Management Commons, Management
Sciences and Quantitative Methods Commons, Numerical Analysis and Scientific Computing Commons,
and the Statistical Models Commons

Recommended Citation
Xiong, Heng, "Some applications of higher-order hidden Markov models in the exotic commodity markets"
(2018). Electronic Thesis and Dissertation Repository. 5226.
https://ir.lib.uwo.ca/etd/5226

This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted
for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of
Scholarship@Western. For more information, please contact [email protected].
Abstract

The liberalisation of regional and global commodity markets over the last several decades
resulted in certain commodity price behaviours that require new modelling and estimation
approaches. Such new approaches have important implications to the valuation and utili-
sation of commodity derivatives. Derivatives are becoming increasingly crucial for market
participants in hedging their exposure to volatile price swings and in managing risks asso-
ciated with derivative trading. The modelling of commodity-based variables is an integral
part of risk management and optimal-investment strategies for commodity-linked portfo-
lios. The characteristics of commodity price evolution cannot be captured sufficiently by
one-state driven models even with the inclusion of multiple factors. This inspires the adop-
tion of regime-switching methods to rectify the one-state multi-factor modelling inadequa-
cies. In this research, we aim to employ higher-order hidden Markov models (HOHMMs)
in order to take advantage of the latent information in the observed process recorded in
the past. This hugely enhances and complements the regime-switching features of our ap-
proach in describing certain variables that virtually determine the value of some commodity
derivatives such as contracts dependent on temperature, electricity spot price, and fish-price
dynamics. Our push for the utility of the change-of-probability-measure technique facil-
itates the derivation of recursive filtering algorithms. This then establishes a self-tuning
dynamic estimation procedure. Both the data-fitting and forecasting performances of vari-
ous model settings are investigated.

This research work emerged from four related projects detailed as follows. (i) We start
with an HMM to model the behaviour of daily average temperatures (DATs) geared to-
wards the analysis of weather derivatives. (ii) The model in (i) is extended naturally by
showcasing the capacity of an HOHMM-based approach to simultaneously describe the
DATs’ salient properties of mean reversion, seasonality, memory and stochasticity. (iii) An
HOHMM-driven jump process augments the HOHMM-based de-seasonalised temperature
process to capture price spikes, and the ensuing filtering algorithms under this modelling
framework are constructed to provide optimal parameter estimates. (iv) Finally, a multi-
dimensional HOHMM-modulated set up is built for futures price-curve dynamics pertinent
to financial product valuation and risk management in the aquaculture sector. We examine
the performance of this new modelling set up by considering goodness-of-fit and out-of-
sample forecasting metrics with a detailed numerical demonstration using a multivariate

i
data set compiled by the Fish Pool ASA.

This research offers a collection of more flexible stochastic modelling approaches for pric-
ing and risk analysis of certain commodity derivatives on weather, electricity and fish
prices. The novelty of our techniques is the powerful capability to automate the param-
eter estimation. Consequently, we contribute to the development of financial tools that aid
in selecting the appropriate and optimal model on the basis of some information criteria
and within current technological advancements in which continuous flow of observed data
are now readily accessible in real time.

Keywords: regime-switching model, commodity-derivatives valuation, change of refer-


ence probability measure, optimal parameter estimation, multivariate HOHMM filtering
method

ii
Co-Authorship Statement

I hereby declare that this thesis incorporates materials that are direct results of my main
efforts during my doctoral study. All research outputs (jointly authored with Dr Rogemar
Mamon) led to two published papers and two manuscripts under review in refereed jour-
nals, and these are detailed below.

The content of chapter 2 appeared as a full paper in the journal Computational Manage-
ment Science; see reference [44] in chapter 4.

The results of chapter 3 were published in the Journal of Computational Science; see ref-
erence [37] in chapter 4.

Chapter 4 is based on a manuscript that is currently under review in the journal Applied
Energy.

The source of chapter 5 is a manuscript under consideration for publication in the Journal
of Economic Dynamics and Control.

An integrated-article format of this dissertation is adopted in accordance with the West-


ern’s thesis guidelines. As a lead author of all self-contained chapters, I am responsible
for probing the feasibility of the proposed modelling frameworks, the implementation of
filtering algorithms, and the completion of the manuscripts. Dr Mamon guided me consci-
entiously on the research plan, proposed general modelling development and formulation,
interpretation involved in the empirical analysis of all aforementioned four projects, and in
addressing competently the referees’ concerns for successful journal publications.

I certify that this thesis is fully a product of my own work. It was conducted from Septem-
ber 2014 to present under the supervision of Dr Mamon at the University of Western On-
tario.

London, Ontario

iii
To my Dad in Heaven
“I miss you, my first & forever hero.”

iv
Acknowledgements

First and foremost, I would like to express my deepest gratitude to my supervisor Dr Ro-
gemar Mamon for his innumerably valuable guidance on my graduate study. This doctoral
research enjoys steady progress and is building momentum for potential impact due to his
constructive comments and excellent advice. More importantly, his infinite patience, thor-
ough knowledge and deep insights have carried me through some toughest times of my
academic life since day one at Western.

I would also like to sincerely thank Dr Matt Davison for his invaluable help and support. He
is always generous with his time to discuss some of my academic and professional issues.
I am extremely appreciative of his well-thought out suggestions and tenacious encourage-
ment throughout my graduate studies.

I gratefully acknowledge the useful feedback, precious time, and energy of my other thesis
committee members: Dr Marcos Escobar-Anel, Dr Lars Stentoft, and Dr Francois Watier.
Their helpful comments and suggestions certainly strengthen this dissertation at various
levels. Furthermore, many thanks are due to the outstanding faculty and staff of the De-
partment of Statistical and Actuarial Sciences (DSAS) for their support in various ways. I
also sincerely thank the financial support provided by the DSAS and the Ontario Graduate
Scholarship/Queen Elizabeth II Scholarship Program.

I take this chance to thank all my colleagues and friends in Canada. I am so blessed to
be in your great company, affording moral support in the accomplishment of this lifetime
milestone.

Last but not least, I would like to express my special thanks to my family, most especially
my parents and grandma, for their selfless love, unconditional support, and unfailing belief
on my ability to reach greater heights and possibilities. I will do my best to be a better man
and make you all proud.

v
Contents

Abstract i

Co-Authorship Statement iii

Dedication iv

Acknowledgements v

List of Figures x

List of Tables xiii

List of Appendices xv

1 Introduction 1
1.1 Research motivation and objectives . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Overview of HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Extension to HOHMMs . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Evaluation with forward-backward algorithm . . . . . . . . . . . . 6
1.2.4 Decoding with Viterbi algorithm . . . . . . . . . . . . . . . . . . . 7
1.2.5 Training with expectation-maximisation algorithm . . . . . . . . . 8
1.2.6 Change of measure method in HOHMMs . . . . . . . . . . . . . . 11
1.3 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Putting a price tag on temperature . . . . . . . . . . . . . . . . . . 13
1.3.2 A self-updating model driven by a higher-order hidden Markov
chain for temperature dynamics . . . . . . . . . . . . . . . . . . . 13
1.3.3 A higher-order Markov chain-modulated model for electricity spot-
price dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

vi
1.3.4 Modelling and forecasting futures-prices curves in the Fish Pool
market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Putting a price tag on temperature 18


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Model for temperature derivatives . . . . . . . . . . . . . . . . . . 20
2.2.2 The HMM-modulated OU process . . . . . . . . . . . . . . . . . . 21
2.3 Recursive filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Change of reference probability measure . . . . . . . . . . . . . . . 23
2.3.2 Calculation of recursive filters . . . . . . . . . . . . . . . . . . . . 24
2.4 Optimal parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Numerical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 The deterministic component . . . . . . . . . . . . . . . . . . . . . 28
2.5.2 The stochastic component . . . . . . . . . . . . . . . . . . . . . . 29
2.5.2.1 Initial values for the parameter estimates . . . . . . . . . 31
2.5.2.2 Evolution of parameter estimates . . . . . . . . . . . . . 31
2.5.3 Analysis of regime and model selection . . . . . . . . . . . . . . . 32
2.5.3.1 Assessment of predicted temperature values . . . . . . . . 36
2.5.3.2 Likelihood-based model selection and error analysis . . . 39
2.6 Application to the pricing of temperature-dependent contracts . . . . . . . . 41
2.6.1 Temperature-based derivatives . . . . . . . . . . . . . . . . . . . . 42
2.6.2 Pricing of temperature-based index futures and options . . . . . . . 43
2.6.3 The price of risk and risk-neutral measure Q . . . . . . . . . . . . . 44
2.6.4 Pricing of a temperature-based contract . . . . . . . . . . . . . . . 48
2.6.5 Sensitivity analysis of temperature-based option price . . . . . . . . 50
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 A self-updating model driven by a higher-order hidden Markov chain for


temperature dynamics 63
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Model description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.1 Model for temperature derivatives . . . . . . . . . . . . . . . . . . 67
3.2.2 Ornstein-Uhlenbeck (OU) process in the temperature model . . . . 68

vii
3.2.2.1 HMM-modulated OU process . . . . . . . . . . . . . . . 68
3.2.2.2 HOHMM-modulated Ornstein-Uhlenbeck process . . . . 69
3.3 Recursive filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.1 Reference probability measure . . . . . . . . . . . . . . . . . . . . 70
3.3.2 Calculation of recursive filters . . . . . . . . . . . . . . . . . . . . 72
3.4 Optimal parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 Numerical implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5.1 Analysis of the deterministic component . . . . . . . . . . . . . . . 78
3.5.2 Analysis of the stochastic component . . . . . . . . . . . . . . . . 79
3.5.3 Validity of model evaluation . . . . . . . . . . . . . . . . . . . . . 80
3.5.4 Implementation aspects of the HOHMM-OU filtering . . . . . . . . 81
3.5.4.1 Initial values for the parameter estimation . . . . . . . . . 82
3.5.4.2 Evolution of parameter estimates and comparison under
HMM and HOHMM settings . . . . . . . . . . . . . . . 83
3.5.5 Model selection and other diagnostics . . . . . . . . . . . . . . . . 88
3.5.5.1 Assessment of predicted temperatures . . . . . . . . . . . 88
3.5.5.2 Error analysis and model selection . . . . . . . . . . . . . 91
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4 A higher-order Markov chain-modulated model for electricity spot-price dy-


namics 101
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2 Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2.1 Model for electricity spot price . . . . . . . . . . . . . . . . . . . . 105
4.2.2 The fusion of HOHMM, OU and compound Poisson models . . . . 106
4.3 Recursive filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.3.1 Change of reference probability measure . . . . . . . . . . . . . . . 109
4.3.2 Calculation of recursive filters . . . . . . . . . . . . . . . . . . . . 110
4.4 Optimal parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 113
4.5 Numerical application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5.1 Analysis of the deterministic component . . . . . . . . . . . . . . . 117
4.5.2 Application of filtering algorithms for the HOHMM-OU process
with jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.5.2.1 Data processing with the filtering algorithm . . . . . . . . 119

viii
4.5.2.2 Implementing the filtering procedure . . . . . . . . . . . 121
4.5.3 Discussion of model performance . . . . . . . . . . . . . . . . . . 127
4.5.3.1 Forecasting and error analysis . . . . . . . . . . . . . . . 127
4.5.3.2 Selection of suitable model setting . . . . . . . . . . . . . 132
4.5.3.3 The valuation of expected future spot at delivery . . . . . 135
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5 Modelling and forecasting futures-prices curves in the Fish Pool market 144
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.2 Model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.3 Filtering and parameter estimation . . . . . . . . . . . . . . . . . . . . . . 152
5.4 Numerical application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.4.1 Fish pool exchange and data description . . . . . . . . . . . . . . . 156
5.4.2 Implementation of filters and estimation . . . . . . . . . . . . . . . 160
5.4.3 Model performance and selection . . . . . . . . . . . . . . . . . . 162
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

6 Conclusion 180
6.1 Summary of research contributions . . . . . . . . . . . . . . . . . . . . . . 180
6.2 Further research directions . . . . . . . . . . . . . . . . . . . . . . . . . . 182
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Appendix A Derivations of the model’s optimal parameter estimates in chapter 2185

Appendix B Proofs of Propositions in chapter 2 189

Appendix C Derivations of the model’s optimal parameter estimates in chapter 3192

Appendix D Derivations of the model’s optimal parameter estimates in chapter 4196

Appendix E Derivation of the model’s optimal parameter estimates in chapter 5 204

Appendix F Derivation of the d-step ahead forecasting formula in chapter 5 207

Curriculum Vitae 209

ix
List of Figures

1.1 Reference measure optimal filter derivation for HOHMMs . . . . . . . . . 12

2.1 Fitted seasonal component and actual observations . . . . . . . . . . . . . . 30


2.2 Deseasonalised stochastic component Xt . . . . . . . . . . . . . . . . . . . 30
2.3 Evolution of parameter estimates for θ, α, and ξ under a 2-state HMM-
based model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Evolution of transition probability under a 2-state HMM-based model . . . 34
2.5 Evolution of parameter estimates for θ, α, and ξ under a 3-state HMM-
based model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 Evolution of transition probability under a 3-state HMM-based model . . . 36
2.7 One-step ahead forecasts under a 2-state HMM-based model . . . . . . . . 37
2.8 Comparison of the expected HDD and actual HDD in a 2-state model . . . 38
2.9 Comparison of one step ahead forecasts in 1-, 2-, and 3-state HMMs . . . . 38
2.10 Monthly call option price for varying strike HDD . . . . . . . . . . . . . . 51
2.11 HDD call option price for varying the market price of risk with initial state 1 51
2.12 HDD call option price for varying θ1∗ . . . . . . . . . . . . . . . . . . . . . 54
2.13 HDD call option price for varying θ2∗ . . . . . . . . . . . . . . . . . . . . . 55
2.14 HDD call option price for varying α1 . . . . . . . . . . . . . . . . . . . . . 55
2.15 HDD call option price for varying α2 . . . . . . . . . . . . . . . . . . . . . 56
2.16 HDD call option price for varying ξ1 . . . . . . . . . . . . . . . . . . . . . 56
2.17 HDD call option price for varying ξ2 . . . . . . . . . . . . . . . . . . . . . 57

3.1 Fitted seasonal component versus actual observations . . . . . . . . . . . . 79


3.2 Deseasonalised stochastic component Xt . . . . . . . . . . . . . . . . . . . 80
3.3 Evolution of transition probability under a 2-state HOHMM-based model . 83
3.4 Evolution of parameter estimates for κ, ϑ, and % under a 2-state HOHMM-
based model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Evolution of parameter estimates for κ, ϑ, and % under a 3-state HOHMM-
based model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

x
3.6 Evolution of parameter estimates for κ, ϑ, and % under a 2-state HOHMM-
based model with 95% confidence level . . . . . . . . . . . . . . . . . . . 87
3.7 One-step ahead forecasts under a 3-state HOHMM-based model . . . . . . 89
3.8 Comparison of the expected HDD and actual HDD in a 3-state HOHMM-
based model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.9 Comparison of one-step-ahead forecasts in 1-, 2-, and 3-state HOHMM-
based models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.10 Evolution of AICs for the 1-, 2-, and 3-state HMM- and HOHMM-based
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.1 Fitted seasonal component and actual observations . . . . . . . . . . . . . . 119


4.2 Zoom-in view of fitted seasonal component versus actual observed prices
(Jan/01/2014 - Dec/31/2014) . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.3 Deseasonalised stochastic component of DESP . . . . . . . . . . . . . . . 120
4.4 Evolution of parameter estimates for α, θ, and ξ under a 2-state HOHMM . 123
4.5 Evolution of parameter estimates for µβ , and σβ . . . . . . . . . . . . . . . 124
4.6 Evolution of parameter estimates for α, θ, and ξ under a 3-state HOHMM-
based model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.7 Evolution of parameter estimates for µβ , and σβ . . . . . . . . . . . . . . . 126
4.8 One-step ahead forecasts for Xk and S k under a 3-state HOHMM . . . . . . 130
4.9 Comparison of one-step ahead forecasts in 1-, 2-, 3-state HMM- and HOHMM-
based models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.10 Evolution of BIC values under the 1-, 2-, 3-state HMM- and HOHMM-
based models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.11 Expected future spot price on delivery (EFSP) under a 2-state HOHMM . . 137

5.1 Evolution of transition probabilities under a 2-state HOHMM for futures


prices with expiry T h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.2 Evolution of transition probabilities under a 3-state HOHMM for prices of
futures with expiry T h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.3 Evolution of parameter estimates for ξ, λ, ν, and σ under a 2-state HOHMM
with expiry on 30 Dec 2016. . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.4 Evolution of parameter estimates for ξ, λ, ν, and σ under a 3-state HOHMM
with expiry on 30 Dec 2016. . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.5 One-step ahead predictions for futures prices with expiries on 28 Oct 2016,
29 Nov 2016, and 29 Dec 2016 . . . . . . . . . . . . . . . . . . . . . . . . 173

xi
5.6 AIC, AICc and BIC for the 1-, 2-, 3-state HMM/HOHMM-based model for
salmon futures prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.7 One-step ahead and out-of-sample predictions under the 3-state HOHMM
for futures prices with expiries on 31 Jan 2017, 28 Feb 2017, and 31 Mar
2017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

xii
List of Tables

2.1 Descriptive statistics for daily average temperature(DAT) . . . . . . . . . . 29


2.2 Parameter estimates for the seasonality-component . . . . . . . . . . . . . 29
2.3 Interval of standard errors for parameter estimates under 1-, 2-, 3-state
HMM- and HOHMM-based models . . . . . . . . . . . . . . . . . . . . . 34
2.4 Error analysis of HMM-based models . . . . . . . . . . . . . . . . . . . . 39
2.5 Bonferroni-adjusted p-values of a t-test in terms of RMSEs under HMM
settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Number of parameters under HMM settings . . . . . . . . . . . . . . . . . 41
2.7 AIC-based selection analysis . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.8 Specifications of typical CME temperature-based contracts . . . . . . . . . 42
2.9 Elements of a standardised temperature-based contract . . . . . . . . . . . 43
2.10 Optimal parameter estimates for the 2-state HMM model with and without
the market price of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.11 HDD call option price for varying intensities pc . . . . . . . . . . . . . . . 54

3.1 Descriptive statistics for the DATs data set . . . . . . . . . . . . . . . . . . 78


3.2 Parameter estimates for the seasonality-component . . . . . . . . . . . . . 78
3.3 Parameter estimates for the seasonality-component . . . . . . . . . . . . . 81
3.4 Interval of standard errors for the parameter estimates under 1-, 2-, 3-state
HMM- and HOHMM-based models . . . . . . . . . . . . . . . . . . . . . 88
3.5 Results of error analysis on HMM-based models . . . . . . . . . . . . . . . 92
3.6 Results of error analysis on HOHMM-based models . . . . . . . . . . . . . 92
3.7 Bonferroni-corrected p-values for a paired t-test on RMSEs under HMM
and HOHMM settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.8 Number of estimated parameters under the HMM and HOHMM settings . . 93
3.9 Results of error analysis for the HDD on HMM- and HOHMM-based models 94

4.1 Descriptive statistics for daily electricity spot price (DESP) . . . . . . . . . 118
4.2 Parameter estimates for the seasonal component . . . . . . . . . . . . . . . 118

xiii
4.3 Interval of standard errors for parameter estimates under 1-, 2-, 3-state
HMM- and HOHMM-based models . . . . . . . . . . . . . . . . . . . . . 128
4.4 Error analysis of HMM- and HOHMM-based models . . . . . . . . . . . . 132
4.5 Bonferroni-corrected p-values for the t-test performed on the RMSEs in-
volving the DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.6 Number of estimated parameters under HMM-OU with jumps and HOHMMs-
OU with jumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.7 Comparison of selection criteria AIC . . . . . . . . . . . . . . . . . . . . . 134
4.8 Optimal parameter estimates for the 2-state HOHMM . . . . . . . . . . . . 136

5.1 The data periods of future contracts (maturities of 1–6 months) covered by
the moving window in the filtering procedure . . . . . . . . . . . . . . . . 158
5.2 Futures contracts with maturity up to 6 months and expiration on 29 Jan 2016158
5.3 Descriptive statistics for log-futures prices (maturities of 1 – 6 months) . . . 159
5.4 Error analysis of HMM- and HOHMM-based models under 1-, 2-, and 3-
state settings for salmon futures prices . . . . . . . . . . . . . . . . . . . . 167
5.5 Bonferroni-corrected p-values for the paired t-test performed on the RM-
SEs involving salmon futures prices . . . . . . . . . . . . . . . . . . . . . 168
5.6 Number of estimated parameters under HMM and HOHMM settings for
salmon futures prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.7 Out-of-sample error analysis of HMM- and HOHMM-based models under
1-, 2-, and 3- state settings for futures prices . . . . . . . . . . . . . . . . . 171

xiv
List of Appendices

Appendix A: Derivations of the model’s optimal parameter estimates in chapter 2 . . 184


Appendix B: Proofs of Propositions in chapter 2 . . . . . . . . . . . . . . . . . . . . 188
Appendix C: Derivations of the model’s optimal parameter estimates in chapter 3 . . 191
Appendix D: Derivations of the model’s optimal parameter estimates in chapter 4 . . 195
Appendix E: Derivation of the model’s optimal parameter estimates in chapter 5 . . . 203
Appendix F: Derivation of the expected value of d-step ahead forecasts in chapter 5 . 206

xv
Chapter 1

Introduction

1.1 Research motivation and objectives


Amongst the most pressing concerns of today’s contemporary society include climate change,
worldwide political uncertainty, resiliency of power resource generation, food sustainabil-
ity, and development of new or revised regulatory frameworks. Undoubtedly, these issues
have impact on the financial, energy, agriculture, and aquaculture sectors. This research
work delves on the development of mathematical models that support the creation of finan-
cial solutions, in the above-mentioned industries, to mitigate, hedge and transfer relevant
risks closely tied to potential huge economic losses arising from such societal pressing is-
sues.

The financial solutions refer to exotic commodity-based derivatives whose values are de-
pendent on weather measurements, and prices in the electricity and fish markets. Taking
suitable positions comprising the correct number of contracts in derivatives trading forms
strategies in managing risk exposure to certain pertinent factors. Research studies in the
aforementioned markets are based on one-state models. However, they might not accu-
rately capture the stylised behaviours of price evolution, especially during periods of fi-
nancial crises or when positive news affect market sentiments that drive prices suddenly.
These situations require models with more capabilities to account for abrupt fluctuations of
various statistics, and one simple but powerful way is the embedding of a regime-switching
approach enriched by a memory-capturing mechanism.

In particular, to meet the goals of mathematical modelling developments in the context


of considerations as described above, the following will be carried out: (i) Construct

1
2 Chapter 1. Introduction

an Ornstein-Uhlenbeck (OU) modelling framework modulated by hidden Markov mod-


els (HMM) and higher-order hidden Markov models (HOHMM) to adequately delineate
important characteristics of market primitives. (ii) Add a compound-Poisson process into
the HOHMM-OU setting to pick up the spike dynamics in the observed time series. (iii)
Extend the HOHMM setting into a multi-dimensional framework tailored for multivariate
data series. (iv) Develop self-calibrating algorithms and dynamic estimation procedures for
the proposed models. (v) Derive recursive filters for quantities that are functions of both
HMM and HOHMM, and use such filters to obtain models’ optimal parameter estimates.
(vi) Assess and compare performances under both HMM and HOHMM settings using error
analyses and likelihood-based information criteria.

1.2 Literature review


In this section, we will first introduce the theory and assumptions of HMMs, and then ex-
tend the concepts to HOHMMs. Besides expanding the HMM analytical framework, we
will review four fundamental results at an intuitive level for the HOHMM construction and
filtering relevant to the objectives of our research. They consist of (i) evaluation (ii) de-
coding (iii) training, and (iv) change of measure method. The first three issues are notable
problems and throughly addressed within HMMs; see [1], [19], and [3], whilst we will
examine them under the HOHMM setting by following [2]. A summary of the change of
probability measure technique will be given to reduce an HOHMM to an equivalent first-
order representation for our filtering processes.

1.2.1 Overview of HMMs


Since initially introduced in the late 1960s, HMMs have been widely utilised as a ubiqui-
tous instrument in many fields of engineering and physical sciences, such as speech recog-
nition, data compression, and molecular physics. The comprehensive mathematical struc-
ture of HMMs constructs a solid theoretical foundation and provides a great performance
in practice. The study of applying this tool in the financial field was pioneered by Hamil-
ton [8]. There have been systematic methodologies and extensive applications of HMMs
developed to capture variables in finance and economy since then.
1.2. Literature review 3

The general mechanism of HMMs in finance can be concisely explained by the signal the-
ory, well-known in engineering. Outputs produced from any processes in both theory and
practice normally are considered as signals. In finance, we then can treat collected data as
observed signals, such as prices and indices of underlying assets. However, these observa-
tions might contain noise or be distorted by various errors from their generating procedures.
The estimation and measurements for real sources might be corrupted. It is then not possi-
ble to reveal the hidden state of the model only through the observed symbols. Fortunately,
the application of HMMs is capable of filtering the noise out and unravelling the distortion.
The hidden variables and information in our observations then can be efficiently estimated
and further investigated.

Following Cappé et al. [3], an HMM comprises a bivariate discrete process {Xt , Ot } with
t ∈ Z+0 . Let {Xt } be a Markov chain encapusulated in an observed process {Ot }, which is a
corrupted version of {Xt } due to some noise. Even though the latent process {Xt } is hidden
to the outside, it can be observable through {Ot } under an HMM. As in Rabiner [29], a set
of parameters (N, n, Π, Λ, P) needs to be introduced in order to define an HMM completely.

• N, the number of hidden states in the model. Even though they are latent, physical
significance to the states can be detected for many practical applications. We denote
the state in the model as qt at time t, and individual states as

S t = {s1 , s2 , . . . , sN }

• n, the number of distinct observation signals per hidden state at time t. For instance,
“rainy day” and “non-rainy day” can be treated as two distinct observed symbols. We
denote the signal as “Yt ” at time t. The individual signals are presented by

Yt = {y1 , y2 , . . . , yn }

• Π = {π ji } stands for the state transition probability distribution that governs transi-
tions among states, where
 
π ji = P qt+1 = si | qt = s j , 1 ≤ i, j ≤ N,

π ji = 1, π ji ≥ 0.
PN
and i=1

• Λ = {Λ j } denotes the initial state distribution, where


 
Λ j = P q1 = s j , 1 ≤ j ≤ N.
4 Chapter 1. Introduction

• P = {pi (m)} denotes the probability distribution of observed symbols in hidden state
i , where

pi (m) = P (Ot = ym | qt = si ) , 1 ≤ i ≤ N, 1 ≤ m ≤ n,

pi (m) = 1, {pi (m) ≥ 0}.


Pn
and m=1

Before implementing HMMs in generating observations, we need to explicitly state three


important assumptions in the theory of HMMs [29] as follows:

• Assumption 1: Markov assumption


   
P qt+1 = si | qt = s j , qt−1 = sl , . . . , q0 = sh = P qt+1 = si | qt = s j = π ji .

We assume the Markov property holds. As shown in its definition, the current state st
is independent of all the states prior to t − 1, given st−1 . The observed symbols meets
the Markov assumption as well with respect to their corresponding states.

• Assumption 2: Stationarity assumption

P qt+1 = si | qt = s j = P qk+1 = si | qk = s j , for 1 ≤ i, j ≤ N and ∀k, t ∈ Z+0 .


   

The intuitive explanation for this assumption is that the state transition distribution
over next state given current state does not change over time.

• Assumption 3: Output independence assumption


T
Y
P (O | q1 , q2 , . . . , qT , N, n, Π, Λ, P) = P (Ot | qt , N, n, Π, Λ, P) .
t=1

Let O = O1 , O2 , . . . , OT be an observation sequence, in which T denotes the total


number of observations. Every observation Ot , 1 ≤ t ≤ T , is one of the signals from
Yt . This assumption indicates that the current output observations are statistically
independent of previous ones.

Once proper values are assigned to the set of parameters (N, n, Π, Λ, P) under the afore-
mentioned assumptions, the HMM then can be utilised to produce an observation sequence
O, the general procedures of which include the following steps. We start with setting an
initial state in terms of Λ and then choose Ot = ym according to pi (m). π ji will be applied
when transiting to another state qt+1 = si . By taking increment by 1 for t till T , we shall
produce O through an appropriate HMM.
1.2. Literature review 5

1.2.2 Extension to HOHMMs


HMM-based approaches and strategies are in widespread use with its structural simplic-
ity and straightforward computation. In view of the definitions in Section 1.2.1, an HMM
formulation is generally capable of resolving first-order transition among hidden states.
However, it might be inadequate for some real-world applications. For instance, the lim-
ited validities in assumptions basically indicates that only the present information affects
what you are about to express next in speech recognition, one of the most extensive fields
of HMMs application. It seems improper since long-term dependence might be necessary
to retain a robust speech structure. As presented in Juang and Rabiner [9], HMMs are not
sufficient under higher levels of a recognition system, including syntactic or semantic pro-
cessing.

Various literatures also find out the longer memory property in many financial series, such
as the long term dependence in the volatility of S&P 500 returns [12] and in percentage
changes on Treasury debt security yields [13]. For this sort of financial series, a growing
body of evidence suggests that a regular HMM cannot effectively capture their stylised be-
haviours ([17] and [21]), thereby raising a need to expand the HMM literature. It results
in some scholars examing applications of HOHMMs in the financial field. For instance,
Siu et al. proposed a higher-order Markov-switching model with the drift and the volatility
modulated by a discrete-time high-order Markov chain for measuring risk of a risky port-
folio [28]. Xi et al. introduced an analysis of asset allocation strategies under both HMM
and HOHMM settings, and concluded that the HOHMM-based approach outperforms the
HMM-based strategy for certain levels of transaction costs [20].

As an extension of a normal Markov model, a higher-order Markov model builds longer


memories into a Markov process by specifying that a current state depends on other pre-
vious states, not only on the previous one. Analogically, we can find out an HOHMM
involves some features presented in Section 1.2.1 with two major differences. The state
transition probability distribution becomes

πit−k+1 ,...,it+1 = P qt+1 = sit+1 | qt = sit , . . . , qt−k+1 = sit−k+1 , for 1 ≤ k ≤ t ≤ T,



(1.1)

and the initial k states distribution is defined as

πi1 ,...,ik = P q1 = si1 , . . . , qk = sik .



(1.2)
6 Chapter 1. Introduction

Equations (1.1) and (1.2) increase the number of parameters exponentially in terms of the
order of the HOHMM setting. They lead to more complicated processes to address the
three classic problems originally in HMMs, the results of which will be elaborated in the
following sections.

1.2.3 Evaluation with forward-backward algorithm

Given an observation sequence O = O1 , . . . , OT and a parameter set S = (Π, Λ, P), we


can solve the evaluation problem under the HOHMM setting by finding P(O | S ). Even
though straightforward probabilistic arguments is one of options to address this issue, it is
not feasible in practice considering the computation in order of N T . Therefore, we adopt
the forward-backward algorithm [29] to efficiently compute the probability of the observed
outputs. We firstly define a forward variable as follows

αt (it−k+1 , . . . , it ) = P O1 , . . . , Ot , qt−k+1 = sit−k+1 , . . . , qt = sit | S , for 1 ≤ k ≤ t ≤ T.




It is a joint conditional probability that the subsequence of first t observed outcomes is


O1 , . . . , Ot and the last m hidden states of an HOHMM at time t are sit−k+1 , . . . , sit . The
forward probability αT (iT −k+1 , . . . , iT ) can then be calculated through the forward algorithm
by the following recursion,

N
X 
αt+1 (it−k+2 , . . . , it+1 ) = αt (it−k+1 , . . . , it ) P Ot+1 | S , qt+1 = sit+1 ·

it−k+1 =1

P qt+1 = sit+1 | S , qt−k+1 = sit−k+1 , . . . , qt = sit
N
X
= αt (it−k+1 , . . . , it ) πit−k+1 ,...,it+1 pit+1 (Ot+1 ) , (1.3)
it−k+1 =1

 
where αk (i1 , . . . , ik ) = Λi1 ,...,ik · kj=1 pi j O j , for 1 ≤ j ≤ k. The required probability is then
Q

obtained by P (O | S ) = iNT −k+1 ,...,iT =1 αT (iT −k+1 , . . . , iT ) as the sum of all forward variables.
P

Instead of computing variables forward in time, the backward algorithm proceeds in a


similar way but backward in time. The backward variable is defined as

βt (i1 , . . . , ik ) = P Ot+k , . . . , OT | S , qt = sit , . . . , qt+k−1 = sit+k−1 , for 1 ≤ t ≤ T − k.



1.2. Literature review 7

As in the case of the forward algorithm, this joint conditional probability can be obtained
through a recursive equation given by
N 
X
βt (i1 , . . . , ik ) = P Ot+k+1 , . . . , OT | S , qt+1 = sit+1 , . . . , qt+k = sit+k ·

it+k =1

P Ot+k | S , qt+k = sit+k · P qt+k = sit+k | S , qt = sit , . . . , qt+k−1 = sit+k−1


N
X
= βt+1 (i2 , . . . , ik , j) πi2 ,...,ik , j p j (Ot+k ) , (1.4)
j=1

where βT −t (i1 , . . . , ik ) = 1, for 0 ≤ t ≤ k − 1, and 1 ≤ i1 , . . . , ik ≤ N. This quantity will


be required for the development of the Baum-Welch algorithm [1] to solve the learning
problem in Section 1.2.5.

1.2.4 Decoding with Viterbi algorithm


Given a particular HOHMM with the parameter set S , we attempt to find out the most
probable sequence of underlying hidden states, Q = q1 , q2 , . . . , qT , from an observation
sequence O = O1 , O2 , . . . , OT . That is equivalent to caluclate argmaxQ P (Q | S , O). Since
we have
P (Q, O | S )
argmaxQ P (Q | S , O) = argmaxQ ,
P (O | S )
and P (O | S ) does not depend on Q, if argmaxQ P (Q, O | S ) is uncovered, we then resolve
the recoding problem. As a technique based on dynamic programming method for verifying
the existence of the best state sequence, the Viterbi algorithm [19] is an efficient approach
to decode the observation sequence O. To facilitate the calculation, we define a variable as
θt (it−k+1 , . . . , it ) = maxq1 ,...,qt−k P q1 = si1 , . . . qt = sit , O1 , . . . , Ot | S ,


where 1 ≤ k ≤ t ≤ T . θt (it−k+1 , . . . , it ) is the maximum probability of the partial observation


sequence up to time t and the state sequence ending at sit . The following recursive algorithm
holds for 1 ≤ k + 1 ≤ t ≤ T and 1 ≤ it+1 ≤ N

θt+1 (it−k+2 , . . . , it+1 ) = max P qt+1 = sit+1 , Ot+1 | S , q1 = si1 . . . qt = sit , O1 , . . . , Ot ·

q1 ,...,qt−k+1

P q1 = si1 . . . qt = sit , O1 , . . . , Ot | S

= max θt (it−k+1 , . . . , it ) · P Ot+1 | S , qt+1 = sit+1 ·

1≤qt−k+1 ≤N

P qt+1 = sit+1 | S , qt−k+1 = sit−k+1 , . . . , qt = sit
!
=pit+1 (Ot+1 ) max θt (it−k+1 , . . . , it ) πit−k+1 ,...,it+1 ,
1≤qt−k+1 ≤N
8 Chapter 1. Introduction

where for 1 ≤ j ≤ k and 1 ≤ i1 , . . . , ik ≤ N, the initialisation is given by

θk (i1 , . . . , ik ) = P q1 = si1 . . . qk = sik , O1 , . . . , Ok | S




k
 Y  
= P q1 = si1 . . . qk = sik | S · P O j | S , q j = si j
j=1
k
Y  
= Λi1 ,...,ik pi j O j
j=1

Eventually, we can find out argmax1≤qt−k+1 ,...,qT ≤N θT (iT −k+1 , . . . , iT ) through the Viterbi itera-
tion. To recover entire most likely state sequence, we further define a back-tracked array
that traces θt (it−k+1 , . . . , it ) for each t and it−k+1 , . . . , it as

φt+1 (it−k+2 , , . . . , it+1 ) = argmax θt (it−k+1 , . . . , it ) · πit−k+1 ,...,it+1 .


1≤qt−k+1 ≤N

Since we have φT (iT −k+1 , , . . . , iT ) = argmax1≤qt−k+1 ,...,qT ≤N θT (iT −k+1 , . . . , iT ), the remaining qt
then can be obtained recursively by setting φk (i1 , , . . . , ik ) = 0.

1.2.5 Training with expectation-maximisation algorithm


Both evaluation and decoding issues associated with HOHMMs in Sections 1.2.3 and 1.2.4
are solved under a measurable circumstance where the set of parameters S is known in
advance. However, we might encounter the situation in practice when parameters are not
directly foreknown and have to be estimated. This is the training problem that adjusts
model parameters by maximising the probability of the observed sequence under the given
HOHMM. Regular maximum likelihood criteria might not generate an optimal solution es-
pecially with unknown state paths. P(O | S ), however, can be locally maximised under the
HOHMM setting by the Baum-Welch algorithm [1], a special version of the expectation-
maximum (EM) algorithm [4]. It provides an efficient method for researching parameter
estimation in the further application of HOHMMs. We first present the general EM algo-
rithm, and then describe results of the Baum-Welch algorithm under the HOHMM setting
by following [2].

Generally, there are two main applications of the EM algorithm. One situation occurs
when likelihood functions are not analytically solvable but can be tractable by assuming
the occurrence of additional missing parameters. The other one happens where data is
incomplete or missing owing to the possible limitation of observations. In our study of
1.2. Literature review 9

HOHMMs, the state sequence Q is hidden and viewed as the missing data, whilst the
process O is observed. The EM algorithm is presented in the discrete density case for
the subsequent discussion of the Baum-Welch algorithm and to remain consistent with the
previous discussion in evaluation and decoding. We assume the complete-data likelihood
function L(S | Q, O) = P(Q, O | S ), and have

P (Q, O | S ) = P (Q | O, S ) P (O | S ) . (1.5)

Let Scbe the set of reestimated parameters that generates an improved model compared
to that of S . Considering equation (1.5) and the log likelihood function l(S
c | Q, O) =
  P  
log P O | Sc = Q log P Q | S c, we get
     
log P Q, O | S
c = log P Q | O, S
c + log P O | S
c (1.6)

By taking a sum over Q and multiplying equation (1.6) by P(Q | O, S ), we can further
transform it into
 
P O|S c X   X
log = P (Q | O, S ) log P O, Q | S c− P (Q | O, S ) log P (O, Q | S )
P (O | S ) Q Q
 
X  P (Q | O, S ) 
+ P (Q | O, S ) log    . (1.7)
Q P Q | O, Sc

 
 
It is straightforward to show that Q P (Q | O, S ) log ≥ 0. Let A S , S
P(Q|O,S ) c =
P
 
P Q|O,S
c
 
Q P (Q | O, S ) log P O, Q | S . With the previous notation, we can rewrite equation
P c
(1.7) as
   
l S c| O − l (S | O) ≥ A S , S c − A (S , S ) , (1.8)
 
where the strict inequality holds unless P Q | O, Sc = P (Q | O, S ) or Sc= S . The EM
n o
procedure is iteratively constructed from the sequence Sc(k) initiating with some value
k≥1
S
c(0) . Every iteration contains two steps:
 
1. Expectation step: calculate A S , S c(k) .
 
2. Maximisation step: determine S
c(k+1) = argmax A S , S
S
c(k) .

When S c, S in equation (1.8), an improved estimate S


cis produced by the iteration until
a stopping criterion is met to make S
c ≈ S . Then a local maximum of the likelihood is
10 Chapter 1. Introduction

reached.

Two auxiliary variables need to be defined first in terms of forward and backward variables
for properly presenting the Baum-Welch algorithm. We denote εt (i1 , . . . , ik+1 ) as the prob-
ability of being in state si1 , . . . , sik+1 at time t, . . . , t + k correspondingly given the model and
the observation sequence, where

εt (i1 , . . . , ik+1 ) = P qt = si , . . . qt+k = sik+1 | O, S




P qt = si , . . . qt+k = sik+1 , O1 , . . . , Ot+k− | S



=
P (O | S )
αt+k−1 (i1 , . . . , ik ) βt+1 (i2 , . . . ik+1 ) πi1 ,...,ik+1 pik+1 (Ot+k )
= . (1.9)
P (O | S )
The second variable ξt (i1 , . . . , ik ) is the probability of being in state si1 , . . . , sik at time
t, . . . , t + k − 1 correspondingly, given the observation sequence and the model. It can
be expressed by εt (i1 , . . . , ik+1 ) as
T
X
ξt (i1 , . . . , ik ) = εt (i1 , . . . , ik+1 ) (1.10)
ik+1=1

By summing εt (i1 , . . . , ik+1 ) over t, we get the expectation number of transitions from the
state sequence si1 , . . . , sik+1 . The expectation number of transitions from the state sequence
si1 , . . . , sik can be generated alike by summing up ξt (i1 , . . . , ik ) over t. Re-estimation for-
mulas then can be given by

ε[
t (i1 ) = ξt (i1 )
[ (1.11)

εt (i1 , . . . , ik+1 )
PT −k
π\
i1 ,...,ik+1 = PT −k PN
t=1
(1.12)
t=1 ik+1=1 εt (i1 , . . . , ik+1 )

ξt ( j)
PT −k
t=1,s.t Ot =yk
j (yk ) = Pn PT −k
p\ (1.13)
k=1 t=1,s.t Ot =yk ξt ( j)

The Baum-Welch training process thereby can be performed by assuming a starting set of
parameters S0 . Forward and backward variables, αt and βt , are then computed according to
equations (1.3) and (1.4) respectively, followed by calculating auxiliary variables εt and ξt
using equations (1.9) and (1.10). The parameters are re-estimated iteratively by equations
1.2. Literature review 11

(1.11) to (1.13) till estimates converge. As we mentioned before, the Baum-Welch algo-
rithm is viewed as a special case of the EM algorithm with the same goal of maximising
P(O | S ) by adjusting S given an HOHMM. To deal with hidden states and incomplete
data, the EM algorithm becomes significantly powerful in the parameter estimation and
filtering processes of HOHMMs.

1.2.6 Change of measure method in HOHMMs

To address filtering problems of HOHMMs in our research, relevant processes and optimal
filters can be generated generally through two ways: the semi-martingale method and the
change of probability measure approach. The prior one is direct but extremely complicated,
whilst the latter one is indirect but efficient in filtering applications. Zakai [22] pioneers
the application of change of measure technique in stochastic filtering. Elliott et al. [6] de-
rive optimal filters with this method for HMMs based on Girsanov’s Theorem. Mamon et
al. [14] develop closed-form solutions of the recursive filters for estimating optimally the
parameters of a commodity price model via change of measure.

Considering the wide use of this approach mostly in HMMs, one additional but critical
step ought to be implemented for our HOHMM setting. That is to transform a higher-order
Markov chain into a first-order Markov chain rather than to derive the change of probability
measure for HOHMMs directly. Kriouile et al. [10] develop an equivalent model specific
to transiting a second-order discrete HMM to a regular HMM. An detailed example is de-
scribed to convert a second-order two state Markov chain into a regular 2-state one with
efficient recursive algorithms [15]. Based on these previous work for processing second-
order HMMs, Du Preez extends order-reducing algorithms to a general HOHMM setting,
which enables any HOHMM to be transformed into its corresponding first-order HMM [5].
The essential idea is similar to the common method that converts higher-order differential
equations into a system of first-order differential equations. We follow their computing
algorithms by introducing a mapping variable that transforms a higher-order Markov chain
into a regular one. Then the change of probability measure method can be utilised to find
our filtering algorithms.

Equivalent to the real world measure P, an ideal measure P̃ is defined as a reference prob-
ability from a discrete-time version of Girsanov’s Theorem [9]. Under P̃, observations are
independent and identically random variables, and the Markov chain follows the same dy-
12 Chapter 1. Introduction

Figure 1.1: Reference measure optimal filter derivation for HOHMMs

namics as those under P. To find optimal filters, an easy-to-compute framework then can
be carried out by Fubini’s Theorem, which allows the interchange of expectations and sum-
mations [11]. The results and calculations under the ideal measure P̃ can be traced back to
the real world measure P by invoking a reverse measure change. The derivation process of
reference probability optimal filters can be explicitly illustrated in Figure 1.1.

1.3 Structure of the thesis

This thesis consists of six chapters. An overview of HMMs and underpinnings of HOHMMs
is given in this Chapter; see previous section. The main contents focus on the results of
four related projects. We develop HMM and HOHMM settings for the dynamics of daily
average temperatures (DATs) for weather derivatives in Chapters 2 and 3, respectively. The
modelling, filtering and estimation problems for the respective electricity-spot and salmon
futures prices are addressed in Chapters 4 and 5. Some concluding remarks are given in
Chapter 6. The synopsis of the projects are briefly introduced below.
1.3. Structure of the thesis 13

1.3.1 Putting a price tag on temperature

A model for the evolution of DATs is put forward to support the analysis of weather deriva-
tives. The goal is to capture simultaneously the mean-reversion, seasonality and stochas-
ticity properties of the DATs process. An OU process modulated by an HMM is proposed
to model the both mean-reversion and stochasticity of a deseasonalised component. The
seasonality part is modelled by a combination of linear and sinusoidal functions. OU-
HMM filtering algorithms are established for the evolution of switching model parameters.
Consequently, adaptive parameter estimates are obtained. Numerical implementation of
the estimation technique using a data set compiled by the National Climatic Data Center
(NCDC) was conducted. A sensitivity analysis of the option prices with respect to model
parameters is included.

1.3.2 A self-updating model driven by a higher-order hidden Markov


chain for temperature dynamics

We develop a model for the evolution of DATs that could benefit the analysis of weather
derivatives in finance and economics as well as the modelling of time series data in meteo-
rology, hydrology and other branches of the sciences and engineering. Our focus is to cap-
ture the mean-reverting, seasonality, memory and stochastic properties of the temperature
movement and other time series exhibiting such properties. To model both mean-reversion
and stochasticity, a deseasonalised component is assumed to follow an OU process mod-
ulated by a higher-order hidden Markov chain, which takes into account short/long range
dependence in the data. The seasonality part is modelled through a combination of linear
and sinusoidal functions with appropriate coefficients and arguments. Furthermore, we put
forward a parameter estimation approach that establishes recursive HOHMM filtering algo-
rithms customised for the regime-switching evolution of model parameters. Quantities that
are functions of HOHMC characterise these filters. Utilising the EM method in conjunction
with the change of measure technique, optimal and self-updating parameter estimates are
obtained. We illustrate the numerical implementation of our model and estimation tech-
nique using a 4-year Toronto DATs data set compiled by the NCDC. We perform pertinent
model selection and validation diagnostics to assess the performance of our methodology.
It is shown that a 2-state HOHMM-based model best captures the empirical character-
istics of the temperature data under examination on the basis of various error-based and
information-criterion metrics.
14 Chapter 1. Introduction

1.3.3 A higher-order Markov chain-modulated model for electricity


spot-price dynamics

Over the last three decades, the electricity sector worldwide underwent massive deregula-
tion. Power market participants have encountered a growing number of challenges due to
competition and other pertinent factors. As electricity is a non-storable commodity, its price
is extremely sensitive to changes in supply and demand. The evolution of electricity prices
exhibits pronounced mean reversion and cyclical patterns, possesses extreme volatility and
relatively frequently occurring spikes, and manifests presence of memory property. These
observed features necessitate the development of models aimed to simultaneously capture
such price characteristics for forecasting, risk management, and valuation of electricity-
driven derivatives. This study tackles the modelling and estimation problems under a new
paradigm that integrates the deterministic calendar seasons and stochastic factors govern-
ing electricity prices. The de-seasonalised component of our proposed model has both the
jump and mean-reverting properties to account for spikes and periodic cycles alternating
between lower price returns and compensating periods of higher price returns. The pa-
rameters of the de-seasonalised model components are also modulated by a higher-order
hidden Markov chain (HOHMC) in discrete time. This provides a mechanism to extract
latent information from historical data. The HOHMC’s state is interpreted as the “state of
the world” resulting from the interaction of various forces impacting the electricity market.
Filters are developed to generate optimal estimates of HOHMC-relevant quantities using
the observation process, and these provide online estimates of model parameters. Empiri-
cal demonstrations using the daily electricity spot prices, compiled by the Alberta Electric
System Operator (AESO), show that our HOHMM approach has considerable merits in
terms of price data fitting and forecasting metrics. Implications of our model to the pricing
of an electricity forward contract are also examined.

1.3.4 Modelling and forecasting futures-prices curves in the Fish Pool


market

This project aims to capture the evolution and salient features of fish-futures prices with a
flexible and dynamic approach via a higher-order hidden Markov model (HOHMM). The
parameters of a proposed futures-price model under a multivariate setting are governed by
a discrete-time HOHMM to account for the random switching of market or economic states
over time. Multi-dimensional filters derived, along with the application of the expectation-
1.3. Structure of the thesis 15

maximisation (EM) algorithm, give rise to a complete on-line estimation of futures-price


model parameters. Our numerical implementation analyses the Fish Pool’s salmon futures
prices. The goodness of fit and forecasting performance of our approach are assessed using
appropriate statistical metrics. Our empirical findings illustrate that the HOHMM-based
approach possesses better fitting capacity and excellent short-run predictive ability of fu-
tures prices vis-á-vis certain modelling benchmarks.
References

[1] L. E. Baum, T. Petrie, G. Soules and N. Weiss. A maximization technique occurring


in the statistical analysis of probabilistic functions of markov chains. The Annals of
Mathematical Statistics 41(1) (1970), 164–171.

[2] W. Ching. Markov Chains: Models, Algorithms and Applications (2nd;2;2nd 2013;
ed).

[3] O. Cappé, E. Moulines and T. Ryden. Inference in Hidden Markov Models (2006).

[4] A. P. Dempster, N. M. Laird and D. B. Rubin. Maximum likelihood from incomplete


data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Method-
ological), 39(1) (1977), 1–33.

[5] J. A. Du Preez. Efficient training of high-order hidden markov models using first-order
representations. Computer Speech & Language, 12(1)(1998), 23–39.

[6] R. J. Elliott, L. Aggoun and J. B. Moore. Hidden Markov Models: Estimation and
Control (Corr. 2nd. printing. ed.) (1997).

[7] R. J. Elliott and H. Yang. Forward and backward equations for an adjoint process,
Festschrift for G. Kallianpur, Springer Verlag, Berlin, Heidelberg, New York. (1992).

[8] J. D. Hamilton. A new approach to the economic analysis of nonstationary time series
and the business cycle. Econometrica 57(2)(1989), 357–384.

[9] B. H. Juang and L. R. Rabiner. Spectral representations for speech recognition by


neural networks-a tutorial (1992).

[10] A. Kriouile, J. Mari and J. Haon. Some improvements in speech recognition algo-
rithms based on HMM (1990).

16
REFERENCES 17

[11] M. Loeve. Probability Theory, fourth edn, Springer Verlag, Berlin, Heidelberg, New
York. (1978).

[12] N. Lobato and N. E. Savin. Real and spurious long-memory properties of stock-
market data. Journal of Business & Economic Statistics 16(3) (1998), 261-268.

[13] J. McCarthy. Tests of long-range dependence in interest rates using wavelets. Quar-
terly Review of Economics and Finance 44(1) (2004), 261–268.

[14] R. Mamon, C. Erlwein, B. Gopaluni, Adaptive signal processing of asset price dy-
namics with predictability analysis, Information Sciences, 178 (1) (2008), 203–219.

[15] I. L. MacDonald and W. Zucchini. Hidden Markov and Other Models for Discrete-
Valued Time Series (1997).

[16] L. Rabiner, A tutorial on hidden Markov models and selected applications in speech
recognition, Proceedings of the IEEE, 77 (2) (1989), 257–286.

[17] T. Rydén, T. Teräsvirta, and SÅsbrink. Stylized facts of daily return series and the
hidden Markov model. Journal of applied econometrics. 5(1)(1998), 217–44.

[18] T. Siu, W. Ching, E. Fung, M. Ng, X. Li, A high-order markov-switching model for
risk measurement, Computers and Mathematics with Applications, 58 (1) (2009), 1–
10.

[19] A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum de-
coding algorithm. IEEE Transactions on Information Theory 13(2)(1967), 260-269.

[20] X. Xi, R. Mamon and M. Davison. A higher-order hidden markov chain-modulated


model for asset allocation. Journal of Mathematical Modelling and Algorithms in
Operations Research 13(1)(2014),59–85.

[21] S. Yu, Z. Liu, M. S. Squillante, C. Xia and L. Zhang. A hidden semi-markov model
for web workload self-similarity (2002).

[22] M. Zakai. On the optimal filtering of diffusion processes. Zeitschrift f r Wahrschein-


lichkeitstheorie Und Verwandte Gebiete 11(3)(1969), 230–243.
Chapter 2

Putting a price tag on temperature

2.1 Introduction
We are all familiar with car and home insurance policies. The cost and overall value of these
contracts are determined by the likelihood of occurrence of particular calamities, disasters
and accidents, amongst other risk factors. But, how does one insure against the vagaries of
weather? Specifically, what is the price of a contract whose value depends on how hot or
cold it will be on a given day in the future?

We shall consider a modelling approach that provides support for temperature-linked con-
tracts and weather-related derivatives. It is estimated that about 39.1% of the U.S. gross do-
mestic product is weather-sensitive [13], and over 90% of weather derivatives are temperature-
based [14]. The first weather-derivative transaction took place in 1997 executed by Aquila
Energy and embedded in a power contract [8], and in 1999 the first exchange-traded tem-
perature derivative was launched in the Chicago Mercantile Exchange (CME). The volume
of weather derivatives, since then, has grown rapidly both in the exchange and over-the-
counter (OTC) markets.

Weather-driven futures and options trading in the CME are written on the following indices:
temperature heating degree days (HDD), cooling degree days (CDD) and cumulative av-
erage temperature (CAT). These indices are not tradeable or storable. Hence, the usual
no-arbitrage valuation methodology is not necessarily valid in pricing such contracts [10].
Assuming the existence of an appropriate pricing measure, our aim is to develop a model
that could accurately capture the salient features of the temperature data aiding the accurate
pricing of weather derivatives.

18
2.1. Introduction 19

In the literature, several studies were conducted to deal with the valuation of weather deriva-
tives. The historical burn analysis (HBA) is adopted by most investors given its ease of
replication [12]. This is because the HBA assumes that the distribution of the expected
value of the temperature-based derivatives simply follow the historical data; there is no
construction of any stochastic models and fitting for the dynamics of the underlying vari-
ables. It is argued, however, in [25] that HBA is bound to be biased and prone to large
pricing errors. As a suitable alternative, time-series methods are proposed [25] and the
marginal substitution value principle of mathematical economics could be employed for
pricing [10]. Indifference pricing was also considered in [38], and a comparison of HBA,
Black-Scholes-Merton approximation, and an equilibrium Monte Carlo simulation was ex-
amined in [32].

The Ornstein-Uhlenbeck (OU) process is tapped in reproducing the dynamic behaviour of


the DATs and hence in generating the temperature-based indices. This is aligned to the
concept of adopting a continuous-time stochastic process to capture the dynamics of tem-
perature [11]. Incorporating seasonalities [1] enriches the model. In [3], DAT variations
are modelled with an OU process with innovations following the generalised hyperbolic
Levy process.

A one-state stochastic process cannot describe the behaviour of temperature with great pre-
cision and flexibility, especially when there are drastic changes in regimes attributed to
occasional climatic changes. Thus, the regime-switching OU model is considered in [14],
which seems to be the first paper embedding a regime-switching approach to capture the
evolution of the time series temperature data. However, no dynamic estimation, not even
a static one, was provided in [14] leaving a big gap in implementing a regime-switching
approach involving observed data until the publication of [37] that employs a higher-order
HMM. We note though that [37], extending the developments in [36], focuses only estima-
tion and model fitting, and the pricing and risk measurement of pertinent contracts under
such a model setting remain unaddressed. In this chapter, we apply hidden Markov model
(HMM) filtering algorithms to provide optimal parameter model estimates for a successful
implementation of the valuation and risk management of weather futures and option prod-
ucts.

Our exposition is organised in the following way. Section 2.2 presents a self-contained for-
20 Chapter 2. Putting a price tag on temperature

mulation of the temperature modelling using a discrete-time HMM modulating the model
parameters. In Section 2.3, we derive recursive filtering equations for various quantities
of HMM via the change of reference probability measure method. Our self-updating pa-
rameter estimation scheme is laid out in Section 2.4. Numerical work demonstrating the
applicability of our proposed model and estimation technique to a four-year Toronto tem-
perature data set is detailed in 2.5; and in this section, mechanics of the best-model selection
are also outlined through forecasting and penalised log-likelihood criteria. In Section 2.6,
the valuation of a temperature option is discussed and a sensitivity analysis of the option
price with respect to parameters of our proposed model is included. Section 2.7 gives some
concluding remarks.

2.2 Model description

Notation: All vectors and matrices will be denoted by bold English/Greek letters in lower-
case and bold capitalised English/Greek letters, respectively.

2.2.1 Model for temperature derivatives

As weather derivatives are written on HDD, CDD and CAT, we put forward a model to
capture these measurements. HDD and CDD quantify the respective demands of heating
and cooling of a particular location. Since temperature could be observed at any time of a
given day, given the time interval [τ1 , τ2 ] with τ1 < τ2 , we can regard it as an Itô process so
that the respective continuous-time functional forms of HDD, CDD and CAT are
Z τ2
HDD = max (T base − T s , 0) ds,
Z τ1τ2
CDD = max (T s − T base , 0) ds,
τ1
Z τ2
and CAT = T s ds, (2.1)
τ1

where T base is the base temperature, usually given as 65◦ F, or 18◦ C in the market. In
practice, the daily average temperatures (DATs) are measured on a daily basis so that the
2.2. Model description 21

temperature-based indices must be considered in discrete time defined by


τ2
X τ2
X τ2
X
HDD = max (T base − T t , 0) , CDD = max (T t − T base , 0) , and CAT = Tt ,
t=τ1 t=τ1 t=τ1

(2.2)

T max + T min
where T t is the DAT on day t computed as . Clearly, CAT is simply the sum
2
of DATs over the contract period. Weather contracts typically mature in a month or sea-
son. An HDD contract’s period spans the period covering October in one year to April of
the next year. A CDD contract’s period covers the warm season, i.e., starting April until
October of the same year. Overlapping months, October and April, in the CDD and HDD
contracts are termed as transition or shoulder months. Following the rationale in [5], the
needed indices as well as the dynamic representation of DATs will be obtained from T t ,
which we are going to model.

Suppose (Ω, F , P) is an underlying probability space for the process T t given by,

T t = Xt + S t . (2.3)

Equation (2.3) is composed of Xt that follows an OU process with HMM-driven parameters


and S t described by a deterministic function to capture the seasonal trends. Motivated by
[5], the seasonal component is
3 " ! !#
X 2π 2π
S t = at + b + ch sin dh t + eh cos dh t , (2.4)
h=1
365 365

where d1 = 1, d2 = 2, d3 = 4 to account for the respective yearly, semi-annual and quarterly


patterns.

2.2.2 The HMM-modulated OU process


The OU process Xt in (2.3) satisfies the stochastic differential equation (SDE)

dXt = α(θ − Xt )dt + ξdBt , (2.5)

where Bt is a standard Brownian motion under P. The solution, by Itô’s lemma, of (2.5) is
Z t
Xt = X s e−α(t−s)
+ (1 − e −α(t−s)
)θ + ξe−αt
eαu dBu (2.6)
s
22 Chapter 2. Putting a price tag on temperature

for s ≤ t. The explicit representation of the temperature process T t is then given by


Z t
Tt = S t + Xse −α(t−s)
+ (1 − e −α(t−s)
)θ + ξe−αt
eαu dBu . (2.7)
s

The discretised versions of (2.6) and (2.7), by approximating the distributions of their re-
spective stochastic-integral components, are given by
r
1 − e−2α4tk+1
= e−α4tk+1 Xk + 1 − e−α4tk+1 θ + ξ zk+1 ,

Xk+1 (2.8)

where 4tk+1 = tk+1 − tk and {zk+1 } is a sequence of independent and identically distributed
(IID) standard normal random variables.

As indicated in Section 2.1, having constant parameters in the OU process might not be
adequate in describing the dynamic switches of regimes resulting from the interactions of
various market and economic factors. In the succeeding discussion, the parameters in (2.8)
will be governed by a homogeneous Markov chain yk with finite states in discrete time to
model the behaviour of Xt and hence, T t .

To simplify the ensuing algebraic calculations, the state space is associated with the canon-
ical basis of RN , which is {e1 , e2 , . . . , eN }, ei = (0, . . . , 0, 1, 0, . . . , 0)> with 1 in the ith
position. Here, N is the total number of states and > denotes the transpose of a matrix.
The dynamics of yk is yk+1 = Πyk + ζ k+1 , where Π = (π ji ) ∈ RN×N , ζ k+1 is a martingale
increment, and π ji = P(yk = e j |yk−1 = ei ) with Nj=1 π ji = 1. Let P = (p ji ) ∈ RN×N de-
P

note the intensity matrix for the continuous-time Markov chain process at time t, where,
π (t, t+∆t)
p ji = lim∆t→0 ji ∆t , for j , i, (cf Cox and Miller [7]), with p ji ≥ 0 and i=1 p ji = 0.
PN

Given that our HMM filtering methodology is under the discrete-time framework, it is nec-
essary to recall the connection of the rate matrix P to the transition probability matrix Π,
which is its discrete-time counterpart providing inputs to the recursive filtering equations
in Section 2.3 and eventually to the option pricing formulae in Section 2.6. From Grimmett
and Stirzaker [27] (see as well Siu, et al [33]), Π is the exponential matrix of P, i.e.,

X Pk
Π = exp (P) = . (2.9)
k=0
k!

A handy technique to convert an N × N matrix P into an N × N Π is to take advantage


of the diagonalisability of P inherent in our modelling formulation. If P has eigenvalues
2.3. Recursive filtering 23

b1 , b2 , . . . , bN and corresponding eigenvectors j1 , j2 , . . . jN , then P = JBJ−1 , where B is a


diagonal matrix with b1 , b2 , . . . , bN as its diagonal elements, and J = {j1 , j2 , . . . jN } is an
invertible matrix. Furthermore,
 
exp (P) = exp JBJ−1
B2 −1
=I + JBJ + J J + · · ·
−1
2! !
B2
=J I + B + + · · · J−1
2!
=J exp (B) J−1 . (2.10)

Both (2.9) and (2.10) can be evaluated numerically by almost all mathematical or statistical
software packages nowadays.

We assume the process Xk in (2.8) have parameters that depend on yk , i.e.,


s
1 − e−2α(yk )4tk+1
Xk+1 = e−α(yk )4tk+1 Xk + (1 − e−α(yk )4tk+1 )θ(yk ) + ξ(yk ) zk+1 , (2.11)
2α(yk )

s
1 − e−2α(yk )4tk+1
and T k+1 = S k+1 + e−α(yk )4tk+1 Xk + (1 − e−α(yk )4tk+1 )θ(yk ) + ξ(yk ) zk+1 .
2α(yk )
(2.12)

With yk ’s state space, αk := α(yk ) = hα, yk i, θk := θ(yk ) = hθ, yk i, and ξk := ξ(yk ) = hξ, yt i,
where h·, ·i is the inner product in RN . Clearly, yk makes the models in (2.11) and (2.12)
regime-switching.

2.3 Recursive filtering

2.3.1 Change of reference probability measure


We shall introduce an ideal probability measure P̃, where calculations of the recursive fil-
ters are manageable to carry out. This is because under P̃, the observations are independent,
identically distributed (IID) random variables. The real measure P can be recovered via the
change of measure technique. The state of the HMM will be estimated from the noisily ob-
served data set. Under the reference measure P̃, the dynamics of yk remain unchanged but
24 Chapter 2. Putting a price tag on temperature

are independent of the observed data series; see the underlying principle utilised in Elliott
et al. [19].

A discrete-time version of the Girsanov’s theorem enables us to back out P from P̃ [19].
The real-world measure P is recovered by considering the Radon-Nikodym derivative
k
dP Y
= Ψk = ϕl , k ≥ 1,
dP̃ Fk l=1

where
   1 
1−e−2α(yl−1 )4tl−1 − 2
φ ξ(yl−1 )−1
2α(yl−1 )
β(yl−1 )
ϕl =  q  . (2.13)
1−e−2α(yl−1 )4tl−1
ξ(yl−1 ) 2α(yl−1 )
φ (Xl )

In (2.13), φ is the probability density function of a standard normal random variable,


ϕ0 = 1, {ϕl , l ∈ Z+ } is an Fl -martingale under P, and β(yl−1 ) = Xl − e−α(yl−1 ) Xl−1 − (1 −
e−α(yl−1 )4tl−1 )θ(yl−1 ). With the aid of (2.13), the derivation of the recursive filters for yk and
related quantities under P̃ is facilitated, and then the calculated results can be re-interpreted
under P via a reverse change of measure.

2.3.2 Calculation of recursive filters


Suppose Xk is the filtration generated by the observations of Xk , which ‘conceal’ the ‘true’
state of yk . We aim to obtain optimal estimates of quantities that are functions of yk under P
through their adaptive filters calculated using the conditional expectations Xk under P̃ and
using the Bayes’ theorem.

So, for instance, to estimate a scalar quantity U under P, Bayes’ theorem for conditional
expectation gives
Ẽ[Ψk Uk |Xk ]
E[Uk |Xk ] = (2.14)
Ẽ[Ψk |Xk ]

· ·
where E[ ] and Ẽ[ ] are the conditional expectations under P and P̃, respectively. As in
[18], we consider an Fl -adapted Ul , having the form

Ul = Ul−1 + gl + hhl , ol i + rl f (Xl ) (2.15)

where g, hl , and rl are F -predictable; and f is a scalar-valued function and ol = yl + Πyl−1 .


2.3. Recursive filtering 25

γk (Uk )
Write γk (Uk ) := Ẽ[Ψk Uk |Xk ]. Therefore, equation (2.14) can be expressed as .
γk (1)
Since γk (Uk ) = γk (Uk h1, yk i) = h1, γk (Uk yk )i, the filter for any adapted process U has the
representation
h1, γk (Uk yk )i
E[Uk |Xk ] = .
h1, γk (yk )i
Invoking Theorem (5.3) of [15] and tailoring it to our modelling framework and notation,
the recursion for γk (Uk yk ) is
N
X h
γk (Uk yk ) = Λi (Xk ) hei , γk−1 (Uk−1 yk−1 )iΠei + hei , γk−1 (gk yk − 1)iΠei
i=1
i
+ (diag(Πei ) − Πei ⊗ Πei )γk−1 (hk hei , yk−1 )i + γk−1 (rk hei , yk−1 i) f (Xk )Πei ,
(2.16)

·
where ⊗ stands for the tensor product of vectors, diag( ) is a diagonal matrix, and Λi (Xk )
is defined by
 

 
2 
Xl−1 + (1 − e Xl−1 + (1 − e
  −αi 4(l−1) −αi 4(l−1)   −αi 4(l−1) −αi 4(l−1)

 e )θi Xl e )θi 

 
Λ (Xl ) = exp  .
i
 
− −

  q  2  q  2 

1−e−2αi 4(l−1) 1−e−2αi 4(l−1)
ξi 2 ξi


 


 
 2αi 2αi 
(2.17)
For the estimation of model parameters in the next section, we require the recursive filters
of the following scalar quantities.

(i) The number of jumps from state er to state e s up to time k:


k
X
Jksr = hyl−1 , er ihyl , e s i = Jk−1
sr
+ π sr hyk−1 , e s i + hyk−1 , er ihok , e s i. (2.18)
l=1

(ii) The number of occupations up to time k, i.e., the amount of time that yk spent in state
er :
k
X
Okr = hyl−1 , er i = Ok−1
r
+ hyk−1 , er i. (2.19)
l=1

(iii) The auxiliary process dependent on yk for the function f up to time k in state er :
k
X
Tkr ( f ) = f (Xl )hyl−1 , er i = Tk−1
r
( f ) + f (Xk )hyk−1 , er i. (2.20)
l=1
26 Chapter 2. Putting a price tag on temperature

The estimator for the state ei of yk is readily obtained by taking Uk = U0 = 1, gk = hk =


rk = 0 in equation (2.15). It has the simplified form
N
X
γk (yk ) = Λi (Xk )Πei hei , γk−1 (yk−1 )i. (2.21)
i=1

By virtue of (2.16) with U0 = 0, Uk = Jksr , gk = π sr hyk−1 , er i, hk = hyk−1 , er ie s and rk = 0,


we get

N
X
γk (Jksr yk ) = Λi (Xk )hγk−1 (Jk−1
sr
yk−1 ), ei iΠei + Λr (Xk )γk−1 (hyk−1 , er i)π sr e s . (2.22)
i=1

Setting U0 = 0, Uk = Okr , gk = hyk−1 , er i, and hk = rk = 0 in (2.16) and using (2.14), we


obtain
N
X
γk (Okr yk ) = Λi (Xk )hγk−1 (Ok−1
r
yk−1 ), ei iΠei + Λr (Xk )γk−1 (hyk−1 , er i)Πer . (2.23)
i=1

For the auxiliary process Tkr in (2.20), where f has the form f (X) = X, f (X) = X 2 , or f (X) =
Xl+1 Xl , we utilise (2.16) with Uk = Tkr ( f ), U0 = gk = hk = 0, and rk = hyk−1 , er i to have
N
X
γk (Tkr ( f )yk ) = Λi (Xk )hγk−1 (Tk−1
r
( f )yk−1 ), ei iΠei + Λr (Xk )hγk−1 (yk−1 ), er i f (Xk )Πer .
i=1
(2.24)

2.4 Optimal parameter estimation


We recall the Expectation-Maximisation (EM) algorithm [17] starting with a family of
probability measures {Pυ , υ ∈ Υ} on (Ω, F ), where Υ is the set of parameters. Let Pυ be
absolutely continuous with reference to a fixed probability measure P0 . Assume that the
observations {x1 , x2 , · · · , xk } of Xk are available and Xk ⊂ Fk . The likelihood as a function
of information in Xk is
dPυ
" #
L(υ) = E0 Xk . (2.25)
dP0
2.4. Optimal parameter estimation 27

The goal is to find the estimate of υ that maximises L(υ), b υ ∈ argmaxυ∈Υ L(υ). Now, consider

dPυ
" #
Q(υ;b υm ) = Ebυm log bυ Xk . (2.26)
dP m
The EM algorithm is implemented as follows. First, let m = 0 and choose the initial value
υ
υ0 for υ. The first iteration is computed as Q(υ;b
b υ0 ) = Ebυ0 [log dP
dP
b υ1 is found to
υ0 |Xk ], and b

υ1 ;b
make Q(b υ0 ) ≥ Q(υ;b
υ0 ). Second, the E-step computes (2.26). Third, the M-step finds
υm+1 such that Q(b
b υm+1 ;b
υm ) ≥ Q(υ;b
υm ); b
υm+1 ∈ argmaxυ∈Υ Q(υ;b
υm ). The E-step and M-step
υm+1 −b
are repeated until some stopping criterion is met, e.g., |b υm | < ε, where ε is sufficiently
small.

 
Let δ (yk ) = e−α(yk )4tk and η (yk ) = 1 − e−α(yk )4tk θ (yk ) . From (2.8), Xk+1 has a normal
distribution with
1 − e−2α(yk )4tk
mean µ(yk ) = δ (yk ) Xk + η (yk ) and variance  2 (yk ) = ξ2 (yk ) .
2α(yk )
Hence, at state i,
2 2
1 ηi − 4tik log δi
αi = − log δi , θi = and ξi =
2
. (2.27)
4tk 1 − δi 1 − δ2i

The complete parameter estimation of (2.8) is attained by applying the EM algorithm and
using the adaptive filters (2.18), (2.19) and (2.20). We then have the next result.

Proposition 2.4.1 The EM estimates for model parameters are

Tbki (Xk−1 , Xk ) − ηi Tbki (Xk−1 )


δi =
b , (2.28)
Tbi (X 2 )
k k−1

Tbki (Xk ) − δi Tbki (Xk−1 )


ηi =
b , (2.29)
O
bi
k

   
Tbki Xk2 + δ2i Tbki Xk2 + η2i B
bi + 2η2 δi Tbi (Xk−1 ) − 2δi Tbi (Xk−1 , Xk ) − 2ηi Tbi (Xk )
k i k k k
i2 =
b ,
O
bt
k
(2.30)
28 Chapter 2. Putting a price tag on temperature

and

Jcji
π ji = k .
b (2.31)
Obi
k

Proof The proofs of (2.28)–(2.31) are outlined in Appendix A.

Remark 1: Having calculated δi , ηi and i2 in conjunction with (2.27), our proposed model
αi , b
in (2.8) is fully determined as {b ξi , b
θi } is explicitly available. The establishment of the
recursive filters implies that parameter estimates are self-calibrating, i.e., they are updated
automatically every time there is new information.

Remark 2: The filtering and parameter estimation procedure, as laid out above, is distinct
from and more clear-cut than the one given in Erlwein and Mamon [19]. In this research
work, the EM estimates are provided directly for each parameter appearing in the mean of
the OU process; whereas in [19], the EM estimates rely first on the estimate of the entire
drift component.

2.5 Numerical implementation


We implement the results from Section 2.4 on the four-year Toronto DATs (01 Jan 2011 –
31 Dec 2014 with 1461 data points) compiled by the NCDC. The seasonal component S t
is removed from T t (see equation (2.3)) before applying the recursive filters to Xt . The data
set is then grouped into batches of equal sizes to perform the filtering.

2.5.1 The deterministic component


The deterministic component S t in equation (2.3) is fitted to the entire data set to obtain
the estimates of its coefficients and arguments. Noting that S t is a linear combination of
sinusoidal functions, we use the built-in function of linear regression in the R software to
accomplish the fitting process. This is complemented by a stepwise regression procedure
to identify the dependent variable relevant to S t . The R function ‘step’ is then applied to
perform the selection of explanatory variables.
2.5. Numerical implementation 29

Mean Std Deviation Std Error Min Max Skewness Kurtosis


8.88 10.65 0.28 -19.25 31.5 -0.23 -0.92

Table 2.1: DATs’ descriptive statistics

Parameter Estimate 95% confidence interval


a -0.0018 (-0.00235, -0.00127)
b 10.2111 (9.75806, 10.66405)
c1 -5.2948 (-5.61554, -4.97416)
c2 -0.6385 (-0.95550, -0.32341)
c3 -0.4131 (-0.72794, -0.09821)
e1 -12.7240 (-13.03831, -12.40979)

Table 2.2: Parameter estimates for the seasonality component S t

Table 2.1 shows the DATs’ descriptive statistics, which could guide in selecting initial val-
ues. Table 2.2 presents the parameter estimates of the deterministic S t with an adjusted
R-squared (coefficient of determination) of 0.835. This means 83.5% of the variation in
the response T t can be explained well by the model given the regressor variables in equa-
tion (2.4) except for e2 and e3 . Therefore, these variables were eliminated. The fitted
seasonality component S t is presented in Figure 2.1 together with the actual DATs. The
characteristics of the plotted S t are congruous with the temperature movement in both the
summer and winter seasons. With the given values of T t and S t , Xt is calculated and then
treated as the observation process for our filtering and parameter estimation implementa-
tion.

2.5.2 The stochastic component

The seasonal trend of the actual data is adequately captured even in the presence of Xt
and some noise. The deseasonalised stochastic component Xt = T t − S t from (2.3) is also
depicted in Figure 2.2. We process Xt in 73 groups with 20 data points in each processing
window, where 4t = 1 day. So, we are updating roughly every 3 weeks with the aid of the
recursive filtering equations. Other filtering window sizes were also tested and we find that
the size only slightly affects the results; similar outcomes are produced even with different
window sizes.
Temperature (celcius) Temperature (Celsius)
30

-20
-10
0
10
20
-20
-10
0
10
20
30
40
01/01/2011 01/01/2011
01/03/2011 01/03/2011

01/05/2011 01/05/2011

01/07/2011 01/07/2011

01/09/2011 01/09/2011

01/11/2011 01/11/2011

01/01/2012 01/01/2012
01/03/2012 01/03/2012

01/05/2012 01/05/2012

01/07/2012 01/07/2012

01/09/2012 01/09/2012

01/11/2012 01/11/2012

01/01/2013 01/01/2013

times
times
01/03/2013 01/03/2013

01/05/2013 01/05/2013

01/07/2013 01/07/2013

01/09/2013 01/09/2013

01/11/2013 01/11/2013

01/01/2014 01/01/2014

Figure 2.2: Deseasonalised stochastic component Xt


01/03/2014 01/03/2014

01/05/2014 01/05/2014
Figure 2.1: Fitted seasonal component and actual observations

01/07/2014 01/07/2014

01/09/2014 01/09/2014
actual observations

01/11/2014 01/11/2014
fitted seasonal component
Chapter 2. Putting a price tag on temperature
2.5. Numerical implementation 31

2.5.2.1 Initial values for the parameter estimates


1
We assign for the initial transition probabilities. The other initial parameter values for
N
the filtering equations are based on the parameter estimates for the Xt process under a single
state. We consider the single-state version of (2.8), which is

Xk+1 = δXk + η + zk+1 , (2.32)

where
1 − e−2α4tk
δ = e−α4tk , η = 1 − e−α4tk θ,  2 = ξ2 .


Using equation (2.32), the likelihood function of Xk is
m
(Xk − η − δXk−1 )2
!
Y 1
L (Xk ; δ, η, ) = √ exp − , (2.33)
k=1 2π 2%2

where 1 ≤ m ≤ 1460 in our case. Equation (2.33) entails the solution to


m
(Xk − η − δXk−1 )2
!
 X 1
argmax log L (Xk ; δ, η, ) = log √ − . (2.34)
k=1 2π 2 2

We use, with appropriate adjustments in our problem formulation, the R function ‘optim’
δ = 0.6518, b
to solve equation (2.34). The results are b η = −0.001624 and b  = 0.4271.
These are employed as benchmarks to select initial values for parameter estimates in the
modelling framework with more than 1 regime. They serve as a good guide in launching
the filtering recursions producing optimal values for various quantities leading to the EM
parameter estimates of the model. We must also ensure that  > 0 to avoid bizarre outcomes
when equation (2.34) is applied.

2.5.2.2 Evolution of parameter estimates

To process the data, we apply recursive filtering equations (2.22)-(2.24), and then use the
results in 73 passes to feed into (2.28)-(2.31) to find the optimal parameter estimates. The
implementation was made on two-state and three-state HMCs, and Figure 2.3 displays the
movement, through each algorithm pass, of the optimal estimates for θ, α, and ξ under a
2-state set up. All three parameters exhibit convergence regardless of the choice of initial
values provided they do not substantially deviate from the values suggested in our one-state
ML estimation.
32 Chapter 2. Putting a price tag on temperature

The choice of initial values could impact the speed of convergence, but eventually all pa-
rameter estimates approach certain unique values. The parameter estimates in the second
state are characterised by lower mean-reverting speed, but higher mean-reverting level and
volatility in comparison to those in state 1. Figure 2.4 depicts the transition probabilities
with stable evolutions. A similar pattern in the evolution of parameter estimates for θ, α,
and ξ under the 3-state model is shown in Figure 2.5. In this case, we see that the mean-
reverting level and volatility in state 1 are the highest but with the lowest speed. The prob-
ability transition estimates π ji are displayed in Figure 2.6. The behaviours of the parameter
evolutions in a two-state setting are similar. Stability in the evolution of estimates for θ, α, ξ
and Π is attained by the self-calibrating HMM algorithm after approximately 26 algorithm
passes in both the 2-state and 3-state HMM settings. To quantify the variance of the various

 2 
estimators, we calculate the Fisher information matrix I (v) = −Ev ∂v 2 log L (X; v) , which

bounds the asymptotic variance of the MLEs and v is a vector of parameters. The MLE is
consistent and possesses an asymptotically normal sampling distribution [36]. Hence, we
utilise the limiting distribution of the MLE for b v to dynamically obtain the 95% confidence
1
interval. For a generic (scalar) MLE b vi ∈ b vi ± 1.96 p  .
v, this is given by b
I bv ii

The entries of the Fisher information matrix are derived from the log-likelihood expressions
given in the Appendix A and the results are summarised below.
 
  J cji Tbi X 2
k k−1 Obi
I π ji = 2 , I (δi ) =
k
, I (ηi ) = 2k , and
π ji i2
i
O
bi 3  bi  2  bi 2  
I (i ) = − k
+ T X + O ϑ + δ T
2 bi
X 2
− 2Tbki (Xk ) ηi
i
2
i4 k k k t i k k−1

− 2Tbi (Xk−1 , Xk ) δi + 2Tbi (Xk−1 ) ηi δi .
k k

The standard errors (SEs) of all parameter estimates for the 1-, 2- and 3-state models are
getting smaller as more algorithm passes are being run. The narrow ranges of the SEs
shown in Table 2.3 indicate that precise estimates are achieved by the proposed estimation
technique here (via HMM filtering and the EM algorithm).

2.5.3 Analysis of regime and model selection


In order to find out the best-fitting model that captures the dynamics of the process Xt , we
perform a one-step ahead forecasting under the HMM setting, i.e., comparing the predic-
tions of the stochastic component and DATs. This is supplemented with an AIC analysis
2.5. Numerical implementation 33

0.0
θ
-0.4

θ1
θ2
-0.8

10 20 30 40 50 60 70
Algorithm steps
1.0

α1
α2
α
0.6
0.2

10 20 30 40 50 60 70
Algorithm steps
1.1

ξ1
ξ2
ξ
0.9
0.7

10 20 30 40 50 60 70
Algorithm steps

Figure 2.3: Evolution of parameter estimates for θ, α, and ξ under a 2-state model
34 Chapter 2. Putting a price tag on temperature

0.8
Transition probabilities

π11
π21
π12
π22
0.6
0.4
0.2

10 20 30 40 50 60 70
Algorithm steps

Figure 2.4: Evolution of transition probability under a 2-state model

Parameter 1-state model 2-state model 3-state model


Estimates Bound of SEs
Lower Upper Lower Upper Lower Upper
δi
b 3.62152×10−90 0.84905 2.88315×10−93 5.69605×10−2 1.23351×10−89 0.50619
ηi
b 3.13262×10−90 0.81819 2.49571×10−93 5.04276×10−2 1.28888×10−89 0.50535
i
b 2.21510×10−90 0.54026 1.08302×10−93 2.36474×10−2 7.17870×10−90 0.34230
π ji
b 4.79291×10−90 1.08185 1.24810×10−93 3.60257×10−2 3.52893×10−90 0.34041

Table 2.3: Interval of standard errors for parameter estimates under 1-, 2- and 3-state mod-
els
2.5. Numerical implementation 35

1.0
0.0
θ

θ1
θ2
θ3
-1.0

10 20 30 40 50 60 70
Algorithm steps

α1
1.2

α2
α3
α
0.6
0.0

10 20 30 40 50 60 70
Algorithm steps
0.6 0.8 1.0 1.2

ξ1
ξ2
ξ3
ξ

10 20 30 40 50 60 70
Algorithm steps

Figure 2.5: Evolution of parameter estimates for θ, α, and ξ under a 3-state model
36 Chapter 2. Putting a price tag on temperature

Transition probabilities
π11

0.7
π21
π31
π12
π22

0.5
π32
π13
π23
π33
0.3
0.1

10 20 30 40 50 60 70
Algorithm steps

Figure 2.6: Evolution of transition probability under a 3-state model

as a guide in choosing the optimal number of states.

2.5.3.1 Assessment of predicted temperature values

We evaluate the one-step ahead predictions for DATs as an additional diagnostic for model
fitting. The expected value of the Xk+1 given the available information up to time k provides
the forecast

E (Xk+1 | Xk ) = E (δ (yk ) Xk + η (yk ) +  (yk ) zk+1 | Xk ) = δ,b


yk Xk + η,b
yk .



(2.35)

Figures 2.7 illustrates the one-step ahead forecasts for the deseasonalised process Xk vis-á-
vis DATs under a 2-state HMM set-up in accordance with (2.35).

A comparison of the observed seasonal HDD and the expected seasonal HDD for the en-
tire period (2011-2014) under the 2-state model is presented in Figure 2.8. Clearly, the
expected seasonal HDD obtained from the model follows closely the actual seasonal HDD.
Furthermore, we examine an upward 6-month period, in a magnified view of forecasting,
of the DATs under all 3 states settings as shown in Figure 2.9. Visually, all forecasts are
quite close to the actual DATs and the temperatures’ trends and dynamics are captured well
by our proposed self-calibrating estimation approach.
Temperature (Celsius) Temperature (Celsius)

-20
-10
0
10
20
30
-15
-10
-5
0
5
10
15
01/02/2011 01/02/2011

03/02/2011 03/02/2011

05/02/2011 05/02/2011

07/02/2011 07/02/2011

09/02/2011 09/02/2011

11/02/2011 11/02/2011

01/02/2012 01/02/2012
2.5. Numerical implementation

03/02/2012 03/02/2012

05/02/2012 05/02/2012

07/02/2012 07/02/2012

09/02/2012 09/02/2012

11/02/2012 11/02/2012

01/02/2013 01/02/2013

times
times
03/02/2013 03/02/2013

05/02/2013 05/02/2013

07/02/2013 07/02/2013

09/02/2013 09/02/2013

11/02/2013 11/02/2013

01/02/2014 01/02/2014

03/02/2014 03/02/2014

05/02/2014 05/02/2014

Figure 2.7: One-step ahead forecasts under a 2-state model


07/02/2014 07/02/2014
actual DATs

09/02/2014 09/02/2014
forecast of DATs
actual deseasonalized observations

11/02/2014 11/02/2014
forecast of deseasonalized observations
37
Cumulative HDD
Temperature (Celsius)
38

0 1000 2000 3000 4000

5
10
15
20
25
30
Oct
Nov
04/01/2011
Dec
04/08/2011
Jan
04/15/2011

2011-2012
Feb
04/22/2011
Mar

actual HDD
04/29/2011

predicted HDD
05/06/2011 Apr
05/13/2011
Cumulative HDD
05/20/2011

05/27/2011
0 1000 2000 3000 4000

06/03/2011 Oct
06/10/2011 Nov
06/17/2011 Dec
06/24/2011
Jan
07/01/2011
2012-2013

Feb

times
07/08/2011
Mar
actual HDD

07/15/2011
predicted HDD

07/22/2011 Apr

07/29/2011
Cumulative HDD
08/05/2011
0 1000 2000 3000 4000
08/12/2011
Oct
08/19/2011

08/26/2011 Nov

09/02/2011 Dec

actual DATs
09/09/2011 Jan
2013-2014

09/16/2011
Feb
09/23/2011

Figure 2.9: Comparison of one step ahead forecasts in 1-, 2-, and 3-state HMMs
Mar
actual HDD

Figure 2.8: Comparison of the expected HDD and actual HDD in a 2-state model

09/30/2011
predicted HDD

forecast DATs in 3-state HMM


forecast DATs in 2-state HMM
forecast DATs in 1-state HMM
Apr
Chapter 2. Putting a price tag on temperature
2.5. Numerical implementation 39

HMM setting Deseasonalised Xk DATs T k


1-state 2-state 3-state 1-state 2-state 3-state
MSE 0.42702 0.42700 0.42757 10.67538 10.67504 10.68929
RMSE 0.65346 0.65345 0.65389 3.26732 3.26727 3.26945
MAE 0.50467 0.50465 0.50584 2.52334 2.52324 2.52921
RAE 0.74435 0.74432 0.74608 0.27539 0.27538 0.27603

Table 2.4: Results of error analysis for Xk and T k under different model settings

2.5.3.2 Likelihood-based model selection and error analysis

Visual inspection is not sufficient in deciding which model does the job well especially
when the actual values are very close to the predicted values under several competing mod-
els. We, therefore, provide quantitative approaches in gauging the ‘best-performing’ model
on the basis of information-criterion and minimised-error metrics. Following [19], we ex-
amine the mean-squared error (MSE), root-mean-squared error (RMSE), absolute-mean
error (MAE), and relative-absolute error (RAE) in assessing the one-step ahead predictions
under the 1-, 2- and 3-state HMMs. Suppose, k = 1, 2, . . . , Xk is the actual value of the
bk is the predicted value at time k given available data up to time k − 1, X̄
process at time k, X
is the mean of all Xk ’s, and K = 1460 is the total number of predicted observations. Then
the error metrics are computed as

2 s 2
PK  b PK  b
k=1 X k − Xk k=1 X k − X k
MSE = , RMSE = ,
K K
PK P K
k=1 | Xk − Xk | k=1 | Xk − Xk |
b b
MAE = , and RAE = PK .
K k=1 | Xk − X̄|
b

Table 2.4 displays the various error values concerning the one-step ahead predictions for
the model’s stochastic component Xk and observed DATs T k . Whilst the differences in er-
rors are generally small, the 2-state model outperforms the 1- and 3-state models under all
4 error measures. We also investigated the 4-state HMM but found no evidence of even
further minimal improvements. Increasing the number of regimes may outweigh the bene-
fit of model flexibility as there is a corresponding increase as well in the parameters to be
estimated.
40 Chapter 2. Putting a price tag on temperature

To evaluate the statistical significance for the mean difference of errors, a t-test involving
three pairs of model settings is conducted on the RMSEs generated using the bootstrapping
method. Firstly, a bootstrapped sample of the same size 1460 is generated by sampling
with replacement from the original squared errors for each model setting. Secondly, the
square root of the average squared errors for every bootstrapped sample is computed. After
running the two-step bootsrapped procedure 10,000 times, the bootstrapped RMSEs are
then produced for the paired t-test.

We report in Table 2.5 the Bonferroni-corrected p-values accounting for the pairwise set-
tings. With a significance level of 5%, we see that the results for the 1- and 2-state settings
are statistically different and the same conclusion holds for the results involving 1- and 3-
state settings. The results for the 2- and 3-state settings though are not statistically different.
This implies two things: (1) there is a benefit in adding regime-switching features to the
model setting; and (2) a 2-state model is sufficient for the DATs series under examination.
These findings support the observations in Section 2.5.3.1.

1-state 1-state 2-state


t-test vs vs vs
2-state 3-state 3-state
p-value 0.01259 7.06352×10−5 0.57193

Table 2.5: Bonferroni-adjusted p-values of a t-test applied on bootsrapped-generated RM-


SEs

To balance the model’s goodness of fit and complexity, we also adopt the Akaike Infor-
mation Criterion (AIC) in choosing the ‘best-performing’ model. The AIC is calculated
as

AIC = −2 log L (X; v) + 2ω, (2.36)

where ω is the number of estimated parameters included in the model, and log L (X; v)
denotes the model’s loglikelihood evaluated using the data X and a set of parameters v. The
preferred model has the lowest AIC value. In terms of the observed Xk ’s and parameter set
v, the loglikelihood is given by
B 
 Xk − η (yk−1 ) − δ (yk−1 ) 
  
X 1
log L (Xk ; v) =  ,
 
log  √  − (2.37)
k=1 2π (yk−1 ) 2 2 (yk−1 )
2.6. Application to the pricing of temperature-dependent contracts 41

where B is the number of observations in each algorithm pass.

From equation (2.36), a model that has high loglikelihood is favoured and a complex model
(in the sense of having many parameters) is penalised. Looking at (2.11) and Π, we obtain
the number of parameters to be estimated for each model setting and this is shown in Table
2.6. These inputs are necessary to complete the calculation of (2.36).

Setting 1-state 2-state 3-state N-state


No. of parameters 3 8 15 3N + (N − 1) N

Table 2.6: Number of parameters to be estimated under HMM

With the aid of equation (2.37) and Table 2.6, we calculated the AIC values in Table 2.7
for the HMM settings as well as the setting with no regime-switching. Table 2.7 shows that
both the 2- and 3-state HMM settings have lower AIC values than that of the 1-state model.
It indicates that there is merit in embedding the regime-switching feature into the model.
Also, the 2-state HMM model produces the lowest AIC value. So, this result together with
the error analysis leads us to conclude that the 2-state HMM setting is the most suitable for
our data set under examination.

Model Number of parameters AIC


1-state model 3 2899.7982
2-state HMM 8 2888.4031
3-state HMM 15 2897.7614

Table 2.7: AIC analysis

2.6 Application to the pricing of temperature-dependent


contracts
The parameter estimation and error analysis performed in the previous sections indicate
that a 2-state HMM framework presents a better performance in capturing the dynam-
ics of DATs and HDD index. Therefore, we shall utilise the optimal estimates generated
through the filtering recursions in pricing weather contracts. In this section, mathematical
expressions for the price of HDD futures and options are derived with the consideration
of the price of risk. Furthermore, a numerical example on pricing a European-call op-
tion via simulation is presented. A sensitivity analysis, under a 2-state HMM setting, is
42 Chapter 2. Putting a price tag on temperature

Regions Contract types Contract periods One contract size


Monthly HDD Oct-Apr $20
North America Monthly CDD Apr-Oct
(US & Canada) Seasonal HDD Nov-Mar & Dec-Feb
Seasonal CDD May-Sept & Jul-Aug
Europe Monthly CAT Apr-Oct e20 or £20
(Schengen Areas & UK) Seasonal CAT 2-7 consecutive months
from Apr-Oct

Table 2.8: Specifications of typical CME temperature-based contracts

also implemented to examine the extent of each parameter’s influence to the value of a
temperature-based derivative.

2.6.1 Temperature-based derivatives


Although there are various underlying variables for weather derivatives such as tempera-
ture, air pressure, humidity, etc, the most commonly used underlying is a temperature-based
index. Temperature-based futures and options are the main financial products trading in
the CME covering many regions and countries around the globe. These temperature index-
linked products quantify weather and make it as tradeable as some other regular underlying
assets such as stocks and commodities. They are currently geared to average monthly and
seasonal weather in several north American and European cities.

In the United States and Canada, the CME weather contracts are mainly written on the HDD
or CDD index with a monthly or seasonal duration, whilst weather derivatives in Europe are
normally based on the CAT index. Table 2.8 summarises the specifications of typical CME
temperature-based futures and options. These products written on accumulated indices
over a calendar duration are financially settled at the end of the index or contract period.
The most common option style so far is a European-type option in both the OTC market
and CME, which may only be exercised at its expiration date. A cap on the maximum
payout is normally set for OTC weather contracts, whereas there is currently no such limit
for those trading in the CME. In contrast to temperature-based option contracts, regular
options traded in the CME typically specify index futures as the underlying variables. A
standardised temperature-based contract normally contains five basic elements listed in
Table 2.9.
2.6. Application to the pricing of temperature-dependent contracts 43

Element Description
Underlying variable A temperature-based index or index futures
Contract period Accumulated and consecutive calendar period
Contract (tick) size A dollar amount attached to each HDD, CDD or CAT point
Land-based station A specific station that reports observations
including temperature for a particular area
Payoff function A function that converts a tradeable temperature-based index
into a cash flow in terms of the above four elements.

Table 2.9: Elements of a standardised temperature-based contract

2.6.2 Pricing of temperature-based index futures and options

In the literature, there are two mainstream approaches in modelling the underlying geared
towards the pricing of temperature-driven contracts. The first approach is to employ a
continuous-time process modelling the DATs. For instance, Bellini [3] proposed an ex-
tended OU model for DATs and to price HDD/CDD contracts assuming a Lévy-type noise
process. Benth and Šaltytė-Benth [5] computed futures and option prices using a continuous-
time autoregressive model for the temperature dynamics under a normality assumption.
The second approach is to adopt a discrete-time process to model DATs considering that
temperature indices are defined as discrete sums of DATs rather than integrals over a spe-
cific period. Moreno [30] demonstrated that it is not necessary to estimate DATs continu-
ously and the goodness of fit using a discrete-time process is deemed satisfactory.

The first approach facilitates various computations for valuation via stochastic calculus and
the latter one is mostly used in the actuarial field under a discrete-time framework. The
pricing conducted in this chapter employs a continuous-time OU process but such a pro-
cess is discretised in order to estimate the model parameters. Once parameter estimates are
available, as in our case, any temperature-based derivative can be theoretically valued by
taking the expectation of its discounted payoff at contract’s maturity. We derive the pricing
formulae for temperature-based futures and options, and include a specific pricing example
via simulation.

The market of weather derivatives is relatively less liquid and this complication is exacer-
bated as well by the fact that the temperature itself cannot be either stored or traded. A
zero risk premium could not be necessarily assumed because weather contract prices are
44 Chapter 2. Putting a price tag on temperature

not independent of the regime shifts in our model and investors’ risk aversions especially
in such a somewhat illiquid market. Hence, the price of risk process λt of the underlying
variable ought to be considered in order to determine a fair price for a temperature-based
contract.

2.6.3 The price of risk and risk-neutral measure Q

In a complete market, there is a unique risk-neutral probability measure equivalent to the


real-world probability measure, where the price of any contingent claim is the expected
value of its payoff discounted at the risk-free rate. However, the weather-derivatives mar-
ket is incomplete because the underlying variable (temperature index) is not storable or
tradable. This hampers a straightforward adoption of the no-arbitrage arguments to price
weather-derivative instruments. Notwithstanding this difficulty, a risk-neutral valuation can
still be utilised in incomplete markets; see Xu et al. [38]. In an incomplete market, there
are several equivalent-martingale or risk-neutral measures to choose from. More specifi-
cally, there is a risk-neutral measure Q equivalent to the objective measure P for pricing
weather derivatives; see [31]. The rationale in selecting or constructing this Q is detailed
in the subsequent discussion.

We first note that there is also an issue regarding the risk-neutral valuation for illiquid and
incomplete markets such as the market for weather derivatives. This arises from the lack
of observed market prices that supposedly could determine the risk-neutral parameters and
the martingale measure Q. In any event, the price obtained under Q can be regarded as a
benchmark similar to the idea of pricing other exotic and structured products not yet avail-
able in the market. This “benchmarking principle” was espoused in Gao et al. [25] in which
the risk-neutral valuation was still employed to get no-arbitrage prices for a guaranteed an-
nuity option; this was done even though market information is insufficient to reflect certain
conditions concerning insurance-related contingent claims. Moreover, the risk-neutral val-
uation was applied to price a product dependent on an underlying (i.e., mortality rate) that
is also non-tradeable and storable.

Akin to the pricing under Q is the concept of the market price of risk λt at time t, which
links the P to the Q−dynamics of the underlying variable. The quantity λt could be assumed
zero if only to attain simplicity and tractability. But, some recent findings indicate that λt
associated with the temperature variable is significantly different from 0; see Xu et al. [38],
2.6. Application to the pricing of temperature-dependent contracts 45

Benth et al. [4], and Elias et al. [14]. In this chapter, we shall investigate the impact of λt in
the pricing of weather contracts. This in turn requires an explicit expression for T t under Q.
The formal justification for a risk-neutral measure Q utilised for weather-derivative pricing
in our set up follows from Weron’s work [31], which is anchored intimately on λt that is
assumed a real-valued measurable and bounded process.

Through the Girsanov’s theorem, there are probability measures Q and Qλ equivalent to P,
such that
Z t
λ
BtQ = Bt + λ s ds = Bt + λt t, (2.38)
0
λ
where BQ is a standard Brownian motion under Qλ and Bt is a P−Brownian motion driv-
ing our deseasonalised process Xt in equation (2.5). Benth and Šaltytė-Benth [5] stated the
existence of a flexible class of risk-neutral probabilities (e.g., Qλ ) that can be obtained from
the time-varying λt to fit the forward curves. As Qλ is a risk-adjusted or risk-neutral prob-
ability measure (see page 151 of [31] and page 173 of [2]), this signifies that Qλ coincide
with the needed Q for the purpose of our valuation.

Referring to the construction arguments in [31], we obtain the condition θ Q = θ − λαt ξ , where
θ Q is defined under Q. So extending [31] to our framework, λt links the parameters of our
proposed model through
λt ξ(yt )
θ∗ (yt ) = θ(yt ) − , (2.39)
α(yt )
where α(yt ), ξ(yt ), and θ(yt ) are given in equation (2.11). Here, θ∗ (yt ) is the new parameter
under Q that replaces θ(yt ) in the OU process after taking into account λt , which encapsu-
lates both the (i) price of risk due to regime switching and (ii) market price of risk (due to
the shocks from Bt ).

Remark 3: The introduction of Q essentially introduces a market price of risk parame-


ter λt . As such, we refer to λt as a form of an ‘insurance premium’ for taking uncertainty
risks as notably identified in (i) and (ii) above.

A rigourous construction of a regime-switching λt under a discrete-time OU process in


the context of bond pricing is given in [20]. Such construction in our option-pricing frame-
work is a separate issue and not within the scope of this research. For illustrative intent,
we assume that λt is a non-zero constant for all regimes in the HMM setting, but we shall
46 Chapter 2. Putting a price tag on temperature

investigate the effect of its variation in the succeeding sensitivity analysis. For weather-
derivative valuation, we utilise

dXt = α(yt )(θ∗ (yt ) − Xt )dt + ξ(yt )dBtQ (2.40)

and

dT t = dS t + α(yt )(θ∗ (yt ) − Xt )dt + ξ(yt )dBtQ . (2.41)

Consequently, we have the explicit form


Z t
Tt = S t + Xse −α(yt )(t−s)
+ (1 − e
−α(yt )(t−s)
)θ (yt ) + ξ(yt )e
∗ −α(yt )t
eα(yu )u dBuQ . (2.42)
s

As asserted in Geman and Leonardi [26], the hypothesis for accumulated-degree days in
pricing weather contracts on temperature indices being normally distributed is accepted
by both researchers and practitioners. In Benth and Šaltytė-Benth [5], the DATs are mod-
elled by an OU process having a seasonal volatility, and the HDD/CDD-index pricing was
accomplished under the normality assumption. More recently, Alexandridis and Zapranis
[2] verified the suitability of such a normality assumption. Under the same assumption
together with the principles employed in [2] and [5], we derive pricing formulae for HDD
futures and options under the OU-HMM settings.

It is assumed that the contract’s period is [τ1 , τ2 ], the contract size is 1 monetary unit for
each index point, and the risk-free interest rate r f is constant with continuous compound-
ing. The HDD futures price F H (t, τ1 , τ2 ) is evaluated under two scenarios: (i) at time t
before the contract period, i.e., 0 ≤ t ≤ τ1 < τ2 , and (ii) at a time t within the contract
period, i.e., 0 ≤ τ1 ≤ t < τ2 .

h R τ2
From Benth and Šaltytė-Benth’s reasoning [5], e−r f (τ2 −t) EQ τ max (T base − T v , 0) dv
i 1

− F (t, τ , τ ) F = 0; and so the HDD futures price, as an F -adapted stochastic process,


H 1 2 t t

must be defined as
"Z τ2 #
F H (t, τ1 , τ2 ) = E Q
max (T base − T v , 0) dv Ft . (2.43)
τ1

For the European HDD call option, we consider two types: a call dependent on the HDD
itself and a call dependent on the HDD futures price. The pricing results, whose proofs
are given in Appendix B, are presented in Propositions 2.6.1– 2.6.3 below under the model
described in (2.41).
2.6. Application to the pricing of temperature-dependent contracts 47

Proposition 2.6.1 The HDD futures price for the contract period 0 ≤ τ1 < τ2 is given by
Z τ2 !
M (t, v, Xt )
0 ≤ t ≤ τ1 < τ2



 A (t, v) D dv




 τ1 A (t, v)





F H (t, τ1 , τ2 ) = E [H|Ft ] = 
Q

Z t
max (T base − T u , 0) du 0 ≤ τ1 ≤ t < τ2 ,




τ1


τ2

 Z !
M (t, v, Xt )


+



 A (t, v) D dv
t A (t, v)
where
Z τ2
H= max (T base − T v , 0) dv,
τ1

M (t, v, Xt ) =T base − S v − Xt eα(yt )(v−t) + (1 − eα(yt )(v−t) )θ∗ (yt ),


Z v
and A (t, v) =ξ (yt )e
2 2 2α(yt )v
e2α(yt )u du, D (x) = xΦ (x) + φ (x)
t

with Φ and φ are the cumulative and probability density functions of a standard normal
distribution, respectively.

Proposition 2.6.2 The price of a European call option written on an HDD futures price at
time t is given by
Z Fm
C FH (t, K, τT ) = e −r(τT −t)
F H − K g F H dF H ,
 
K

where
Z τ2 !
M (t, v, Xt )
FH = dv, Fm = max F H ,

A (t, v) D
τ1 A (t, v)
τT is the expiration date, t ≤ τT ≤ τ1 ≤ τ2 , K is the strike price, r is the risk-free interest
rate, and g(x) is the probability density function of a random variable X.

Proposition 2.6.3 The price of a European call option written on HDD at time t is given
by
Z Hm
C H (t, K, τT ) = e −r(τT −t)
(H − K) g (H) dH,
K
48 Chapter 2. Putting a price tag on temperature

where
Z τT
H= max (T base − T v , 0) dv, Hm = max (H) ,
t

τT is the expiration date, t ≤ τT , K is the strike price, r is the risk-free interest rate, and
g(x) is the probability density function of a random variable X.

2.6.4 Pricing of a temperature-based contract


When pricing HDD options in Alaton et al. [1], the distribution of HDD is Gaussian and the
probability of max(T base −T v , 0) = 0 is close to zero during the winter period. Adopting this
assumption to Proposition 2.6.3 with Mc and A2c for the mean and variance of H, respec-
tively, setting Hm to infinity, and employing the method in deriving the result in Proposition
2.6.1, we have
Z Hm
C H (t, K, τT ) = e −r(τT −t)
(H − K) g (H) dH
K
" ! !#
Mc − K Mc − K
=e −r(τT −t)
(Mc − K) Φ +φ
Ac Ac
!
Mc − K
= e−r(τT −t) Ac D . (2.44)
Ac

The normal distribution for modelling HDD provides a good fit in several winter months
for almost all locations [28]. However, it is almost impossible to obtain a good fit in mod-
elling the HDD in other months with the use of the normal distribution most especially in
December when cold temperatures give rise to heavy tails in the empirical distribution of
the HDD. Therefore, this assumption is not quite realistic for Proposition 2.6.3 in pricing a
one-month weather option contract. Although an analytic valuation formula can be gener-
ated due to the simplicity of a Gaussian distribution, such distribution is probably going to
be adequate only in valuing a seasonal HDD option rather than a contract for one month.

To compute the prices of futures and options written on HDD, we note the path-dependent
nature of the expected payoff, which can be calculated efficiently using the Monte-Carlo
(MC) simulation method. We perform the valuation under a 2-state HMM model and
include as well the price of the risk process. In particular, we evaluate
h i
F Hd d (t, τ1 , τ2 ) = VEQ H d Ft , (2.45)
2.6. Application to the pricing of temperature-dependent contracts 49

h   i
C Hd d (t, K, τT ) = e−r(τT −t) VEQ max H d − K, 0 Ft , (2.46)

where V is the contract size, and H d is defined as ττ21 max (T base − T, 0) in (2.45) and
P
PτT
t max (T base − T, 0) in (2.46) consistent with the respective notations employed in Propo-
sitions 2.6.1 and 2.6.3.

We examine closely the valuation of HDD call options through a sensitivity analysis. Also,
in order to make a valid comparison between the option prices generated under our model
and the results in Elias et al. [14], we make certain conditions uniform between the con-
tract in our study and that in [14]. Consider a European call with effectivity of 01 – 31 Dec
2012, and t = 0, V = $20, and K = $580 in equation (2.46). We set r = 1%, which is
the 1-year T-bill’s yield as published by the Bank of Canada in 2012, to proxy the interest
rate. Having optimal estimates for θ (yk ), α (yk ), and ξ (yk ) in Section 2.5, and granting λt is
known or assigned, a value for θ∗ (yt ) in accordance with (2.39) is immediate. Moreover, T
and H d can be calculated based on equations (2.41) and (2.46), respectively. The optimal
estimates for parameters to compute T and H d are shown in Table 2.10.

Unlike a regular underlying variable, temperature varies geographically over swaths of a


particular region, and this further adds to the difficulty of determining the appropriate level
of λt . The market price of risk associated with the temperature variable is deemed signifi-
cant in pricing weather futures and options. Risk premia, across a range of risk aversions,
for several cities in America were established in terms of the contemporaneous correlation
levels between aggregate dividends and temperature in [9]. Here, aggregate dividends refer
to the total consumption of investors so that temperature-related contingent claims can be
priced by considering the corresponding payoff and the utility function on consumption.
In particular, risk premia for temperature-dependent call options in Chicago and New York
were estimated to be 5.73% and 7.63% respectively.

Assuming that the risk premium solely depends on the market price of risk associated
with the underlying variable, the average of the two risk premium values for the two cities
(Chicago and New York) located close to Toronto, which is λt = 6.68%, is utilised in our
illustrative example.

In their pricing calculations, Elias et al. [14] assumed r = 5% in 2008, and used an expected
50 Chapter 2. Putting a price tag on temperature

Parameter estimate δ
b η
b 
b θ
b α
b ξ
b θ∗
b
Regime 1 0.7146 0.0390 0.6229 0.1367 0.3361 0.7301 -0.0084
Regime 2 0.5166 -0.0889 0.6929 -0.1839 0.6604 0.9301 -0.2779

Table 2.10: Optimal parameter estimates for the 2-state HMM model with and without the
market price of risk

HDD in their payoff function. In our case, after simulating the underlying variable H d using
the optimal estimates in Table 2.10, we perform 10,000 simulations to evaluate the expected
payoff in (2.46). Then, by embedding λt into a 2-state HMM-based model, we get the call
option price written on HDD as
!
31 31
   
d
C Hd 0, 580, = e−0.01 365 20EQ max H d − 580, 0 F0 = $861.33. (2.47)
365
If λt is not considered in equation (2.47), the price is just $507.53. The SEs of the MC
simulations with and without λt are 8.9369 and 6.3323 respectively, and the corresponding
95% confidence intervals are [843.81, 878.85] and [495.12, 519.94]. In this case, 10,000
simulations are sufficient to obtain a desired level of accuracy at the significance level of
5%.

Figure 2.10 exhibits the variation of the HDD option prices with and without the price of
risk process, over a range of strike HDD. As the strike price K increases from 540 to 620,
both option prices decline. They reach zero at K = 623.10 and K = 605.49 respectively.
Indeed, the price of risk process does make a substantial difference in pricing an HDD call
option for each strike HDD.

2.6.5 Sensitivity analysis of temperature-based option price


Sensitivity analysis entails the variation of one model parameter over a certain range, whilst
holding the rest of the parameters fixed, in order to see the impact on the option price. Sim-
ilar to the idea in Siu et.al [33], we shall do such an analysis under a 2-state HMM model,
which turned out to be the best-fitting model given our data set. Using the pricing example
highlighted in Section 2.6.4, we perform sensitivity analyses on the price of risk process,
Markov chain’s intensity matrix, and other model parameters.

Figure 2.11 shows the influence of λt on the HDD call option price by changing its value
from 0 to 20%. The option price rises with an increasing λt , which is consistent with the
2.6. Application to the pricing of temperature-dependent contracts 51

HDD call option price

1500
monthly option price with λ t
monthly option price without λ t
0 500

540 550 560 570 580 590 600 610 620


Strike HDD

Figure 2.10: One-month call option price with initial state 1 versus HDD strike

results from Elias et. al [14] and it constitutes a significant portion of the temperature-based
derivative’s price.
HDD call option price

1500
0 500

0.00 0.05 0.10 0.15 0.20


Market price of risk λt

Figure 2.11: HDD call option price with initial state 1 versus market price of risk

To easily interpret our numerical results, we denote states 1 and 2 as “hot” and “cold”
regimes, respectively, corresponding to the hot and cold fronts in the 2-state HMM-temperature
52 Chapter 2. Putting a price tag on temperature

model setting. The intensity matrix P is then given by


 
 −p p 
P =   ,
p −p

where p, the intensity parameter, is the rate of entering into/leaving from one state to an-
other. The use of the matrix P facilitates in probing the option-price sensitivity associated
with the jump probability of the Markov chain as we only have one parameter to focus on,
which is p in this case. For such a purpose, a convenient and practical representation of
matrix Π, defined in (2.9), is
   
1  1 1  1 −2p  1 −1 
Π = exp (P) =   + e   . (2.48)
2 1 1 2 −1 1

In our numerical demonstration, the values of p are set to 0.1, 0.3, 0.5, 0.7, and 0.9 and our
simulations consider starting states 1 and 2 at t = 0 separately. The tremendous impact of
the intensity parameter is shown in Table 2.11; a comparison with the pricing results for
the HDD call options with no-regime switching model is included. When the initial regime
is state 2 (cold), the intensity negatively affects the HDD call option, i.e., a higher intensity
implies a faster rate of switching into a “hot” state. This leads to a smaller HDD, and so
a lower option price based on equation (2.46). When we start with a “cold” regime and
during the month of December as considered in our example, HDD calculated from the
two-state HMM model ought to be generally higher than the ones generated from simula-
tion with initial state 1 (hot).

On the other hand, when the initial regime is state 2 (cold), lower p’s produce higher op-
tion prices. This is because when p is low, the probability of remaining in the initial state
is high. Option prices produced by a no-regime switching model are very close to the ones
generated from the 2-state HMM model with p = 0.1 and initial state 2 in Table 2.11. This
is expected since a very small p indicates a very slow switching from the “cold” state to a
“hot” state within the context of the December contract period.

We utilise an increment level of 0.2 for the sensitivity analyses involving θ∗ , α, and ξ.
The values of optimal estimates listed in Table 2.10 are regarded as starting benchmarks
when adding or lowering increment levels. Figure 2.12 depicts the option prices computed
through varying values of θ1∗ . It is apparent that a change in θ1∗ greatly influences the option
2.6. Application to the pricing of temperature-dependent contracts 53

price. With initial state 1, as θ1∗ increases, the option price dramatically decreases and with
initial state 2, we have the opposite conclusion for the option price. This is because when
the mean-level θ∗ is low (cold), HDD is high giving a high positive HDD payoff. Figure
2.13 depicts the impact of varying θ2∗ on the option price. Even though the prices decline
along with the increase of θ2∗ similar to the impact trend of θ1∗ , the prices under θ2∗ are a
bit more spread out upwards compared to those under θ1∗ . This is because θ2∗ represents the
mean level under a cold state. Once we increase θ2∗ , even though option prices decline, any
subsequent switches to the hot state is not sufficient to drastically perturb the price level
under state 2.

Figures 2.14 and 2.15 display the respective plots of option prices by varying the values of
α1 and α2 . Although a slight increase in option price can be identified with a corresponding
increase in α1 , no drastic fluctuations appear under both “cold” and “hot” initial regimes.
The same can be observed in the option prices when α2 is varied in Figure 2.15. However,
the option prices can be seen to be a wee bit higher under initial state 2 for both α1 and α2 .

The impact of ξ1 on the option price starting with the respective “hot” and “cold” states
is illustrated in Figure 2.16. The effect of varying ξ2 on the option price is also presented
in Figure 2.17. The option price decreases with increasing ξ1 irrespective of whether the
initial state is 1 or 2, whilst the opposite is true for the option price when ξ2 increases.
Moreover, ξ1 seems to have a slightly more effect on the variability of option prices than
ξ2 does given the same increment. An increasing volatility under state 1 (hot) leads to a
rising temperature level, which in turn yields a decreasing option price. The results also
suggest that the variation of ξ2 under initial state 2 contributes more to the variation of the
option prices when compared to those obtained from the same variation levels of ξ2 under
the initial state 1 as shown in Figures 2.16 and 2.17. Higher volatility ξ2 under the initial
“cold” state results to higher option prices along with the fact that the sensitivity analysis
is conducted for the cold month period of December in our illustrative case.

Remark 4: The standard errors of option prices calculated in (2.47) of subsection 2.6.4,
as well as of those displayed in Figure 2.11 and Table 2.11 of subsection 2.6.5 fall in the
range [$6.67, $9.61].
54 Chapter 2. Putting a price tag on temperature

Strike HDD Kc = 520 Kc = 530 Kc = 540 Kc = 550 Kc = 560 Kc = 570 Kc = 580


1-state model 1993.40 1798.78 1607.25 1420.29 1239.71 1066.88 903.97
p = 0.1 1642.04 1442.20 1242.37 1042.54 842.71 642.88 443.05
Initial p = 0.3 1652.74 1452.91 1253.08 1053.25 853.42 653.59 453.76
state 1 p = 0.5 1808.83 1609.00 1409.17 1209.34 1009.51 809.68 609.85
(Hot state) p = 0.7 1918.75 1718.92 1519.09 1319.26 1119.43 919.60 719.77
p = 0.9 1919.37 1719.54 1519.71 1319.88 1120.05 920.22 720.39
p = 0.1 2045.34 1845.51 1645.68 1445.85 1246.02 1046.19 846.36
Initial p = 0.3 2037.09 1837.26 1637.43 1437.60 1237.77 1037.94 838.11
state 2 p = 0.5 1925.11 1725.28 1525.45 1325.62 1125.79 925.95 726.12
(Cold state) p = 0.7 1908.94 1709.16 1509.28 1309.45 1109.62 909.78 709.95
p = 0.9 1863.26 1663.43 1463.60 1263.77 1063.94 864.11 664.28

Table 2.11: HDD call option price for varying intensities p

HDD call option price: initial state 1 HDD call option price: initial state 2

θ*1 = − 0.4084 θ*1 = − 0.4084


θ*1 = − 0.2084 θ*1 = − 0.2084
HDD call option price

HDD call option price


1500

1500

θ*1 = − 0.0084 θ*1 = − 0.0084


θ*1 = 0.1916 θ*1 = 0.1916
θ*1 = 0.3916 θ*1 = 0.3916
0 500

0 500

540 560 580 600 620 540 560 580 600 620
Strike HDD Strike HDD

Figure 2.12: HDD call option price for varying θ1∗


2.6. Application to the pricing of temperature-dependent contracts 55

HDD call option price: initial state 1 HDD call option price: initial state 2

θ*2 = − 0.6779 θ*2 = − 0.6779


θ*2 = − 0.4779 θ*2 = − 0.4779
HDD call option price

HDD call option price


1500

1500
θ*2 = − 0.2779 θ*2 = − 0.2779
θ*2 = − 0.0779 θ*2 = − 0.0779
θ*2 = 0.1221 θ*2 = 0.1221
0 500

0 500
540 560 580 600 620 540 560 580 600 620
Strike HDD Strike HDD

Figure 2.13: HDD call option price for varying θ2∗

HDD call option price: initial state 1 HDD call option price: initial state 2

α1 = 0.1361 α1 = 0.1361
HDD call option price

HDD call option price

α1 = 0.3361 α1 = 0.3361
1500

1500

α1 = 0.5361 α1 = 0.5361
α1 = 0.7361 α1 = 0.7361
α1 = 0.9361 α1 = 0.9361
0 500

0 500

540 560 580 600 620 540 560 580 600 620
Strike HDD Strike HDD

Figure 2.14: HDD call option price for varying α1


56 Chapter 2. Putting a price tag on temperature

HDD call option price: initial state 1 HDD call option price: initial state 2

α2 = 0.2604 α2 = 0.2604
HDD call option price

HDD call option price


α2 = 0.4604 α2 = 0.4604
1500

1500
α2 = 0.6604 α2 = 0.6604
α2 = 0.8604 α2 = 0.8604
α2 = 1.0604 α2 = 1.0604
0 500

0 500
540 560 580 600 620 540 560 580 600 620
Strike HDD Strike HDD

Figure 2.15: HDD call option price for varying α2

HDD call option price: initial state 1 HDD call option price: initial state 2

ξ1 = 0.3301 ξ1 = 0.3301
HDD call option price

HDD call option price

ξ1 = 0.5301 ξ1 = 0.5301
1500

1500

ξ1 = 0.7301 ξ1 = 0.7301
ξ1 = 0.9301 ξ1 = 0.9301
ξ1 = 1.1301 ξ1 = 1.1301
0 500

0 500

540 560 580 600 620 540 560 580 600 620
Strike HDD Strike HDD

Figure 2.16: HDD call option price for varying ξ1


2.7. Conclusion 57

HDD call option price: initial state 1 HDD call option price: initial state 2

ξ2 = 0.5301 ξ2 = 0.5301
HDD call option price

HDD call option price


ξ2 = 0.7301 ξ2 = 0.7301
1500

1500
ξ2 = 0.9301 ξ2 = 0.9301
ξ2 = 1.1301 ξ2 = 1.1301
ξ2 = 1.3301 ξ2 = 1.3301
0 500

0 500
540 560 580 600 620 540 560 580 600 620
Strike HDD Strike HDD

Figure 2.17: HDD call option price for varying ξ2

2.7 Conclusion
We implemented the parameter estimation of a temperature model, comprising of seasonal
and stochastic components with mean reversion, designed to accurately capture the charac-
teristics unique to the dynamics of the DATs. As the model parameters are modulated by
HMM, the HMM-based filtering technique, via change of reference probability measure,
along with the EM algorithm was employed to provide self-tuning parameters. We tested
the model and estimation method on a 4-year Toronto’s DATs. The one-step ahead fore-
casting errors and the AIC metrics were generated, and on their bases, the 2-state HMM is
deemed to adequately model the 4-year Toronto DATs data set. Furthermore, we derived
an expression for the price of HDD futures and option contracts by taking into account the
price of risk process. Sensitivity analyses, under a 2-state HMM setting, were also per-
formed by probing the behaviour of the option price as the intensity of the transition matrix
and other model parameters were varied. We found that the mean-reverting level θ and
volatility ξ have significant effects on pricing temperature-linked options.

Our work complements that in [14] and further elevates the use of regime-switching model
for temperature modelling at a level that supports an interactive platform, given a data set,
for the pricing and risk management of weather derivatives. A natural extension of this
work is to investigate dynamic risk management under our setting and using filtered and
optimal parameter estimates obtained from our estimation approach. The results could
then be benchmarked with those from current estimation methodology for weather-linked
58 Chapter 2. Putting a price tag on temperature

contracts. A stress-testing comparison using the framework put forward here and other
modelling approaches could be pursued as well in the context of portfolio optimisation,
in the spirit of [24] for instance; or risk measurement similar to the objectives of [25]
but for weather derivatives instead; and modelling of weather futures prices constituting a
multivariate time series resembling to the analysis in [12].
References

[1] P. Alaton, B. Djehiche, D. Stillberger, On modelling and pricing weather derivatives,


Applied Mathematical Finance, 9 (1) (2002) 1–20.

[2] A. Alexandridis, A. Zapranis, Weather Derivatives: Modeling and Pricing Weather-


Related Risk, Springer-Verlag, New York (2015).

[3] F. Bellini, The weather derivatives market: modelling and pricing temperature, Ph.D.
Thesis, (2005) University of Lugano, Switzerland.

[4] F. Benth, W. Hardle and B. Cabrera, Pricing of asian temperature risk, in Anonymous
Berlin, Heidelberg: Springer Berlin Heidelberg, (2011) 163–199.

[5] F. Benth, J. Šaltytė-Benth, Weather derivatives and stochastic modelling


of temperature, International Journal of Stochastic Analysis, (2011), 1–21.
doi:10.1155/2011/576791

[6] F. Benth, J. Šaltytė-Benth, Stochastic modelling of temperature variations with a view


towards weather derivatives, Applied Mathematical Finance, 12 (1) (2005), 53–85.

[7] D. R. Cox, H. D. Miller, The Theory of Stochastic Processes. London: Methuen,


(1965).

[8] S. Campbell, F. Diebold, Weather forecasting for weather derivatives, Journal of the
American Statistical Association, 100 (469) (2005) 6–16.

[9] M. Cao, J. Wei, Weather derivatives valuation and market price of weather risk, Jour-
nal of Futures Markets, 24 (11) (2004) 1065-1089.

[10] W. Ching, T. Siu, L. Li, Pricing exotic options under a high-order Markovian regime
switching model, Journal of Applied Mathematics and Decision Sciences, (2007) 1–
15, doi:10.1155/2007/18014.

59
60 REFERENCES

[11] G. Considine, Introduction to weather derivatives, CME Group Web,


http://web.math.pmf.unizg.hr/ amimica/pub/lapl.pdf, Accessed March 2016

[12] P. Date, R. Mamon, A. Tenyakov, Filtering and forecasting commodity futures prices
under an HMM framework, Energy Economics, 40 (2013) 1001–1013.

[13] M. Davis, Pricing weather derivatives by marginal value, Quantitative Finance, 1 (3)
(2001) 305–308.

[14] B. Dischel, At least: a model for weather risk, Weather Risk Special Report, Energy
and Power Risk Management, March issue (1998) 3032.

[15] G. Dorfleitner, M. Wimmer, The pricing of temperature futures at the Chicago Mer-
cantile Exchange, Journal of Banking and Finance, 34(6) (2010) 1360–1370.

[16] J. Dutton, Opportunities and priorities in a new era for weather and climate services,
Bulletin of the American Meteorological Society, 83 (9) (2002) 1303-131.

[17] R. Elias, M. Wahab, L. Fang, A comparison of regime-switching temperature mod-


eling approaches for applications in weather derivatives, European Journal of Opera-
tional Research, 232 (3) (2014) 549–560.

[18] R. Elliott, Exact adaptive filters for Markov chains observed in Gaussian noise, Auto-
matica, 30 (1994) 1399–1408.

[19] R. Elliott, L. Aggoun, J. Moore, J., Hidden Markov Models: Estimation and Control,
Springer, New York (1995).

[20] R. Elliott, T. Siu, A. Badescu, Bond valuation under a discrete-time regime switching
term-structure model and its continuous-time extension, Managerial Finance, 37 (11)
(2011) 1025-1047.

[21] R. Elliott, V. Krishnamurthy, New finite-dimensional filters for parameter estimation


of discrete-time linear Gaussian models, IEEE Transactions on Automatic Control,
44 (5) (1999) 938-051.

[22] C. Erlwein , R. Mamon, An online estimation scheme for a HullWhite model with
HMM-driven parameters, Statistical Methods and Applications, 18 (1) (2009) 87–
107.
REFERENCES 61

[23] C. Erlwein, F. Benth, R. Mamon, HMM filtering and parameter estimation of an elec-
tricity spot price model, Energy Economics, 32 (5) (2010) 1034–1043.

[24] C. Erlwein, R. Mamon, M. Davison, An examination of HMM-based investment


strategies for asset allocation, Applied Stochastic Models in Business and Industry,
27 (3) (2011) 204–221.

[25] H. Gao, R. Mamon, X. Liu, Risk measurement of a guaranteed annuity option under
a stochastic modelling framework, Mathematics and Computers in Simulation, 132
(2017) 100–119.

[26] H. Geman, M. Leonardi, Alternative approaches to weather derivatives pricing, Man-


agerial Finance, 31(6)(2005) 46–72.

[27] G. Grimmett, D. Stirzaker, Probability and Random processes, Oxford University


Press, Oxford (2001).

[28] S. Jewson, Introduction to weather derivative pricing, Journal of Alternative Invest-


ments, 7(2) (2004) 57-64.

[29] S. Jewson, A. Brix, C. Ziehmann, Weather Derivative Valuation: The Meteorological,


Statistical, Financial and Mathematical Foundations, Cambridge University Press,
Cambridge (2005).

[30] M. Moreno, Riding the temp, Weather Derivatives, FOW Special Supplement, De-
cember (2000).

[31] J. Norris, Markov chains, Cambridge University Press, Cambridge & New York
(1997).

[32] T. Richards, M. Manfredo, D. Sanders, Pricing weather derivatives, American Journal


of Agricultural Economics, 86(4) (2004) 1005–1017.

[33] T. Siu, C. Erlwein, R. Mamon, The pricing of credit default swaps under a Markov-
modulated Merton’s structural model, North American Actuarial Journal, 12 (1)
(2008) 19–46.

[34] A. van der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge and
New York (1998).
62 REFERENCES

[35] R. Weron, Modeling and forecasting electricity loads and prices: A statistical ap-
proach. Hoboken, NJ; Chichester, England; John Wiley and Sons. (2006)

[36] X. Xi, R. Mamon, Parameter estimation of an asset price model driven by a weak
hidden Markov chain, Economic Modelling, 28 (1) (2011), 36–46.

[37] H. Xiong, R. Mamon, A self-updating model driven by a higher-order hidden Markov


chain for temperature dynamics, Journal of Computational Science, 17 (2016) 47–61.

[38] W. Xu, M. Odening, O. Musshoff, Indifference pricing of weather derivatives, Amer-


ican Journal of Agricultural Economics, 90(4) (2008) 979–993.
Chapter 3

A self-updating model driven by a


higher-order hidden Markov chain for
temperature dynamics

3.1 Introduction

Over $2.4 trillion in economic losses and nearly 2 million deaths globally have been re-
ported as a result of weather-related hazards since 1971; see [34]. Catastrophes, such as
floods, tornadoes and hurricanes, are well-publicised as a result of the heavy casualties and
enormous property losses that they cause. But, whilst these weather-driven disasters are
getting more attention, we note that even some slight abberations of weather conditions, al-
though non-catastrophic, could undeniably still be matters of consequence too. They may
not produce immediately tremendous losses, yet their accompanying potential risks could
materialise later in the future with more frequency as well as lead to more widespread im-
pact on business and economy. Weather risk, for instance, is best depicted by crop yields
that may drop substantially due to prolonged droughts or rainy season. Poor weather hugely
affects planned construction schedules. Cold winters increase operational costs of energy
consumers.

Dutton and John [13] concluded that weather and climate risks exist in nearly 33.3% of
the private industry activities and 39.1% of the U.S. gross domestic product is weather-
sensitive. In order to hedge the risks associated with such non-catastrophic weather condi-
tions, the so-called weather derivatives were created. Their values depend on the underlying

63
64 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

variables governed by weather-related measurements, such as temperature, humidity, rain-


fall and snowfall. As documented in Considine [8], the first weather-derivative transaction
took place in 1997, which was executed by Aquila Energy and embedded in a power con-
tract. It is noted in Cao and Wei [9] that the Chicago Mercantile Exchange (CME) launched
the first exchange-traded weather derivatives linked to temperature in 1999. Since then,
the volume of weather derivatives grew rapidly both in the exchange and over-the-counter
(OTC) markets. Notional values of weather derivative contracts processed in the exchange
market are greater than those in the OTC market. Yet, interestingly, investors and other
market participants are more actively involved in the OTC market according to the survey
reported by Price Waterhouse Coopers (PwC); see [28].

In contrast to the traditional financial derivatives whose underlying assets are typically
bonds and stocks, weather derivatives are dependent on non-negotiable underlying indices
that have no price themselves. Such indices are introduced to quantify weather phenomena
but casually employed to construct the basis of a weather derivative contract. As stated
by Elias et al. [14], over 90% of weather derivatives are temperature-based. Weather fu-
tures and options traded in the CME are written on temperature heating degree day (HDD),
cooling degree day (CDD) and cumulative average temperature (CAT) indices. The most
distinctive feature of these temperature-based indices is that, unlike conventional financial
underlying asset, they are not tradeable or storable. Therefore, the traditional no-arbitrage
pricing approach, utilised in the Black-Scholes modelling framework, is not applicable to
pricing weather derivatives [10]. Without delving into the challenges of derivative valua-
tion concerning the choice of an appropriate pricing measure, which are outside the scope
of this research, our aim is to develop a model that could accurately capture the salient
features of the evolution of temperature data. Our proposed model then may be useful in
accurately pricing weather derivatives.

In the literature, several studies were conducted to address the pricing of weather deriva-
tives. It is asserted in Dorfleitner and Wimmer [12] that the historical burn analysis (HBA)
is normally adopted by most investors given its straightforwardness and ease of replication.
Nevertheless, Jewson and Brix [25] argued that HBA is bound to be biased and prone to
large pricing errors, and so, time-series methods are proposed in [25] instead. Davis [10]
dealt with weather derivative pricing by relying on the marginal substitution value principle
of mathematical economics.
3.1. Introduction 65

More recent studies considered the Ornstein-Uhlenbeck (OU) process in reproducing the
dynamic behaviour of the temperature series. In comparison to simulating temperature
indices in the HBA method, modelling the daily average temperature (DAT) directly gen-
erates more accurate outcomes since it contains more information under both regular and
extreme conditions. The estimated DAT model is then utilised to develop corresponding
temperature indices. These indices are used to determine the price of various weather
derivatives together with their risk management. Dischel [11] first proposed the concept
of adopting a continuous-time stochastic process to capture the dynamics of temperature.
Alaton et al. [1] improved Dischel’s model by incorporating seasonalities in the mean with
a sinusoidal function. Benth and Saltyte-Benth [3] studied DAT variations with an OU pro-
cess, where the noise follows a generalised hyperbolic Lévy process.

However, the application of a one-state stochastic process in almost all of the above-
mentioned papers may not describe the behaviour of temperature with great accuracy, es-
pecially during drastic changes in regimes or emergence of entirely different states as in
the occurrence of occasional climatic changes. This consideration inspires the extension of
the typical OU mean-reverting process with the addition of the regime-switching feature. It
seems that the research of Elias et al. [14] was by far the only paper that employs a regime-
switching approach to model the stochastic behaviour of temperature. They applied lattices
to construct corresponding models and concluded that the HDD and CDD generated by a
two-state process were closer to the observed data than those obtained by a single stochas-
tic process.

To take several steps of research progress farther away from the accomplishments in [14],
the purpose of this work is to develop self-calibrating higher-order hidden Markov model
(HOHMM) filtering algorithms that provide improved accuracy in capturing the behaviour
and features of temperature, which will be beneficial for better valuation and risk man-
agement of weather futures and option products. An HOHMM of order k is an HMM
that takes into account the HMM values up to lag k; in the literature, HOHMM is also
termed as a weak HMM as the memoryless assumption is weakened or relaxed to account
for information revealed in the last k steps. The usual HMM has found many financial
and economic applications, and pioneering developments, with the illustration of a two-
state regime-switching model, are highlighted in Hamilton [22]. Elliott et al. [6] made
significant landmarks in the estimation methodology of HMM via the change of measure
technique for model identification after processing batches of data. Since then, researchers
66 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

put forward HMM-modulated models to address various problems in quantitative finance,


insurance, economics, epidemiology, and other branches of the sciences and engineering.
Erlwein and Mamon [18] is a precursor to our development of an HOHMM driving a mean-
reverting process. We note that Mamon et al. [26] devised a 2-state HMM in modelling
commodity prices and then analysed h-step ahead price predictions. Nonetheless, the mem-
oryless property in the regular Markov assumption is a real drawback as models must also
capture long-range dependency in the observed economic and financial data. To this end,
Xi and Mamon [32], for instance, promoted the modelling of risky asset’s log returns with
the use of an HOHMM for a more flexible framework and possibility of better forecasting
performance. This promising approach is primarily applied in speech recognition although
its application in finance has started gradually to take momentum in recent years. For ex-
ample, Ching et al. [7] priced exotic options assuming the variable’s dynamics are under
the HOHMM setting. With the aid of a higher-order Markov model, Siu et al. [28] in-
vestigated some aspects of risk management and examined the model’s influence on risk
measures through back testing.

The departure of our contributions from the current state of the art in Markov-switching
temperature modelling is highlighted by the following accomplishments in the study: (i)
Our proposed modelling approach simultaneously captures four important stylised facts of
temperature evolution, which are the mean-reversion, seasonality, memory and stochas-
ticity. (ii) Although adopted from the literature in Markovian regime-switching models
for economic and financial variables, to the best of our knowledge, this research is the
first to put forward a dynamic estimation procedure that readily gives parameter updates
whenever a new set of data becomes available. (iii) The embedding of the HOHMM in
the OU-modelling framework is also new in an attempt to capture the memory property in
the temperature data series. (iv) Within the OU-modelling set up, we develop the filtering
results relying on a transformation that converts a second-order HOHMM into a regular
HMM [26]; hence, our approach enables the estimation of higher-order HOHMM to be
implemented via an HOHMM-order reduction. (v) Finally, systematic and sufficient model
implementation details are laid out including various diagnostics and validation based on
techniques from statistical analysis and inference. Such details are beneficial for practition-
ers in building efficient computing platforms and interfaces with their business software as
well as to other scientists in constructing a modelling framework that examines the move-
ments and ability to control the dynamics of similar or related phenomena.
3.2. Model description 67

The remaining parts of this chapter are structured as follows. Section 3.2 presents the for-
mulation of temperature modelling with a discrete-time higher-order hidden Markov chain
governing the model parameters. In Section 3.3, the change of probability measure is ap-
plied to derive recursive filtering equations for quantities that are functions of HOHMM and
necessary to carry out an online parameter estimation scheme in Section 3.4. The numer-
ical implementation of our proposed model and estimation method is detailed in Section
3.5 involving a data set of daily temperatures collected at the Toronto Pearson Airport. The
determination of the most appropriate model setting is given in Section 3.5 by comparing
the respective forecasting performance and penalised-log likelihood of different competing
set ups. Section 3.6 concludes.

3.2 Model description

3.2.1 Model for temperature derivatives


Majority of futures and option contracts on weather derivatives in the OTC and exchange-
traded markets are written on HDD, CDD and CAT measurements. We, therefore, con-
centrate on the modelling of the above-mentioned indices. In particular, HDD and CDD
were actualised to measure the demands of heating and cooling, respectively, of a certain
location, and CAT is simply the sum of DATs over the contract period. Their corresponding
values over the time interval [τ1 , τ2 ] with 0 ≤ τ1 < τ2 are given by
τ2
X
HDD = max (T base − T t , 0) ,
t=τ1
τ2
X τ2
X
CDD = max (T t − T base , 0) and CAT = Tt ,
t=τ1 t=τ1

where T base is the base temperature, usually given as 65◦ F, or 18◦ C in the market, T t is the
T max + T min
daily average temperature on day t, computed as . The contracts written on
2
these indices have a weekly, monthly or seasonal duration. HDD contracts typically cover
the cold months from October in one year to April of the following year, whilst the period
of the CDD contracts is the warm season that runs from April to October of the same year.
The overlapping months, October and April, are called the transition or shoulder months.

Some authors of previous papers (e.g., Dorfleitner and Wimmer [12]) attempted to price
68 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

temperature derivatives using an index-modelling approach by modelling the aforesaid in-


dices directly. However, Benth et al.[2] argued that it is not appropriate to model the indices
directly to derive option prices because the dynamics for the futures price could not even
be obtained. Thus, for the valuation of temperature derivatives, we focus on modelling T t
that yields the needed indices as well as the dynamic representation of DATs.

Suppose (Ω, F , P) is an underlying probability space that supports the modelling of the
observed process T t , the DAT on day t. Following Benth et al. [2],

T t = Xt + S t . (3.1)

Equation (3.1) has two components: Xt that is assumed to follow an OU process with an
HOHMM-governed parameters and S t , which is a deterministic function devised to pin
down the seasonal trends and DATs’ mean reversion. As in Campbell and Diebold [5], the
seasonal component is given by
3 " ! !#
X 2π 2π
S t = at + b + ch sin dh t + eh cos dh t , (3.2)
h=1
365 365

where d1 = 1, d2 = 2, d3 = 4 to cover the yearly, semi-annual and quarterly patterns,


respectively.

3.2.2 Ornstein-Uhlenbeck (OU) process in the temperature model


We consider an OU model for the Xt component of the temperature model in (3.1). In the
succeeding discussions, we denote all vectors by bold English/Greek letters in lowercase
and all matrices are represented by bold capitalised English/Greek letters.

3.2.2.1 HMM-modulated OU process

Consider the stochastic differential equation (SDE) for the OU process Xt given by

dXt = α(θ − Xt )dt + ξdBt , (3.3)

where Bt is a standard Brownian motion on some probability space. By Itô’s lemma and
for s ≤ t, the SDE in (3.3) has the solution
Z t
Xt = X s e
−α(t−s)
+ (1 − e −α(t−s)
)θ + ξe
−αt
eαu dBu . (3.4)
s
3.2. Model description 69

By utilising the Euler approximation, the discretised version of (3.4) is

Xk+1 = e−α4tk+1 Xk + 1 − e−α4tk+1 θ



r
1 − e−2α4tk+1
+ξ zk+1 , (3.5)

where 4tk+1 = tk+1 − tk and {zk+1 } is a sequence of independent and identically distributed
(IID) standard normal random variables.

We assume that we have a homogeneous Markov chain yk with finite states in discrete time.
Its state space is associated with the canonical basis {e1 , e2 , . . . , eN }, where ei = (0, . . .
, 0, 1, 0, . . . , 0)> ∈ RN with 1 in the ith position, N stands for the total number of states, and
> denotes the transpose of a matrix. In a typical OU-HMM setting (e.g., Date et al. [9]),
Xk with representation in (3.5) has parameters dependent on the Markov chain yk , i.e.,

Xk+1 = e−α(yk )4tk+1 Xk + (1 − e−α(yk )4tk+1 )θ(yk )


s
1 − e−2α(yk )4tk+1
+ ξ(yk ) zk+1 . (3.6)
2α(yk )

In equation (3.6), the speed of mean-reversion α, the mean level of the process θ, and
the volatility ξ are all governed by yk making the model regime switching. Given the
representation of the Markov chain’s state space, αk := α(yk ) = hα, yk i, θk := θ(yk ) =
hθ, yk i, and ξk := ξ(yk ) = hξ, yt i, where h·, ·i is the inner product in RN .

3.2.2.2 HOHMM-modulated Ornstein-Uhlenbeck process

The second-order hidden Markov chain will be used as a prototype to develop the HOHMM
setting; such prototype’s simplicity will facilitate the discussion of concepts associated with
a generalised Markov chain of lag k (k = 1, 2, . . .). Under this setting, the parameters α, θ
and ξ are governed by a second-order Markov chain ywk , which is defined on the stochastic
basis (Ω, F , {Fk }, P) with the canonical basis {e1 , e2 , . . . , eN }. Write Fk := Fkw ∨ FkB for
the global filtration, where Fkw is the filtration generated by σ{yw0 , . . . , ywk } and FkB is the
filtration generated by Bt .

A discrete-time second-order hidden Markov chain ywk at current time k depends on states
that occurred at two prior lag times k − 1 and k − 2. The key principle in the filtering
and estimation of HOHMM is the utility of a mapping that transforms an HOHMM into a
70 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

regular HMM. In our case, a mapping, say, ζ converts the second-order Markov chain into
the usual Markov chain. The transformation ζ is defined as

ζ(eb , ec ) = ebc , for 1 ≤ b, c ≤ N, (3.7)

where ebc denotes a unit vector with 1 in its ((b−1)N +c)th position. Then, the new Markov
chain satisfies the relation

hζ(ywk+1 , ywk ), ebc i = hywk+1 , eb ihywk , ec i (3.8)

and has the semi-martingale representation

ζ(ywk+1 , ywk ) = Πζ(ywk , ywk−1 ) + δwk+1 , (3.9)

where {δwk+1 }k≥1 is a martingale increment. So, EP [δwk+1 |Fkw ] = 0 and Π is the associated
N 2 × N 2 probability transition matrix.

3.3 Recursive filtering

3.3.1 Reference probability measure


Deriving recursive filters under the real-world probability P, where there is dependence
amongst the observations, presents some difficulty. We construct an ideal probability mea-
sure Q that makes computations manageable because under such an ideal measure, the
observed Xk ’s are IID. The calculations are then reverted back to the measure P via an ap-
propriate discretised version of the Girsanov density.

Suppose ywk is a second-order hidden Markov chain that drives the model parameters of Xk .
Write
Hdcb := P(ywk+1 = ed |ywk = ec , ywk−1 = eb ),
where k ≥ 1, and d, c, b ∈ {1, 2, . . . , N}. As in Siu et al. [28], H is an associated N × N 2
probability transition matrix given by
 
 h111 h112 · · · h11N ··· h1N1 h1N2 ··· h1NN 
 
 211 h212 · · · h21N ··· · · · h2NN 
 h h2N1 h2N2
H =  . .. .. .. .. .. .. ..  .

 .. . . . ··· . . . . 
 
hN11 · · · hN1N · · · hNN1 hNN2 · · · hNNN

3.3. Recursive filtering 71

Since we constructed ζ to transform a second-order Markov chain into a regular Markov


chain, its corresponding N 2 ×N 2 matrix of transition probabilities Π can be defined in terms
of H as
 
 h111 ··· h11N 0 ··· 0 ··· 0 ··· 0

 .. .. ..

 0 ··· 0 h121 ··· h12N ··· . . .


 .. ..

.. 
 . . . 0 ··· 0 ··· 0 ··· 0 
.. .. ..
 
. . .
 
 0 ··· 0 ··· 0 ··· 
 .. .. .. 
h
 N11 · · · hN1N 0 ··· 0 ··· . . . 

Π =  0 ··· 0 hN21 · · · hN2N ··· 0 ··· 0 

 ... .. ..
 
. . 0 ··· 0 ··· h1N1 ··· h1NN 

.. ..

..

. . .

··· ··· ···
 0 
0 0 0 
.. .. .. 
 
. . . 

 0 ··· 0 0 ··· 0 ···
 . .. .. .. .. ..

 .. . . . . . ··· 0 ··· 0 

 
0 ··· 0 0 ··· 0 · · · hNN1 · · · hNNN

where π ji = hdcb = P(ywk+1 = ed |ywk = ec , ywk−1 = eb ) for j = (d − 1)N + c, i = (c − 1)N + b,


and π ji = 0 otherwise.

Under the HOHMM framework, the model for Xk in (3.6) is modified to

Xk+1 = κ ywk Xk + ϑ ywk + % ywk zk+1 ,


  
(3.10)

where

= e−α(yk )4tk+1 , ϑ ywk = 1 − e−α(yk )4tk+1 θ ywk ,


w   w

κ ywk
 
v
u
1 − e−2α(yk )4tk+1
t w

% yk = ξ yk
w w
.
 
 
2α ywk

As we do not observe the state of ywk , it needs to be estimated and to this end, we work as
noted above under an ideal reference probability measure Q whose existence is justified by
the Kolmogorov’s extension theorem. Again, under Q, {zk }k≥1 is a sequence of IID N(0, 1)
random variables and also independent of ywk .

Let Xk = σ(X0 , X1 , . . . , ). We introduce the Girsanov’s density to back out P from Q given
72 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

an Xk -adapted process Ψwk defined by


k
dP Y
Ψk =
w
= ϕw , k ≥ 1, Ψ0 = 1 (3.11)
dQ Xkw l=1 l
  −1 h     i
φ % ywl−1 Xl − ϑ ywl−1 − κ ywl−1 Xl−1
and ϕwl =   .
% ywl−1 φ (Xl−1 )

3.3.2 Calculation of recursive filters


Our goal is to obtain optimal parameter estimates under the real-world measure P through
self-updating filters that are functions of ywk . The conditional expectation of any F w -
adapted process given Xk is then needed. As mentioned above, calculations performed
under the Q measure could be done with ease. Moreover, the Bayes’ theorem for con-
ditional expectations provides a connection between values calculated under P and those
calculated under Q.

We establish adaptive filtering processes, under P, for estimators of quantities that are
functions of ζ(ywk+1 , ywk ) as per equations (3.8) and (3.9), but by doing the analysis and
computations under measure Q. Write
sk (cb) : = P ywk = ec , ywk−1 = eb | Xk


= E hζ(ywk , ywk−1 ), ecb i | Xk


 
(3.12)
and sk = (sk (11) , . . . , sk (cb) , . . . , sk (NN))> ∈ RN×N . From the Bayes’ theorem for
conditional expectation,
 EQ [Ψwk ζ(ywk , ywk−1 )|Xk ]
sk = E ζ(yk , yk−1 ) | Xk = .
 w w
(3.13)
EQ [Ψwk |Xk ]
h i
Write vk := EQ Ψwk ζ(ywk , ywk−1 )|Xk , and note that
N
X
hζ(ywk , ywk−1 ), ecb i = hζ(ywk , ywk−1 ), 1i = 1, (3.14)
c,b=1
N2
where 1 is an R -vector with 1 in all of its entries. Then, we have
XN N D
X E
hvk , ecb i = EQ Ψwk ζ(ywk , ywk−1 )|Xk , ecb
 
c,b=1 c,b=1
 N

 X 
= EQ Ψwk ζ(ywk , ywk−1 ), ecb | Xk 


c,b=1

= E Ψk | X k .
Q w 
(3.15)
3.3. Recursive filtering 73

Along with equations (3.14) and (3.15), the conditional expectation of ζ(ywk+1 , ywk ) under the
real-world probability measure P could be written explicitly as
vk vk
sk = P N = . (3.16)
c,b=1 hv k , ecb i hv k , 1i
As noted in Xi and Mamon [32], a diagonal matrix is needed to put recursive processes
in place under an HOHMM. Let Dk be an N 2 × N 2 diagonal matrix to estimate several
functions of ζ(ywk+1 , ywk ), where
 
 dk1 0 ··· 0 

... .. 
 0 0 . 
..
 
..
. .
 
 0 dkN 
...
 
Dk =  
..

..

. .

dk1 0
 

.. ..

.


 . 0 0 

0 ··· ··· 0 dkN

with diagonal elements


 
φ (Xk − ϑi − κi Xk−1 ) %−1
i
dki = . (3.17)
%φ (Xk )
We define below various processes needed for the parameter estimation in the next section.

1. The number of jumps from state (er , e s ) to state et up to time k, 2 ≤ l ≤ k and


r, s, t = 1, . . . , N, is denoted by

k
X
Atsr
k := hywl−2 , er ihywl−1 , e s ihywl , et i. (3.18)
l=2

2. The number of occupations up to time k, which is the length of time that Markov
chain yw spent in state et , 2 ≤ l ≤ k and t = 1, . . . , N, is given by

k
X
Btk := hywl−1 , et i = Btk−1 + hywk−1 , et i. (3.19)
l=2

3. The number of occupations up to time k with the length of time that Markov chain
yw spent in state (et , e s ), 2 ≤ l ≤ k and s, t = 1, . . . , N, is calculated as

k
X
Bts
k := hywl−1 , et ihywl−2 , e s i. (3.20)
l=2
74 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

4. The auxiliary process related to the Markov chain yw for the function f up to time k
in state et , 2 ≤ l ≤ k, t = 1, . . . , N, is computed as

k
X
Ctk ( f ) := f (Xl )hywl−1 , et i = Ctk−1 ( f )
l=2

+ f (Xk )hywk−1 , et i, (3.21)

where f has the form f (X) = X, f (X) = X 2 , or f (X) = Xl−1 Xl .

5. The conditional expectation of ζ(ywk+1 , ywk ) in Equation (3.16) can be written recur-
sively in terms of the diagonal matrix Dk+1 as

sk+1 = ΠDk+1 sk . (3.22)

Let Gk be any Xk -measurable generic process defined in equations (3.18)-(3.21). From


Bayes’ theorem for conditional expectation, the ‘best estimate’ for Gk is
h i
EQ Ψwk Gk |Xk
bk = EP [Gk |Xk ] =
G  
EQ Ψwk |Xk
h i h i
EQ Ψwk Gk |Xk EQ Ψwk Gk |Xk
= PN = . (3.23)
c,b=1 hvk , ecb i
hvk , 1i

In order to get dynamic updates for Gk every time an observed value Xk arrives or a batch
of Xk ’s becomes available, we provide recursions via the vector quantity Gk ζ(ywk , ywk−1 ) that
bk .
will ultimately lead to an efficient updating of the scalar quantity G

Write
γk w (Gk ) := EQ Ψwk Gk |Xk .
 
(3.24)

By equation (3.14), the scalar γkw (Gk ) can be calculated using the vector quantity Gk ζ(ywk , ywk−1 )
by noticing

γkw (Gk ) = γkw (Gk hζ(ywk , ywk−1 ), 1i)


= γkw hGk ζ(ywk , ywk−1 ), 1i


= γkw ζ(ywk , ywk−1 ) , 1





= EQ Ψwk Gk ζ(ywk , ywk−1 ), 1 |Xk .




(3.25)
3.4. Optimal parameter estimation 75

Therefore, equation (3.23) can be further calculated as


h i
EQ Ψwk Gk |Xk γkw (Gk )
Gk =
b = , (3.26)
hvk , 1i hvk , 1i

which shows G
bk can be obtained purely from calculations under Q and can be updated
dynamically if we have recursions for γkw (Gk ). So, to implement (3.26) for the quantities
Atsr t ts t
k , Bk , Bk , and Ck ( f ), we utilise the semi-martingale representation (3.9) to derive their
unnormalised recursive filtered estimates.

Adopting Xi and Mamon [32], let

Kt , 1 ≤ t ≤ N

be an N 2 × N 2 matrix with eit on its ((i − 1) N + t)th column and 0 elsewhere. Then we have
the following result.

Proposition 1: The vector recursions involving the respective quantities in equations (3.18)–
(3.21) are

γk+1
w
k+1 ζ(yk+1 , yk ) =ΠDk+1 γk Ak ζ(yk , yk−1 ) + hsk , e sr idk+1 hΠe sr , ets iets ,
Atsr w w  w tsr w w  t
(3.27)

γk+1
w
Btk+1 ζ(ywk+1 , ywk ) =ΠDk+1 γkw Btk+1 ζ(ywk , ywk−1 ) + Kt dk+1
t
Πsk ,
 
(3.28)

γk+1
w
k+1 ζ(yk+1 , yk ) =ΠDk+1 γk Bk+1 ζ(yk , yk−1 ) + hsk , ets idk+1 Πets ,
Bts w w  w ts w w  t
(3.29)

γk+1
w
Ctk+1 ( f )ζ(ywk+1 , ywk ) =ΠDk+1 γkw Ctk+1 ( f )ζ(ywk , ywk−1 ) + f (Xk+1 )Kt dk+1
t
Πsk .
 
(3.30)

Proof The derivations are similar to those given in Mamon et al. [26].

3.4 Optimal parameter estimation


In this section, we present the optimal estimates for the parameters of our proposed temper-
ature model using the maximum-likelihood approach. Since maximising the likelihood or
76 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

log-likelihood functions is not straightforward for such functions with complicated struc-
tures like ours, we employ the EM algorithm, which is an effective iterative method for
parameter estimation especially for an exponential family of models [29].

We shall show that our EM results give optimal estimates in terms of the filters in (3.22)
and (3.27)–(3.30). Consider a family of probability measures {Pυ , υw ∈ Υw } on (Ω, F w ).
w

The algorithm is implemented by setting the initial Pυ0 and then changing from Pυ0 to Pυ
w

reflecting the variation of the data to the updated parameters.

We refer to Υw = {κt , ϑt , %t , htsr , 1 ≤ t, s, r ≤ N} as the set of parameters under our HOHMM


setting. The corresponding maximum likelihood (ML) estimator is
" υw #
υ0 dP
υ ∈ argmaxυw ∈Υw L(υ ) and L(υ ) = E
w w w
Xk .
dPυ0
b

The sequence of log likelihoods yields monotonically non-decreasing ML estimates, which


locally converges [35].

A change of measure from Pυ0 to Pυ is performed to estimate the entries of matrix H


w

thereby giving Π updated entries through the filtering algorithms of the relevant quantities.
Recall that yw is an HOHMC with transition matrix H = (htsr ) under Pυ . Hence, a new
w

probability measure Pbυ must be constructed, under which yw is still an HOHMC but with
w

b = (b htsr = Pbυ (yw = et |yw = e s , yw = er ) with b


w
transition matrix H htsr ), where b k+1 k htsr ≥ 0.
k−1

From Elliott et al. [6], the appropriate density in conjunction with the EM algorithm for
our successive estimation of the transition probabilities is

dPυ
w

= Λk , Λ0 = 1 and
dPυ0 Xk
N b hyl−2 ,er ihyl−1 ,e s ihyl ,et i
w w w
k Y
Y  htsr 
Λk =   . (3.31)
l=2 t,s,r=1
htsr

htsr
b
When htsr = 0, b
htsr = 0; in this case, we set = 1. The resulting expression for b
htsr and
htsr
those for the rest of the parameters, computed as well via the EM algorithm, are given in
the succeeding summary.

Proposition 2: The EM estimates, at state t based on a data series up to time k (k ≥ 1), for
3.5. Numerical implementation 77

the parameters of the model in (3.10) are as follows:

Ctk (Xk−1 , Xk ) − ϑt b
b Ct (Xk−1 )
κt =
b   k
Ctk Xk−1
b 2
   
γkw bCtk (Xk−1 , Xk ) − ϑt γkw bCtk (Xk−1 )
=    , (3.32)
γkw bCtk Xk−1
2

Ctk (Xk ) − κt b
b Ctk (Xk−1 )
ϑt =
b
Btk
b
   
γk+1
w
Ctk (Xk ) − κt γkw b
b Ctk (Xk−1 )
=   , (3.33)
γkw b Btk
   
Ctk Xk2 + κt2 b
b Ctk Xk−12
Btk + 2ϑt κt b
+ ϑ2t b Ctk (Xk−1 )
%t =
b2
Bt
b
k

Ctk
−2κt b (Xk−1 , Xk ) − 2ϑt b
Ctk (Xk )
(3.34)
Bt
b
k
Atsr
b
htsr =
b k
, ∀ pairs (t, s) , t , s. (3.35)
B sr
b
k

Proof The proofs of (3.32)-(3.35) can be found in the Appendix C.

Thus, when a temperature series is available at time k, we could get automatically new
parameters κt , ϑt , %t , and htsr by running the filtering recursions of the Markov chain given
in Proposition 1.

3.5 Numerical implementation


We implement our recursive HOHMM filtering algorithms on DAT series collected by the
NCDC. The data were recorded from 01 Jan 2011 to 31 Dec 2014 comprising of 1461
data points. Since the DAT data have high presence of seasonality, we first remove the
deterministic cyclical pattern S t from T t in equation (3.1), after which our filtering equa-
tions are implemented to obtain optimal parameter estimates of the OU component Xt . The
data set is then divided into batches with an equal size to perform the implementation of
the self-tuning process with recursive filtering equations under both HMM and HOHMM
settings.
78 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

Mean Sd Dev SE Min Max Skew Kurtosis


8.88 10.65 0.28 -19.25 31.5 -0.23 -0.92

Table 3.1: Descriptive statistics for daily average temperature(DAT)

Parameter Estimate 95% confidence interval


a 10.2111 (-0.0024, -0.0013)
b -0.0018 (9.7581, 10.6641)
c1 -5.2949 (-5.6155, -4.9742)
c2 -0.6395 (-0.9555, -0.3234)
c3 -0.4131 (-0.7279, -0.0982)
e1 -12.7241 (-13.0383, -12.4098)

Table 3.2: Parameter estimates for the seasonality component S t

3.5.1 Analysis of the deterministic component


We fit S t described in (3.1) into the entire data set. Since S t is a linear combination of
sinusoidal functions and a linear mapping, the built-in function of the software R’s linear
regression is competent to execute this fitting process. In fact, a stepwise regression proce-
dure was carried out to identify the best fitting model for S t using the R function ‘step’ to
perform the selection of explanatory variables.

Table 3.1 shows some descriptive statistics for the DAT, which guided us with certain refer-
ences in choosing the initial values in the implementation process. The values of estimated
parameters from best fitting our data to the specified S t are exhibited in Table 3.2. The
adjusted coefficient of determination R2 shows that 83.5% of the variation in the response
T t can be well explained by the model with all the regressor variables in equation (3.2)
except for e2 and e3 . Consequently, these predictors are eliminated during the process of
variable selection on the basis of the scoring criteria built in the function ‘step’, including
the adjusted R2 and Akaike information criterion (AIC).

The fitted seasonality component S t is plotted in Figure 3.1 together with the actual DAT,
which displays high-seasonality property. The plot depicts four crests and four troughs
occurring in summer and winter, respectively, of the 4-year period. The main characteristics
of the two graphs neatly jibe with the temperature movements in the four seasons of each
year. The remaining component Xt is treated as our observation process which we shall use
3.5. Numerical implementation 79

40
fitted seasonal component
Temperature (Celsius)

actual observations
30
20
10
0
-10
-20
01/01/2011
01/03/2011

01/05/2011

01/07/2011

01/09/2011

01/11/2011

01/01/2012
01/03/2012

01/05/2012

01/07/2012

01/09/2012

01/11/2012

01/01/2013
01/03/2013

01/05/2013

01/07/2013

01/09/2013

01/11/2013

01/01/2014
01/03/2014

01/05/2014

01/07/2014

01/09/2014

01/11/2014
times

Figure 3.1: Fitted seasonal component versus actual DAT series

to test our proposed filtering and parameter estimation methods.

3.5.2 Analysis of the stochastic component

The cyclical component of the model captures very well the underlying seasonal trend of
the actual data amidst the noisy values of Xt . The deseasonalised stochastic component
Xt = T t − S t is displayed in Figure 3.2.

We process the data set in 73 batches, and so there are 20 data points in each group with
4t = 1 day. This means that the parameters are updated roughly every 3 weeks. Other
filtering-window sizes can certainly be adopted in the data processing. Our experiment
shows that the choice of the window size has little effect on the numerical outcomes. A
batch of 20 data points is fairly adequate to cover the arrival of new temperature recordings
that might influence changes in temperature such as wind power, ocean currents, and other
meteorological events.
80 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

20
Temperature (celcius)

10

-10

-20
01/01/2011
01/03/2011

01/05/2011

01/07/2011

01/09/2011

01/11/2011

01/01/2012
01/03/2012

01/05/2012

01/07/2012

01/09/2012

01/11/2012

01/01/2013
01/03/2013

01/05/2013

01/07/2013

01/09/2013

01/11/2013

01/01/2014
01/03/2014

01/05/2014

01/07/2014

01/09/2014

01/11/2014
times

Figure 3.2: Deseasonalised stochastic component Xt

3.5.3 Validity of model evaluation


We are aware that a valid statistical analysis of a proposed model entails testing using a
data set that was not used in the model estimation. Otherwise, if the model is not tested
on a different data set from the same population, it is difficult to determine if any patterns
found are simply due to chance. Thus, showing the formulated model’s predictive power
necessitates waiting for new data to come in. This ensures that there is no hand-tailoring
of the predictive ability of the model to the data on hand as the upcoming weather data are
not yet available.

Our online filtering method is consistent with the above statistical principle because we
use the past and current data through Xt to obtain parameter updates that encapsulate the
information up to present time t. Such parameter updates are used to process a new batch
of accumulated information in order to obtain a succeeding new set of parameter updates.
Also, utilising the same parameter updates, we obtain our predicted values for the calcu-
lation of error analysis metrics. Hence, the data set, used in prediction and calculation of
log-likelihood measures for testing goodness of fit and assessing model complexity in sec-
tion 3.5.5, is different from and does not overlap with that used in model estimation.
3.5. Numerical implementation 81

Parameter [Min, Max]


a [10.0900, 10.5300]
b [-0.0027, -0.0017]
c1 [-5.3220, -4.9600]
c2 [-0.6500, -0.4489]
c3 [-0.5062, -0.3337]
e1 [-12.8700, -12.6900]

Table 3.3: Minimum and maximum parameter estimates for S t ’s coefficients covering
twelve 4-year moving windows as described in Section 3.5.3

Our method may appear not in agreement though with the above principle if S t is taken
into account. This is because the S t function must be calculated first using the entire data
set before the Xt data series can be produced. We argue that there is no inconsistency here
whenever the coefficients of the S t component remain almost constant as time evolves.
Such is exactly the case for the S t0 s coefficients estimated from our daily temperature read-
ings covering the 4-year period under study (01 Jan 2011 – 31 Dec 2014) and coefficients
estimated from the twelve prior 4-year period moving windows going backwards (i.e., Data
Window I: 01 Jan 2007– 31 Dec 2010; Data Window II: 01 Dec 2006 – 30 Nov 2010; Data
Window III: 01 Nov 2006 – 31 Oct 2010; etc). Table 3.3 displays the [minimum, max-
imum] values of the 12 estimates for each coefficient. Clearly, the variation is so small.
Thus, we take the average of each set of 12 estimates as a proxy for a given coefficient
value so that the use of values in Table 3.2 (as close approximations to the corresponding
proxies) for S t to determine Xt is justified in adherence to the above statistical principle.

3.5.4 Implementation aspects of the HOHMM-OU filtering


To detect the presence of memory, we estimate the order of differencing d in the autore-
gressive fractionally integrated moving average framework [16]. For 0 < d < 0.5, there
is presence of finite long memory, and short memory for d = 0. Estimating d could be
done via the Geweke-Porter-Hudak estimator, approximate MLE, and the smoothed peri-
odogram method with the R functions ‘fdGPH’, ‘fdML’, and ‘fdSperio’ respectively. We
relied on ‘fdSperio’ proposed by Reisen [30] as it has no constraint on d and applies to non-
stationary series, but the same cannot be said for the ‘fdGPH’ and ‘fdML’ functions. Our
data set yields a rough estimate of d = 1/4, implying it possesses some form of memory.
We acknowledge the limitation of our modelling, which is the imposition of a second-order
82 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

lag and our current HOHMM approach still does not include a mechanism to provide an
optimal lag estimate based on the data series. However, for the purpose of illustrating our
filtering implementation, we verified that our temperature data series exhibits memory, val-
idating the appropriateness of the HOHMM.

Remark: Indeed, if for a given data set, a lag order of greater than 2 is formally nec-
essary (in a statistical-inference sense), then the transformation in (3.7) can be repeatedly
utilised until a 2nd-order HOHMM set up is obtained, and therefore, our current filtering
and estimation results could be applied in a straightforward manner. Despite the capac-
ity for sophistication of being able to include as many lags as needed, there is also the
practical consideration, especially from the perspective of implementation in the industry,
to balance between the benefit of having a flexible but complex model and the associated
formidable computational cost. Of course, given the power of supercomputers, we envision
that our optimal processing results in this chapter for large lags can be efficiently imple-
mented someday and the dreaded curse of dimensionality can be appreciably alleviated.
With the rapid continuing development in computing architectures, we hope to see that
the complicated part of re-coding and extension of filtering algorithms as the lag becomes
bigger could be facilitated with much ease.

3.5.4.1 Initial values for the parameter estimation

The filtering algorithms are implemented by first finding the initial estimates of parameters
1
under the assumption that the Xt is a-single state process. A value of is given to each
N
non-zero element of the matrix Π. We detail the procedures in setting initial values for κ,
ϑ, and %. Given that {zk+1 } in (3.10) are IID standard normal, the likelihood function of Xk+1
is
m
(Xk+1 − ϑ − κXk )2
!
Y 1
L (Xk+1 ; κ, ϑ, %) = √ exp − (3.36)
k=1 2π% 2%2

where 1 ≤ m ≤ 1460 in our case. Equivalently, our task is to seek for the maximisers of the
sum of log-likelihood, i.e.,

argmax log L (Xk+1 ; κ, ϑ, %)



m 
(Xk+1 − ϑ − κXk )2 

X 1
=  .

log √ − (3.37)
k=1 2π% 2%2
3.5. Numerical implementation 83

Transition probabilities
h111
h211

0.8
h112
h212
h121
h221
h122
h222
0.4
0.0

10 20 30 40 50 60 70
Algorithm steps

Figure 3.3: Evolution of transition probabilities under a 2-state HOHMM-based model

We employ the R function ‘optim’ to solve the optimisation problem in (3.37), and get
κ = 0.6518, ϑ = −0.001624 and % = 0.4271 acting as benchmarks in selecting initial
values for the parameter estimation of frameworks with more than 1 regime.

3.5.4.2 Evolution of parameter estimates and comparison under HMM and HOHMM
settings

Propositions 1 and 2 aid in getting numerical results from running the self-tuning filtering
algorithms (3.27)-(3.30) on batches of data. Filtered estimates are then fed correspondingly
into parameter estimate representations in (3.32)-(3.35). The filters process new informa-
tion, and in turn, optimally update the parameter estimates. One algorithm run constitutes
an algorithm pass, and we have 73 passes in total. In every algorithm pass beyond the initial
pass, the parameter estimates from the previous pass serve as initial parameter values for
the succeeding pass. The filtering algorithm is implemented on the process Xt driven by
two-state and three-state HOHMCs.

For a complete comparison of model settings, we also perform filtering implementations on


two special cases: 2-state and 3-state HMM frameworks. Note that HMM is a special case
of HOHMM when the lag order is 1. Although not plotted, similar dynamic characteristics
for the evolution of parameter estimates under both the 2-state and 3-state HMM settings
are obtained when compared to the results from the HOHMM settings. The dynamic move-
ments of the estimates are depicted in Figures 3.3–3.5. These gradually converge to certain
84 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

unique values after approximately 25 passes in both the 2-state and 3-state HOHMM set
ups.

Akin to the reliability of parameter estimates is the quantification of their variability. There-
fore, we
" 2examine the variance of the various estimators via the Fisher information I (vw ) =

#
−Evw log L (X; vw ) , which bounds the asymptotic variance of the ML estimates. The
∂v2w
MLE is consistent and has an asymptotically normal sampling distribution [36]. So, we
utilise the limiting distribution of the ML estimator b vw to obtain the 95% confidence inter-
1
val for the estimated vw in the form b vw ± 1.96 p  . The Fisher information involved
I bvw
in each estimator is derived straightforwardly from the log-likelihood functions in the EM
algorithm calculations detailed in the Appendix C, and the final results of such derivations
are listed below.

Atsr
b
I (htsr ) = k
, (3.38)
h2tsr
 
2
Ctk Xk−1
b
I (κt ) = , (3.39)
%2t
Bt
b
I (ϑt ) = k
, (3.40)
%2t
Bt
b 3  t  2
I (%t ) = − 2k + 4 b C X
%t %t k k
 
+b Btk ϑ2t + κt2 b 2
Ctk Xk−1 Ctk (Xk ) ϑt ,
− 2b

Ctk (Xk−1 , Xk ) κt + 2b
− 2b Ctk (Xk−1 ) ϑt κt . (3.41)

The 95% confidence intervals for the parameters κ, ϑ, and % in the 2-state HOHMM-based
model are shown in Figure 3.6.

The 95% confidence intervals generated for estimated parameters throughout the 73 algo-
rithm passes are extremely narrow and this is attributed to the declining standard errors.
Standard errors of all parameter estimates for the 1-, 2- and 3-state models under both the
HMM and HOHMM settings were examined and they all become smaller as the the number
of algorithm passes increases. The narrow ranges exhibited in Table 3.4 reflect that precise
estimates are achieved by the EM algorithm with our filtering approach.
3.5. Numerical implementation 85

0.65

κ1
κ2
0.50
κ
0.35

10 20 30 40 50 60 70
Algorithm steps
-0.1 0.1

ϑ1
ϑ2
ϑ
-0.4

10 20 30 40 50 60 70
Algorithm steps
0.80

ρ1
0.65
ρ

ρ2
0.50

10 20 30 40 50 60 70
Algorithm steps

Figure 3.4: Evolution of parameter estimates for κ, ϑ, and % under a 2-state HOHMM-based
model
86 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

0.65

k1
0.50
k

k2
k3
0.35

10 20 30 40 50 60 70
Algorithm steps
0.0

J1
J2
-0.2

J3
J
-0.4

10 20 30 40 50 60 70
Algorithm steps

r1
r2
r3
0.65
r
0.50

10 20 30 40 50 60 70
Algorithm steps

Figure 3.5: Evolution of parameter estimates for κ, ϑ, and % under a 3-state HOHMM-based
model
3.5. Numerical implementation 87

0.65

κ2 with 95% confidence level


κ2 with 95% confidence level
0.50
κ
0.35

10 20 30 40 50 60 70
Algorithm steps
-0.1 0.1

ϑ with 95% confidence level


ϑ12 with 95% confidence level
ϑ
-0.4

10 20 30 40 50 60 70
Algorithm steps
0.7

ρ1 with 95% confidence level


ρ2 with 95% confidence level
ρ

0.5

10 20 30 40 50 60 70
Algorithm steps

Figure 3.6: Evolution of parameter estimates for κ, ϑ, and % under a 2-state HOHMM-based
model with 95% confidence level
88 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

HMM
Parameter 1-state model 2-state model 3-state model
Estimates Bound of SE
Lower Upper Lower Upper Lower Upper
δi
b 3.62152 ∗ 10−90 0.84905 2.88315 ∗ 10−93 5.69605 ∗ 10−2 1.23351 ∗ 10−89 0.50619
ηi
b 3.13262 ∗ 10−90 0.81819 2.49571 ∗ 10−93 5.04276 ∗ 10−2 1.28888 ∗ 10−89 0.50535
i
b 2.21510 ∗ 10−90 0.54026 1.08302 ∗ 10−93 2.36474 ∗ 10−2 7.17870 ∗ 10−90 0.34230
p ji
b 4.79291 ∗ 10−90 1.08185 1.24810 ∗ 10−93 3.60257 ∗ 10−2 3.52893 ∗ 10−90 0.34041
HOHMM
Parameter 1-state model 2-state model 3-state model
Estimates Bound of SE
Lower Upper Lower Upper Lower Upper
κt
b 3.62152 ∗ 10−90 0.84905 1.21738 ∗ 10−96 6.33495 ∗ 10−2 1.53624 ∗ 10−90 7.84591 ∗ 10−1
ϑt
b 3.13262 ∗ 10−90 0.81819 8.94466 ∗ 10−97 5.67838 ∗ 10−2 1.33044 ∗ 10−90 7.57304 ∗ 10−1
b%t 2.21510 ∗ 10−90 0.54026 6.32678 ∗ 10−97 3.59050 ∗ 10−2 9.40836 ∗ 10−91 4.96797 ∗ 10−1
htsr
b 4.79291 ∗ 10−90 1.08185 4.74425 ∗ 10−97 7.03355 ∗ 10−2 6.79239 ∗ 10−91 8.85929 ∗ 10−1

Table 3.4: Interval of standard errors for the parameter estimates under 1-, 2-, 3-state
HMM- and HOHMM-based models

3.5.5 Model selection and other diagnostics


One-step ahead forecasts from both the HMM and HOHMM settings are compared. We
also present a forecasting error analysis accompanied by an AIC analysis on the selection
of optimal number of HMM and HOHMM states.

3.5.5.1 Assessment of predicted temperatures

To make predictions of the DAT values over a one-step ahead time interval, we evaluate the
expected value of the observation process at time k + 1. Given Xk+1 in (3.10), we have

E [Xk+1 | Xk ] = E κ ywk Xk + ϑ ywk + % ywk zk+1 | Xk


    

= κ,byk Xk + ϑ,bywk .

w

(3.42)

Figure 3.7 depicts the one-step ahead forecasts both for the deseasonalised data Xk and
DATs under the 3-state HOHMM-based model using (3.42). The very short-term predic-
tions for Xk and DATs are very close to the actual data series, and the same can also be
said for for Xk and DATs under the HMM-based models. We also present a comparison
3.5. Numerical implementation 89

Temperature (Celsius)

15 actual deseasonalized observations


forecast of deseasonalized observations
10
5
0
-5
-10
-15
01/02/2011

03/02/2011

05/02/2011

07/02/2011

09/02/2011

11/02/2011

01/02/2012

03/02/2012

05/02/2012

07/02/2012

09/02/2012

11/02/2012

01/02/2013

03/02/2013

05/02/2013

07/02/2013

09/02/2013

11/02/2013

01/02/2014

03/02/2014

05/02/2014

07/02/2014

09/02/2014

11/02/2014
times
Temperature (Celsius)

actual DATs
30 forecast of DATs
20
10
0
-10
-20
01/02/2011

03/02/2011

05/02/2011

07/02/2011

09/02/2011

11/02/2011

01/02/2012

03/02/2012

05/02/2012

07/02/2012

09/02/2012

11/02/2012

01/02/2013

03/02/2013

05/02/2013

07/02/2013

09/02/2013

11/02/2013

01/02/2014

03/02/2014

05/02/2014

07/02/2014

09/02/2014

11/02/2014
times

Figure 3.7: One-step ahead forecasts under a 3-state HOHMM-based model

between the observed seasonal HDD and the predicted seasonal HDD covering the entire
period (2011-2014) under the 3-state HOHMM-based settings in Figure 3.8. Similar com-
parison was performed, though not shown, under the HMM-based settings. The seasonal
HDD forecasts obtained from the proposed models follow closely the actual seasonal HDD.
Furthermore, the magnified view of the predicted DATs under the HOHMM set ups is given
in Figure 3.9 covering a 6-month period; DATs forecasts under the HMM setting are also
generated but not shown here. All forecasts are relatively close to the actual DATs. The
dynamics and trends of the temperature series are captured well by our filtering algorithms
and self-calibrating parameter estimation method.
90 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

4000

4000

4000
3000

3000

3000
Cumulative HDD

Cumulative HDD

Cumulative HDD
2000

2000

2000
1000

1000

1000
actual HDD actual HDD actual HDD
predicted HDD predicted HDD predicted HDD
0

0
Nov
Dec
Jan

Nov
Dec
Jan

Jan
Oct

Feb
Mar
Apr

Oct

Feb
Mar
Apr

Nov
Dec
Oct

Feb
Mar
Apr
2011-2012 2012-2013 2013-2014

Figure 3.8: Comparison of the expected HDD and actual HDD in a 3-state HOHMM-based
model

25 actual DATs
Temperature (Celsius)

forecast DATs in 1-state HOHMM


forecast DATs in 2-state HOHMM
20 forecast DATs in 3-state HOHMM
15
10
5
0
-5
07/01/2014

07/08/2014

07/15/2014

07/22/2014

07/29/2014

08/05/2014

08/12/2014

08/19/2014

08/26/2014

09/02/2014

09/09/2014

09/16/2014

09/23/2014

09/30/2014

10/07/2014

10/14/2014

10/21/2014

10/28/2014

11/04/2014

11/11/2014

11/18/2014

11/25/2014

12/02/2014

12/09/2014

12/16/2014

12/23/2014

12/30/2014

times

Figure 3.9: Comparison of one-step-ahead forecasts in 1-, 2-, and 3-state HOHMM-based
models
3.5. Numerical implementation 91

3.5.5.2 Error analysis and model selection

We perform an error analysis to quantify the goodness of fit of various HMM and HOHMM
settings using the criteria put forward in Erlwein et al. [19], which are the mean square error
(MSE), root mean square error (RMSE), absolute mean error (MAE) and relative absolute
error (RAE). Suppose Xk denotes the actual value at time k, X̂k stands for the one-step
ahead prediction value at time k, Ȳ is the mean of Xk ’s, and n = 1460 is the total number of
predicted values.
2 s 2
Pn  Pn 
k=1 X̂k − Xk k=1 X̂k − Xk
MSE = , RMSE = ,
n n
Pn Pn
k=1 | X̂k − Xk | |X̂k − Xk |
MAE = and RAE = Pk=1
n .
n k=1 |Xk − X̄|

Table 3.5 displays the error-analysis results involving Xk and DATs data under the HMM
setting whilst Table 3.6 contains the error-analysis results under the HOHMM setting. Al-
though the errors are generally small, the error metrics illustrate that the 2-state model
outperforms the 1- and 3-state models under both HMM and HOHMM frameworks. In
addition, the 1-, 2- and 3-state HOHMM-based models produce better forecasts than those
generated by their corresponding HMM settings as far as error measures are concerned.
The 4-state HMM-based and HOHMM-based settings were also examined, but no evi-
dence of even minimal improvement is achieved by making the model more complex such
as including 64 parameters in its probability transition matrix.

To determine if the error-mean differences are statistically significant in each pairwise set-
ting, we perform a t-test using the bootstrapped method. We generate RMSEs for all pos-
sible paired HOHMM settings, and then calculate, using the R function ‘p.adjust’, the
adjusted p-values with the Bonferroni’s method to control the familywise error rate. For
the 1-state HOHMM versus the 2-state HOHMM, the p-value is smaller than 0.01 so that
we can conclude there is sufficient evidence of significant difference after adding one more
regime into the one-state model. For the 2-state HOHMM vis-á-vis the 3-state HOHMM,
the p-value is much greater than 0.05, meaning that we cannot reject the null hypothesis
of no difference. The estimated p-values covering three pairs of model settings are shown
in Table 3.7. On the basis of a 5% significance level, the 1- and 2-state, and 1-and 3-state
models in both the HMM and HOHMM frameworks are statistically different, whilst the 2-
and 3-state settings are not. This suggests that there is benefit to using a regime-switching
92 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

HMM setting Deseasonalized Xt DATs T t


1-state 2-state 3-state 1-state 2-state 3-state
MSE 0.42702 0.42700 0.42887 10.67538 10.67504 10.72172
RMSE 0.65346 0.65345 0.65488 3.26732 3.26727 3.27440
MAE 0.50467 0.50465 0.50728 2.52333 2.52324 2.53640
RAE 0.74435 0.74432 0.74821 0.27539 0.27538 0.27681

Table 3.5: Results of error analysis on HMM-based models

HOHMM setting Deseasonal Xt DATs T t


1-state 2-state 3-state 1-state 2-state 3-state
MSE 0.42702 0.42686 0.42698 10.6754 10.67142 10.67445
RMSE 0.65346 0.65334 0.65344 3.26732 3.26671 3.26718
MAE 0.50467 0.50448 0.50463 2.52334 2.52239 2.52315
RAE 0.74435 0.74407 0.74430 0.27539 0.27528 0.27537

Table 3.6: Results of error analysis on HMM-based models

model, and the 2- and 3-state HMM and HOHMM settings possess similar capability to
capture the DAT dynamics. That is, a two-state model is sufficient for our data.

We complement our error analysis with a likelihood-based model selection analysis via the
AIC. This estimates the Kullback-Leibler information under the ML paradigm, and given
by

AIC = −2 log L (X; v) + 2g,

where g is the number of estimated parameters in the model, and log L (X; v) denotes
the log-likelihood function of the model given the data X and a set of parameters v. A

Setting HMM HOHMM


1-state 1-state 2-state 1-state 1-state 2-state
t-test vs vs vs vs vs vs
2-state 3-state 3-state 2-state 3-state 3-state
p-value 0.01259 7.06352×10−5 0.57193 0.00678 0.01495 1.00000

Table 3.7: Bonferroni-corrected p-values for a paired t-test applied to RMSEs


3.5. Numerical implementation 93

model deemed the best and is chosen if it is able to balance between fitness (maximise the
log-likelihood function) and complexity (minimise the penalty from too many parameters)
yielding the lowest AIC value. As a function of the observation Xt , parameter sets v∗ and
vw , the log-likelihood functions under the respective HMM and HOHMM settings are
β "  
X 1
log L (Xk ; v ) =

 
log  √ 
k=1 2πσ (yk )
Xk+1 − µ (yk )
#
− , (3.43)
2σ2 (yk )
β X N "  
X 1
log L (Xk ; v ) =
w
hyk , et i log  √
w
 

k=1 t=1 2π% (y k )
   #
Xk+1 − κ yk Xk − ϑ yk
w w
−   , (3.44)
2%2 ywk

where β denotes the number of observations in each pass, and N is the number of regimes.
From equation (3.10) and the matrix of transition probabilities, we can determine the total
number of parameters in each of the HMM and HOHMM settings and these are presented
in Table 3.8.

Setting 1-state 2-state 3-state N-state


HMM 3 8 15 3N + (N − 1) N
HOHMM 3 10 27 3N + (N − 1) N 2

Table 3.8: Number of estimated parameters under the HMM and HOHMM settings

Employing equations (3.43) and (3.44), and Table 3.8, we compute the AIC values for
each model as we run through the algorithm passes. Figure 3.10 illustrates the evolution
of the computed AIC values for all candidate models. As the number of regimes grows
in a model, there is a substantial increase in the number of parameters especially coming
from the number of transition probabilities. Our results show that the 2-state HOHMM has
the smallest AIC value and a more stable pattern in the entire data period. Therefore the
2-state model is the best-fitting model for the dynamics of our data set, which agrees with
our error analysis.

Although it is typical in the statistical analysis of a proposed financial model to bench-


mark it against say, random walk, log-normal and ARCH-type models (see Tenyakov et al.
[32] and Hardy [23], for example), these common benchmarks are incompatible with our
94 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

100
80
AIC

60
40
20

0 10 20 30 40 50 60 70

Algorithm steps

1-state HMM/HOHMM 2-state HMM 3-state HMM


2-state HOHMM 3-state HOHMM

Figure 3.10: Evolution of AIC for 1-, 2-, and 3-state HMM- and HOHMM-based models

modelling situation. More specifically, these models for benchmarking do not have the ca-
pability to capture the important property of mean-reversion for temperature data and thus,
they are inordinately disadvantaged in comparison to mean-reverting models. Hence, the
only meaningful benchmark not involving any Markov chain in our case is the one-state
OU process, where there is no switching of regimes. From Figure 3.10, the 1-state model
(i.e., no Markov chain) is performing poorly relative to the 2- and 3-state models. This is
also supported by the HDD error-analysis metrics shown in Table 3.9.

HDD HMM setting HOHMM setting


1-state 2-state 3-state 1-state 2-state 3-state
MSE 4725.2238 4724.6591 5754.0650 4725.2238 4705.0409 4724.9993
RMSE 68.7403 68.7362 75.8556 68.7403 68.5933 68.7386
MAE 51.4458 51.4237 57.3876 51.4458 51.3215 51.4243
RAE 0.0504 0.0504 0.0562 0.0504 0.0503 0.0504

Table 3.9: Results of error analysis for the HDD on HMM- and HOHMM-based models
3.6. Conclusion 95

3.6 Conclusion

The major contribution of this chapter is the development of a model flexible enough to
describe a data set that exhibit seasonality, randomness, mean-reversion and memory. Such
is the case for the dynamics of DATs series, which is the most utilised underlying variable
in the creation of weather-dependent derivatives. In particular, we put forward a model fol-
lowing a deterministic seasonality component and the mean-reversion is governed by the
OU process. The stochasticity is generated by the use of the Brownian motion in the diffu-
sion component. Embedding an HOHMM facilitated the switching of parameters amongst
various conceivable regimes and also captures short or long-range dependence.

Our estimation procedure for all the model parameters is successfully achieved through
the extension of the HMM-OU filtering techniques applied to a proposed transformed
HOHMM. With the change of probability measure and EM algorithm, self-tuning HOHMM
recursive filters were obtained that supports online parameter estimation. Our results also
encompass the special case of HOHMM with lag 1, which is the case for the usual HMM
recursive filters. Our modelling methodology was tested on Toronto’s DATs covering a
4-year period. Our post-modelling diagnostics reveal reliable one-step ahead forecasts un-
der various settings. We must note though that the 2-state HOHMM provides the best
framework for the data that we investigated. This is validated by the AIC analysis and the
accompanying error analyses.

The current HMM-modulated models have been employed in many areas of finance and
economics, such as the modelling of commodity futures prices, developing asset allo-
cation strategies, valuing interest-rate products, and so on. HMM- and HOHMM-based
methods in weather derivatives are virtually nonexistent until the publication of the 2-state
regime-switching temperature model of Elias et al.[14]. This work further elevates that
development temperature modelling for weather derivatives in three respects: (i) provision
of dynamic estimation with efficient algoritms, (ii) generalisation of the usual HMM to
take advantage of information beyond lag one, and (iii) empirical results that reinforce the
choice of the optimal regime dimension. We believe that our proposed method offers an
effective alternative to available approaches in efficient modelling and forecasting temper-
ature dynamics with easily interpretable results.

A natural direction to pursue further is to price weather derivatives by finding a risk-neutral


96 Chapter 3. A self-updating model driven by HOHMC for temperature dynamics

measure and linking this pricing measure to the optimal estimates produced from our pro-
posed filtering procedure. Whilst this is beyond the objectives of this study, we have laid
down some groundwork to motivate that kind of research exploration. It would be also
worth examining this HOHMM-based filtering algorithms and the parameter estimates in
relation to the modelling of other weather measurements such as wind speed, level of rain-
fall, etc, either for the purpose of financial pricing or meteorological modelling and fore-
casting. Finally, an apparent weakness of this research is the modeller’s prescription of
HOHMM’s lag order before the filtering algorithms could be implemented. It is hoped that
this could be rectified in future research through the construction of statistical inference
techniques that estimate the correct lag order as implied by the data.
References

[1] P. Alaton, B. Djehiche, D. Stillberger, On modeling and pricing weather derivatives,


Applied Mathematical Finance, 9 (1) (2002) 1–20.

[2] F. Benth, J. Šaltytė-Benth, Weather derivatives and stochastic modelling


of temperature, International Journal of Stochastic Analysis, 1–21 (2011).
doi:10.1155/2011/576791

[3] F. Benth, J. Šaltytė-Benth, Stochastic modelling of temperature variations with a view


towards weather derivatives, Applied Mathematical Finance, 12 (1) (2005), 53–85.

[4] J. Beran, Statistics for Long-Memory Processes, Chapman and Hall, New York
(1994).

[5] S. Campbell, F. Diebold, Weather forecasting for weather derivatives, Journal of the
American Statistical Association, 100 (469) (2005) 6–16.

[6] M. Cao, J. Wei, Weather derivatives: a new class of financial instruments, University
of Toronto Web, http://www.rotman.utoronto.ca/ wei/research/JAI.pdf (2003), Ac-
cessed March 2016.

[7] W. Ching, T. Siu, L. Li, Pricing exotic options under a high-order Markovian regime
switching model, Journal of Applied Mathematics and Decision Sciences., (2007)
1–15, doi:10.1155/2007/18014.

[8] G. Considine, Introduction to weather derivatives, CME Group Web,


http://web.math.pmf.unizg.hr/ amimica/pub/lapl.pdf, Accessed March 2016

[9] P. Date, A. Tenyakov, R. Mamon, Filtering and forecasting commodity futures prices
under an HMM framework, Energy Economics, 40 (2013) 1001-1013.

[10] M. Davis, Pricing weather derivatives by marginal value, Quantitative Finance, 1 (3)
(2001) 305–308.

97
98 REFERENCES

[11] B. Dischel, At least: a model for weather risk, Weather Risk Special Report, Energy
and Power Risk Management, March issue (1998) 3032.

[12] G. Dorfleitner, M. Wimmer, The pricing of temperature futures at the Chicago Mer-
cantile Exchange, Journal of Banking and Finance, 34(6) (2010) 1360–1370.

[13] J. Dutton, Opportunities and priorities in a new era for weather and climate services,
Bulletin of the American Meteorological Society, 83 (9) (2002) 1303-131.

[14] R. Elias, M. Wahab, L. Fang, L., A comparison of regime-switching temperature


modeling approaches for applications in weather derivatives, European Journal of
Operational Research, 232 (3) (2014) 549–560.

[15] R. Elliott, Exact adaptive filters for Markov chains observed in Gaussian noise, Auto-
matica, 30 (1994) 1399–1408.

[16] R. Elliott, L. Aggoun, J. Moore, J., Hidden Markov Models: Estimation and Control,
Springer, New York (1995).

[17] R. Elliott, V. Krishnamurthy, New finite-dimensional filters for parameter estimation


of discrete-time linear Gaussian models, IEEE Transactions on Automatic Control,
44 (5) (1999) 938-051.

[18] C. Erlwein , R. Mamon, An online estimation scheme for a HullWhite model with
HMM-driven parameters, Statistical Methods and Applications, 18 (1) (2009) 87–
107.

[19] C. Erlwein, F. Benth, R. Mamon, HMM filtering and parameter estimation of an elec-
tricity spot price model, Energy Economics, 32 (5) (2010) 1034–1043.

[20] D. Gujarati, Basic Econometrics (4th ed), McGraw-Hill, New York (2003).

[21] C. Granger, R. Joyeux, An introduction to long memory time series models and frac-
tional differencing, Journal of Time Series Analysis, 1 (1980) 49–64.

[22] J. Hamilton, Rational expectations econometric analysis of changes in regime: an


investigation of term structure of interest rates, Journal of Economic Dynamics and
Control, 12 (1988) 385–423.

[23] M. Hardy, A regime-switching model of long-term stock returns, North American


Actuarial Journal, 5 (2) (2001) 41–53.
REFERENCES 99

[24] J. Hull, A. White, Pricing interest-rate-derivative securities, Review of Financial Stud-


ies, 3 (4) (1990) 573–592.

[25] S. Jewson, A. Brix, C. Ziehmann, Weather Derivative Valuation: The Meteorological,


Statistical, Financial and Mathematical Foundations, Cambridge University Press,
Cambridge (2005).

[26] R. Mamon, R. Elliott, Hidden Markov Models in Finance: International Series in


Operations Research and Management Science, 104, Springer, New York, (2007).

[27] R. Mamon, C. Erlwein, B. Gopaluni, Adaptive signal processing of asset price dy-
namics with predictability analysis, Information Sciences, 178 (1) (2008) 203–219.

[28] Price Waterhouse Coopers, 2011 Weather risk derivative survey, Weather Risk
Management Association,
http://library.constantcontact.com/download/get/file/1101687496358-
153/PwC+Survey+Final+Presentation+20110519PRESS.pdf
(2011), Accessed March 2016

[29] L. Rabiner, A tutorial on hidden Markov models and selected applications in speech
recognition, Proceedings of the IEEE, 77 (2) (1989) 257–286.

[30] V. Reisen, Estimation of the fractional difference parameter in the ARIMA( p, d, q)


model using the smoothed periodogram, Journal of Time Series Analysis, 15 (1994)
335–350.

[31] T. Siu, W. Ching, E. Fung, M. Ng, X. Li, A high-order markov-switching model for
risk measurement, Computers and Mathematics with Applications, 58 (1) (2009) 1–
10.

[32] A. Tenyakov, R. Mamon, M. Davison, Modelling high-frequency FX rate dynamics:


A zero-delay multi-dimensional HMM-based approach, Knowledge-Based Systems,
101 (2016) 142–155.

[33] A. van der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge and
New York (1998).

[34] World Meteorological Organization (WMO) and the Centre for Research on the Epi-
demiology of Disasters (CRED) of the Catholic University of Louvain (UCL), Atlas
100 REFERENCES

of Mortality and Economic Losses from Weather, Climate and Water Extremes 1970-
2012, WMO Press, Belgium (2014).

[35] C. Wu, On the convergence properties of the EM Algorithm, Annals of Statistics, 11


(1) (1983) 95–103.

[36] X. Xi, R. Mamon, Parameter estimation of an asset price model driven by a weak
hidden markov chain, Economic Modelling 28 (1) (2011) 36–46.
Chapter 4

A higher-order Markov
chain-modulated model for electricity
spot-price dynamics

4.1 Introduction

Electricity contributed greatly to technological revolution and hastened industrial progress.


As one of the pivotal mankind’s discoveries, it is indispensable in our modern society. It
changed, in a significant way, almost every aspect of our daily lives. It would be hard
to imagine a world without electronics, machines, equipments and inventions that are en-
ergised by electric power nowadays. Considering the advantages of economies of scale
and steady industrial productivity, regulation of electricity supply was centralised; but, this
inhibited any competition at all. In terms of economic efficiency, microeconomic theory
maintains that introducing competition can motivate production innovation, increase sup-
ply diversity, and offer most benefits to consumers. Since privatisation and competition
were first introduced to electric power systems in Chile [28], electricity markets have ex-
perienced restructuring and deregulations in many countries. Electricity is now a daily ne-
cessity; given its prime importance, it is a commodity actively trading in the financial mar-
kets. Consequently, we are witnessing increasing price uncertainties and risks in electricity-
driven investment portfolios. So, models capable of adequately capturing electricity price
dynamics for electricity- contract pricing and risk management are sought. Existing re-
search studies attempt to formulate electricity pricing models that mimick model-pricing
developments in the commodity market. However, due to the unique characteristics of

101
102 Chapter 4. An HOHMM for electricity spot-price dynamics

electricity spot prices, many financial models designed for regular commodities cannot be
adapted necessarily to the electricity markets.

Compared to common and tangible assets in the capital markets, electricity is non-storable;
when produced, it must be consumed almost instantly. This non-storability feature makes
its spot price highly sensitive to demand and supply in real time. As a result, electricity
has more dramatic price evolutions than those of other energy sources, such as crude oil
and natural gas. Since electricity cannot be stored, electricity prices are largely dependent
upon supply and demand, which exhibit pronounced seasonality patterns. Somewhat dif-
ferent from general seasonality, highly occurring multi-cyclical nature of prices is evident
in electricity markets. Annual and quarterly patterns, for example, are attributed princi-
pally to the variations in temperature and duration of daylight, especially during winter and
summer. Cyclical patterns also occur weekly, daily and even intra-daily caused by quantity
demanded that is varying in time.

Similar to other energy commodities, it is also critical to incorporate the mean-reverting


property in electricity prices as argued in Schwartz [31]. Owing to demand fluctuations,
the price can deviate from the long-run mean. However, the production cost and supply
adjustments are capable of hauling it back to the mean level over time, hence, the mean-
reverting behaviour of electricity prices. Unbalanced supply-and-demand situation that
happens expectedly or unexpectedly from time to time could bring about abrupt and large
jumps on the prices. Capturing simultaneously the above-mentioned properties of the price
dynamics is a great challenge in electricity spot-price modelling in the framework of any
deregulated markets. Yet, this difficulty must be dealt with in order to address the concerns
nestled in the trading and hedging of electricity-dependent derivatives.

Modelling electricity prices can be divided into two roughly categorical approaches. The
first approach focuses on the modelling of the entire forward curve akin to the valuation of
associated derivatives and short-term prediction of futures prices; see Islyaev and Date [19]
and Fanelli et al. [12]. The second approach concentrates on model formulation of spot
prices taking into account several and only the most relevant fundamental price drivers
observed in electricity markets; see Ziel et al. [47]. The emphasis of our research is on
the latter approach. Early research works mainly put forward stochastic models that only
capture seasonality and mean reversion in prices. The seasonal systematic patterns are typ-
ically described by sinusoidal functions [23] and the Ornstein-Uhlenbeck (OU) process is
4.1. Introduction 103

prevalently utilised to model the mean reversion as noted in Schwartz [31]. A two-factor
model in an OU-inspired setting was developed by Schwartz and Smith [32] in an effort
to explicitly portray seasonal patterns and short-term mean-reverting variations in prices at
the equilibrium level. However, most of these early literature did not take into account the
peculiarity of electricity prices and examine it thoroughly.

Motivated by the phenomenon of frequent but large price jumps observed in the electric-
ity markets, scholars began to develop modelling capabilities that can handle spikes in
electricity-price fluctuations. Deng [6] pioneered the use of jump-diffusion models with
a mean-reverting process to replicate the distinct characteristics of electricity-spot prices;
such models were fitted to data from the American markets. Benth et al. [1] added jumps
to a general exponential multi-factor mean-reverting model and performed calibration em-
ploying data from the Nord-Pool market. In Seifert and Uhrig-Homburg [34], different
models for the jump component in the electricity markets were explored and the effec-
tiveness of different jump specifications compared; such models were applied to the Euro-
pean Energy Exchange (EEX) market. Many research proponents advanced the utility of
Poisson-jump models, whilst others, in recent literature, argued that modelling the ‘jumpy’
attribute could be achieved better by introducing a regime-switching approach. Huisman
and Mahieu [18] proposed a regime-shifting jump process with a recovery state and con-
cluded that it performed better than a stochastic jump model whilst a two-regime-switching
model with ‘abnormal’ and ‘normal’ states was shown superior to a Poisson-jump model
in De Jong [7] in capturing electricity-price dynamics. Wu et al. [37] studied the Al-
berta electricity-spot market via a hidden Markov model with a multi-model identification
approach. Alberta electricity pool prices are divided into five classes, and the last three
classes with prices over $100/MWh are deemed high-pool-price regions. It was found that
price-forecasting performance could be improved substantially, especially in high-pool-
price regions, by incorporating a Markovian regime-switching mechanism.

In the context of regime-switching approaches, hidden Markov models (HMMs) have been
widely successful in many engineering, economic and financial applications; see Mamon
and Elliott [20, 25]. HMMs are beneficial for modelling processes illustrating regime-
switching dynamics via Markov chains with latent states. This approach was previously
applied in the examination of electricity-price behaviour. A non-stationary model based
on input-output HMM was developed to model time series of spot prices in the Spanish
electricity market [15]. As well, Yu and Sheblé [46] utilised HMMs to efficiently model
104 Chapter 4. An HOHMM for electricity spot-price dynamics

price movements in the electricity market and provide good forecasts adjudged from the
perspective of accuracy and dynamic information in the US market.

The assumption of first-order state transition dependency in HMM renders this model de-
ficient in taking advantage of the information content of historical data. Thus, researchers
introduced recently higher-order hidden Markov models (HOHMMs), also called weak
hidden Markov models. An HOHMM is a doubly embedded stochastic process having
an observation series and an underlying unobserved Markov chain. The probability dis-
tribution of the Markov chain’s state transition at present depends not only on the most
recent state but also on states at prior epochs into the past. The primary aim in practice
is to obtain the best estimate of the Markov chain’s current or future state, which repre-
sents the ‘filtered’ or ‘predicted’ state of the market or economic system. This estimation is
performed by taking the conditional expectations of functions of the Markov chain and ob-
served electricity spot prices that are regarded as offshoot of the interaction of many latent
factors, such as market participants’ actions, production and consumption, weather condi-
tions, transmission network, etc. In our context, HOHMM setting is designed to capture the
presence of memories in electricity market prices that will provide additional information
in the estimation and forecasting of economic regimes and other model parameters.

An array of HOHMMs’ applications can be found in speech recognition and finance,


amongst other research areas. An HOHMM for piecewise linear processes was developed
by Lee and Jean [22] to approximate the behaviour of a real process in speech recogni-
tion. Xiong and Mamon [43] demonstrated that an HOHMM captures more accurately the
empirical characteristics of a data set on daily average temperature (DAT) when compared
to the usual HMM. With the aid of HOHMMs, Xi and Mamon [38] devised a more flexi-
ble framework and showed better performance in forecasting the risky asset’s log returns.
Other HOHMM-related works in financial modelling include [39], [40], [41], and [42].

To the best of our knowledge, this research is the first to build an HOHMM-modulated
electricity spot-price model. The regular HMM approach is extended in our approach
and the HOHMM’s prediction performance is examined. Our work can viewed both as
an update and extension of Erlwein et al.’s model construction [7] that was based on an
exponential-OU process with a jump component under the HMM setting; albeit Erlwein et
al.’s empirical application covers the Nord-Pool market whilst ours investigates the Alberta
electricity market. The structure of the electricity industry in Alberta is unique in North
4.2. Model formulation 105

America and apparently different from that of Norway, as the former is developed from
a set of uniquely distinguishable circumstances. In particular the Alberta Electric System
Operator (AESO) relies on wind farms; and to avoid power shortages when wind drops off
and does not blow consistently, coal and natural gas plants have to take up the slack.

Modelling the behaviour of electricity spot prices is of utmost importance because they
serve as the underlying variable for the values of many traded electricity contracts and
they are key indicators in the strategic planning and investment-decision making of various
stake holders in the electricity market. Our main research contribution is the creation of
an HOHMM-OU-Poisson framework that simultaneously delineates five salient properties
of electricity spot prices. The implementation portion of this work highlights model val-
idation and post-modelling diagnostics. Capitalising on Erlwein et al.’s use of HMM on
electricity spot price modelling [7], this research further highlights HOHMC-modulated
model parameters to address memory in time series data. Our recursive filters are also ex-
pressed more compact compactly in a matrix notation.

This chapter is organised as follows. In Section 4.2, we building an electricity spot price
model whose parameters are governed by an HOHMM in discrete time. Through a change
of reference probability technique, adaptive filters are presented in Section 4.3 for the
states of the HOHMH and related quantities of the observation process. We outline the
self-calibrating parameter estimation scheme in terms of the recursive filters via the EM al-
gorithm in Section 4.4. Numerical implementation, which includes assessment of model’s
goodness of fit and prediction ability, is performed on a 4-year AESO data in Section 4.5.
Implication to forward pricing of our proposed HOHMM-modulated electricity spot price
model is also illustrated. Section 4.6 provides some concluding remarks.

4.2 Model formulation

4.2.1 Model for electricity spot price


As noted above, studies in the earlier literature employ time series models to describe the
price process of electricity. An OU process with a predictable component is a germane
example, and this was adopted by Lucia and Schwartz [23] to replicate the unfolding of
the log-spot prices in the Nordic exchange. Yousef [45] applied mean-reverting diffusion
106 Chapter 4. An HOHMM for electricity spot-price dynamics

models to the Alberta electricity market; however, these models could not produce spikes in
the dynamics of electricity spot prices. Invoking Benth et al. [1], we suppose the electricity
spot price S t at time t, defined on some probability space (Ω, F , P), is given by

S t = Dt eXt . (4.1)

Equation (4.1) comprises two components to handle seasonality, and stochasticity coming
from normal perturbations and jump behaviours. The deterministic function Dt accounts
for regularities in the price evolutions and periodic trends. The stochastic part Xt is as-
sumed to be an OU process (to rationalise the tendency to return to a long-run mean) plus
a compound Poisson process that deals with excessive volatilities and spikes.

Trigonometric functions were applied by various authors to capture processes’ seasonality


in different complex systems. In the power market for instance, Lucia and Schwartz [23]
included sinusoidal and dummy variables for prices in the Nordic market. Linear trend and
cosine functions were introduced to model spot prices’ seasonality in the US electricity
market by Geman and Roncoroni [14]. As in Xiong and Mamon [43, 44], daily temperature
data series can be modelled by a combination of sinusoidal functions, with varying seasonal
frequencies, and a linear trend. Following [43] and [1], Dt has the specification
3 " ! !
X 2π 2π
Dt =at + b + ch sin dh t + eh cos dh t
h=1
365 365
! !#
2π 2π
+ gh sin dh t + jh cos dh t , (4.2)
7 7
where d1 = 1, d2 = 2, and d3 = 4 to cover periodic trends including the annual, semi-
annual, quarterly, monthly, weekly, and mid-week patterns. The constants a, b, ch , eh , gh ,
and jh are recovered from a data set of prices. Such a deterministic function D(t) may be ex-
tended with the augmentation of more terms to elucidate other possible factors at play (e.g.,
effects of generation capacity and holidays). We argue, however, that since temperature is
a major driver for electricity demand and contributes substantially to seasonal variations,
equation (4.2) can sufficiently portray the seasonal characteristics of electricity spot prices.

4.2.2 The fusion of HOHMM, OU and compound Poisson models


In the ensuing exposition, we shall denote all vectors and matrices by bold small En-
glish/Greek letters and bold capitalised English/Greek letters, respectively. The desea-
sonalised component Xt in (4.2) follows an OU model and an additive compound-Poisson
4.2. Model formulation 107

component; these equip the capability to model mean reversion, price variations and spike
dynamics. The process Xt in equation (4.1) satisfies the stochastic differential equation
(SDE)

dXt = α(θ − Xt )dt + ξdWt + dJt , (4.3)

where α, θ, and ξ stand for the speed of mean reversion, mean-reverting level, and volatility,
respectively. A Brownian motion {Wt } models the random price fluctuations under a stable
market condition, whilst {Jt } is designed to pick up the price spikes. A compound Poisson
process affords capacity to model jumps, as adduced in [11, 20] and similar to their settings,
we let dJt = βdPt , where Pt is a Poisson process with a constant intensity ς and a normally
 
distributed jump size β ∼ N µβ , σ2β . For s ≤ t, the continuous-time solution of equation
(4.3), by Itô’s lemma, is
Z t
α(t−s)
eαu dWu
 
Xt =X s e + 1−e−α(t−s)
θ + ξe−αt
s
Z t
+ e−α(t−u) dJu . (4.4)
s

By approximating the distributions of the stochastic integrals in (4.4), the discrete-time


version of the solution is
r
1 − e−2α4tk+1
Xk+1 =Xk e −α4tk+1
+ 1−e −α4tk+1 
θ+ξ zk+1

P4tk+1
X
+ e−α(4tk+1 −υm ) βm , (4.5)
m=1

where 4tk+1 = tk+1 − tk for k ∈ Z∗ := Z+ ∪ {0}, υm is the occurrence time of the mth jump,
and {zk+1 } is a sequence of independent and identically distributed (IID) standard normal
random variables.

To create a regime-switching model for electricity spot prices, we begin with a homoge-
neous Markov chain yk with a finite state space {e1 , e2 , . . . , eN }, where ei = (0, . . . , 0, 1, 0, . . .
, 0)> ∈ RN with 1 in the ith position, > is the matrix transpose operator, and N is the state-
space dimension. Model parameters in equation (4.5) switch randomly, in accordance with
the dictates of yk , amongst different electricity-market regimes as time progresses. With
the canonical basis as yk ’s state space, we have αk := α(yk ) = hα, yk i, θk := θ(yk ) = hθ, yk i,
ξk := ξ(yk ) = hξ, yk i, and βk := β(yk ) = hβ, yk i, where h·, ·i is the inner product in RN . The
108 Chapter 4. An HOHMM for electricity spot-price dynamics

stochastic process Xk in equation (4.5) then can be written as


 
Xk+1 =Xk e−α(yk )4tk+1 + 1 − e−α(yk )4tk+1 θ(yk )
s
1 − e−2α(yk )4tk+1
+ ξ(yk ) zk+1
2α(yk )
P4tk+1
X
+ e−α(yk )(4tk+1 −υm ) βm (yk ). (4.6)
m=1

To completely characterise the parameter estimation and filtering under the HOHMM set-
ting, we concentrate on a second-order hidden Markov chain. Of course, in theory and
principle, the estimation and filtering for the generalised Markov chain with order greater
than 2 can be extended in a straightforward manner, notwithstanding the corresponding
computational challenge. Suppose yw is a second-order hidden Markov chain regulating
the evolution of α, θ, ξ, and β. We define Fk := Fkw ∨ Fkz ∨ FkJ as the global filtration,
where Fkw , Fkz , and FkJ are filtrations generated by {ywk }, {Wt } and {Jt }, respectively. Under
the probability space (Ω, F , {Fk }, P), the discrete-time hidden Markov chain ywk at the cur-
rent step k depends on the information revealed at two prior steps k − 1 and k − 2. Following
2
[28], write T for the RN×N transition probability matrix, and specified as
 
 p111 p112 ··· p11N ··· p1N1 p1N2 ··· p1NN 
 
 p211 p212 ··· p21N ··· p2N1 p2N2 ··· p2NN
 ,

T :=  .. .. .. .. .. .. .. ..
 . . . . ··· . . . . 
 
pN11 ··· pN1N ··· pNN1 pNN2 ··· pNNN

where pdcb := P(ywk+1 = ed |ywk = ec , ywk−1 = eb ) with k ≥ 1 and d, c, b ∈ {1, 2, . . . , N}, which
stands for the probability that the Markov chain will be in state d at time k + 1, given that
it is in state c at time k and in state b at time k − 1.

A vital strategy in the estimation and filtering of HOHMMs is a mapping that converts an
HOHMM into an HMM, and then the usual HMM filtering methods can then be conve-
niently applied. For our second-order HMC, let $ be a transformation defined by

$(eb , ec ) = ebc , for 1 ≤ b, c ≤ N, (4.7)

where ebc is a unit vector with 1 in its ((b − 1)N + c)th position. We obtain a new Markov
chain $(ywk+1 , ywk ), whose state space is the canonical basis of RN , and
2

h$(ywk+1 , ywk ), ebc i = hywk+1 , eb ihywk , ec i. (4.8)


4.3. Recursive filtering 109

The corresponding dynamics are

$(ywk+1 , ywk ) = A$(ywk , ywk−1 ) + mwk+1 , (4.9)

where {mwk+1 }k≥1 is a sequence of martingale increments with E[mwk+1 |Fkw ] = 0 under the
real-world measure P. The matrix A = (a ji ) is the RN
2 ×N 2
probability transition matrix with
entries a ji := pdcb = P(ywk+1 = ed |ywk = ec , ywk−1 = eb ) for j = (d − 1)N + c, i = (c − 1)N + b,
and a ji = 0 otherwise.

4.3 Recursive filtering

4.3.1 Change of reference probability measure


Deriving recursive filters, under the real-world probability P with the realistic supposition
of dependent observations, could be computationally expensive and cumbersome. To this
end, we shall perform the recursive filtering involving ywk under an ideal reference proba-
bility measure P. e the observed Xk ’s are IID and yw has the same dynamics under
e Under P,
k
both P and P.
e We assume further that the jump component of the stochastic process Xt is
unaffected by the change of reference probability measure following Merton [29]. The esti-
mated processes’ dynamics, under measure P,e can be recovered via a discrete-time version
of the Girsanovs theorem [9] following a reverse measure change.

To aid the filtering computations in our HOHMM setting, we re-express Xk+1 in equation
(4.6) as

Xk+1 = κ ywk Xk + ϑ ywk + % ywk zk+1 + τ ywk ,


   
(4.10)

where
w
κ ywk = e−α(yk )4tk+1 ,

(4.11)

  w

ϑ ywk = 1 − e−α(yk )4tk+1 θ ywk ,

(4.12)

s
w
1 − e−2α(yk )4tk+1
% yk =w
ξ(ywk ) , (4.13)
2α(ywk )
110 Chapter 4. An HOHMM for electricity spot-price dynamics

P4tk+1
X w
τ ywk = e−α(yk )(4tk+1 −vh ) βh (ywk ).

(4.14)
h=1

Akin to the change of measure is an ideal reference measure P̃, whose construction is
justified by the Kolmogorov’s extension theorem [3]. As noted earlier, the corresponding
dynamics of Jk and yw are unaltered by the measure change. We may recover P from P
k
e
given an Fkw -adapted process, for k ≥ 1, through the Radon-Nikodym derivative
k
dP Y
Ψwk = = ϕwl (4.15)
dP F
e w
k l=1

and
  −1 h      i
φ % yl−1 2 w
Xl − ϑ yl−1 − κ yl−1 Xl−1 − τ ywl−1
w w

ϕl =
w
  
%2 ywl−1 φ (Xl )
   w     
ϑ yl−1 + κ ywl−1 Xl−1 + τ ywl−1 Xl
= exp  −

 
%2 ywl−1
      2 
ϑ ywl−1 + κ ywl−1 Xl−1 + τ ywl−1 
−   ,
2%2 ywl−1

where φ is the probability density function of a standard normal random variable and Ψ0 =
1, {Ψl , l ∈ Z+ } is an Flw -adapted martingale under P.

4.3.2 Calculation of recursive filters


Under P
e where the observation process is IID, calculations and derivations needed for the
filtering and parameter estimation are manageable to carry out. Suppose Xk is the filtration
generated by Xk , which will be used to estimate the HOHMC ywk . The Bayes’ theorem for
conditional expectation is a handy tool in getting optimal estimates of various quantities
under P, using the results from recursive filters under P.
e

>
In particular, let gk = gk (1) , gk (2) , . . . , gk (cb) , . . . , gk (NN) ∈ RN , where gk (cb) :=
2

  h i
P ywk = ec , ywk−1 = eb | Xk = E h$(ywk , ywk−1 ), ecb i | Xk . A filter for $(ywk , ywk−1 ) under P is
given by

 EP [Ψwk $(ywk , ywk−1 )|Xk ]


e
gk : = E $(ywk , ywk−1 ) | Xk = .

(4.16)
EPe[Ψwk |Xk ]
4.3. Recursive filtering 111

h i
Write γk := EP Ψwk $(ywk , ywk−1 )|Xk . Since c,b=1 h$(ywk , ywk−1 ), ecb i = h$(ywk , ywk−1 ), 1i = 1,
e PN
2
where 1 is an RN vector with 1 in all of its entries, we have
N
X N
X D Pe  w E
γk , ecb = E Ψk $(ywk , ywk−1 )|Xk , ecb


c,b=1 c,b=1
 N

 X 
= E Ψk
wP
$(yk , yk−1 ), ecb | Xk 
w w
e


c,b=1

=E P
Ψwk | Xk .
e 
(4.17)

Therefore, the filter of $(ywk+1 , ywk ) in equation (4.16) under P has an explicit representation
given by
γk γk
gk = P N =
(4.18)
c,b=1 γk , ecb
γk , 1

.

To derive recursive processes and estimate relevant quantities in terms of $(ywk+1 , ywk ), we
need to construct two N 2 × N 2 matrices by following Xi and Mamon [38]. These are Kt
with eit on its ((i − 1) N + t)th column and 0 elsewhere for 1 ≤ i, t ≤ N, and a diagonal
matrix Hk , where
 
 h1k 0 ··· 0 

... .. 
 0 0 . 
..
 
...
.
 
 0 hkN 
..
 
Hk =  . 
..

...

.

h1k 0
 

.. ..

.


 . 0 0 

0 ··· ··· 0 hkN

with diagonal entries


 −1 
φ % 2
[Xk − ϑ − κXk−1 − τ]
hik = . (4.19)
%2 φ (Xk )


Following Xiong and Mamon [43], we define certain scalar processes of interest involving
the HOHMC ywk . These are as follows:
112 Chapter 4. An HOHMM for electricity spot-price dynamics

(i)
k
X
Aktsr = hywl−2 , er ihywl−1 , e s ihywl , et i, (4.20)
l=2
which refers to the number of jumps from state (er , e s ) to state et up to time k, where
2 ≤ l ≤ k and r, s, t = 1, . . . , N;

(ii)
k
X
Bkt = hywl−1 , et i = Bk−1
t
+ hywk−1 , et i, (4.21)
l=2
which gives to the occupation time up to time k or the length of time that ywk spent in
state et , where 2 ≤ l ≤ k and t = 1, . . . , N;

(iii)
k
X
Bkts = hywl−1 , et ihywl−2 , e s i, (4.22)
l=2
which represents to the occupation time up to time k or the length of time that ywk
spent in state (et , e s ), where 2 ≤ l ≤ k and s, t = 1, . . . , N;

(iv)
k
X
Ckt ( f ) = f (Xl )hywl−1 , et i
l=2

=Ck−1
t
( f ) + f (Xk )hywk−1 , et i, (4.23)

which is an auxiliary process related to ywk for some function f up to time k in state
et , where 2 ≤ l ≤ k, t = 1, . . . , N. Here, f takes the functional forms f (X) = X,
f (X) = (X)2 or f (X) = Xk−1 Xk .

Consequently, the conditional expectation of $(ywk+1 , ywk ) in equation (4.18) can be written
recursively as
gk+1 = AHk+1 gk . (4.24)
Suppose Uk is any Xk -measurable process, denoting any of the quantities in equations
h i
(4.20)-(4.23). Write Dwk [Uk ] := EP Ψwk Uk | Xk and U
bk := E [Uk | Xk ]. Here, Ubk is
e

regarded as the ‘best estimate’ of Uk . Similar to the steps in establishing equation (4.18),
the conditional expectation of Uk given Xk can be obtained using calculations that are
entirely under P
e by observing that

EP [Ψwk Uk | Xk ] Dwk (Uk )


e
bk = E [Uk | Xk ] =
U = . (4.25)
EPe[Ψwk | Xk ] γk , 1

4.4. Optimal parameter estimation 113

Evaluating the numerator of (4.25) (cf Mamon et al. [26]) for each quantity defined in equa-
tions (4.20)–(4.23) and taking advantage of the semi-martingale representation in (4.9), re-
cursive filtering equations are obtained that will provide self-calibrating estimates of the
HOHMM parameters. This will be elaborated further in Section 4.4.

Proposition 1: The filtering recursions for the respective quantity in equations (4.20)–
(4.23) are

Dwk+1 Ak+1
tsr
$(ywk+1 , ywk ) =AHk+1 Dwk Aktsr $(ywk , ywk−1 )
 

+ hgk , e sr ihtk+1 hAe sr , ets iets , (4.26)

Dwk+1 Bk+1
t
$(ywk+1 , ywk ) =AHk+1 Dwk Bk+1
t
$(ywk , ywk−1 )
 

+ Kt htk+1 Agk , (4.27)

Dwk+1 Bk+1
ts
$(ywk+1 , ywk ) =AHk+1 Dwk Bk+1
ts
$(ywk , ywk−1 )
 

+ hgk , ets ihtk+1 Aets , (4.28)

and

Dwk+1 Ck+1
t
( f )$(ywk+1 , ywk ) =AHk+1 Dwk Ck+1
t
( f )$(ywk , ywk−1 )
 

+ f (Xk+1 )Kt htk+1 Agk . (4.29)

Proof The proofs of (4.26)-(4.29) are similar to those given in Mamon et al. [26].

4.4 Optimal parameter estimation


In this section, we derive the optimal estimates for the parameters of the stochastic process
Xt through the maximum-likelihood approach. We employ the Expectation-Maximisation
(EM) algorithm to deal with the somewhat involved nature of our recursive filtering re-
lations, which aptly requires an efficient iterative approach. Refer to [7], [43], and [38],
amongst others, for further details of the EM algorithm.

A key step in the EM-algorithm implementation is to characterise the probability den-


sity function (pdf) of Xt , which is needed in the maximisation of the appropriate expected
114 Chapter 4. An HOHMM for electricity spot-price dynamics

log-likelihood function given the information reflected in Xt . Model parameters are gov-
erned by ywt but suppose that the HOHMM and all model parameters remain unchanged
over the infinitesimal interval [s, t]. Then, the regular HOHMM-modulated OU process
without jumps has a normal distribution with mean X s eα(t−s) + 1 − e−α(t−s) θ and variance
 

ξ2 (1−e−2α(t−s) )

.

The pdf of Xt can be derived completely if the distribution of the jump component Jt is also
 
determined. As stated in Section 4.2.2, Jt relies on β and P. We recall that β ∼ N µβ , σ2β
Rt
and P has a constant jump intensity ς. Utilising the integral approximation s e−α(t−u) dJu ≈
e−α(t−u) (Jt − J s ) in Erlwein et al. [7], the density of the jump component is

X (ς (t − s))m
Φ Jt−s (x) = e−ς(t−s)
m=0
m!
 
× φ x; µβ e−α(t−s) m, σ2β e−2α(t−s) m . (4.30)

By noting the stationarity of the compound Poisson process, equation (4.30) is also the
density function of the increment Jt − J s .

In equation (4.4), Xt is the sum of a regular OU process and a jump term. As in [17], and
just focusing first on the non-switching setting, we can utilise the convolution of the OU
and Jt ’s densities to get the probability density of Xt given X s , which turns out to be an
expectation of a normal density. That is,

X (ς (t − s))m 
ΦXt |Xs (x) = e−ς(t−s)
× φ x; X s eα(t−s)
m=0
m!
 
+ 1 − e−α(t−s) θ + µβ me−α(t−s) ,
 
ξ2 1 − e−2α(t−s) 
+ σβ me
2 −2α(t−s)

" 
=EP4t φ x; X s eα(t−s) + 1 − e−α(t−s) θ
 

 
ξ2 1 − e−2α(t−s)
+ µβ P4t e−α(t−s) ,

#
+ σ2β P4t e−2α(t−s) , (4.31)

where P4t denotes a Poisson counter.


4.4. Optimal parameter estimation 115

Going back to the implementation of the EM algorithm under our HOHMM setting, let P4t
be q and assume (for tractability) that Pt is independent of other parameters in the model.
From (4.10) and (4.31), the discrete-time process Xk+1 , under the HOHMM framework, is
set to follow a mixture of normal distributions with

µX ywk = κ ywk Xk + ϑ ywk + µβ(ywk ) qκ ywk ,


   

and
σ2X ywk = %2 ywk + σ2β(yw ) qκ2 ywk .
  
k

We now compute the maximum likelihood estimates (MLEs) of model parameters via the
EM algorithm. Define Ew = {κt , ϑt , %t , µβt , σβt , ptsr , 1 ≤ t, s, r ≤ N} as the set of HOHMM-
based parameters. Starting with the set Ew0 of initial parameters, the recursive parameter
updates bw of MLEs, where E
an updated set E bw ∈ argmaxEw L(Ew ) and L(Ew ) =
 wyield

E
EE0 dP E Xk . Following Xiong and Mamon’s arguments [43], the estimation of the ma-
dP 0

trix T of transition probabilities can also be facilitated by a change of measure from PE0
w
to PE and entries are updated automatically through the filtering processes. It should be
w bw
noted that ywk is still an HOHMC under both PE and PE but each has the corresponding
transition matrices T = (ptsr ) and bT = b

ptsr . To estimate the transition probabilities in
succession, i.e., by substituting T with b
T, we utilise the likelihood function in combination
with the EM algorithm and consider

Ew k Y N !hywl−2 ,er ihywl−1 ,es ihywl ,et i


dP Y ptsr
ΛTk = Xk = ,
b
(4.32)
dPE0 l=2 t,s,r=1
ptsr

ptsr
= 1 for ptsr = 0 and b ptsr = 0. The estimation of parameters given the observa-
b
where
ptsr
tion process following equation (4.10) is accomplished by blending the EM algorithm and
recursive filters in (4.26)-(4.29). The resulting outcomes for b ptsr and the rest parameters
are given as follows.

Proposition 2: The EM estimates, at state t given a series of observations Xk+1 for k ≥ 0,


116 Chapter 4. An HOHMM for electricity spot-price dynamics

are
Cbkt (Xk , Xk+1 ) − ϑt Cbkt (Xk ) + µβt qCbkt (Xk+1 ) − ϑt qµβt B
bt
κt =
b    2
k

Ck Xk + 2µβt qCk (Xk ) + µβt q Bk


bt 2 bt b t
     
Dwk Ckt (Xk , Xk+1 ) − ϑt Dwk Ckt (Xk ) + µβt qDwk Ckt (Xk+1 )
=       2  
Dwk Ckt Xk2 + 2µβt qDwk Ckt (Xk ) + µβt q Dwk Bkt
 
ϑt qµβt Dwk Bkt
−       2  , (4.33)
Dwk Ckt Xk2 + 2µβt qDwk Ckt (Xk ) + µβt q Dwk Bkt
Cbkt (Xk+1 ) − κt Cbkt (Xk ) − κt µβt qB
bt (Xk )
ϑt =
b k
B bt
k
     
Dwk Ckt (Xk+1 ) − κt Dwk Ckt (Xk ) − κt µβt qDwk Bkt
=   , (4.34)
Dwk Bkt
   
Cbkt Xk+1 2 + κt2 Cbkt Xk2
%2t =
b
Bbt
k
  2 
Bk ϑt + κt µβt q + 2ϑt κt µβt q − σ2βt κt2 q
b t 2

+
Bbt
k
 
Cbkt (Xk ) 2µβt qκt2 + 2ϑt κt
+
B
bt
k
 
2ϑt + 2κt µβt q Cbkt (Xk+1 ) + 2κt Cbkt (Xk+1 Xk )
− , (4.35)
kBbt
Cbkt (Xk+1 ) − ϑt B
bt (Xk ) − κt Cbt (Xk )
µβt =
b k k
κt qB bt (Xk )
  k    
Dwk Ckt (Xk+1 ) − ϑt Dwk Bkt (Xk ) − κt Dwk Ckt (Xk )
=   (4.36)
κt qDwk Bkt (Xk )
   
Cbt X 2 + κ2 Cbt X 2
k k+1 t k k
σ2βt =
b
B κt q
b t 2
 k  
bt ϑ2 + κt µβ q 2 + 2ϑt κt µβ q − %2

B k t t t t
+
Bbt κ2 q
k t
 
Ck (Xk ) 2µβt qκt + 2ϑt κt
b t 2
+
Bbt κ2 q
k t
 
2ϑt + 2κt µβt q Cbkt (Xk+1 ) + 2κt Cbkt (Xk+1 Xk )
− , (4.37)
Bbt κ2 q
k t
 
Abktsr Dwk Aktsr
ptsr =
b =   , ∀pairs (t, s) , t , s. (4.38)
Bbsr
k
Dwk Bksr
4.5. Numerical application 117

Proof See Appendix D.

For ease of numerical implementation, we assume q = 1 in Section 4.5. The filtering re-
cursions in Proposition 1 yield new estimates for κt , ϑt , %2t , µβt , σ2βt , and ptsr , 1 ≤ t, s, r ≤ N
whenever a new sequence of electricity spot prices becomes available up to time k.

The development of these recursive filters thereby implies that the parameter estimates
under our HOHMM setting are self-calibrating. The results in Propositions 1 and 2 con-
stitute further research progress relative to those given in Erlwein et al. [7], and Xiong
and Mamon [43] in the following respects. Firstly, self-calibrating filtering algorithms for
electricity spot prices in a regular HMM setting [7] are extended to a general HOHMM
case. Secondly, we include a compound Poisson process to depict the spikes in the price
dynamics; such inclusion was not a consideration item in the HOHMM setting of [43].
Thirdly, Erlwein et al. [7] went though the route of estimating first the entire drift compo-
nent before being able to compute an estimate for the mean-reverting level κ. In our case,
we directly dealt with the MLE of κ by providing an explicit solution as a function of filter-
ing recursions. The required sequence of computations in recovering the model parameters
is clarified.

4.5 Numerical application


To evaluate the performance of our proposed model in conjunction with its associated fil-
tering algorithms in section 4.4, we implement them on the daily electricity spot prices
(DESP) recorded by the Alberta Electric System Operator (AESO). The AESO data set
contains 1461 observations covering a 4-year period from 01 Jan 2011 to 31 Dec 2014. In
the context of this data set, let us first consider the seasonality and stochastic components
as underscored in the DESP model in accordance with equation (4.1).

4.5.1 Analysis of the deterministic component


Alberta launched the first wholesale electricity market in Canada. It prescribes that if the
wholesale electrical energy generated in the province is not consumed on site, it must flow
through the Power Pool operated by the AESO [27]. The AESO, on behalf of Albertans,
runs a fair and openly competitive electricity market and offers a reliable and economic
118 Chapter 4. An HOHMM for electricity spot-price dynamics

operation of the Alberta Interconnected Electric System (AIES) [5].

From Figure 4.1, we observe prices displaying strong mean reversion, high periodicities,
and numerous spikes. Despite variations and jumps, DESP exhibit a long-run mean-
reverting level and cyclical patterns. For instance, there are more extreme values and
changes in temperature values during the winter and summer seasons, whilst off-peak pe-
riods exist in springs and autumns. The seasonality aspect mainly originates from market
supply and demand, which depend on electricity generation capacity, human activities, and
weather conditions. To execute the recursive filtering equations on the deseasonalised part
of the model, we perform data fitting on the component Dt using the statistical software R’s
built-in regression functions; this then removes the discernible seasonal pattern following
equation (4.1). A ‘step’ function is further applied in selecting the suitable number of ex-
planatory variables in terms of the adjusted-R2 and the Akaike information criterion (AIC).
The descriptive statistics, presented in Table 4.1, guide the parameters’ initialisation in the
implementation procedure. Table 4.2 summarises the estimated parameters for Dt .

Mean Std Deviation Std Error Median


67.53 89.77 2.35 32.34
Min Max Skewness Kurtosis
5.61 669.89 2.98 9.74

Table 4.1: Descriptive statistics for daily electricity spot price (DESP)

Parameter Estimate 95% confidence interval


a −0.0181 (−0.0290, −0.0071)
b 80.7214 (71.5387, 89.9041)
c1 −6.4621 (−12.9686, 0.0445)
c2 8.9067 (2.4942, 15.3191)
g2 −7.6311 (−14.0103, −1.2519)
e1 −6.2641 (−12.6405, 0.1123)
j1 −20.1426 (−26.5187, −13.7664)

Table 4.2: Parameter estimates for the seasonal component Dt

Figure 4.1 displays the plots of the fitted seasonality-component function and the actual
DESP data. A zoom-in view of the data evolution from 01 Jan 2013 - 31 Dec 2014 is
4.5. Numerical application 119

600 actual observations


seasonal component
500
CAD/MWh

400
300
200
100
0
01/01/2011

03/01/2011

05/01/2011

07/01/2011

09/01/2011

11/01/2011

01/01/2012

03/01/2012

05/01/2012

07/01/2012

09/01/2012

11/01/2012

01/01/2013

03/01/2013

05/01/2013

07/01/2013

09/01/2013

11/01/2013

01/01/2014

03/01/2014

05/01/2014

07/01/2014

09/01/2014

11/01/2014
Time

Figure 4.1: Fitted seasonal component and observed spot prices

presented in Figure 4.2. Even though the seasonal component Dt has a notably crucial role
for DESP modelling, the spot-price behaviour is still hugely influenced by stochasticity;
refer to Figures 4.1 and 4.2.

4.5.2 Application of filtering algorithms for the HOHMM-OU process


with jumps
4.5.2.1 Data processing with the filtering algorithm

Electricity spot prices are gathered with daily frequency and so we assign 4t = 1. We
process the observations in 73 batches with 20 data points in each batch. In this sense,
the parameters are updated roughly every 3 weeks. Other filtering window sizes can be
explored too; however, within the data set that we analyse, our experimentation produces
similar outcomes, telling us that with different window sizes have negligible effect. We note
that a 20-point data length for the HOHMM filtering procedure is fairly sufficient and not
numerically onerous in processing the continual flow of new information including those
resulting from abrupt changes in the DESP dynamics due to extreme weather, supply out-
ages, and excess demand, amongst others. In general, practitioners in other fields applying
online HOHMM-based filtering methods have the freedom to choose a data window size
befitting their circumstances such as computational resources and frequency of data gener-
ation.
CAD/MWh CAD/MWh
120

-2
-1
0
1
2
0
100
200
300
400
500
01/01/2011 01/01/2014

03/01/2011

05/01/2011 02/01/2014

07/01/2011

09/01/2011 03/01/2014

11/01/2011

(Jan/01/2014 - Dec/31/2014)
01/01/2012 04/01/2014

03/01/2012

05/01/2012 05/01/2014

07/01/2012

09/01/2012 06/01/2014

11/01/2012

01/01/2013 07/01/2014

Time
Time

03/01/2013

05/01/2013 08/01/2014

07/01/2013

09/01/2013 09/01/2014

11/01/2013

01/01/2014 10/01/2014

03/01/2014

05/01/2014

Figure 4.3: Deseasonalised stochastic component of DESP


11/01/2014

07/01/2014
actual observations
seasonal component

09/01/2014 12/01/2014

11/01/2014
Chapter 4. An HOHMM for electricity spot-price dynamics

Figure 4.2: Zoom-in view of the fitted seasonal component versus observed prices
4.5. Numerical application 121

HOHMM filtering calls for the verification of memory presence, and determination of ini-
tial values in connection with parameter estimation. To detect memory in our data, we
evaluate the fractional-differencing parameter d in Granger and Joyeux’s autoregressive
fractionally integrated moving average (ARFIMA) methodology [16]. When 0 < d < 0.5,
there is a finite long memory in the data; and d = 0 signifies short memory. In computing d,
one may employ the Geweke-Porter-Hudak estimator, approximated MLE, and smoothed
periodogram approach; estimated values for d could be easily returned by utilising the R
functions ‘fdGPH’, ‘fdML’, and ‘fdSperio’, respectively. In our case, we employ ‘fdSpe-
rio’, proposed by Reisen [30], since unlike the two former algorithms, the latter algorithm
has no restriction and is applicable to a non-stationary process. Our de-seasonalised data
set gives db = 0.20.

We wish to get the optimal estimates for the parameter set Ew = {κ, ϑ, %, µβ , σβ , p} ; for em-
phasis, these parameters are driven by a discrete-time finite-state HOHMC. This goal is ac-
complished by initialising estimates and updating subsequently the parameter set. Bench-
marks for starting values are attained by treating Xt as a single-regime process. This implies
that the transition probability matrix T is identity. From (4.10), the likelihood function of
Xk+1 is
 
L Xk+1 ; κ, ϑ, %, µβ , σβ
2
!
(Xk+1 −ϑ−κX k −µβ qκ)
Ym exp − 
2 %2 +σ2β qκ2


= q   , (4.39)
k=1 2π %2 + σ2β qκ2

where 1 ≤ m ≤ 1460 in our implementation. This is equivalent to minimising the negative


of the log likelihood, i.e.
  2 
Xm  q   Xk+1 − ϑ − κXk − µβ qκ 
log 2π %2 + σ2 qκ2 +  . (4.40)
β  
2 % 2 + σ2 qκ
 
k=1 β

κ = 0.6394, b
The R function ‘optim’ is applied to equation (4.40) producingb ϑ = −0.1515,b
%=
µβ = −0.4945, and b
0.6443,b σβ = 0.0103.

4.5.2.2 Implementing the filtering procedure

Propositions 1 and 2, in conjunction with the initial parameter estimates, obtained in Sec-
tion 4.5.2.1 are put into use for the estimation of the proposed model interfacing the
122 Chapter 4. An HOHMM for electricity spot-price dynamics

HOHMM, OU and jump attributes. Our dynamic parameter estimation proceeds by first
evaluating the filtering recursions (4.26)-(4.29) and then employing their numerical out-
comes to provide optimal estimates through equations (4.33)-(4.38); processing comple-
tion of one batch of data values constitutes one algorithm step or pass.

The final value in an algorithm step is used as the initial value for the filtering recurions
to complete the succeeding algorithm step. Once the optimal estimates of the parameters
in equation (4.10) are obtained through our self-calibrating process after certain number of
algorithm steps, we can back out, via equations (4.11)-(4.13), the optimal parameter esti-
mates of the proposed model with the original specifications in equation (4.5).

The filtering algorithms are implemented on the process Xt under the 1-,2-, and 3-state set-
tings driven by both HOHMC and HMC. Estimated parameters evolve as shown in Figures
4.4-4.7. The parameter evolutions under the 2-state HOHMM (Figures 4.4-4.5) are differ-
ent from those under the 3-state HOHMM (Figures 4.6-4.7). Nonetheless, there is gradual
convergence to certain values after approximately 55 passes for each sequence of param-
eter estimates. Whilst the parameters’ initial values might affect the convergence speed,
as long as they do not deviate substantially from the benchmark values under the 1-state
setting in section 4.5.2.1, the evolution of estimates for each parameter will eventually ap-
proach a specific value. Although not plotted here, parameter estimates under the HMMs
exhibit similar realisations. It is worth mentioning that with HOHMM, the movements of
the parameter estimates under the 1- and 2-state HOHMM settings exhibit analogous dy-
namic patterns and converge to similar optimal values; whilst the evolution of parameter
estimates under 3-state HOHMM setting is distinct. It indicates that a 2-state HOHMM
might be adequate to capture the dynamics of DESP and reflect the economic and market
information. This is further supported by the analysis of model performance in subsection
4.5.3.

We quantify the variability of the parameter estimates by considering the variance of the
estimators using the Fisher information I(Ew ). This is given by

" 2 #
Ew
I (E ) = −E
w
log L (X; E ) .
w
∂E2w
This provides a bound on the asymptotic variance of the MLEs. The ML estimator is
consistent and has an asymptotically normal sampling distribution; see [36]. We utilise the
limiting distribution of the MLE, Ebw , to get the 95% confidence interval. For a generic
4.5. Numerical application 123

1.0

α1
α2
α

0.6
0.2

10 20 30 40 50 60 70

Algorithm steps
-1

θ1
-2

θ2
θ

-3
-4
-5

10 20 30 40 50 60 70

Algorithm steps
1.2

ξ1
ξ2
ξ

0.8
0.4

10 20 30 40 50 60 70

Algorithm steps

Figure 4.4: Evolution of parameter estimates for α, θ, and ξ under a 2-state HOHMM
124 Chapter 4. An HOHMM for electricity spot-price dynamics

0.70
0.60

μ β1
μβ

μ β2
0.50

10 20 30 40 50 60 70

Algorithm steps
0.004

σ β1
σ β2
0.002
σβ

0.000

10 20 30 40 50 60 70

Algorithm steps

Figure 4.5: Evolution of parameter estimates for µβ , and σβ under a 2-state HOHMM
4.5. Numerical application 125

0.7 α1
α2
0.6
α3
0.5
α

0.4
0.3

10 20 30 40 50 60 70

Algorithm steps

θ1
-0.8

θ2
θ3
-1.2
θ

-1.6

10 20 30 40 50 60 70

Algorithm steps
0.8

ξ1
0.6

ξ2
ξ

ξ3
0.4
0.2

10 20 30 40 50 60 70

Algorithm steps

Figure 4.6: Evolution of parameter estimates for α, θ, and ξ under a 3-state HOHMM
126 Chapter 4. An HOHMM for electricity spot-price dynamics

0.70

μ β1
0.60

μ β2
μβ

μ β3
0.50

10 20 30 40 50 60 70

Algorithm steps
0.004

σ β1
σ β2
σ β3
0.002
σβ

0.000

10 20 30 40 50 60 70

Algorithm steps

Figure 4.7: Evolution of parameter estimates for µβ , and σβ under a 3-state HOHMM
4.5. Numerical application 127

 w ± 1.96 √ 1 w . Derivation of I ( w ) makes use of the the log-


(scalar) estimate  w , this is b
 )
I (b
likelihood functions, and this is delegated in the Appendix D. The results are summarise
below.

Abktsr   B bt   κt qB bt
ptsr = 2 , I ϑt = 2 , I b k
µβt = k
,

I b b
ptsr %t %t2
   2
Cb
k
t
X 2
k + 2µβt qC b
k
t
(X k ) + µ βt q B
bt
k
κt = ,

I b
%t 2

Bbt 3    
%t = − 2k + 4 Cbkt Xk+1 2
+ κt2 Cbkt Xk2

I b
%t %
 t  2 
+ Bk ϑt + κt µβt p + 2ϑt κt µβt q − σβt κt q
b t 2 2 2

 
+ Cbkt (Xk ) 2µβt qκt2 + 2ϑt κt
  !
− 2ϑt + 2κt µβt p Ck (Xk+1 ) − 2κt Ck (Xk+1 Xk ) ,
b t b t

  1 3    
σβt = −
I b + Cbt X 2 + κ2 Cbt X 2
%2t κt2 q %4t κt2 q k k+1 t k k
 
bt ϑ2 + κt µβ p 2 + 2ϑt κt µβ q − σ2 κ2 q
 
+B k t t t βt t
 
+ Cbkt (Xk ) 2µβt qκt2 + 2ϑt κt
  !
− 2ϑt + 2κt µβt p Cbk (Xk+1 ) − 2κt Cbk (Xk+1 Xk ) .
t t

Our recursive filtering is an adaptive approach and it has the mechanism for optimal es-
timates to be iteratively produced. As the estimates obtained manifest reasonably well
convergence and stability properties the SEs become smaller as we go further down the
algorithm steps. Moreover, Table 4.3 evinces that we have robust parameter estimates as
implied by the substantially narrow ranges of all SEs throughout the algorithm passes.

4.5.3 Discussion of model performance

4.5.3.1 Forecasting and error analysis

We shall generate and assess the one-step ahead forecasts for Xt and DESP under the
HOHMM-with-jump settings. To do these, we first compute the expected value of the
128 Chapter 4. An HOHMM for electricity spot-price dynamics

HMM
Parameter 1-state model 2-state model 3-state model
Estimates Bound of SE
Lower Upper Lower Upper Lower Upper
κi
b 1.0834 × 10−6 0.3113 2.4645 × 10−8 0.1574 4.45092 × 10−7 0.3113
ϑi
b 8.8910 × 10 −7
0.2568 2.0532 × 10 −8
0.2127 3.2596 × 10 −7
0.2568
%i
b 9.4071 × 10−8 0.0254 3.1251 × 10−9 0.1377 4.1097 × 10−8 0.0254
µβi
b 1.0480 × 10 −6
0.3162 2.2175 × 10 −9
0.4088 4.0512 × 10 −7
0.1253
σ βi
b 6.7328 × 10−8 0.0171 3.1251 × 10−9 0.0346 2.6396 × 10−8 0.0170
−7 −8 −7
p ji
b 6.3839 × 10 0.2063 3.2075 × 10 0.1785 6.3839 × 10 1.6160
HOHMM
Parameter 1-state model 2-state model 3-state model
Estimates Bound of SE
Lower Upper Lower Upper Lower Upper
κt
b 1.0834 × 10−6 0.3113 7.4602 × 10−7 0.3827 2.2600 × 10−6 0.3858
ϑt
b 8.8910 × 10 −7
0.2568 6.3506 × 10 −7
0.2766 1.6974 × 10 −6
0.2676
%t
b 9.4071 × 10−8 0.0254 1.0576 × 10−7 0.4424 1.3078 × 10−7 0.0221
µβt
b 1.0480 × 10 −6
0.3162 8.3156 × 10 −8
0.3357 2.0243 × 10 −6
0.3265
σβt
b 6.7328 × 10−8 0.0171 7.4834 × 10−8 0.0243 9.1683 × 10−8 0.0150
−7 −9 −9
qtsr
b 6.3839 × 10 0.2063 3.7525 × 10 1.2956 4.3980 × 10 0.8859

Table 4.3: Interval of standard errors for parameter estimates under the 1-, 2-, 3-state
HMMs and HOHMMs
4.5. Numerical application 129

stochastic component in (4.10). This gives


"
| Xk ] =E κ ywk Xk + ϑ ywk + % ywk zk+1
  
E [Xk+1
G4t
Xk
#
−α(yw
+ )(4tk −τm )
βm (ywk ) Xk

e k

m=1

= κ,b
yk Xk + ϑ,b

w
ywk



w D E
+ q κ,b
yk µβ ,b
ywk . (4.41)

Equation (4.41) brings about one-step ahead forecasts under the non-regime-switching
model, and 2-state and 3-state settings driven by HOHMMs. Figure 4.8 presents the graphs
showing the movements of the one-step ahead predictions for Xk and DESP under the 3-
state HOHMM setting. A magnified view of the predicted DESP is depicted in Figure 4.9
covering a one-year period. The forecasts follow closely the actual dynamics of DESP dur-
ing ‘normal’ periods. Models with regime-switching features outperform the 1-state model
during periods with jump occurrences, and HOHMMs are more accurate than the HMMs
in predicting spikes in DESP.

Visually, the DESP’s dynamics and spikes are captured very reasonably by our self-calibrating
algorithm and filtering processes under the regime-switching setting. But, more formally,
we wish to quantify the difference, in terms of accuracy and fitting performance, between
the HOHMMs and HMMs. To assess the goodness of fit, as a by product of the pre-
dictability performance, an error analysis following the criteria in Erlwein et al. [7] is
undertaken. We shall rely on the following error metrics: mean-squared error (MSE), root-
mean-squared error (RMSE), absolute mean error (MAE), relative absolute error (RAE),
and median absolute percent error (MdAPE). They are calculated as follows:

2 s 2
Pm  b Pm  b
k=1 Xk − Xk k=1 Xk − Xk
MSE = , RMSE = ,
m m
Pm b Pm b
| Xk − Xk | |Xk − Xk |
MAE = k=1 , RAE = Pk=1 ,
m m
k=1 | X
b k − X̄|

bk − Xk
X
and MdAPE = Md ,
Xk

where Xk denotes the actual value of the observation at time k; X


bk stands for its correspond-
ing prediction; X̄ is the mean of all Xk ’s; and m = 1460, which is the total number of
CAD/MWh CAD/MWh
130

0
100
200
300
400
500
600
-3
-2
-1
0
1
2
3
01/02/2011 01/02/2011

03/02/2011 03/02/2011

05/02/2011 05/02/2011

07/02/2011 07/02/2011

09/02/2011 09/02/2011

11/02/2011 11/02/2011

01/02/2012 01/02/2012

03/02/2012 03/02/2012

05/02/2012 05/02/2012

07/02/2012 07/02/2012

09/02/2012 09/02/2012

11/02/2012 11/02/2012

01/02/2013 01/02/2013

Time
Time
03/02/2013 03/02/2013

05/02/2013 05/02/2013

07/02/2013 07/02/2013

09/02/2013 09/02/2013

11/02/2013 11/02/2013

01/02/2014 01/02/2014

03/02/2014 03/02/2014

05/02/2014 05/02/2014

07/02/2014 07/02/2014
actual daily spot price

09/02/2014 09/02/2014
forecast daily spot price
actual deseasonalized observations

11/02/2014 11/02/2014
forecast of deseasonalized observations

Figure 4.8: One-step ahead forecasts for Xk and S k under a 3-state HOHMM
Chapter 4. An HOHMM for electricity spot-price dynamics
4.5. Numerical application 131

actual daily spot price


600 forecast daily spot price in 1-state HOHMM
forecast daily spot price in 2-state HOHMM
forecast daily spot price in 3-state HOHMM
500 forecast daily spot price in 2-state HMM
forecast daily spot price in 3-state HMM
CAD/MWh

400

300

200

100

0
01/01/2012

02/01/2012

03/01/2012

04/01/2012

05/01/2012

06/01/2012

07/01/2012

08/01/2012

09/01/2012

10/01/2012

11/01/2012

12/01/2012

01/01/2013
Time

Figure 4.9: Comparison of one-step ahead forecasts in 1-, 2-, 3-state HMMs and HOHMMs

predicted values.

The results of our error analyses are presented in Table 4.4; they bespeak that a non-regime-
switching model is less adequate than a regime-switching model in describing the dynamics
of our data. Under the HMM setting, the 3-state framework outperforms the 1- and 2-state
frameworks; whilst under the HOHMM setting, the 2-state model is the best in predicting
the stochastic component and DESP. Furthermore, the HOHMM beats the regular HMM
with respect to the four metrics; this finding agrees with the output revealed in Figure 4.9.

The 2-state HOHMM has the best over-all prediction amongst all state settings. The 4-state
HMMs and HOHMMs were also tested but no significant improvement is achieved. Ad-
ditionally, the computational cost of using HMMs and HOHMMs with state dimensions
higher than 4 outweighs the benefits. Results of the t-test for the RMSEs’ mean differ-
ences under various modelling set ups are displayed in Table 4.5. The adjusted p-values
in our pairwise comparison were computed via the Bonferroni’s method using the R func-
tion ‘p.adjust’; this addresses the issue caused by familywise errors. The p-values for the
comparison of 2-state versus 3-state HOHMMs are large; so, we cannot reject the null
hypothesis of no RMSE-mean differences at a significance level of 5%. This agrees with
Figures 4.8 and 4.9, where regimes 2 and 3 behave similarly. Thus, introducing a third
state will generate minimal gain (if any). Error metrics for the 2-state and 3-state settings
132 Chapter 4. An HOHMM for electricity spot-price dynamics

HMM setting Deseasonalised component Xt Electricity daily spot price S t


1-state 2-state 3-state 1-state 2-state 3-state
MSE 0.4152 0.2870 0.2828 5454.2 2840.2 2769.0
RMSE 0.6443 0.5357 0.5318 73.852 53.293 52.621
MdAPE 46.712 42.440 40.724 37.534 32.332 31.675
MAE 0.4828 0.4062 0.4016 36.738 27.883 27.259
RAE 0.7317 0.6156 0.6086 0.6445 0.4892 0.4782
HOHMM setting Deseasonalised component Xt Electricity daily spot price S t
1-state 2-state 3-state 1-state 2-state 3-state
MSE 0.4152 0.2542 0.2768 5454.2 1906.9 1996.6
RMSE 0.6443 0.5042 0.5261 73.852 43.668 44.683
MdAPE 46.712 39.156 40.289 37.534 31.104 36.238
MAE 0.4828 0.3825 0.3938 36.738 24.253 24.373
RAE 0.7317 0.5797 0.5967 0.6445 0.4255 0.4276

Table 4.4: Error analysis of the HMM- and HOHMM-based models for DSP

Setting HMM HOHMM


1-state 1-state 2-state 1-state 1-state 2-state
t-test vs vs vs vs vs vs
2-state 3-state 3-state 2-state 3-state 3-state
p-value 9.2771 × 10−12 2.8685 × 10−15 0.7771 1.0775 × 10−12 1.3966 × 10−15 0.8502

Table 4.5: Bonferroni-corrected p-values for the t-test performed on the RMSEs involving
the DSP

in Table 4.4 show very close results. However, comparison tests demonstrate that the 1-
state and 2-state settings’ error metric values are statistically different, and the same can be
said concerning those for the 1-state and 3-state settings. Additionally, Figure 4.9 and Ta-
ble 4.4 lend support to such an outcome, which clearly suggests the merit of incorporating
regime-switching features in the model for our data.

4.5.3.2 Selection of suitable model setting

We use an information selection criterion that emphasises the trade-off between bias and
variance for our HMM and HOHMM settings. Similar to Erlwein et al. [7], and Xi and
4.5. Numerical application 133

Mamon [38]), we consider the Akaike Information Criterion (AIC), to provide a measure
that balances the relative goodness of fit and complexity of various model settings. The
AIC is given by

AIC = − 2 log L (X; Ew ) + 2l, (4.42)

where l is the number of model parameters to be estimated, and log L (X; Ew ) is the log-
likelihood function associated with the model. For the observation process Xk+1 , the corre-
sponding log-likelihood under the HOHMM setting is

  B X
X N
log L Xk+1 ; κ, ϑ, %, µβ , σ2β = hywk , et i×
k=1 t=1
   − 1 
 log 2π %2 ywk  + σ2β qκ2 ywk  2


        2 
Xk+1 − ϑ ywk − κ ywk Xk − µβ ywk qκ ywk 
−        , (4.43)
2 % yk + σβ yk qκ yk
2 w 2 w 2 w

where B is the number of observed values and N is the number of states. The number of
parameters to be estimated depends upon N, and this is summarised in Table 4.6. As ex-
pected, there is a significant increase in the number of parameters to be estimated as the
number of states grows in a model.

Given the form of the AIC in equation (4.42), the selection principle hinges on minimising
the AIC. From Table 4.7, the 1-state model has the highest AIC values. Clearly, the two-
regime HOHMM-OU-jump setting outperforms all other settings given our data as model
inputs.

Model 1-state 2-state 3-state N-state


HMM 5 12 21 N 2 + 4N
HOHMM 5 14 33 N 3 − N 2 + 5N

Table 4.6: Number of estimated parameters under HMM-OU-jump and HOHMM-OU-


jump settings
134 Chapter 4. An HOHMM for electricity spot-price dynamics

Model Number of parameters AIC


1-state model 5 3005.51
2-state HMM 12 2877.48
3-state HMM 21 2821.99
2-state HOHMM 14 2657.84
3-state HOHMM 33 2897.54

Table 4.7: Comparison of selection criteria AIC

The AIC metric lacks the asymptotic-consistency property as pointed out by Bozdogan
[4]. This puts into question the AIC’s optimality as the best-model criterion (cf [33]) when
one is confronted by a collection of models with different dimensions and numbers of pa-
rameters. The Bayesian information criterion (BIC) is an alternative metric, which admits
the data set’s sample size on top of other inputs included in the AIC. As comprehensively
discussed in Kuha [21], if the two criteria agree on the best-model choice, robustness in
model selection is achieved. For this reason, we also adopt the BIC in evaluating the set of
candidate models to choose from.

The AICs in Table 4.7 come from a static estimation. We complement this with the gen-
eration of BIC values generated by our dynamic parameter estimation method. The BIC is
computed as

BIC = −l log B + 2 log L (X; E) . (4.44)

By setting B as the number of observations in each algorithm pass when employing equa-
tion (4.43), a series of BIC values are obtained as the data set is processed in its entirety.
Given the model choice-metric form in equation (4.44), the underlying principle of selec-
tion is to maximise the BIC function. Figure 4.10 depicts the evolution of the calculated
BIC values for the 1-, 2-, and 3-state models.

For the 1-state model, we get higher BIC values for almost all of the periods spanned by
the algorithm passes in the whole data set. We note, nonetheless, that for periods where
jumps occur, the BIC values markedly drop under the 1-state model. The drastic decline of
the BIC during DESP’s spike occurrences insinuates that this criterion is unable to sustain
robustness with one-state modelling. In contrast, the BIC values produced by the regime-
switching models have more stable patterns. Furthermore, the 2-state HOHMM even pro-
duces BIC values higher than those from the 1-state HOHMM/HMM for algorithm steps
4.5. Numerical application 135

-40
-80
BIC

-120
-160

0 10 20 30 40 50 60 70

Algorithm steps
1-state HMM/HOHMM 2-state HMM 3-state HMM
2-state HOHMM 3-state HOHMM

Figure 4.10: Evolution of BIC values under the 1-, 2-, 3-state HMM- and HOHMM-based
models

that encompass periods of jump events. This tells us that the 1-state model is not able to
capture the market well in situations where jumps could be present. On the other hand, the
2-state HOHMM can describe satisfactorily the dynamics of data with jump characteristics.
This is also supported by the results from our error and AIC-based analyses. Undoubtedly,
it is worthy to embed regime-switching capabilities, and in the case of our data set, the
2-state HOHMM setting is regarded as the most appropriate choice.

4.5.3.3 The valuation of expected future spot at delivery

As a consequence of electricity-market deregulation, power producers and wholesale buy-


ers turn to derivatives and recent financial-product innovations in an effort to manage their
risk exposure. Futures and forward written on electricity prices are commonly used for
hedging, which entails the pricing of these contracts. Forward/futures price modelling can
be categorised into two strands of study in the current literature. The first strand is to di-
rectly construct forward-price curves from market data (e.g., Fleten and Lemming [13]).
The second strand is to derive forward prices as the expected future spot prices at delivery
(EFSP) (e.g., Benth et al. [2]).

In this research, we first identify the best-fitting model for our DESP data. Then, optimal
estimates using our filtering-based calibration are fed into the chosen model for EFSP.
136 Chapter 4. An HOHMM for electricity spot-price dynamics

Regime κ
b ϑ
b %
b µβ
b σβ
b α
b θ
b ξ
b

State 1 0.7169 -0.4722 0.1281 0.5312 1.6664 × 10−4 0.3328 -1.6679 0.1499
State 2 0.6388 -0.5643 0.6473 0.6457 1.5808 × 10−4 0.4482 -1.5622 0.7965

Table 4.8: Optimal parameter estimates for the 2-state HOHMM

Let G(t, T ) denote the expected value of the spot price for delivery at future time T , and
assume that the current time is t. With EFSP as the underlying variable, the theoretical
forward price F(t, T ) of an electricity contract is computed as

F (t, T ) = λ (t, T ) + G (t, T ) , (4.45)

where λ (t, T ) is the price of the risk process. In our OU process with jumps, λ (t, T ) has
two contributing parts: (i) the price of risk due to switching of regimes and (ii) the premium
due to the market price of risk. Under the real-world probability measure P, the EFSP as a
function of HOHMMs is calculated as

G (t, T ) =E [S T | Xt ]

w w
 
=D (T ) exp Xt e−α(yt )(T −t) + 1 − e−α(yt )(T −t) θ(ywt )


 
ξ2 (ywt ) 1 − e−2α(yt )(T −t) 
w

+ 
2α(ywt )
" Z T !#
−α(yw
× E exp e t )(T −u)
dJu | Xt
s

w w
 
=D (T ) exp Xt e−α(yt )(T −t) + 1 − e−α(yt )(T −t) θ(ywt )


 
ξ2 (ywt ) 1 − e−2α(yt )(T −t) 
w

+ 
2α(yw ) t
Z T
w w
 
exp µβ ywt e−α(yt )(T −u) + σ2β ywt e−2α(yt )(T −t) ςdu
  
× exp 
s

− ς (T − t) .

(4.46)

Derivation details of (4.46) are similar to those given in Benth et al. [2]. The optimal pa-
rameter estimates in the evaluation of the EFSP in equation (4.46) are shown in Table 4.8.
4.5. Numerical application 137

Figure 4.11: Expected future spot price on delivery (EFSP) under a 2-state HOHMM

It is reasonable to assume a constant Markov chain when pricing EFSP over a short-term
delivery period. For a longer maturity of T > 30 days, EFSP has to be numerically com-
puted with a dynamic Markov chain involved, as recommended in Erlwein et al. [7].

The reasonableness of the “constant Markov-chain” assumption for contracts with short
delivery horizon remains disputable. In our implementation, we simulate dynamic states
through an HOHMC to obtain estimated values of the EFSP for a general delivery period.
Figure 4.11 shows the simulated EFSP with time t mapping the last 30 days of our dataset
for maturities T = 1, . . . , 30 days. Salient features of the data are replicated by our pro-
posed model such as the short-term fluctuations, seasonal patterns, and spikes of electricity
prices. Our pricing application is of practical significance, as forward contract valuation is
immediate once λ (t, T )’s estimate is available. Determing the appropriate λ (t, T ) is be-
yond the scope of this article. But, we certainly recognise its critical impact on the pricing
of electricity contracts. This warrants a separate investigation that could utilise our current
results and exposition serving the necessary groundwork.
138 Chapter 4. An HOHMM for electricity spot-price dynamics

4.6 Conclusion
A new framework, possessing regime-switching and data memory-capturing mechanisms,
is proposed for the modelling and forecasting of electricity spot prices. The spot price
is modelled as an exponential of an OU process, and augmented by a compound Poisson
term to deal with the mean-reverting, stochastic and spiked components. The exponential
part of the model is scaled by a suitable deterministic combination of trigonometric and
linear functions to account for the cycles and other type of attributes of certain price de-
terminants. The parameters in the exponential component are modulated by a higher-order
hidden Markov chain in discrete time, which drives the random switching between differ-
ent economic regimes.

The work of Erlwein et al. [7] appears to be the only online HMM filtering algorithms for
the analysis of electricity spot prices. An improvement of their approach was accomplished
in our model development. Empirical implementation based on our extended HOHMM fil-
ters confirmed that the predominant features of seasonality, mean reversion and extremely
large spikes in electricity prices are all captured quite well. The various insights from this
study reinforce the findings of Erlwein et al. [7]. We further showed that HOHMMs offer a
much better fit than those achieved by the usual HMMs. The 2-state HOHMM outperforms
other model settings for our data in accordance with an information-criterion evaluation.

This study showcased two key research contributions: (i) new filters that support self-
updating parameter estimation of the HOHMM-OU-jump model thereby enriching the lat-
est research of Xiong and Mamon [43] with the inclusion of spikes; and (ii) extension of
the Erlwein et al.’s model [7] by incorporating HOHMMs for electricity price modelling
geared towards derivative valuation and risk measurement. A direct and natural direction
of this work is further empirical test of the modelling framework and estimation being put
forward in the analysis of various contracts (more sophisticated than forwards) in the elec-
tricity market for investment and hedging. The HOHMM with lag order 2, which we fixed,
is the focus of this research; certainly, the implementation of the HOHMM filtering with
a higher lag order, and statistical inference for the estimation of the optimal lag are all
worthwhile future research endeavours.
References

[1] F. Benth, L. Ekeland, R. Hauge, B. Nielsen, A note on arbitrage-free pricing of for-


ward contracts in energy markets, Applied Mathematical Finance, 10(4) (2003) 325–
336.

[2] F. Benth, A. Cartea, R. Kiesel, Pricing forward contracts in power markets by the cer-
tainty equivalence principle: Explaining the sign of the market risk premium, Journal
of Banking and Finance, 32(10) (2008) 2006–2021.

[3] R. Bwidehattacharya, E. Waymire, Kolmogorov’s extension theorem and Brownian


motion, In: A Basic Course in Probability Theory, (eds.: S. Axler, K. Ribet), Springer,
New York (2007) 129–140.

[4] H. Bozdogan, Model selection and Akaikes information criterion: The general theory
and its analytical extensions, Psychometrika, 52(3) (1987) 345–370.

[5] CCNMatthews Newswire, AESO 10-year plan identifies potential $3.5 billion trans-
mission investment, Toronto, Marketwired L.P. (2007) 1.

[6] S. Deng, Pricing electricity derivatives under alternative stochastic spot price models,
Proceedings of the 33rd Annual Hawaii International Conference on System Sciences,
(2000) 10. doi:10.1109/HICSS.2000.926755.

[7] C. De Jong, The nature of power spikes: A regime-switching approach.


Studies in Nonlinear Dynamics and Econometrics 10(3) (2006). DOI:
https://doi.org/10.2202/1558-3708.1361.

[8] C. Erlwein, F. Benth, R. Mamon, HMM filtering and parameter estimation of an elec-
tricity spot price model. Energy Economics, 32(5) (2010) 1034–1043.

139
140 REFERENCES

[9] R. Elliott, H. Yang, Forward and backward equations for an adjoint process, In:
Stochastic Processes, (eds.: S. Cambanis, J. Ghosh, R. Karandikar, P. Sen), Springer,
New York, NY, (1992) 61–69

[10] R. Elliott, L. Aggoun, J. Moore, Hidden Markov Models: Estimation and Control,
Springer, New York, 1995.

[11] A. Escribano, J. Pena and P. Villaplana, Modelling electricity prices: International


evidence. Oxford Bulletin of Economics and Statistics, 73(5) (2011) 622–650.

[12] V. Fanelli, L. Maddalena, S. Musti, Modelling electricity futures prices using seasonal
path-dependent volatility, Applied Energy, 173 (2016), 92–102.

[13] S. Fleten, J. Lemming, Constructing forward price curves in electricity markets, En-
ergy Economics, 25(5) (2003) 409–424.

[14] H. Geman, A. Roncoroni, Understanding the fine structure of electricity prices, Jour-
nal of Business, 79(3) (2006) 1225–1261.

[15] A. Gonzalez, A. Roque, J. Garcia-Gönzalez, Modeling and forecasting electricity


prices with input-output hidden Markov models. IEEE Transactions on Power Sys-
tems, 20(1) (2005) 13–24.

[16] C. Granger, R. Joyeux, An introduction to long memory time series models and frac-
tional differencing, Journal of Time Series Analysis, 1 (1980) 49–64.

[17] F. Hanson, J. Westman, Stochastic analysis of jump-diffusions for financial log-return


processes, In: Stochastic Theory and Control. Lecture Notes in Control and Infor-
mation Sciences, (eds.: B. Pasik-Duncan), Springer, Berlin, Heidelberg, 280(2002)
169–83.

[18] R. Huisman, R. Mahieu, Regime jumps in electricity prices. Energy Economics, 25(5)
(2003) 425–434.

[19] S. Islyaev, P. Date, Electricity futures price models: Calibration and forecasting. Eu-
ropean Journal of Operational Research, 247(1) (2015) 144.

[20] M. Jabłonska, H. Nampala and T. Kauranne, The multiple-mean-reversion jump-


diffusion model for Nordic electricity spot prices. Journal of Energy Markets, 4(2)
(2011) 3–25.
REFERENCES 141

[21] J. Kuha, AIC and BIC: Comparisons of Assumptions and Performance, Sociological
Methods and Research, 33(2)(2004),188–229.

[22] L. Lee, F. Jean, High-order hidden Markov model for piecewise linear processes and
applications to speech recognition, Journal of the Acoustical Society of America,
140(2) (2016) EL204–EL210.

[23] J. Lucia, E. Schwartz, Electricity prices and power derivatives: Evidence from the
Nordic power exchange. Review of Derivatives Research, 5(1) (2002) 5–50.

[24] R. Mamon, R. Elliott, Hidden Markov Models in Finance, International Series in


Operations Research and Management Science, 104, Springer, New York, 2007.

[25] R. Mamon, R. Elliott, Hidden Markov Models in Finance: Further Developments and
Applications, International Series in Operations Research and Management Science,
209, Springer, New York, 2014.

[26] R. Mamon, C. Erlwein, B. Gopaluni, Adaptive signal processing of asset price dy-
namics with predictability analysis, Information Sciences, 178 (1) (2008) 203–219.

[27] Market Surveillance Administrator, Alberta wholesale electricity market, 2010


http://albertamsa.ca/uploads/pdf/Reports/Reports/Alberta
%20Wholesale%20Electricity%20Market%20Report%20092910.pdf
(accessed in June 2017).

[28] W. Megginson, J. Netter, From state to market: A survey of empirical studies on


privatization. Journal of Economic Literature, 39(2) (2001) 321–389.

[29] R. Merton, Option pricing when underlying stock returns are discontinuous. Journal
of Financial Economics, 3(1) (1976) 125–144.

[30] V. Reisen, Estimation of the fractional difference parameter in the ARIMA( p, d, q)


model using the smoothed periodogram, Journal of Time Series Analysis, 15 (1994)
335–350.

[31] E. Schwartz, The stochastic behavior of commodity prices: Implications for valuation
and hedging, Journal of Finance, 52(3)(1997) 923–973.

[32] E. Schwartz, J. Smith, Short-term variations and long-term dynamics in commodity


prices, Management Science, 46(7) (2000) 893–911.
142 REFERENCES

[33] G. Schwarz, Estimating the dimension of a model, Annals of Statistics, 6(1978) 461–
464.

[34] J. Seifert, M. Uhrig-Homburg, Modelling jumps in electricity prices: Theory and


empirical evidence. Review of Derivatives Research, 10(1) (2007) 59–85.

[35] T. Siu, W. Ching, E. Fung, M. Ng, X. Li, A high-order markov-switching model for
risk measurement, Computers and Mathematics with Applications, 58 (1) (2009) 1–
10.

[36] A. van der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge and
New York, 1998.

[37] O. Wu, T. Liu, B. Huang, F. Forbes, Predicting electricity pool prices using hidden
Markov models, Ifac-Papersonline, 28(8) (2015), 343–348.

[38] X. Xi, R. Mamon, Parameter estimation of an asset price model driven by a weak
hidden markov chain, Economic Modelling, 28(1) (2011) 36–46.

[39] X. Xi, R. Mamon, Yield curve modelling using a multivariate higher-order HMM,
In: State-Space Models and Applications in Economics and Finance (eds.: Y. Zeng,
and Y. Wu), Springers Series in Statistics and Econometrics for Finance 1, (2013)
185–203.

[40] X. Xi, R. Mamon, Capturing the regime-switching and memory properties of interest
rates, Computational Economics, 44(3) (2014) 307–337.

[41] X. Xi, R. Mamon, Parameter estimation in a WHMM setting with independent and
volatility components, In: Hidden Markov Models in Finance: Volume II (Further
Developments and Applications) (eds.: R. Mamon, R. Elliott), Springer (2014) 227-
240.

[42] X. Xi, R. Mamon, M. Davison, A higher-order hidden Markov chain-modulated


model for asset allocation, Journal of Mathematical Modelling and Algorithms in
Operations Research, 13(1) (2014) 59–85.

[43] H. Xiong, R. Mamon, A self-updating model driven by a higher-order hidden Markov


chain for temperature dynamics, Journal of Computational Science, 17 (2016) 47–61.

[44] H. Xiong, R. Mamon, Putting a price tag on temperature. Computational Management


Science, in press (2017). DOI: 10.1007/s10287-017-0291-8
REFERENCES 143

[45] L. Yousef, Derivation and Empirical Testing of Alternative Pricing Models in Al-
berta’s Electricity Market, ProQuest Dissertations Publishing, 2002.

[46] W. Yu, G. Sheblé, Modeling electricity markets with hidden markov model, Electric
Power Systems Research, 76(6)(2006) 445–451.

[47] F. Ziel, R. Steinert, S. Husmann, Efficient modeling and forecasting of electricity spot
prices, Energy Economics, 47(2015), 98–111.
Chapter 5

Modelling and forecasting futures-prices


curves in the Fish Pool market

5.1 Introduction

Commodity futures markets provide a mechanism for investors to hedge a price risk or pos-
sibly profit from price changes at the futures contract’s maturity. These markets, in essence,
offer social and economic benefits through the efficient allocation of resources and making
some form of insurance accessible to businesses. The analysis of commodity futures typi-
cally covers the energy and agriculture sectors with the intertwined objectives of modelling
price formation and risk management. The literature on futures contracts dealing with the
needs of the aquaculture and fisheries industries remain rather scanty, owing to the chal-
lenges in establishing efficient markets in these two sectors.

We note that frozen shrimp was the very first commodity from the seafood industry to be
traded in the Chicago Mercantile Exchange back in the 1960s. Unfortunately, it was a
thin market and only lasted for 3 years; see [19]. Three decades later, the Minneapolis
Grain Exchange introduced the first exchange-traded shrimp futures contract in 1994 to
help market participants hedge their risk exposure to shrimp prices. Alas, the market was
also terminated in 2000 due to low trading volumes and lack of interest, as per the account
of Quagrainie and Engle [24].

The aquaculture-based futures markets in North America were short-lived and apparently
failed unluckily. In contrast, the Fish Pool ASA, located in Norway, has been successful

144
5.1. Introduction 145

since its launch in 2005. Furthermore, it has a strong connection with the rapidly expand-
ing farmed salmon industry over the course of several decades. Primarily owned by Oslo
Børs ASA, the Fish Pool has become a global exchange serving a venue to mitigate price
risk and an instrument to reckon in sustaining the fish and seafood markets. It operates as
a regulated marketplace for trading financial derivatives with the salmon price as the un-
derlying variable. Fish Pool is a very active exchange in the trading of salmon derivatives.
As reported in [23], the volume of salmon traded in the futures exchange has increased
dramatically from approximately 3, 600 tonnes in 2006 to 90, 445 tonnes in 2016 with a
record cash value of 5.8 billion Norwegian Krone (NOK).

The Fish Pool plays a critical role in the pricing and management of fresh farmed salmon.
The creation and trading of salmon forward and futures contracts bring financial and eco-
nomic improvements in terms of increasing the producers’ profitability, enhancing market
efficiency, and facilitating risk hedging. It is thereby vital to model the dynamics of the
process that underlies the value of the futures contract. In this chapter, we put forward
a regime-switching model that is capable of accurately describing the joint dynamics of
salmon futures prices. Our modelling is geared towards the valuation other pertinent finan-
cial derivatives on salmon, hedging against volatile salmon price swings, and optimising
harvesting and investment strategies of seafood resources.

A number of models for salmon prices are deterministic simply to keep the financial-
modelling framework analytically tractable; see for example, Cacho [3], and Guttorm-
sen [15]. In our case, we deal with, using stochastic processes, the uncertainty of market
prices. Although Solibakke [27] built a stochastic volatility model for the Fish Pool mar-
ket, their framework only considered the time-varying dynamics of front months contracts
only, rather than the entire term structure of the the futures prices. Ewald [11] derived
explicit formulae for prices of fish futures and call options under the assumption that the
stock level of salmon follows a stochastic logistic growth. The assumption in [11] is some-
what problematic because such governing dynamics in the pricing framework refer to wild
catches of fresh salmon from an open-access fishery source and not catches via fish farm-
ing. According to the UN’s Food and Agriculture Organisation (FAO) [18], the farmed-fish
industry has surpassed the wild-fish sector for a few decades now in terms of production
volume. Moreover, as a widely traded commodity, farmed salmon, and not wild ones, form
the foundation of the Fish Pool’s market structure.
146 Chapter 5. Modelling and forecasting salmon futures-prices curves

Certain studies in the literature feature the construction of salmon futures pricing frame-
work starting with the spot-price evolution; then this is followed by estimating the spot-
price model using a Kalman-filtering method (e.g., Schwartz [26]). Ankamah-Yeboah et
al. [23] presented the price-formation development in the salmon/aquaculture futures mar-
ket via a risk-premium model, whereby spot prices lead to price discovery of futures con-
tracts. Asche et al [1] studied the spot-forward relationship by treating futures price with
maturities up to six months as an unbiased estimator of spot prices. The popular two-factor
commodity model for spot prices, with a stochastic convenience yield explicitly included,
is widely applied to examine salmon futures prices and aquaculture-farming considera-
tions; see details in Ewald et al. [13] and [12]. Alternatively, research investigations would
also focus directly on futures prices when performing the analysis of the Fish Pool salmon
market.

Of particular interest about the Fish Pool is that the salmon’s spot prices and futures prices
are observed weekly and daily, respectively. Thus, spot prices missing on certain days
could be recovered from futures prices collected at a daily frequency. However, if the daily
futures-price modelling begins with a spot price description, generating spot prices with
daily frequency (from data with weekly frequency) is necessary, and this presents an added
difficulty as noted in [1]. Another Fish Pool’s striking characteristic is that on every trading
day, closing futures prices, corresponding to various maturities that varies from months to
years, are available. But then, as we move forward in time, the remaining time to maturity
of the contract also decreases; and futures prices corresponding to the remaining time to
maturity are available as well. This complication, in the context of multivariate modelling,
creates a heavy computational challenge when capturing the term structure of futures prices
at the Fish Pool whilst ensuring that every futures price in the entire data set is processed,
without overlap (i.e., taken as input only once), in the estimation of model parameters.
Such a complication identified above is certainly a gap in the current literature, which we
shall address in Section 5.4 of this chapter. The usual approach so far to circumvent this
issue in model estimation is simply to average actual maturities; see Ankamah-Yeboah et
al. [23], Ewald et al. [13] and Ewald et al. [12].

Our research is distinguished from the aforementioned works by the following contribu-
tions: (i) Instead of using the classical two-factor model, we construct a Markovian regime-
switching model to capture the memory and stochasticity of salmon futures prices. (ii) This
chapter is the first to pin down the modelling of the term structure of salmon futures prices
5.1. Introduction 147

without assuming approximate maturities of the contracts in the estimation procedure. (iii)
We directly investigate the dynamic behaviour of salmon futures prices, thereby avoid-
ing the use of weekly spot prices. (iv) As an alternative to the Kalman filtering method,
we put forward a self-calibrating estimation method based on a regime-switching model
modulated by a higher-order Markov chain, which is capable of modelling empirical dis-
tributions of virtually all shapes.

Markov regime-switching models capture the dynamics of price evolution more accurately
and provide better forecasting performance as per the findings in Erlwein et al. [10], Date et
al. [5], and Xiong and Mamon [38]. In the context of a Markovian regime-switching mech-
anism, hidden Markov models (HMMs) have been widely utilised encompassing many ap-
plications in engineering, and the natural and social sciences; for examples of applications
concentrating in the fields of economics, insurance and finance, see Mamon and Elliott
[20, 21]. HMMs drive the model’s regime-switching attribute via Markov chains with hid-
den states; although the assumption of first-order state transition dependency in HMMs
does not take advantage of information from historical past that may be beneficially useful.
This gives rise to the concept of higher-order hidden Markov model (HOHMM), which
is a doubly stochastic process in that there is a stochastic model for the observation pro-
cess and for which the parameters are modulated by a higher-order hidden Markov chain
(HOHMC). An ultimate aim is to obtain the best estimate of the latent Markov chain’s cur-
rent or future state. In Xi and Mamon [32], it was shown that the HOHMM setting yields
better forecasting performance in modelling the risky assets log returns. Xiong and Mamon
[37] found that HOHMM-based models could capture better the empirical characteristics
of Toronto’s daily average temperature compared to the capability of HMM-based models.

Our investigation reveals that Date et al. [5] is by far the only paper that utilised the HMM-
filtering algorithms to model and forecast commodity futures prices. In [5], model cali-
bration uses heating-oil futures price data; results showed that Markov regime-switching
models outperformed a one-regime model. This work can be deemed as an extension and
updated version of [5], but with more advancements in various aspects of futures-price
modelling described as follows: (i) Model parameters in this chapter are modulated by an
HOHMM, which has a memory-capturing “configuration” that may enable the extraction of
possibly useful information from past data. (ii) HOHMM filtering algorithms, under a mul-
tivariate setting, are devised for estimation. (iii) The multi-step ahead forecasts are analysed
to evaluate the prediction performance of all proposed models under both the HMM and
148 Chapter 5. Modelling and forecasting salmon futures-prices curves

HOHMM settings. (iv) Finally, and most importantly, we design a crafty moving-window
scheme for our data-filtering algorithm passes; this new data-processing scheme covers the
whole life cycle of all futures contracts under study, and ensures that no data point is missed
out or entered more than once into the filtering equations.

We start with the modelling of futures prices under a one-state set up, assuming that the
spot price process is lognormally distributed. A finite-state HOHMM is then embedded
into the framework, regulating the stochastic switching amongst regimes. Our best esti-
mate (in the sense of an expected value given a history of past and current information) of a
regime reflects a state or level generated by the interactions of competing market forces that
affect the futures prices. Filters for the HOHMMs are established by first transforming the
HOHMMs into HMMs, and then usual techniques are applied to obtain optimal parameter
estimates of the HOHMM-based models.

The remainder of this chapter is structured as follows. Section 5.2 presents the formula-
tion of the multi-dimensional model, the parameters of which are driven by a discrete-time
HOHMC. In Section 5.3, adaptive filters for the states of the HOHMC and relevant quan-
tities are derived through a change of reference probability technique. We then draw up
a self-calibrating parameter-estimation scheme; this takes into account the recursive filters
via the Expectation-Maximisation (EM) algorithm. Numerical application of our proposed
models is conducted on a data set of daily salmon future prices collected from the Fish
Pool in Section 5.4. Both proposed model’s goodness of fit and prediction performance are
assessed. Some concluding remarks are given in Section 5.5.

5.2 Model setup


Following Ross [25] and Schwartz [26], we begin with the assumption that the commodity
price evolves under a risk-neutral probability measure Q. Let Pt denote the spot price of
salmon at time t, and suppose Pt behaves in accordance with the stochastic differential
equation (SDE)

dPt = α(θ − log Pt )Pt dt + ξPt dWtQ , (5.1)

where α, θ and ξ are positive constants, and WtQ is a standard Brownian motion under Q.
5.2. Model setup 149

Write Xt := log Pt . By Itô’s lemma, the log-spot price Xt is a mean-reverting stochastic


process with stochastic dynamics

dXt = α(µ − Xt )dt + ξdWtQ , (5.2)

where α measures the speed of mean reversion moving to the long-run mean level of log
ξ2
price, µ = θ − 2α
, and ξ denotes the volatility of the process. Employing the Brownian-
motion and Itô-isometry properties, the conditional distribution of X under Q at time T ,
where T > t, is normal with respective mean and variance given by
 
EQ [XT | Ft ] = Xt e−α(T −t) + 1 − e−α(T −t) µ, (5.3)

h i
ξ2 1 − e−2α(T −t)
and VarQ [XT | Ft ] = , (5.4)

where Ft is an WtQ -generated filtration.

From Manoliu and Tompaidis [17], the price Ft at time t of a futures contract with maturity
T is the expected price of the underlying commodity at time T under Q. That is,
h i
Ft = EQ [PT | Ft ] = EQ eXT | Ft .

Using the results from equations (5.3) and (5.4), the log-expected value of the spot price is

Gt = log Ft
1
=EQ [XT | Ft ] + VarQ [XT | Ft ]
2  
  ξ2 1 − e−2α(T −t)
=Xt e−α(T −t) + 1 − e−α(T −t) µ + . (5.5)

As noted in Schwartz [26] and Weron [31], it is reasonable to assume that the dynamics of
salmon spot prices follow a mean-reverting stochastic process under an objective measure,
and to introduce the price of risk λt . We shall be working under the objective (or real-world
probability) measure P when implementing a filtering-based estimation using observed
market data in Sections 5.3 and 5.4. To facilitate the estimation, a change of measure
will be employed; this is independent from the concept of risk-neutral measure involved
in valuation. The construction of λt , connecting Q and P, in pricing temperature-based
150 Chapter 5. Modelling and forecasting salmon futures-prices curves

derivatives is elaborated in Xiong and Mamon [38]. Elliott et al. [8] proposed a modified
version of the Esscher transform that takes into account λt when implementing a discrete-
time regime-switching Gaussian model and its continuous-time version. Research findings
in [8] and [38] form the basis in our development of a regime-switching model for salmon
futures prices under the arbitrage-free assumption. In particular, the log-spot price Xt under
P evolves as

dXt = α(µP − Xt )dt + ξdWt , (5.6)


λt ξ
where Wt is a Wiener process under P, and µP = µ + α
. By Itô’s lemma, along with
equations (5.5), the dynamics of (5.6) under P are
!
1 2 −2α(T −t)
dGt = λt ξe −α(T −t)
− ξ e dt + ξe−α(T −t) dWt . (5.7)
2

Salmon futures prices are assumed available at each time tg , g ∈ Z+ with maturities
T 1 , T 2 , . . . , T g . Let Fthh be the price of a futures contract at time th with maturity T h for
h = 1, . . . , g. Using the Euler method and invoking the results in Date et al. [5], the
g-dimensional discretisation of equation (5.7) is
1  1 !
−αh 4tk+1
h h −αh (T h −tk+1
h
) λ − ξ e ( h k+1 ) 1 + e
h T −th h 4th
 
Ghk+1 =Ghk + h 1−e ξ e h h −α −α k+1
α k
4
s
h h
−αh (T h −tk+1
h
) 1 − e−2α 4tk+1 h
+ ξe zk+1 , (5.8)
2αh
h
where 4tk+1 = tk+1
h
− tkh for k ∈ Z+0 , and {zhk+1 } are sequences of independent and identically
distributed (IID) N (0, 1) random variables.

Remark 1: In the succeeding discussion, all vectors are denoted by bold lowercase let-
ters and all matrices are represented by bold capitalised letters.

To incorporate the impact of changes in market conditions and economic regimes on fu-
tures prices, we embed an HOHMM following the formulation in Xiong and Mamon [37];
morever, the modelling framework is extended to a multi-dimensional set up. Let yk be a
discrete-time homogeneous Markov chain with a finite-state space in RN under a probabil-
ity space (Ω, F , P). We associate the state space with the canonical basis of RN , which is
5.2. Model setup 151

{e1 , e2 , . . . , eN }, where ei := (0, . . . , 0, 1, 0, . . . , 0)> ∈ RN with 1 in the ith position; also, >
stands for the transpose of a vector and N is the state-space dimension. To equip the model
with a regime-switching capability, the g-dimensional process in equation (5.8) will have
parameters that are governed by yk so that

1  −αh (yk )4tk+1


ξh (yk )e−α (yk )(Th −tk+1 ) λh (yk )
h h h

Ghk+1 =Ghk + 1 − e
α (yk )
h

1 h !
−αh (yk )(T h −tk+1
h
) −αh (yk )4tk+1
h

− ξ (yk )e 1+e
4
s
h h
−αh (yk )(T h −tk+1
h
) 1 − e−2α (yk )4tk+1 h
+ ξ (yk )e
h
zk+1 . (5.9)
2αh (yk )

Taking advantage of the algebraic simplification due to yk ’s state-space representation, we


have αh (yk ) = hαh , yk i, λh (yk ) = hλh , yk i, and ξh (yk ) = hξh , yk i in equation (5.9), where
h·, ·i is the inner product in RN . The IID random variables {zhk+1 } are independent of the
higher-order Markov chain yk .

To facilitate the conceptual discussion of the filtering processes, we focus on a second-


order Markov chain ywk to construct and implement the multi-dimensional HOHMM filter-
ing method. Concentrating on a second-order ywk as a prototype on this modelling breaks
down the complexity in discerning the sequence and structure of algorithms in implement-
ing a generalised HOHMC yk with a lag order k, k ∈ N+ . A second-order ywk is a Markov
chain at the current step k that depends on information available at prior two steps k − 1 and
2
k − 2. As in Siu et al. [28], we define an RN×N transition probability matrix P, given as
 
 p111 p112 · · · p11N · · · p1N1 p1N2 · · · p1NN 
 
 p211 p212 · · · p21N · · · p2N1 p2N2 · · · p2NN 
P =  . .. .. .. .. .. .. ..  ,
 .. . . . ··· . . . . 
 
pN11 · · · pN1N · · · pNN1 pNN2 · · · pNNN

where pdcb := P(ywk+1 = ed |ywk = ec , ywk−1 = eb ) with k ≥ 1 and d, c, b ∈ {1, 2, . . . , N}. The
quantity pdcb is interpreted as the probability that the Markov chain will be in state d given
that currently it is in state c and was in state b immediately prior to being in the present
state. The observation process in equation (5.9) can be expressed as

Ghk+1 = Ghk + νh (ywk ) + σh (ywk )zhk+1 , (5.10)


152 Chapter 5. Modelling and forecasting salmon futures-prices curves

where
1  −αh (yw
ξh (ywk )e−α (yk )(Th −tk+1 )
h h w h

νh (ywk ) = 1 − e k
)4tk+1
α (yk )
h w

1 h w −αh (yw )(Th −th )  h (yw )4th


!
× λ (yk ) − ξ (yk )e
h w k k+1 1+e −α k k+1 = hνh (k), ywk i, (5.11)
4
 >
where νh (k) = ν1h (k), ν2h (k), . . . , νhN (k) ∈ RN , and
s
h w h
h w −α (yk )(T h −tk+1 )
h w h 1 − e−2α (yk )4tk+1
σ (yk ) =ξ (yk )e
h w
= hσh (k), ywk i, (5.12)
2αh (ywk )
 >
where σh (k) = σh1 (k), σh2 (k), . . . , σhN (k) ∈ RN . (5.13)

The stochastic basis (Ω, F , {Fk }, P) serves as the modelling background and supports the
stochastic processes considered in our framework. The global filtration Fk is defined as
Fk := Fkw ∨ Fkz , where Fkw and Fkz are filtrations generated by ywk and Wt (a P-Wiener
process), respectively.

It is important to note that ywk is latent rather than directly observable from the salmon
futures market. In particular, the state of the HOHMM is hidden in the noisy observation
process Ghk+1 , which is evolving under P. Following the ideas common in papers [32]-[37],
a transformation that converts ywk into a regular Markov chain is employed, after which the
usual HMM-filtering estimations apply. Consider the mapping η(eb , ec ) := ebc , for 1 ≤
b, c ≤ N, where ebc is an RN unit vector with 1 in its ((b − 1)N + c)th position. The
2

semi-martingale representation of the new Markov chain η(ywk+1 , ywk ) is then given by

η(ywk+1 , ywk ) = Bη(ywk , ywk−1 ) +  wk+1 , (5.14)

2 ×N 2
where { wk+1 }k≥1 is a martingale increment, and B is a RN probability transition matrix
with entries b ji := pdcb for j = (d − 1)N + c, i = (c − 1)N + b, and b ji = 0 otherwise.

5.3 Filtering and parameter estimation


We rely on Elliott et al. [6] in adopting an approach that introduces an equivalent probabil-
ity measure P
e under which the observation process is a sequence of IID random variables.
5.3. Filtering and parameter estimation 153

The ensuing calculations are facilitated by the IID assumption. So, filters are computed un-
der P,
e and then they are related back to P with the justification via the Girsanov’s theorem.

Remark 2: The reference measure P e for filtering is not the same as the equivalent risk-
neutral measure Q used in pricing. The purpose of having a P e is to circumvent the direct
calculations under P, which necessitates hard semi-martingale computations.

Under our modelling framework, the suitable Radon-Nikodým derivative of P with respect
to P
e is constructed as
g Y k
dP Y
Ψk := = ϕhl , k ≥ 1, Ψ0 = 1,
dP
e Fk h=1 l=1

 −1  
φ σ (yl−1 )
h w
Gl − Gl−1 − ν (yl−1 )
h h h w

ϕhl =  
σh (ywl−1 )φ Ghl
  
1  1  h 2  h w −2  h 2 
= h w exp  Gl − σ (yl−1 ) × Gl − Ghl−1 − νh (ywl−1 ) ,
σ (y )
l−1 2
where φ is the probability density function of an N (0, 1) random variable, and Fk is the
filtration generated by the observation process Gk . Every parameter driving the multivari-
ate observation process in equation (5.10) is modulated by the same ywk . Even though the
correlation of salmon futures prices is not explicitly modelled, it is implicitly encapsulated
in the underlying higher-order Markov chain regulating to G0k s multivariate dynamics with
possibly dependent variates.

The general principle of obtaining optimal estimates of pertinent quantities under P for a
multi-dimensional HOHMM setting is first to construct filters, which are conditional expec-
e involving functions of η(yw , yw ). Then, the filters
tations under the reference measure P k k−1
under P
e will be used to recover the model’s parameter estimates under P.

Write qk = (qk (11) , . . . , qk (cb) , . . . , qk (NN))> ∈ RN×N , where

qk (cb) := P ywk = ec , ywk−1 = eb | Fk = E hη(ywk , ywk−1 ), ecb i | Fk .


  
(5.15)

By the Bayes’ theorem for conditional expectations, a filter of η(ywk , ywk−1 ) under P can be
computed as
 EP [Ψk η(ywk , ywk−1 )|Fk ]
e
qk = E η(yk , yk−1 ) | Fk = .
 w w
(5.16)
EPe[Ψk |Fk ]
154 Chapter 5. Modelling and forecasting salmon futures-prices curves

h i
Let sk := EP Ψk η(ywk , ywk−1 )|Fk . Since c,b=1 hη(ywk , ywk−1 ), ecb i = hη(ywk , ywk−1 ), 1i = 1, where
e PN

1 := (1, . . . , 1)> ∈ RN , we have


2

N
X N D
X E
hsk , ecb i = EP Ψk η(yk , ywk−1 )|Fk , ecb
e 
c,b=1 c,b=1
 N

 X 
= EP Ψk η(ywk , ywk−1 ), ecb | Fk 
e 

c,b=1

= E [Ψk | Fk ] .
P
(5.17)
e

Plugging in the result from equation (5.17) into equation (5.16), together with the construc-
tion of sk , yields
sk sk
qk = PN = . (5.18)
c,b=1 hsk , ecb i
hsk , 1i

As with Xi and Mamon [32], we also define the following relevant quantities:
k
X
Aktsr = hywl−2 , er ihywl−1 , e s ihywl , et i, (5.19)
l=2
k
X
Bkt = hywl−1 , et i = Bk−1
t
+ hywk−1 , et i, (5.20)
l=2
k
X
Bkts = hywl−1 , et ihywl−2 , e s i, (5.21)
l=2
   X k   
Ckt f Ghk = f (Ghl )hywl−1 , et i = Ck−1
t
f Ghk−1 + f (Ghk )hywk−1 , et i, (5.22)
l=2

 
where r, s, t = 1, . . . , N, 2 ≤ l ≤ k, 1 ≤ h ≤ g, and f has the form f Gh = Gh ,
 
or f Gh = (Gh )2 . Equations (5.19), (5.20) and (5.21) refer, respectively, to the Markov
chain’s number of jumps from state (er , e s ) to et , amount of time spent in state et , and oc-
cupation time in state (et , e s ) up to time k. The auxiliary process Ckt ( f ) in equation (5.22)
is the level sum for the states et .

We define two N 2 ×N 2 matrices C and D (similar to Xiong and Mamon [37]) for deriving the
filters of η(ywk+1 , ywk ). Recursive filtering relations for ζk+1
   
w Ak+1
tsr η(yw , yw ) , ζ w
k+1 k k+1 Bk+1 η(yk+1 , yk ) ,
t w w
   
ζk+1
w Bk+1
ts η(yw , yw ) , and ζ w
k+1 k k+1 C t ( f )η(yw , yw ) are obtained with the aid of equations
k+1 k+1 k
(5.14) and (5.18).
5.3. Filtering and parameter estimation 155

Proposition 5.3.1 Let Ck+1 be an diagonal matrix


 1 
 ck+1 0 ··· 0 
 0 . . . 0
 .. 
. 
 ..
 
 . N ... 
0 ck+1 
..
 
Ck+1 =  .  ,
..

..

. .

c1k+1 0
 

 . .
 .. .. 0

 0 

N
0 ··· · · · 0 ck+1

where
g  
Y 1  1  h 2  h w −2
cik+1 = exp Gk+1 − σi (yk )
σ h (yw )

h=1 k 2

 
h w 2 

× Gk+1 − Gk − νi (yk ) ,
h h

for k ≥ 0 and 1 ≤ b, c ≤ N. Then

sk+1 =BCk+1 sk (5.23)


ζk+1
w
Ak+1
tsr
η(ywk+1 , ywk ) =BCk+1 ζkw Aktsr η(ywk , ywk−1 )
 

+ hsk , e sr ictk+1 hBe sr , ets iets (5.24)


ζk+1
w
Bk+1
t
η(ywk+1 , ywk ) =BCk+1 ζkw Bk+1 t
η(ywk , ywk−1 )
 

+ Dt ctk+1 Bsk (5.25)


ζk+1
w
Bk+1
ts
η(ywk+1 , ywk ) =BCk+1 ζkw Bk+1
ts
η(ywk , ywk−1 )
 

+ hsk , ets ictk+1 Bets (5.26)


   
ζk+1
w
Ck+1
t
( f h )η(ywk+1 , ywk ) =BCk+1 ζkw Ck+1 t
( f h )η(ywk , ywk−1 ) + f (Ghk+1 )Dt ctk+1 Bsk , (5.27)

where Dt , 1 ≤ t ≤ N is an N 2 × N 2 matrix with eit on its ((i − 1) N + t)th column and 0


elsewhere.

Proof The derivations are analogous to the proofs provided in Xi and Mamon [32], or
Xiong and Mamon [37].

Remark 3: Expressions for our recursive filters (5.23)-(5.27) in matrix notation are more
compact and ‘neater’ than those presented in Erlwein and Mamon [10], engendering a
156 Chapter 5. Modelling and forecasting salmon futures-prices curves

more efficient evaluation using software with intensive matrix-manipulation capability. Ad-
ditionally, these results are extensions of the filtering equations derived by Date et al. [5]
for commodity futures prices under a regular HMM framework.

To estimate the optimal parameters, we adopt the Expectation-Maximisation (EM) algo-


rithm for the multi-dimensional HOHMM framework. The estimates of νh (ywk ), σh (ywk ),
and the transition probability P are functions of the recursive filters in Proposition 5.3.1.

Proposition 5.3.2 For a g-dimensional observation process Ghk+1 in equation (5.10), k ≥ 0,


1 ≤ h ≤ g, the EM estimates of its parameters are given by
 
Abk+1tsr ζk+1
w
Ak+1 tsr
η(ywk+1 , ywk )
ptsr =
b =   , ∀pairs (t, s) , t , s, (5.28)
B bsr
k+1
ζ w
k+1 B sr
k+1 η(y w
,
k+1 ky w
)
 
Cb t ζ w
k+1 C t
k+1 ( f h
)η(y w
k+1 k, y w
)
νt h = k+1 =
b   , (5.29)
B b t
k+1
ζk+1 Bk+1 η(yk+1 , ywk )
w t w
 
 ζ w C t ( f h ) 2 η(yw , yw )
   
k+1 k+1 k+1 k 2 f ζ
h w
t k+1 C t
k+1 ( f h
)η(y w
,
k+1 k y w
)
σht =


b   −  
ζk+1
w
Bk+1t
η(ywk+1 , ywk ) ζk+1
w
Bk+1
t
η(ywk+1 , ywk )
 2  1
fth ζk+1w
Bk+1t
η(ywk+1 , ywk )  2
+    . (5.30)
ζk+1
w
Bk+1 t
η(ywk+1 , ywk )

Proof See Appendix E.

Once the quantities in Proposition 5.3.1 are all determined, model parameter estimates are
immediate from Proposition 5.3.2.

5.4 Numerical application

5.4.1 Fish pool exchange and data description


Our numerical implementation utilises data from the Fish Pool ASA exchange market. The
Fish Pool contracts, mainly composed of futures and options, are settled financially rather
than settled by physical delivery of salmon. All contracts traded at the Fish Pool ASA are
settled on a monthly basis with the Fish Pool Index (FPI) as a reference price reflecting
the actual spot price of fresh Atlantic salmon. FPI is a weighted average (cf [14]) of three
5.4. Numerical application 157

indices (weights given in parenthesis): Nasdaq Salmon Index (85%), Fish Pool European
Buyers Index (10%), and Statistics Norway customs statistics (5%). To avoid issues con-
cerning deliverable grades, the FPI makes use of the weighted average of the most traded
weight categories: 3-4 kg, 4-5 kg, 4-6 kg with contributions of 30%, 40%, and 30%, re-
spectively, to the averaging procedure.

Spot prices are accessible at the Fish Pool on a weekly basis for immediate delivery. On
the other hand, forward prices are available on a daily basis, reflecting the expectations
of market participants for future trading. The interest rate is assumed deterministic rather
than stochastic because forward prices in the Fish Pool are equivalent to futures prices for
numerical application. To the best of our knowledge, research studies in the literature on
pricing salmon futures would transform daily futures prices into weekly or monthly fre-
quency. This is evident from the futures price modelling formulation of Ankamah-Yeboah
et al. [23], Asche et al. [1], and Ewald et al. [13], which contains weekly spot prices.

Remark 4: Our approach differs significantly from the current methodology on salmon
futures price modelling. Instead of averaging futures prices to generate a proxy for the
weekly spot prices as the common practice in various papers, we directly use daily futures
price observations for modelling and forecasting the dynamics of futures prices in the near
or medium-term horizons.

We customise a filtering method tailored to our proposed multivariate model; this is then
implemented on data concerning futures contracts with maturities up to 6 months. These
are short-term contracts that are more frequently traded than others in the salmon market;
see Ankamah-Yeboah et al. [23] and Ewald et al. [13]. We consider a data set of daily
log-return series of futures prices collected by the Fish Pool; this data set comprises 1515
data points. The 12 maturity dates are denoted by T h , where h = 1, . . . , 12, and they
are set to be the last business day in the months of January 2016, February 2016 , . . . ,
December 2016. When the filtering procedure is carried out on the multivariate data, a
moving window is going over time th until the entire trajectory of futures price curves is
exhaustively processed. Futures, with maturities from 1 to 6 months, are traded on any
business date between the starting th and ending th as shown in Table 5.1.
158 Chapter 5. Modelling and forecasting salmon futures-prices curves

h-dimension Starting th Ending th Maturity T h


1 03Aug2015 28Jan2016 29Jan2016
2 01Sept2015 26Feb2016 29Feb2016
3 01Oct2015 30Mar2016 31Mar2016
4 02Nov2015 28Apr2016 29Apr2016
5 01Dec2015 30May2016 31May2016
6 01Jan2016 29Jun2016 30Jun2016
7 01Feb2016 28Jul2016 29Jul2016
8 01Mar2016 30Aug2016 31Aug2016
9 01Apr2016 29Sept2016 30Sept2016
10 02May2016 28Oct2016 31Oct2016
11 01Jun2016 29Nov2016 30Nov2016
12 01Jul2016 29Dec2016 30Dec2016

Table 5.1: Illustrating the data periods of future contracts (maturities of 1–6 months) cov-
ered by the moving window in the filtering procedure

To further elucidate the relevance of the information in Table 5.1, consider futures contracts
with expiry data T 1 (29 Jan 2016) and refer to Table 5.2.

Futures contract Time to maturity Trading date







 T 1 − t11 , t11 : 03Aug2015


T − t2 ,
1 1
t21 : 04Aug2015




6 − month  .. ..
. .







T 1 − t1 ,

 1

21 t21 : 31Aug2015




 T 1 − t221
, t22 1
: 01Sept2015
.. ..



5 − month 

 . .

T 1 − t 1 ,

 1

43 t43 : 30Sept2015
..
.




 T 1 − t108
1
, t108 1
: 04Jan2016
.. ..



1 − month  
 . .

T 1 − t1 ,

 1

126 t126 : 28Jan2016

Table 5.2: Futures contracts with maturity up to 6 months and expiration on 29 Jan 2016
5.4. Numerical application 159

If a market participant enters into a futures contract on any business day between t11 (03 Aug
1
2015) and t21 (31 Aug 2015), then he owns has a 6-month futures contract with maturity
date T 1 ; In this case, the contract could have between 151 to 180 days until expiration.
A similar reasoning can be made when owning 5-month, . . . , 1-month contracts until
the expiration T 1 is reached. Our algorithm, with moving window going through time
points th , is applied to futures contracts with maturity dates T 2 , T 3 , . . . , T 12 . This proposed
data-reorganisation scheme ensures that for times to maturity T h − tkh , 1 ≤ k ≤ 126 (k
expressed as number of trading days), we have 12 data points corresponding to each tk .
More importantly, this scheme will create a multivariate time series without missing or
causing a double entry of any data point.

Remark 5: A novelty of this work arises from realistically treating the remaining time to
maturity T h − tkh to be varying. This gives the benefit of using all available raw information
without any additional data transformation in the process of modelling and parameter es-
timation.

Table 5.3 presents the descriptive statistics of our data set. Low volatility as well as skew-
ness and kurtosis of low magnitude are observed for the log-futures price data.

Maturity T h Mean Sd Dev Min Max Skew Kurtosis


29Jan2016 3.8814 0.0868 3.7796 4.0993 0.8528 -0.2463
29Feb2016 3.8987 0.0844 3.7773 4.0535 0.1215 -1.2253
31Mar2016 3.9511 0.1012 3.8022 4.1431 0.2419 -1.0862
29Apr2016 3.9572 0.0978 3.8308 4.1026 0.2493 -1.5814
31May2016 3.9883 0.1090 3.8351 4.2047 0.1825 -1.2799
30Jun2016 3.9951 0.1227 3.8199 4.2413 0.4316 -0.9328
29Jul2016 4.0411 0.1498 3.8133 4.3470 0.5050 -0.7739
31Aug2016 4.0108 0.0727 3.8712 4.1109 -0.4151 -1.3396
30Sept2016 3.9949 0.0535 3.8918 4.0673 -0.7087 -0.5837
31Oct2016 4.0309 0.0503 3.9551 4.1636 1.2693 0.8055
30Nov2016 4.1498 0.0584 4.0518 4.287 0.6630 -0.3971
30Dec2016 4.2321 0.0705 4.1431 4.3307 0.0789 -1.7328

Table 5.3: Descriptive statistics for log-futures prices (maturities of 1 – 6 months)


160 Chapter 5. Modelling and forecasting salmon futures-prices curves

5.4.2 Implementation of filters and estimation


As pointed out in Xiong and Mamon [37], the autoregressive fractionally integrated moving
average (ARFIMA) method could be utilised to verify presence of memory (in the sense
of current data value’s dependence on previous data points). For our data set, fractional-
differencing parameter u is assessed against these benchmarks: u = 0 implies the time
series data exhibits short memory, and 0 < u < 0.5 implies there is finite long memory in
the time series data. We use the R function ‘fdSperio’, which is suitable for non-stationary
u = 0.0633 in-
processes, in examining the memory property of our data set. We obtained b
dicating short memory. This rationalises our use of the 2nd-order HOHMM set-up, which
is sufficient to tackle short memory.

Coming up with the appropriate initial parameter values in the implementation of our self-
calibrating estimation could make use of the least-square method suggested in Erlwein and
Mamon [10] or the likelihood maximisation mentioned in Date et al. [5]. In our case,
we adopt the latter method. Considering Gt as a one-state process, the maximiser of the
associated log-likelihood function, given a series of observations, is given by
m
 
 Y  1 (Gk+1 − Gk − ν)2 
argmax log L (Gk+1 ; ν, σ) =  log √ − , (5.31)
k=1 2πσ 2σ2

where m = 120 in this numerical application. For a g-dimensional setting, the log-likelihood
to be maximised must simultaneously take into account all futures prices spanning matu-
rities T h , 1 ≤ h ≤ g, where g = 12. The aim is to find maximisers νh and σh for equation
(5.10). Given that {zhk+1 } is a sequence of IID N(0, 1), we must solve the optimisation prob-
lem
 2 
g m Ghk+1 − Ghk − νh 

 Y Y  1
argmax log L (Gk+1 ; ν, σ) =  log √ − . (5.32)
σ h 2
h

h=1 k=1 2πσ 2

Invoking Date et al. [5], it is reasonable to assume that the speed of mean reversion is fixed
and independent of the Markov chain when modelling the log-futures prices of a commod-
ity. In fact, we validated that the estimates of α in our model remains relatively stable using
prices of the short-term futures. A fixed α is justified and provides some simplification
in the computation. We emphasise that both λ and ξ are still governed by yw , and their
estimated values are required to update ν(yw ) and σ(yw ) in equation (5.10). We address
5.4. Numerical application 161

problem (5.32) using both R functions ‘nlm’ and ‘optim’, generating initial parameter esti-
α = 0.9864, b
mates b λ = 4.2257, and b
ξ = 0.1855; these estimates are used as benchmarks in
selecting starting values for the multi-regime HOHMMs.

Each variate in our multivariate HOHMM has the same number of regimes and every vari-
ate’s switching is driven by the same HOHMC’s transition matrix P. The time increment
∆t = 1 trading day in the processing of our data, which is divided into 24 batches. Each
batch, for one algorithm, has 5 vectors (i.e., 5 tkh for all T h , h = 1, 2, . . . , 12), and it gives a
window size covering an entire week of trading. One week of multivariate data is deemed
sufficient to accumulate new information that could impact the futures market, such as sub-
stantial changes in supply and demand, climactic conditions (e.g., sea-surface temperature,
water currents, etc), economic and political events, amongst other market factors. Thus,
our algorithm formulation updates quantities that are functions of HOHMM every trading
week through the online filters (5.23)-(5.27).

The filtered estimates are then fed into equations (5.28)-(5.30) to get new parameter esti-
mates. The estimates from the most recent algorithm pass serve as initial parameter values
for the next algorithm pass via the recursive filtering equations. We experimented on var-
ious combinations of different initial parameter values and other filtering windows. It is
found that, for majority of the times during the filtering process, convergence of parameter
estimates could be achieved without any iteration failures. When this is true, window sizes
and starting values only impact the convergence speed, but they do not substantially pro-
duce different results.

Figures 5.1 and 5.2 depict the movement of the entries for the transition probability matrix
P under the 2-state and 3-state settings, respectively. The jumps in the evolution of proba-
bilities reflect possible market-state changes and price fluctuations. Under both 2-state and
3-state settings, noticeable changes occur from the 11th to 15th algorithm pass; and after
approximately 19 algorithm passes, stability of transition probability evolution is attained.
Evidence of regime switching is also detected in the evolution of other parameter estimates
for the process Gt in equation (5.10).

When estimates of ν and σ are generated and with the the value of α computed from solving
the aforementioned optimisation in (5.32), the evolution of the estimates for λ and ξ can
be obtained through equations (5.11) and (5.12). Figures 5.3 and 5.4 show the evolution of
162 Chapter 5. Modelling and forecasting salmon futures-prices curves

Transition probabilities
p111

0.8
p211
p112
p212
p121
0.4

p221
p122
p222
0.0

5 10 15 20

Algorithm steps

Figure 5.1: Evolution of transition probabilities under a 2-state HOHMM for futures prices
with expiry T h .

parameters, under both 2- and 3-state HOHMM set ups, using prices from contracts with
expiration on the last maturity date T 12 . The small magnitude of volatilities ξ and σ is
consistent with the descriptive statistics of the data portrayed in Table 5.3. Figures 5.1 and
5.2 illustrate that the behaviour of parameter estimates is relatively stable after a period of
dramatic switches. Also, the general downward pattern of parameter dynamics is a shared
characteristic of Figures 5.3 and 5.4.

5.4.3 Model performance and selection

For a comprehensive evaluation of model performance, we also implement our filtering al-
gorithm on 2- and 3-state HMM frameworks, which are two special cases of an HOHMM
with lag order 1. It is a common post-diagnostic check to compare the forecasting perfor-
mance of a proposed model with other benchmarked models, such as the random walk and
ARCH-type models; see Hardy [16]. However, these typical benchmarks are incompatible
with the HMM and HOHMM frameworks. This is because we directly model multivariate
futures prices and incorporate memory property of the market into the setting, whereas ran-
dom walk and ARCH models deal with univariate spot prices. We also rely on Mamon et
al. [22] whereby it was found that ARCH and GARCH models are unable to beat the reg-
ular HMMs’ performance with respect to capturing short- and medium-term predictability
of a data series. To find the ‘best-performing’ model that captures the main features of the
5.4. Numerical application 163

p111
Transition probabilities

0.8 p211
p311
p112
p212
0.4

p313
p113
p213
p313
0.0

5 10 15 20

Algorithm steps

p121
Transition probabilities

p221
0.8

p321
p122
p222
0.4

p323
p123
p223
p323
0.0

5 10 15 20

Algorithm steps

p131
Transition probabilities

p231
0.8

p331
p132
p232
0.4

p333
p133
p233
p333
0.0

5 10 15 20

Algorithm steps

Figure 5.2: Evolution of transition probabilities under a 3-state HOHMM for prices of
futures with expiry T h .
164 Chapter 5. Modelling and forecasting salmon futures-prices curves

0.5
ξ1
0.3 ξ2
ξ

0.1

5 10 15 20

Algorithm steps
12

λ1
λ2
8
λ

4
0

5 10 15 20

Algorithm steps

ν1
ν2
0.006
ν

0.000

5 10 15 20

Algorithm steps

σ1
0.014

σ2
σ

0.006

5 10 15 20

Algorithm steps

Figure 5.3: Evolution of parameter estimates ξ, λ, ν, and σ under a 2-state HOHMM for
prices of futures with expiry 30 Dec 2016.
5.4. Numerical application 165

0.5
ξ1
ξ2
0.3 ξ3
ξ

0.1

5 10 15 20

Algorithm steps

λ1
λ2
5 10

λ3
λ

5 10 15 20

Algorithm steps

ν1
0.006

ν2
ν3
ν

0.000

5 10 15 20

Algorithm steps
0.015

σ1
σ2
σ3
σ

0.005

5 10 15 20

Algorithm steps

Figure 5.4: Evolution of parameter estimates ξ, λ, ν, and σ under a 3-state HOHMM for
prices of futures with expiry 30 Dec 2016.
166 Chapter 5. Modelling and forecasting salmon futures-prices curves

salmon futures prices, we evaluate the prediction of Gt under the 1-, 2-, and 3-state HMMs
and HOHMMs. The one-step ahead forecasts in Date et al. [5] are extended; additionally,
the d-step ahead predictions for all proposed models are assessed.

Proposition 5.4.1 Given information up to time k, the ‘best’ estimate of the d-step ahead
forecast of the g-dimensional observation process Ghk+1 = log Fk+1 h
is
  2 
h i Yd X N  σhj 
E Fk+d | Fk = Fk
h h
hB ql , e ji i exp ν j +
l−1  .
 h
(5.33)
l=1 j,i=1
 2 

Proof See Appendix F.

Figure 5.5 depicts the one-step ahead prediction, taking d = 1 in equation (5.33), for salmon
futures prices under the 3-state HOHMM setting with maturity dates of 28 Oct 2016, 29
Nov 2016 and 29 Dec 2016. We can observe that the one-step ahead forecasts are quite
close to the actual market data. Although not shown here, similar patterns can be found
for the one-step ahead forecasts under the HMM setting as well. The forecasts, with very
short term, from our proposed models follow closely the actual prices at the Fish Pool. The
trends and dynamics of salmon futures prices, from visual inspection, are captured well by
our filtering algorithms and estimation procedure.

A formal way of quantifying forecasts’ quality is through an error analysis, i.e, examining
the goodness of fit for all the proposed HMMs and HOHMMs. Following the criteria in
Erlwein et al. [7] and Date et al. [5, 4], we evaluate the root-mean-squared error (RMSE),
absolute-mean error (AME), relative-absolute error (RAE), and mean-absolute-percent er-
ror (MAPE) of the proposed and competing multi-dimensional models. We extend further
the assessment of the error metrics to the case of d-step ahead forecasts.

The results of error analyses for d = 1, 2, . . . , 5, i.e., spanning the entire next week’s trading
days, are presented in Table 5.4. For all d, the 3-state model outperforms other HOHMM-
state settings. Except for the 1-step ahead forecasting case, where the 2-state HMM yields
better fit than those obtained from the 1-and 3-state HMMs, the HOHMM with 3 states is
adjudged better than any HMM set up in terms of goodness of fit. Compared to the HMM
settings, the HOHMM settings produce smaller errors in one-step ahead predictions, whilst
no pronounced difference is detected as the forecasting horizons become longer. The 4-
state HMM and HOHMM settings, which were examined as well, effect only negligible
improvements. Therefore, there is no practical motivation to pursue an HOHMM with a
5.4. Numerical application 167

d-step ahead HMM setting RMSE MAPE AME RAE


1-state 0.5104 0.5195 0.2962 0.0498
1 2-state 0.5036 0.4926 0.2859 0.0481
3-state 0.5055 0.5051 0.2876 0.0484
1-state 0.8062 0.9130 0.5181 0.0878
2 2-state 0.7852 0.8718 0.4941 0.0840
3-state 0.7842 0.8654 0.4909 0.0834
1-state 1.0685 1.2607 0.7104 0.1213
3 2-state 1.0388 1.2053 0.6791 0.1165
3-state 1.0387 1.2021 0.6776 0.1162
1-state 1.3048 1.5741 0.8804 0.1514
4 2-state 1.2655 1.5013 0.8397 0.1453
3-state 1.2650 1.4903 0.8338 0.1443
1-state 1.5298 1.8622 1.0334 0.1788
5 2-state 1.4782 1.7672 0.9812 0.1713
3-state 1.4779 1.7537 0.9738 0.1701
d-step ahead HOHMM setting RMSE MAPE AME RAE
1-state 0.5104 0.5195 0.2962 0.0498
1 2-state 0.5029 0.4964 0.2824 0.0476
3-state 0.5021 0.4926 0.2804 0.0472
1-state 0.8062 0.9130 0.5181 0.0878
2 2-state 0.7851 0.8131 0.4614 0.0787
3-state 0.7847 0.8114 0.4605 0.0785
1-state 1.0685 1.2607 0.7104 0.1213
3 2-state 1.0389 1.1290 0.6372 0.1100
3-state 1.0386 1.1276 0.6364 0.1099
1-state 1.3048 1.5741 0.8804 0.1514
4 2-state 1.2653 1.4192 0.7959 0.1392
3-state 1.2651 1.4188 0.7956 0.1391
1-state 1.5298 1.8622 1.0334 0.1788
5 2-state 1.4778 1.7024 0.9480 0.1678
3-state 1.4777 1.7024 0.9480 0.1678

Table 5.4: Error analysis of HMM- and HOHMM-based models under 1-, 2-, and 3-state
settings for salmon futures prices
168 Chapter 5. Modelling and forecasting salmon futures-prices curves

Setting HMM HOHMM


1-state 1-state 2-state 1-state 1-state 2-state
t-test vs vs vs vs vs vs
2-state 3-state 3-state 2-state 3-state 3-state
p-value 4.3026 × 10−7 4.3225 × 10−4 0.8564 2.5657 × 10−5 1.6071 × 10−8 1.0000

Table 5.5: Bonferroni-corrected p-values for the paired t-test performed on the RMSEs
involving salmon futures prices

number of states greater than 3. In fact, it is also prohibitive to consider a large number of
states since the size of the transition probability entries becomes unwieldy even with just
one regime addition; see Figures 5.1 and 5.2. Notwithstanding the small fitting errors seen
in general for the 1-, 2- and 3-state HMM and HOHMM settings, HMM and HOHMM
with a regime-switching feature clearly outperform the special case, 1-state model, across
all forecasting metrics. This indicates the indisputable benefit of incorporating regime-
switching and memory-capturing capabilities into models. Overall, the 3-state HOHMM is
the best modelling framework in accurately describing the dynamics of our salmon futures-
price data set.

The statistical significance of the error-mean differences in each pairwise setting of our
proposed models is evaluated by a t-test based on a 95% confidence level. The adjusted
p-values are computed with the Bonferroni’s approach to control the family wise error rate.
Table 5.5 displays the estimated Bonferroni-corrected p-values for the three pairs of model
settings.

For the 2-state HMM versus 3-state HMM and the 2-state HMM versus 3-state HOHMM
models, the p-values are larger than 0.05. Therefore, the null hypothesis of no significant
difference cannot be rejected. This tells us that the 2-state and 3-state HMM settings (also
2-state and 3-state HOHMM settings) have similar capability in capturing the dynamics
of the observation process although Table 5.4 shows the 3-state is slightly better than the
2-state setting. For the 1-state versus 2-state and 1-state versus 3-state setting under HMM
(also, 1-state versus 2-state and 1-state versus 3-state setting under HOHMM ) the p-values
are appreciably smaller than 0.01. So, the RMSE differences between the HMM-based
switching and no-switching models (also, between the HOHMM-based switching and no-
switching models) are statistically significant.
5.4. Numerical application 169

Model 1-state 2-state 3-state N-state


HMM 2 6 12 N 2 + 2N
HOHMM 2 8 24 N 3 − N 2 + 2N

Table 5.6: Number of estimated parameters under HMM and HOHMM settings for salmon
futures prices

The results of our error analyses will be reinforced by information-criterion assessments in


identifying the ‘best-performing’ model. Information-criteria assessment techniques aim to
provide a balance between model’s goodness of fit and complexity (i.e., increased number
of parameters). To put it simply, the information-criterion metrics provide a comprehensive
trade-off between bias and variance of our HMM and HOHMM settings. Here, we make
use of the Akaike information criterion (AIC), the small-sample-size corrected version of
AIC (AICc), and the Bayesian information criterion (BIC).

The AIC emphasises the penalty for increasing the number of parameters. The AICc is
the AIC with a correction for a relatively small size of model parameters by increasing the
penalty for model complexity. The BIC strengthens the AIC and AICc by selecting the
‘best’ model from a set of candidate models with a different penalty for the addition of
parameters. The AIC, AICc and BIC are computed as

AIC = −2 log L (Θ) + 2l, (5.34)


2l (l + 1)
AICc = −2 log L (Θ) + 2l + , (5.35)
m−l−1
BIC = −2 log L (Θ) + 2l log m, (5.36)

where l is the number of model parameters to be estimated as summarised in Table 5.6. In


(5.36), m is the number of data points, and log L (Θ) is the log-likelihood function associ-
ated with the model given by
   2 
Gk+1 − Gk − ν ywk 
g X
B X
N
 
X  1 
log L (Θ) = hywk , et i log  √


   −   2  (5.37)
h=1 k=1 t=1 2πσ ywk
h
2 σ y
h w
k

with B being the number of observations in each algorithm pass. The model that gives the
smallest information-criterion assessment value obtained using equations (5.34)– (5.36)
is preferred. We compute the values of AIC, AICc and BIC for the 1-, 2- and 3-state
HMMs and HOHMMs after each algorithm step. The evolution of the calculated values of
these information-criterion metrics is plotted in Figure 5.6. We observe that the the 1-state
170 Chapter 5. Modelling and forecasting salmon futures-prices curves

model generates the largest values and the most volatile patterns throughout the entire data
period. Even though the 3-state HOHMM produces higher values than those from other
regime-switching models at the earlier part of the algorithm pass, it generally has stable
patterns and smaller AIC/AICc/BIC values thereafter. These findings fortify the results of
the above-mentioned error analyses, selecting the 3-state HOHMM as the best model for
the data set studied in this chapter.

Probing the models’ forecasting performance entails as well running the model on data
set covering the out-of-sample period, i.e., the data period not used in model estimation.
The empirical evidence from the in-sample performance is commonly less reliable than the
those generated from analysing forecasts in the out-of-sample period, which better reflects
the information available in ‘real time’; see White [30], and Stock and Watson [29]. As
the d−step ahead prediction is forward-looking and does not overlap with the period en-
compassing the data used for the filtering and estimation of parameters, our approach is
distinctly consistent with the out-of-sample forecasting.

This section on numerical demonstration culminates with an appraisal of all model settings
on their one-step ahead prediction ability using price data of futures with maturities after
30 Dec 2016. Specifically, this out-of-sample forecasting exercise focuses on futures with
maturities up to 6 months that expire on the last business day of Jan 2017, Feb 2017, . . .,
Jun 2017. Figure 5.7 exhibits the comparison between the actual and predicted futures
prices under the 3-state HOHMM. It shows that the movement of the one-step ahead pre-
dictions resemble the actual data. Such a result is also in agreement with what Figure 5.5
reveals. The results of the out-of-sample error analysis under the 1-, 2- and 3-state HMMs
and HOHMMs are also reported in Table 5.7. The 2- and 3-state HMMs and HOHMMs
incontestably outperform the 1-state model with respect to all forecasting metrics whilst
the 3-state HOHMM has the smallest forecast errors. Summing up altogether the pertinent
evidence from error analyses, information-criteria evaluations, and out-of-sample predic-
tions, it is ascertained that there is merit in putting forward the regime-switching settings
for the modelling and forecasting of Fish Pool’s futures prices. The 3-state HOHMM turns
out to be the best-fitting model for the chosen sample data within the intent of our empirical
analysis.
5.5. Conclusion 171

HMM setting HOHMM setting


1-state 2-state 3-state 1-state 2-state 3-state
RMSE 0.4517 0.4369 0.4434 0.4517 0.4370 0.4367
MAPE 0.4745 0.4250 0.4256 0.4745 0.4250 0.4239
AME 0.3231 0.2889 0.2890 0.3231 0.2889 0.2881
RAE 0.0742 0.0665 0.0666 0.0742 0.0665 0.0663

Table 5.7: Out-of-sample error analysis of HMM- and HOHMM-based models under 1-,
2-, and 3- state settings for futures prices

5.5 Conclusion
In this work, we put forward a multi-dimensional HOHMM-based approach in examining
the evolution of salmon futures prices. It is assumed that the parameters of the arbitrage-
free log-futures prices model are driven by a second-order hidden Markov chain in discrete
time. The usual approach to solve for the MLEs is to utilise the classic Kalman filtering un-
der a univariate setting, which is common in the price discovery research of the aquaculture
literature. But in light of the correlated multivariate nature of the data for salmon futures
prices, we developed suitable and congruous self-calibrating filtering algorithms, given in
matrix representations, for optimal parameter estimation and short-term price prediction.

To illustrate the implementability and prediction performance of our approach, we provided


a numerical demonstration utilising fresh-farmed salmon futures prices publicly available
at the Fish Pool. The HOHMM settings were tested on an extensive data set of daily fu-
tures contracts that are actively traded in the market and mature within 6 months. The
innovation of our implementation, in conjunction with our self-tuning estimation frame-
work, emanates from the design of data processing scheme that exhausts every possible
raw price information despite the complexity brought by the varying maturities of futures
contracts trading in the Fish Pool market. Parameter estimates were generated by filter-
ing recursions under the paradigm of shifting market regimes. We conducted post model
diagnostics for our proposed HOHMM setting, and also benchmarked it with the regular
HMM setting. A comparative analysis of model-validation metrics was performed concen-
trating on error analysis, charting dynamic information-criteria evolution, and d−day ahead
172 Chapter 5. Modelling and forecasting salmon futures-prices curves

forecasts for d = 1, . . . , 5. It is shown that the inclusion of a regime-switching and memory-


description features into the model brings about advantages in attaining better goodness of
fit and price-forecasts accuracy of salmon futures prices. The 3-state HOHMM has the ut-
most sufficiency to describe the essential attributes of the data set gathered for the empirical
analysis in this chapter.

This work contributes to the widening of the collection of available quantitative techniques,
enabling further the use of financial technologies in the management of price risk and
volatilities in the fisheries sector and aquaculture industry. Our results could be extended
in a number of directions, such as applying the proposed framework to develop dynamic
hedging strategies and explore optimisation of portfolios with direct exposure to prices of
fish and seafood.
5.5. Conclusion 173

62
Futures prices (NOK)

actual prices
predicted prices
58
54

09May2016 08Jun2016 06Jul2016 03Aug2016 31Aug2016 28Sept2016


times
Futures prices (NOK)

70
65

actual prices
predicted prices
60

07Jun2016 05Jul2016 02Aug2016 30Aug2016 27Sept2016 25Oct2016


times
76
Futures prices (NOK)

72

actual prices
68

predicted prices
64

07Jul2016 04Aug2016 01Sept2016 29Sept2016 27Oct2016 24Nov2016


times

Figure 5.5: One-step ahead predictions for futures prices with expiries on 28 Oct 2016, 29
Nov 2016, and 29 Dec 2016.
174 Chapter 5. Modelling and forecasting salmon futures-prices curves

-4000
-6000
AIC

-8000
-10000

5 10 15 20
Algorithm steps

1-state HMM/HOHMM 2-state HMM 3-state HMM


2-state HOHMM 3-state HOHMM
-8000 -6000 -4000
AICc

-12000

5 10 15 20
Algorithm steps
1-state HMM/HOHMM 2-state HMM 3-state HMM
2-state HOHMM 3-state HOHMM
-4000
-6000
BIC

-8000
-10000

5 10 15 20
Algorithm steps

1-state HMM/HOHMM 2-state HMM 3-state HMM


2-state HOHMM 3-state HOHMM

Figure 5.6: AIC, AICc and BIC for the 1-, 2-, 3-state HMM/HOHMM-based model for
salmon futures prices.
5.5. Conclusion 175

75
Futures prices (NOK)

70

actual prices
predicted prices
65
60

12Aug2016 09Sept2016 07Oct2016 04Nov2016 02Dec2016 02Jan2017


times
75
Futures prices (NOK)

70
65

actual prices
predicted prices
60

14Sept2016 12Oct2016 09Nov2016 07Dec2016 05Jan2017 02Feb2017


times
75
Futures prices (NOK)

actual prices
predicted prices
70
65
60

14Oct2016 11Nov2016 09Dec2016 09Jan2017 02Feb2017 06Mar2017


times

Figure 5.7: One-step ahead and out-of-sample predictions under the 3-state HOHMM for
futures prices with expiries on 31 Jan 2017, 28 Feb 2017, and 31 Mar 2017.
References

[1] F. Asche, B. Misund, A. Oglend, The spot-forward relationship in the Atlantic salmon
market, Aquaculture Economics and Management, 20(2)(2016), 222–234.

[2] I. Ankamah-Yeboah, M. Nielsen, R. Nielsen, Price formation of the salmon aqua-


culture futures market, Aquaculture Economics and Management, (2017) 1–24. DOI:
10.1080/13657305.2016.1189014

[3] O. Cacho, Systems modelling and bioeconomic modelling in aquaculture, Aquacul-


ture Economics and Management, 1(1-2)(1997), 45–64.

[4] P. Date, L. Jalen, R. Mamon, A partially linearised sigma point filter for latent state
estimation in nonlinear time series models, Journal of Computational and Applied
Mathematics, 233(2010) 2675–2682.

[5] P. Date, R. Mamon, A. Tenyakov, Filtering and forecasting commodity futures prices
under an HMM framework, Energy Economics, 40(2013) 1001–1013.

[6] R. Elliott, L. Aggoun, J. Moore, J., Hidden Markov Models: Estimation and Control,
Springer, New York (1995).

[7] C. Erlwein, F. Benth, R. Mamon, HMM filtering and parameter estimation of an elec-
tricity spot price model, Energy Economics, 32 (5) (2010) 1034–1043.

[8] R. J. Elliott, T. Kuen Siu, A. Badescu, Bond valuation under a discrete-time regime-
switching term-structure model and its continuous-time extension, Managerial Fi-
nance, 37(11)(2011) 1025–1047.

[9] C. Erlwein, R. Mamon, M. Davison, An examination of HMM-based investment


strategies for asset allocation, Applied Stochastic Models in Business and Industry,
27(3)(2011), 204–221.

176
REFERENCES 177

[10] C. Erlwein, R. Mamon, An online estimation scheme for a Hull-White model with
HMM-driven parameters, Statistical Methods and Applications, 18(1)(2009) 87–107.

[11] C. Ewald, Derivatives on nonstorable renewable resources: Fish futures and options,
not so fishy after all, Natural Resource Modeling, 26(2) (2013), 215–236.

[12] C. Ewald, R. Ouyang, T. K. Siu, On the market-consistent valuation of fish farms:


Using the real option approach and salmon futures, American Journal of Agricultural
Economics, 99(1)(2017), 207–224.

[13] C. Ewald, R. Ouyang, T. K. Siu, The market for salmon futures: An empirical anal-
ysis of the Fish Pool using the Schwartz multi-factor model. Quantitative Finance,
16(12)(2016), 1823–1842.

[14] Fish Pool Index. http://fishpool.eu/price-information/spot-prices/fish-pool-index/ (ac-


cessed Oct 2017).

[15] A. Guttormsen, Faustman in the sea: Optimal rotation in aquaculture. Marine Re-
source Economics, 23(4)(2008), 401–410.

[16] M. Hardy, A regime-switching model of long-term stock returns, North American


Actuarial Journal, 5 (2) (2001), 41–53.

[17] M. Manoliu, S. Tompaidis, Energy futures prices: Term structure models with
Kalman filter estimation, Applied Mathematical Finance, 9(1)(2002) 21–43.

[18] Market competition between farmed and wild fish: A literature survey, Food and
Agriculture of the United Nations. http://www.fao.org/3/a-i5700e.pdf (accessed Oct
2017).

[19] J. Martı́nez–Garmendia, J. Anderson, Hedging performance of shrimp futures con-


tracts with multiple deliverable grades, Journal of Futures Markets, 19(8)(1999), 957–
990.

[20] R. Mamon, R. Elliott, Hidden Markov Models in Finance, International Series in


Operations Research and Management Science, 104, Springer, New York, 2007.

[21] R. Mamon, R. Elliott, Hidden Markov Models in Finance: Further Developments and
Applications, International Series in Operations Research and Management Science,
104, Springer, New York, 2014.
178 REFERENCES

[22] R. Mamon, C. Erlwein, R. Gopaluni, Adaptive signal processing of asset price dy-
namics with predictability analysis. Information Sciences, 178(2008), 203–219.

[23] Press release, Fish Pool turned over 90000 tons of salmon last year, 2017,
http://ilaks.no/fish-pool-snudde-over-90-000-tonn-laks-ifjor/ (accessed Oct 2017).

[24] K. Quagrainie, C. Engle, A latent class model for analysing preferences for catfish,
Aquaculture Economics and Management, 10(1)(2006), 1–14.

[25] S. Ross, S. Hedging long run commitments: Exercises in incomplete market pricing,
Economic Notes: Economic Review of Banca Monte dei Paschi di Siena. 26(2)(1997)
385–420.

[26] E. Schwartz, E. The Stochastic behavior of commodity prices: Implications for valu-
ation and hedging, Journal of Finance, 52(3)(1997) 923–973.

[27] P. Solibakke, Scientific stochastic volatility models for the salmon forward market:
Forecasting (un-)conditional moments. Aquaculture Economics and Management,
16(3)(2012), 222–249.

[28] T. Siu, W. Ching, E. Fung, M. Ng, X. Li, A high-order Markov-switching model


for risk measurement, Computers and Mathematics with Applications, 58(1) (2009)
1–10.

[29] J. Stock, M. Watson, Introduction to Econometrics (3rd ed.). Boston: Addison-


Wesley. (2011)

[30] H. White, A reality check for data snooping. Econometrica, 68(5)(2000), 1097–1126.

[31] R. Weron, Modeling and Forecasting Electricity Loads and Prices: A Statistical Ap-
proach. Hoboken, NJ; Chichester, England: John Wiley and Sons. (2006).

[32] X. Xi, R. Mamon, Parameter estimation of an asset price model driven by a weak
hidden Markov chain, Economic Modelling 28(1) (2011) 36–46.

[33] X. Xi, R. Mamon, Yield curve modelling using a multivariate higher-order HMM. In
Zeng, Y. and Wu, S. (eds), State-Space Models and Applications in Economics and
Finance. New York, Springer (2013) 185–203.
REFERENCES 179

[34] X. Xi, R. Mamon, M. Davison, A higher-order hidden Markov chain-modulated


model for asset allocation, Journal of Mathematical Modelling and Algorithms in
Operations Research 13(1) (2014) 59–85.

[35] X. Xi, R. Mamon, Parameter estimation in a WHMM setting with independent and
volatility components. In Mamon, R. and Elliott, R (eds), In: Hidden Markov Models
in Finance: Volume II (Further Developments and Applications). New York, Springer
(2014) 227–240.

[36] X. Xi, R. Mamon, Capturing the regime-switching and memory properties of interest
rates, Computational Economics, 44(3) (2014), 307–337.

[37] H. Xiong, R. Mamon, A self-updating model driven by a higher-order hidden Markov


chain for temperature dynamics, Journal of Computational Science, 17 (2016), 47–61.

[38] H. Xiong, R. Mamon, Putting a price tag on temperature, Computational Management


Science, (2017), https://doi.org/10.1007/s10287-017-0291-8
Chapter 6

Conclusion

6.1 Summary of research contributions


In this thesis, we developed more extensions of the the regime-switching models governed
by HOHMCs. The corresponding filtering algorithms to support the new models’ dynamic
parameter estimation were fully constructed. The usual Markov assumption in regime-
switching approaches was relaxed in our HOHMM setting by having dependency of data
beyond the first-order lag times. In turn, such a relaxation of the Markov assumption equips
a capacity to exploit information from time series data recorded in the past. Various empiri-
cal implementations were put forward, and they were specifically designed to capture some
stylised behaviours of commodity prices and indices for support of derivative valuation and
risk management. Modelling set ups were especially tailored for the dynamics of variables
that primarily affect values of weather, electricity, and aquaculture-based contracts.

The higher-order hidden Markov process modulates the parameters in discrete time to cap-
ture the random shifts amongst different economic regimes resulting from the interaction
of various factors. We derived, via some ‘idealised’ reference probability measure, the
recursive filters for the state of the Markov chain and auxiliary quantities of the observa-
tion process. EM parameter estimates are expressed in terms of the adaptive filters, which
produces self-calibrating modelling methodology. The performance of the four proposed
models was assessed and benchmarked against present standard models in the literature
and current practice, comparing various statistical metrics that quantify the model’s good-
ness of fit (i.e., minimised forecasting errors) and the balance between model’s maximised
likelihood and inherent complexity.

180
6.1. Summary of research contributions 181

This research work offers further developments and valuable insights in both the theoreti-
cal and practical aspects of regime-switching models covering recent financial innovations
in the commodity markets. As a recapitulation, the accomplishments of this research be-
gan with the modelling in Chapter 2 of the DATs under a discrete-time HMM; an online
parameter estimation scheme and evaluation of a temperature-based option with parame-
ter sensitivity analysis were also presented. The formulation of a temperature model with
a discrete-time HOHMC governing the model parameters was introduced in Chapter 3.
Recursive filtering equations for various quantities as a function of HOHMM were aided
smoothly by the probability-measure-change technique, and we underscored a numerical
implementation on a data set of 4-year Toronto’s DATs. We created a modelling set-up
combining OU and jump processes, both of which are modulated by a HOHMC, in the de-
piction of electricity spot-price movement in Chapter 4. The model was applied to a desea-
sonalised series of AESO-collected data. Chapter 5 advanced the use of multi-dimensional
HOHMM in the modelling and very short-term prediction of futures prices in the aquacul-
ture industry. An empirical illustration was carried out on salmon futures prices in the Fish
Pool market for the purpose of model validation.

Over all, the significant research contributions in this thesis highlight the: (i) generali-
sation of the regular HMM-OU framework to a modelling set up that possesses the means
to extract information from the data observed beyond a unit time lag, hence incorporating
the flexibility to capture data’s memory property; (ii) development of dynamic model cali-
bration procedures, through HOHMM-filtering, that addresses the need for financial tech-
nologies for automation in the context of today’s artificial-intelligence-driven world; (iii)
valuation and risk measurement of temperature-linked derivatives under a regime-switching
approach; (iv) enrichment of the HOHMM-OU model with a compound Poisson process
to accurately chart electricity spot price dynamics; and (v) design of multi-dimensional
HOHMM filters and predictors in the analysis of salmon futures prices, whereby the dy-
namic estimation picks up and processes all available raw data points once only and no
observed values are discarded or transformed into proxy values that yield more approxima-
tion errors and uncertainty of model results.
182 Chapter 6. Conclusion

6.2 Further research directions


We recognise that, despite the extensions achieved in our proposed models, shortcomings
in our approaches remain that will open avenues for research investigations. We laid down
some of the ground work to push further the theoretical and practical considerations in-
volved in HOHMM filtering techniques to bolster their potential benefit as new kinds of
data series unfold in the areas of finance, engineering, and social sciences. Several off-
shoots and ramifications of this research work are itemised below.

• As indicated in this thesis, we just utilised an HOHMC with a lag order 2, as a


paradigm that simplifies the elaboration of details in the HOHMM-setting develop-
ment. This helps in the rough quantification of the computational complexity when
reducing the lag order from, say k to k − 1, for k = 2, 3, . . .. Ideally, however, it is de-
sirable to come up with statistical inference techniques that will give the HOHMM’s
optimal lag order b
k suitable for a given data set.

• The construction of our HOHMM filtering algorithms is aligned with the discrete-
time nature of the observed time series. As an alternative, we have yet to see the
development of the HOHMMs and their filters in continuous time. For example, re-
cent regime-switching models with continuous-time HMM filters were formulated in
[1]. Although this type of filters requires discretisation and integral approximations,
there are instances that filter derivations are more straightforward because they are
standard continuous-time stochastic calculus computations. Thus, there is merit to
consider continuous-time filtering for HOHMMs, which is still an inchoate area in
stochastic modelling.

• Our HOHMM-based modelling set ups have a Gaussian-noise term. Similar to


HMM-based set ups as espoused in [2], filtering recursions under non-Gaussian noise
terms could be considered for an even greater flexibility of capturing a variety of
shapes of statistical distributions in the context of big data and new kinds of data
given recent financial-market innovations and regulatory developments. Implemen-
tation of new filters under non-normal correlated multivariate models is also antici-
pated to entail more advanced computing platforms to attain efficiency of self-tuning
parameter estimation.

• A natural direction in modelling the DATs under the HOHMM framework is the
pricing and dynamic hedging of other weather-linked products. For instance, the
6.2. Further research directions 183

modelling of weather futures prices can be pursued by multivariate HOHMM-based


filtering algorithms. The filtered-based EM parameter estimates require a connection
with a suitable risk-neutral measure via the construction a quantity that encapsulates
the risk premium (i.e., price and switching risks). Optimisation of portfolios contain-
ing products linked to weather indices is another closely related problem that could
benefit from our HOHMM approach.

• With regard to the modelling of electricity-spot, further applications to the pricing


of more complicated electricity derivatives other than just forward contracts could
be explored. Applications that consider both weather and electricity derivatives in
tandem will be of a particular interest to practitioners. For instance, a dynamic risk
management strategy involving electricity and temperature-based futures could be
supported by the HOHMM filtering algorithms in finding optimal hedge ratios.

• Climate change in recent times significantly disturbs salmon population dynamics


and leads to harvest uncertainty. For instance, sea surface temperature (SST), with
new peculiarities, is a key factor that affects the fish futures market; see L. Little et
al. [3]. A regime-switching SST model that includes spatial and temporal variables
could be proposed, and the HOHMM-filtering recursions for parameter estimation
could be derived. Furthermore, co-integration between SST and fish futures prices
could be examined to offer risk managers with possible toolkits in dealing with the
problem of estimating the optimal fish-farm harvesting time, and in engineering ap-
propriate hedging strategies.
References

[1] S. Grimm, An interest-rate model with regime-switching mean-reversion level, PhD


thesis, (2016), https://www.genealogy.math.ndsu.nodak.edu/id.php?id=211815.

[2] L. Jalen, R. Mamon, Parameter estimation in a WHMM setting with independent and
volatility components. In Mamon, R. and Elliott, R (eds), In: Hidden Markov Models
in Finance: Volume II (Further Developments and Applications). New York, Springer
(2014) 241–261.

[3] L. Little, A. Hobday, J. Parslow, C. Davies, R. Grafton, Funding climate adaptation


strategies with climate derivatives, Climate Risk Management, 8(C)(2015), 9–15.

184
Appendix A

Derivations of the model’s optimal


parameter estimates in chapter 2

A.1 Optimal estimate for δ


Define a new measure Pbυ via

dPbυ

Yk
δ
= Ψ = ϕδl .
dPυ∗ Xk k

l=1

where
  2 
exp − 2 2 (yl−1 ) Xl − δ (yl−1 ) Xl−1 − η (yl−1 )
1 b
ϕδl =  .
exp − 2 2 (y1 l−1 ) (Xl − δ (yl−1 ) Xl−1 − η (yl−1 ))2
Therefore, the log likelihood for Ψ̃δk is
k
δ2 (yl−1 ) Xl−1
2
δ (yl−1 ) Xl−1 + 2η (yl−1 ) b
δ (yl−1 ) Xl−1
!
X − 2Xlb
log Ψδk
b
= − 2 (y )
+ R (δ (yl−1 ))
l=1
2 l−1
k
δ2i Xl−1 δi Xl−1 + 2ηib δi Xl−1
 n 2 !
X X b − 2Xlb
=  hyl−1 , ei i − + R(δi ) .
 
2
l=1 i=1
2i

δi , such remainder has no affect on the result of the derivation.


Since R(δi ) does not contain b
From equations (2.19) and (2.20), and considering U bl = E(Ul |Xk ), we have
N "  #
X 1 b2 i  2  b i
L(δi ) =
b E − 2 δi Tk Xk−1 − 2δi Tk (Xk−1 , Xk ) + 2ηi δi Tk (Xk−1 ) Xk + R (δi )
b i

i=1
2i
N
X 1 b2 i  2  b i 
= − δ T X − 2δi T (X k−1 , X k ) + 2ηi
bδi T i
(X k−1 ) + R (δi ) .
i=1
2i2 i k k−1 k k

185
186 Chapter A. Derivations of the model’s optimal parameter estimates in chapter 2

δi ) with respect to b
We differentiate L(b δi and set the result to zero giving

Tbki (Xk−1 , Xk ) − ηi Tbki (Xk−1 )


δi =
b .
Tbi (X 2 )
k k−1

A.2 Optimal estimate for η


dPbυ
∗ k
η
ϕηl ,
Y
υ∗
Define a new measure P through b
= Ψ =
dPυ∗ Xk k

 l=1
δ η
1 2
η
exp − 2 2 (yl−1 )
X l − (y l−1 ) X l−1 − b (yl−1 )
where ϕl =   leading to the log likelihood
exp − 2 2 (y1 l−1 ) (Xl − δ (yl−1 ) Xl−1 − η (yl−1 ))2

k
η2 (yl−1 ) − 2Xlb
η (yl−1 ) + 2b
η (yl−1 ) δ (yl−1 ) Xl−1
!
log Ψηk
X
= + R (η (yl−1 )) .
b

l=1
2 2 (yl−1 )

Invoking equations (2.19) and (2.20) and then taking expectation of the log likelihood in-
η) = E log Ψηk | Xk with
h i
volving Xk , we obtain L(b

k
 N  
X hyl−1 , ei i  2 !
log Ψηk
h i X  
E | Xk = E  − 2
ηi − 2Xlb
b ηi + 2b
ηi δi Xl−1 + R (ηi ) Xk  .
l=1 i=1
2i

Tbki (Xk ) − δi Tbki (Xk−1 )


η) and setting the result to 0, we get b
Differentiation of L(b ηi = .
O
bi
k

A.3 Optimal estimate for  2


Construct a new measure Pbυ by setting

dPbυ

Yk
2
ϕl ,
2

υ ∗ = Ψk =
dP Xk l=1
187

 
(y l−1 )exp − 1
 2 (yl−1 )
(X l − δ (y l−1 ) Xl−1 − η (y l−1 ))2

where ϕl =
2b
 . The log likelihood of Ψk
2 2

(yl−1 )exp − 2 2 (y1 l−1 ) (Xl − δ (yl−1 ) Xl−1 − η (yl−1 ))2
b
is calculated as
k ! ! !
X 1 1
log Ψk
2
= log + − 2 (Xl − δ (yl−1 ) Xl−1 − η (yl−1 )) + R ((yl−1 )) ,
2

l=1
(yl−1 )
b  (yl−1 )
2b
 
where R  2 does not have b
. From equations (2.19)–(2.20), we have the expectation of the
 2 ), which is given by
log likelihood as a function of Xk , denoted by L(b

k
 N ! ! !
X  X 1 1
log Ψk
2
h i
| Xk = E hyl−1 , ei i log + − 2 (Xl − δ (yl−1 ) Xl−1 − η (yl−1 )) 
2

E
l=1 i=1
(yl−1 )
b  (yl−1 )
2b
+ R (i ) .

 2 ) with respect to b
Differentiating L(b  2 and equating the result to 0 yield
   
Tbki Xk2 + δ2i Tbki Xk2 + η2i B
bi + 2η2 δi Tbi (Xk−1 ) − 2δi Tbi (Xk−1 , Xk ) − 2ηi Tbi (Xk )
k i k k k
i2 =
b .
O
bt
k

A.4 Optimal estimate for π ji


dPbυ

Yk
π
Define the Radon-Nikodym derivative of P with respect to P as υ∗
b υ∗
= Ψk = ϕπl ,
dPυ∗ Xk l=1
!hyl−1 ,ei ihywl ,e j i
Y n
π ji
where ϕπl = . Taking expectation of the log likelihood in conjunction
b
j,i=1
π ji
with equation (2.18), we have
 k n !hyl−1 ,e j ihyl ,ei i 
X X π
E log ϕπl | Xk =E 
  b ji 
log Xk 
l=1 j,i=1
π ji
 k n  
X X    
=E  logbπ ji − log π ji hyl−1 , e j ihyl , ei i Xk 
l=1 j,i=1
 n 
X   
=E  logb π ji Jk  + R π ji ,
tsr

j,i=1
188 Chapter A. Derivations of the model’s optimal parameter estimates in chapter 2

 
where R π ji is free of b π ji = 1 and introducing a Lagrange multiplier ς,
π ji . Noting nt=1 b
P

we maximise
n
 n 
  X X   
π ji , ς =
L b π ji Jkji + ς  b
logb π ji − 1 + R π ji .
j,i=1 j=1

 
π ji , ς with respect to b
Differentiation of L b π ji and ς and then setting each partial derivative
to 0 would result to
1 cji
J + ς = 0.
π ji k
b

J
cji Oki
n Pn
X j=1 k
π ji = 1 and j=1 Jk
ji
= Oki , π ji = = = 1, which can
Pn Pn
Since j=1 b we have b
j=1
−ς −ς
J
cji
n Pn
X j=1 k
be re-expressed as π ji =
b Therefore, the optimal estimate for p is given by
j=1 O
bi
k
Jcji
π ji = k .
b
Obi
k
Appendix B

Proofs of Propositions in chapter 2

B.1 Proof of Proposition 2.6.1

Proof The HDD futures price is calculated as the expected value of HDD over the contract
period given the current value at time t under the risk-neutral measure Q. Based on the
the Fubini-Tonelli theorem, expectation and integration can be interchanged. We firstly
perform the proof for the case of 0 ≤ t ≤ τ1 < τ2 as follows

 
F H (t, τ1 , τ2 ) =EQ H Ft
" Z τ2 #
=E Q
max (T base − T v , 0) dv Ft
τ1
Z τ2  
= E max (T base − T v , 0) Ft dv.
Q
τ1

h i
To solve for the futures price, it is necessary to compute the value of EQ max (T base − T, 0) Ft .
Rt
We suppose α(yt ) is deterministic so that s eαu dBu in equation (2.7) is a Wiener process (cf
Rt
Benth and Šaltytė-Benth [2]). The integral s eα(yu )u dBuQ in equation (2.42), when evaluated
using the optimal parameter estimates under the HMM settings, is then a Wiener process
that incorporates deterministic regime-switching with filtration extended to time t. Under
 
the normality assumption, T base −T v is normally distributed with EQ max (T base − T, 0) Ft =
 
M (t, v, Xt ), and Var max (T base − T, 0) Ft = A2 (t, v). Write M := M (t, v, Xt ) and A2 :=

189
190 Chapter B. Proofs of Propositions in chapter 2

T base − T − M
A2 (t, v) . Let T ∗ = , which has a standard normal distribution as well. Then
A2
  Z Tbase 1 (T −(T base −M))2
E max (T base − T, 0) Ft =
Q
− (T base − T ) √ e− 2A2 dT
−∞ 2π
Z ∞
1 (T −AT ∗ −M−(T base −M ))2
− base
= (AT + M) √

e 2A2 AdT ∗
−M 2πA
Z ∞A Z ∞
1 − (T 2 )
∗ 2
1 (T ∗ )2
= M√ e dT +

AT ∗ √ e− 2 dT ∗
−M A 2π −MA 2π
 ∞ 
 M 
Z −A
1 (T ∗ )2 1 T ∗ )2
 (

=M 1 − +
 − 2

∗
 − 2
√ e dT A −
 √ e
  
 

−∞ 2π 2π −AM

2
  M  1 − ( MA2 )
=M 1 − Φ − +A√ e
A 2π
M M
=MΦ + Aφ .
A A

Therefore, in terms of M(t, v, Xt ), A2 (t, v), and D (x) = xΦ (x)+φ (x), the HDD futures price
for the period 0 ≤ t ≤ τ1 < τ2 is given by
Z τ2 ! !
M (t, v, Xt ) M (t, v, Xt )
F H f (t, τ1 , τ2 ) = M (t, v, Xt ) Φ + A (t, v) φ dv
τ1 A (t, v) A (t, v)
Z τ2 ! !!
M (t, v, Xt ) M (t, v, Xt ) M (t, v, Xt )
= A (t, v) Φ +φ dv
τ1 A (t, v) A (t, v) A (t, v)
Z τ2 !
M (t, v, Xt )
= A (t, v) D dv.
τ1 A (t, v)

For the scenario 0 ≤ τ1 ≤ t < τ2 , based on the result from the first case and the fact that if
Z is Ft -measurable, EQ (Z |Ft ) = Z, we have
" Z τ2 #
F H (t, τ1 , τ2 ) =E Q
max (T base − T v , 0) dv Ft
"Zτ1t Z τ2 #
=E Q
max (T base − T v , 0) dv + max (T base − T v , 0) dv Ft
τ1 t
Z t "Z τ2 #
= max (T base − T u , 0) du + E Q
max (T base − T v , 0) dv Ft
τ t
Z 1t Z τ2 !
M (t, v, Xt )
= max (T base − T u , 0) du + A (t, v) D dv.
τ1 t A (t, v)
191

B.2 Proof of Proposition 2.6.2


Proof Since F H is independent of Ft , its conditional expectation given Ft is equal to
EQ [F H ]. Thus,
" Z τ2 ! ! #
M (t, v, Xt )
C FH (t, K, τT ) =e
−r(T −t) Q
dv − K, 0 Ft

E max A (t, v) D
τ1 A (t, v)
 
=e−r(T −t) EQ max F H − K, 0 Ft


=e−r(T −t) EQ max F H − K, 0



Z Fm
=e−r(T −t)
F H − KF g F H dF H .
 
K

B.3 Proof of Proposition 2.6.3


Proof It can be straightforwardly derived from the HDD formula and employing similar
arguments in the proof of Proposition 2.6.2.
Appendix C

Derivations of the model’s optimal


parameter estimates in chapter 3

C.1 Optimal estimate for κ


Write κ := (κ1 , κ2 , · · · , κN )> ∈ RN . To update the estimates b κ = b κ1 ,b
κ2 , · · · ,b
κN > ∈ RN ,

dPbυ
w
υw
define a new measure P in accordance with equation (3.11) and via
b
υ w = Ψwκ
k =
dP Xk
      2 
Yk exp − 2%2 (1yw ) Xl − ϑ ywl−1 −b
κ ywl−1 Xl−1
ϕwκ
l , where ϕl =

 l−1
     2  . This means that
l=1 exp − 2%2 (yw ) Xl − ϑ yl−1 − κ yl−1 Xl−1
1 w w
l−1

       
k "
X κ2 ywl−1 Xl−1
b 2
κ ywl−1 Xl−1 + 2ϑ ywl−1 b
− 2Xlb κ ywl−1 Xl−1 #
log Ψwκ
k = −   + R κ yl−1 ,
w 

l=1 2%2 ywl−1

κt . We consider the expectation of the log likelihood, i.e.,


where R (κt ) is independent of b
 
κ) = E log Ψk | Xk . By virtue of (3.21),
L(b wκ

k
X hywl−1 , et i  2 2
 N
X
E log Ψk | Xk =

κt Xl−1
 
E  −
2%2t
b
l=1 t=1

κt Xl−1 + 2ϑtb
κt Xl−1 + R (κt ) Xk .

− 2Xlb

κ) with respect to b
Differentiation of L(b κ and setting the resulting expression to 0, we get the
Ct (Xk−1 , Xk ) − ϑt b
b Ct (Xk−1 )
optimal estimate of bκ, given the observations Xk+1 , as b
κt = k   k .
Ct X 2
b
k k−1

192
193

C.2 Optimal estimate for ϑ


 >
Associated with the change from ϑ = (ϑ1 , ϑ2 , · · · , ϑN )> ∈ RN to bϑ= b ϑ1 , b
ϑ2 , · · · , b
ϑN ∈
dPbυ
w
Y k
υw
N
R is a new measure P , based on equation (3.11), via
b
= Ψwϑ
= ϕwϑl , where
dPυw Xk k

 l=1
     2 
exp − 2%2 (yw ) Xl − ϑ yl−1 − κ yl−1 Xl−1
1 b w w

ϕwϑ
l =  l−1
     2  . Thus,
exp − 2%2 (yw ) Xl − ϑ yl−1 − κ yl−1 Xl−1
1 w w
l−1

       
k
X ϑ2 ywl−1 − 2Xlb
b ϑ ywl−1 + 2b
ϑ ywl−1 κ ywl−1 Xl−1 !
log Ψwϑ
k = −   + R ϑ yl−1 ,
w 

l=1 2%2 ywl−1

where R (ϑ) is a remainder not containing bϑt . Applying equations (3.19) and (3.21), the
 
ϑ) = E log Ψwϑ | Xk , where
expectation of the log likelihood depending on Xk is L(b k

k N 
hywl−1 , et i  2
h i X " X  ! #
E log Ψk | Xk =

ϑt − 2Xlb
ϑt + 2b
ϑt κt Xl−1 + R (ϑt ) Xk .

E − 2
b
l=1 t=1
2% t

ϑ) with respect to b
We differentiate L(b ϑ and equate the result to 0. The optimal estimate b
ϑ
C (Xk ) − κt Ck (Xk−1 )
b t bt
may be be derived as bϑt = k .
Bt
b
k

C.3 Optimal estimate for %


%= b
Consider a transformation from % = (%1 , %2 , · · · , %N )> ∈ RN to b %1 ,b
%2 , · · · ,b
%N
>
∈ RN
using the Radon-Nikodym derivative in equation (3.11), i.e.,

dPbυ
w
Yk

υw = Ψk =
w%
ϕw%
l ,
dP Xk
l=1

        2 
% yl−1 exp − 2b%2 (yw ) Xl − ϑ yl−1 − κ yl−1 Xl−1
w 1 w w

where ϕw%
l =    l−1
     2  . So,
% yl−1 exp − 2%2 (yw ) Xl − ϑ yl−1 − κ yl−1 Xl−1
b w 1 w w
l−1

k "
    #
X  1   1 
log Ψw% = log     + −    Xl − ϑ yl−1 − κ yl−1 Xl−1 + R % yl−1 .
    w  w  2 w 
k
l=1 % ywl−1
b %2 ywl−1
2b
194 Chapter C. Derivations of the model’s optimal parameter estimates in chapter 3

where R (%t ) does not include b %t . By equations (3.19) and (3.21),


k
 N  
h i X X  1 
E log Ψw% k | Xk = E hywl−1 , et i log    


l=1 t=1 % ywl−1
b
  !
 1 
+ −    Xl − ϑ yl−1 − κ yl−1 Xl−1  + R (%t ) .
w  w  2

2b %2 ywl−1

%) with respect to b
We differentiate L(b % ignoring the remainder and equate the derivative to
0. The optimal estimate for parameter b % is given by
   
Ctk Xk2 + b
b Btk ϑ2t + κt2 b 2
Ctk Xk−1 + 2b
Ctk (Xk−1 ) ϑt κt
%t =
b2
Bt
b
k

Ct (Xk ) ϑt + b
b Ctk (Xk−1 , Xk ) κt
−2 k .
Bt
b
k

C.4 Optimal estimate for h


dPbυ
w
υw
= Ak ,
h
Consider a new measure P b
defined in equation (3.31) via
dPυw Xk
N b hyl−2 ,er ihyl−1 ,e s ihyl ,et i
w w w
k Y
Y  htsr 
where Ahk=   . By equation (3.18), the expectation of the
l=2 t,s,r=1
h tsr
log-likelihood on Xk is
 k N  hywl−2 ,er ihywl−1 ,es ihywl ,et i 
h i X X bhtsr
E log Ahk | Xk = E  Xk  ,
  
log 
 
l=2 t,s,r=1
htsr

t=1 htsr = 1, we
PN b
where R (htsr ) is a remainder independent of b
htsr . With the constraint
introduce the Lagrange multiplier ρ and obtain the function to maximise as
N
 N 
  X X
htsr , ρ = + ρ  + R (htsr ) .
 
htsr Atsr

L b log b k
 b
 h tsr − 1
t,s,r=1 t=1

 
Differentiating L bhtsr , ρ with respect to htsr and ρ and setting the derivatives to 0, we have
N PN btsr
1 btsr X t=1 Ak Bksr
Ak + ρ = 0. Since t=1 htsr = 1 and t=1 Ak = Bk , we get htsr = = =
PN b PN tsr sr b
htsr
b t=1
−ρ −ρ
195

N PN btsr
t=1 A
X
1, which can be rewritten as htsr =
b k
. This means that the optimal estimate b
htsr
t=1 Bksr
b
Atsr
b
htsr =
is given by b k
.
B sr
b
k
Appendix D

Derivations of the model’s optimal


parameter estimates in chapter 4

D.1 Optimal estimate for κ


bw w
Define the Radon-Nikodym derivative of PE with respect to PE following equation (4.15)
bw k
dPE Y
as Ew = Ψwκ k = ϕwκ
l , where
dP X k
l=1

       2 
exp − 2%2 (1yw ) Xl − ϑ yl−1 −b
w
κ yl−1 Xl−1 − µβ(yl−1 ) qκ ywl−1
w w

ϕwκ
l =  l−1
      2  .
exp − 2%2 (1yw ) lX − ϑ y w
l−1 − κ y w
l−1 X l−1 − µβ(yl−1 )
w qκ ywl−1
l−1

Thus,

 2 w  2  
X  κ yl−1 Xl−1 + 2µβ(ywl−1 ) qXl−1 + µβ(ywl−1 ) q
k  b
log Ψwκ
k = −  
l=1 2%2 ywl−1
     
κ ywl−1 Xl−1 + µβ(ywl−1 ) q ϑ ywl−1 − Xl

2b 
+ R κ ywl−1 

−  
2%2 ywl−1
Xk X N 
hyw , et i  2  2  
= − l−1 2 b κ Xl−1 + 2µβ qXl−1 + µβ q
l=1 t=1
2%t
   !
+ 2bκ Xl−1 + µβ q (ϑ − Xl ) + R (κt ) ,

196
197

where R (κt ) is a remainder free of bκt . With equations (4.21) and (4.23), we consider the
h i
κ) = E log Ψwκ
expectation of the log likelihood, i.e., L(b k | X k , where

 X  X  hywl−1 , et i 2  2
k " N  
E log Ψk | Xk =

κ + + µβ q

E − X l−1 2µ β qXl−1
2%2t
b
l=1 t=1
    #
+ 2bκ Xl−1 + µβ q (ϑ − Xl ) + R (κt ) Xk .

κ) with respect to b
By differentiating L(b κ and setting the result to 0, we get the optimal
κ, which is
estimate b

Cbkt (Xk−1 , Xk ) − ϑt Cbkt (Xk−1 ) + µβt qCbkt (Xk ) − ϑt qµβt B


bt
κt =
b    2
k
.
Ck Xk−1 + 2µβt qCk (Xk−1 ) + µβt q Bk
b t 2 b t b t

D.2 Optimal estimate for ϑ


bw
dPE
based on equation (4.15). Consider the construction Ew =
bw
E
Define a new measure P
dP Xk
k
Y
Ψwϑ
k = ϕwϑ
l , where
l=1
       2 
exp − 2%2 (yw ) Xl − ϑ yl−1 − κ yl−1 Xl−1 − µβ(yl−1 ) qκ ywl−1
1 b w w w

ϕwϑ
l =  l−1
      2  .
exp − 2%2 (yw ) Xl − ϑ yl−1 − κ yl−1 Xl−1 − µβ(yl−1 ) qκ ywl−1
1 w w w
l−1

So,
 b2  w        
Xk
 ϑ yl−1 − 2Xlb
ϑ ywl−1 + 2κ ywl−1 bϑ ywl−1 Xl−1 + µβ(ywl−1 ) q
log Ψwϑ
k =  −  
2%2 yw
l=1 l−1

+ R ϑ ywl−1 
 

k
X  hywl−1 , et i  2
 N  
X  
=   −
2
ϑ − 2Xlb
b ϑ + 2κb
ϑ Xl−1 + µβ q 
l=1 t=1
2% t

+ R (ϑt ) ,

198 Chapter D. Derivations of the model’s optimal parameter estimates in chapter 4

ϑt .
where R (ϑt ) is a remainder that does not contain b

On the basis of equations (4.21) and (4.23), the expectation of the log-likelihood depending
h i
ϑ) = E log Ψwϑ | Xk , where
on Xk is L(b k

k
 X  hywl−1 , et i  2
 N
h iX
E log Ψwϑ
k | Xk = E − 2
ϑ − 2Xlb
b ϑ
l=1 t=1
2% t

   ! 

+ 2κbϑ Xl−1 + µβ q + R (ϑt ) Xk .

∂L(b
ϑ)
Solving and equating the result to 0, we get the optimal estimate
∂b
ϑ

Cbkt (Xk ) − κt Cbkt (Xk−1 ) − κt µβt qB


bt (Xk−1 )
ϑt =
b k
.
Bb t
k

D.3 Optimal estimate for %

bw k
dPE Y
= Ψ = ϕw%
bw
E w%
Define a new measure P , using equation (4.15), by setting l ,
dPEw Xk k

l=1
where

  
   Xl −ϑ(yw )−κ(yw )Xl−1 −µβ(yw ) qκ(yw )2 
% ywl−1 exp −
l−1 l−1 l−1 l−1 
!
% (yl−1 )+σβ(yw ) qκ (yl−1 )
2 w 2 2 w

2b
ϕw%
l =   l−1
2  .
   Xl −ϑ(yl−1 )−κ(yl−1 )Xl−1 −µβ(yw ) qκ(ywl−1 ) 
 w w

% ywl−1 exp −


b l−1 ! 
2 % (yl−1 )+σβ(yw ) qκ (yw
2 w 2 2
l−1 ) 
l−1
199

From which, we have


k
  
X  1 
log Ψw% =  log     + R % ywl−1 

k
% ywl−1
 
l=1 b
      2 
Xl − ϑ ywl−1 − κ ywl−1 Xl−1 − µβ(ywl−1 ) qκ ywl−1 
−      
2 b%2 yl−1 + σβ(yw ) qκ2 yl−1
w 2 w
l−1

k
 N !
X  X  1 w 
=  hyl−1 , et i log
w
y
% l−1


l=1 t=1
b
      2
Xl − ϑ ywl−1 − κ ywl−1 Xl−1 − µβ(ywl−1 ) qκ ywl−1 !

+ ,

−       R (% t )
2 b%2 yl−1 + σβ(yw ) qκ2 yl−1
w 2 w
l−1

%t . As per (4.21) and (4.23), we have


where R (%t ) is independent of b
k
 N  !
h i X  X w  1
log Ψw% | Xk = hyl−1 , et i log w 
E E y
k
l=1 t=1
% l−1
b
1
−     
2 b% yl−1 + σβ(yw ) qκ ywl−1
2 w 2 2
l−1

× Xl − ϑ yl−1 − κ ywl−1 Xl−1


w  

2
! 
− µβ(ywl−1 ) qκ ywl−1 Xk 


+ R (%t ) .
h i
%) of L(b
Equating to 0 the mathematical derivative (with respect to b %) = E log Ψw%
k | X k ,
%2 is
our optimal estimate of b
      2 
Ck Xk + κt Ck Xk−1 + Bk ϑt + κt µβt p + 2ϑt κt µβt q − σβt κt q
bt 2 2 bt 2 bt 2 2 2

%2t =
b
B
bt
k
 
Cbkt (Xk−1 ) 2µβt qκt2 + 2ϑt κt
+
B bt
k
 
2ϑt + 2κt µβt p Cbkt (Xk ) + 2κt Cbkt (Xk−1 Xk )
− .
B
bt
k
200 Chapter D. Derivations of the model’s optimal parameter estimates in chapter 4

D.4 Optimal estimate for µβ


bw
dPE wµ
= Ψk β =
bw
E
Starting with equation (4.15), define a new measure P through
dPEw Xk

k
Y wµ
ϕl β , where
l=1

       2 
exp − 2%2 (1yw ) Xl − ϑ ywl−1 − κ ywl−1 Xl−1 − b
µβ(ywl−1 ) qκ ywl−1
wµβ
ϕl =  l−1
      2  .
exp − 2%2 (1yw ) Xl − ϑ ywl−1 − κ ywl−1 Xl−1 − µβ(ywl−1 ) qκ ywl−1
l−1

Thus,
       

k
X µβ(ywl−1 ) qκ ywl−1 − 2Xl + 2 κ ywl−1 + ϑ ywl−1
log Ψk β
b
= −  
l=1 2%2 ywl−1
    !
× b µβ(ywl−1 ) qκ yl−1 + R µβ(yw )
w
l−1

k X N
hywl−1 , et i

X    
= µ w
µ w

− β(y w ) qκ yl−1 β(yw ) qκ y
l−1 − 2Xl
2%2t
 b l−1
b l−1
l=1 t=1

     !  
+ 2 κ yl−1 + ϑ yl−1
w w  + R µ ,
 βt

 
where R µβt does not contain b
µβt . Again, by (4.21) and (4.23),

k
hyw , et i
 N 

X  X
log Ψk β
h i  
E | Xk = E − l−1 2 b µβ(ywl−1 ) qκ ywl−1
l=1 t=1
2%t
  
       !   
µβ(ywl−1 ) qκ ywl−1 − 2Xl + 2 κ ywl−1 + ϑ ywl−1  Xk  + R µβt .
 
b

Following the same optimisaiton procedure as above,

Cbkt (Xk ) − ϑt B
bt (Xk−1 ) − κt Cbt (Xk−1 )
µβt =
b k k
.
κt qB bt (Xk−1 )
k
201

D.5 Optimal estimate for σβ

bw k
dPE wσβ
Y wσ
= Ψk = ϕl β ,
bw
E
From equation (4.15), define a new measure P by setting Ew
dP X k
l=1
where

  
 Xl −ϑ(yw )−κ(yw )Xl−1 −µβ(yw ) qκ(yw )2 
σβ(ywl−1 ) exp −
l−1 l−1 l−1 l−1 
!
2 % (yl−1 )+bσβ(yw ) qκ (yl−1 )
2 w 2 2 w

wσβ
ϕl =   l−1
.
 Xl −ϑ(yw )−κ(yw )Xl−1 −µβ(yw ) qκ(yw )2 
σβ(ywl−1 ) exp −
l−1 l−1 l−1 l−1 
b !
2 % (yl−1 )+σβ(yw ) qκ (yl−1 )
2 w 2 2 w

l−1

This yields the log likelihood

k
  
 log  1

X
log Ψk β =
  
σ w

l=1
b β(yl−1 )
      2
Xl − ϑ ywl−1 − κ ywl−1 Xl−1 − µβ(ywl−1 ) qκ ywl−1
−     
2 % yl−1 + b
2 w
σβ(yw ) qκ ywl−1
2 2
l−1

  
+ R σβ(yw ) 
l−1

k X
N
  
X 1
= hyl−1 , et i log 
 w  
σβ(ywl−1 )

l=1 t=1
b
      2 
Xl − ϑ ywl−1 − κ ywl−1 Xl − µβ(ywl−1 ) qκ ywl−1 !
−      
2 % yl−1 + b
2 w
σβ(yw ) qκ yl−1
2 2 w
l−1
 
+ R σβt ,

%t . Invoking equations (4.21) and (4.23),


where R (%t ) does not depend on b
202 Chapter D. Derivations of the model’s optimal parameter estimates in chapter 4

k
 N    
h wσβ
i X  X w 1
E log Ψk | Xk = hyl−1 , et i log 
  
E
σβ(ywl−1 )

l=1 t=1
b
1
−     
2 %2 ywl−1 + b
σ2β(yw ) qκ2 ywl−1
l−1

 ! 
2  
× Xl − ϑ −κ
ywl−1 − µβ(ywl−1 ) qκ
ywl−1 Xl ywl−1 Xk  + R σβt
 

N 
 !   
X 1 1   2 
=   Xl − ϑ − κXl−1 − µβ qκ 
  
log −  
 
t=1
σβ
b 2 %2 + b
σ2β qκ2
 
+ R σβt .

σ2β is
The optimal estimate for b

 
bt ϑ2 + κt µβ q 2
    
Cbkt Xk2 + κt2 Cbkt Xk−1
2
+Bk t t
σ2βt =
b
Bbt κt2 q
k
   
b 2ϑt κt µβ q − % + Cbt (Xk−1 ) 2µβ qκ2 + 2ϑt κt
Bk
t
t
2
t k t t
+
B bt κt2 q
k
 
2ϑt + 2κt µβt q Ck (Xk ) + 2κt Cbkt (Xk−1 Xk )
bt
− .
B
bt κt2 q
k

D.6 Optimal estimate for ptsr


bw
To define a new measure PE described in equation (4.32), consider the Radon-Nikodym
bw
dPE
derivative Ew = Λkp , where
dP Xk

k Y N !hyw ,e ihyw ,e ihyw ,e i


Y ptsr l−2 r l−1 s l t
Λkp = .
b
l=2 t,s,r=1
ptsr
203

With the aid of equation (4.20),


 k N !hyw ,er ihyw ,es ihyw ,et i 
h p
i X X ptsr
l−2 l−1 l
E log Λk | Xk = E  Xk 
 b 
log

l=2 t,s,r=1
ptsr
 k N  
X X  
= E  ptsr − log ptsr hyl−2 , er ihyl−1 , e s ihyl , et i Xk 
w w w

log b
l=2 t,s,r=1
 N 
 X 
= E  log bptsr Ak Xk  + R (ptsr ) ,
tsr
 
t,s,r=1

ptsr = 1, we introduce the


PN
where R (ptsr ) is a b
ptsr -free expression. With the constraint t=1 b
Lagrange multiplier ρ and consider the function
N
 N 
 X X
ptsr , ρ = ptsr Aktsr + ρ  b ptsr − 1 + R (ptsr ) .

 
L b log b
t,s,r=1 t=1

ptsr , ρ with respect to ptsr and ρ, and then setting the resulting mathe-

Differentiating L b
matical derivatives to 0, we have
1 btsr
A + ρ = 0.
ptsr k
b
Abktsr Bksr
PN
ptsr = 1 and t=1 Aktsr = Bksr , we have t=1 ptsr = = = 1, and
PN PN PN t=1
Given that t=1 b b −ρ −ρ
A
PN btsr
ptsr = t=1Bbsr k . Henceforth,
PN
consequently, t=1 b
k

Abktsr
ptsr =
b .
Bbsr
k
Appendix E

Derivation of the model’s optimal


parameter estimates in chapter 5

E.1 Optimal estimate for νh


Following the EM estimation given in Xi and Mamon [32], we define the Radon-Nikodym
dPΘ
bw Y k
Θ Θw h h
derivative of P with respect to P as Θw = Ψwν = ϕwν
bw
k l , where
dP Fk l=1

    2 
exp − 2(σh )21(yw ) Gl − bνh ywl−1 − Gl−1
h
ϕwν
l =  l−1
   2  .
exp − 2(σh )12 (yw ) Gl − νh ywl−1 − Gl−1
l

Thus,
   
Xk bνh ywl−1 2Gl − 2Gl−1 − b νh ywl−1
wνh
 
log Ψk =   2 + R ν h
y w 
l−1
l=1 2 σh ywl−1
k X N
 hyl−1 , et i h w  
 w 
X    
= ν yl−1 2Gl − 2Gl−1 − b
ν yl−1 + R νt ,
h w h

  2 b
l=1 t=1 2 σht
 
where R νth is a b νth -free expression. With equations (5.20) and (5.22), we consider the
h i
expectation of the log likelihood, i.e., L(νbh ) = E log Ψwν | Fk , where
h
k

k N
hywl−1 , et i h w  
X "X  #
h
h i  
E log Ψwν
k | Fk = E 2
ν
b y l−1 2G l − 2G l−1 − ν
b h
yw 
l−1
F k + R νth .
l=1 t=1
2σ t

204
205

νh ) with respect to b
By differentiating L(b νh and setting the result to 0, we get
 
Cbt ζk
w
C k
t h
( f )η(yk ,
w w
yk−1 )
νt h = k =
b   .
Bk
b t
ζk Bk η(yk , yk−1 )
w t w w

E.2 Optimal estimate for σh


dPΘ
bw Yk
Θ h h
via Θw = Ψkwσ = ϕwσ
bw
Define a new measure P l , where
dP Fk l=1

(Gl −νh (ywl−1 )−Gl−1 )2


   
σ yl−1 exp −
hw
2
h σh (yw
2(b l−1 ))
ϕwσ
l = .
(Gl −νh (ywl−1 )−Gl−1 )2
  
σ yl−1 exp −
h w
2
2(σh (yw
b
l−1 ))

Consequently,
   2
k ν h w
 
h
X 1 G l − y l − G l−1   
log Ψwσ = + R σh ywl−1 
 
k
 log  −   2
l=1 σ yl−1
bh w
2 bσ yl
h w

   2 
k X N ν h w

X  1 G l − y l−1 − G l−1
  
= hyl−1 , et i log   − + σht ,
 w 
   2 R
σ yl−1

h w
σ yl−1 w

l=1 t=1 b 2 b h

 
where R σht is independent of b
σht . From equations (5.20) and (5.22), we have
   2 ! 
k ν h w
 N
h h
iX  X w  1 G l − y l−1 − G l−1
 
F  + R σh  .
E log Ψwσ
k | Fk = E  hyl−1 , et i log  − 2 k  t
σh ywl−1
  
l=1 t=1 b 2 bσh ywl−1

σh and the resulting derivative


When the above expression is differentiated with respect to b
is set to 0, the optimal estimate is
  
 ζ w C t ( f h ) 2 η(yw , yw ) − 2 f h ζ w C t ( f h )η(yw , yw )  1
    2 
f h ζ w B t η(yw , yw )  2
k k k k−1 t k k k k−1 t k k k k−1 
σht =  +   .

 
b   
ζk Bk η(yk , yk−1 )
w t w w ζk Bk η(yk , yk−1 )
w t w w
206 Chapter E. Derivation of the model’s optimal parameter estimates in chapter 5

E.3 Optimal estimate for ptsr


dPΘ
bw
Θ
= Λkp ,
bw
To define a new measure P , consider the Radon-Nikodym derivative Θ w
dP F k
where
k Y N !hyw ,e ihyw ,e ihyw ,e i
Y ptsr l−2 r l−1 s l t
Λk =
p
.
b
l=2 t,s,r=1
ptsr

By equation (5.19),
 k N !hyw ,er ihyw ,es ihyw ,et i 
h p
i X X p tsr
l−2 l−1 l
log Λk | Fk = E  Fk 
b 
E log
l=2 t,s,r=1
p tsr
 k N    N 
X X    X 
= E  ptsr − log ptsr hyl−2 , er ihyl−1 , e s ihyl , et i Fk  = E  ptsr Ak Fk  + R (ptsr ) ,
 w w w tsr
log b log b
     
l=2 t,s,r=1 t,s,r=1

ptsr = 1 is dealt with by


PN
where R (ptsr ) is a remainder free of b
ptsr . The constraint t=1 b
introducing a Lagrange multiplier % and the function
N
 N 
 X X
ptsr , % = ptsr Aktsr + %  b ptsr − 1 + R (ptsr ) .

 
L b log b
t,s,r=1 t=1

ptsr , % with respect to ptsr and %, and then setting the resulting mathe-

Differentiating L b
matical derivatives to 0, we obtain
1 btsr
A + % = 0.
ptsr k
b
Abktsr Bksr
PN
ptsr = 1 and t=1 Ak = Bksr , we then have ptsr = = = 1.
PN PN tsr PN t=1
Noting that t=1 b t=1 b −% −%
Hence,

Abktsr
ptsr =
b .
Bbsr
k
Appendix F

Derivation of the d-step ahead


forecasting formula in chapter 5

From equation (5.14), the conditional expectation of the new Markov chain η(ywk+1 , ywk ),
given information up to time k, is

E η(ywk+1 , ywk ) | Fk = Bη(ywk , ywk−1 ) = Bqk .


 
(F.1)
h i
Since all non-zero entries in B are elements of P, equation (F.1) gives E ywk+1 | Fk = Pqk .
The construction of P along with equation (5.15) leads to
h i
E η(ywk+d , ywk+d−1 ) | Fk = Bd qk , (F.2)

and from equation (F.2),


h i
E ywk+d | Fk = Pqk+d−1 = PBd−1 qk . (F.3)
h
Fk+d
Employing Ghk+d − Ghk = log Fkh
, and equations (5.10) and (F.3),
" h #
Fk+d D h iE D E
E log | Fk = νh , E ywk+d | Fk = νh , PBd−1 qk . (F.4)
Fkh

Fh
As per equation (5.10), log Fk+d
h is characterised by a mixture of normal distributions having
PN D k E  
the density function j,i=1 qk+d−1 , e ji φ G; νhj , σhj . The 1-step ahead forecasts for Fk+1
h
is
then
  2 
h i N
X  σhj 
h
| Fk = F k
h
hqk , e ji i exp ν j +  .
 h
E Fk+1 (F.5)
j,i=1
 2 

207
208 Chapter F. Derivation of the d-step ahead forecasting formula in chapter 5

h i
For d = 2, assume that Fk+1
h
= E Fk+1
h
| Fk . Thus,
  2 
h i 2 X
Y N  σhj 
h
| Fk = Fkh hB qk , e ji i exp ν j +
l−1  .
 h
E Fk+2 (F.6)
l=1 j,i=1
 2 

h
The d-step ahead forecasts for Fk+d can be obtained straightforwardly through
  2 
h i d X
Y N  σhj 
h
| Fk = Fkh hB qk , e ji i exp ν j +
l−1  h
E Fk+d  (F.7)
l=1 j,i=1
 2 

by the principle of mathematical induction.


209

Curriculum Vitae

Name: Heng Xiong

Post-Secondary Ph.D. in Statistics (Financial Modelling), 2014 - 2018


Education and University of Western Ontario, London, ON
Degrees:
M.Sc. in Statistics (Financial Modelling), 2011 - 2014
University of Western Ontario, London, ON

M.Sc. in Engineering Business Management, 2005 - 2006


University of Warwick, Coventry, UK

B.Sc. in Electronic Science and Technology, 2001 - 2005


Huazhong University of Science and Technology, P.R China

Honours and Western Graduate Research Scholarship, 2011 - 2018


Awards: Ontario Graduate Scholarship, 2017 - 2018
Graduate Student Teaching Award, 2012 - 2013

Related Work Research and Teaching Assistant, 2011 - 2018


Experience: University of Western Ontario

Publications:

[1] H. Xiong, R. Mamon, A self-updating model driven by a higher-order hidden Markov


chain for temperature dynamics, Journal of Computational Science, 17 (2016) 47–61.
[2] H. Xiong, R. Mamon, Putting a price tag on temperature, Computational Management
Science, in press (2017). DOI: 10.1007/s10287-017-0291-8.
[3] H. Xiong, R. Mamon, A higher-order Markov chain-modulated model for electricity
spot-price dynamics, submitted to Applied Energy, 2017.
[4] H. Xiong, R. Mamon, Modelling and forecasting futures-prices curves in the Fish Pool
market, submitted to Journal of Economic Dynamics and Control, 2017.

You might also like