0% found this document useful (0 votes)
140 views

Moam - Info Management-Mathematics 59c225b41723ddbf52d0b67d

Uploaded by

dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views

Moam - Info Management-Mathematics 59c225b41723ddbf52d0b67d

Uploaded by

dinesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 259

Management mathematics

R.D. Hewins
MT2076
2014

Undergraduate study in
Economics, Management,
Finance and the Social Sciences

This subject guide is for a 200 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 5 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
R.D. Hewins, MSc, DIC, ARGS, ANCM, Senior Teaching Fellow at Warwick Business School,
University of Warwick, and The Tanaka Business School, Imperial College London
With typesetting and proof-reading provided by:
James S. Abdey, BA (Hons), MSc, PGCertHE, PhD, Department of Statistics, London School of
Economics and Political Science.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the author is unable to enter into any correspondence relating to, or arising
from, the guide. If you have any comments on this subject guide, favourable or unfavourable,
please use the form at the back of this guide.

University of London International Programmes


Publications Office
Stewart House
32 Russell Square
London WC1B 5DN
United Kingdom
www.londoninternational.ac.uk

Published by: University of London


© University of London 2011
Reprinted with minor revisions in 2014

The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form,
or by any means, without permission in writing from the publisher. We make every effort to
respect copyright. If you think we have inadvertently used your copyright material, please let
us know.
Contents

Contents

Preface 1
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.3 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.4 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.5 Overview of topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.6 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.7 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.8 Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.8.1 The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.8.2 Making use of the Online Library . . . . . . . . . . . . . . . . . . 7
0.9 How to use the subject guide . . . . . . . . . . . . . . . . . . . . . . . . 7
0.10 Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.11 Examination technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1 Set theory 11
1.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 The order of sets: finite and infinite sets . . . . . . . . . . . . . . . . . . 13
1.9 Union and intersection of sets . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 Differences and complements . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.11 Venn diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.12 Logic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.14 Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

i
Contents

1.15 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 24


1.16 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 24
1.17 Guidance on answering the Sample examination questions . . . . . . . . 26

2 Index numbers 31
2.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 The general approach and notation . . . . . . . . . . . . . . . . . . . . . 32
2.7 Simple index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8 Simple aggregate index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9 The average price relative index . . . . . . . . . . . . . . . . . . . . . . . 34
2.10 Weighted price relative indices . . . . . . . . . . . . . . . . . . . . . . . . 34
2.10.1 Laspeyres’ (base period weighted) index . . . . . . . . . . . . . . 34
2.10.2 Paasche’s (current period weighted) index . . . . . . . . . . . . . 35
2.10.3 Advantages and disadvantages of Laspeyres’ versus Paasche’s indices 36
2.10.4 Other weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.11 More complex, ‘ideal’ indices . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.12 Volume indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.13 Index tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.14 Chain-linked index numbers . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.15 Changing a base and linking index series . . . . . . . . . . . . . . . . . . 38
2.16 ‘Deflating’ a series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.17 Further worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.18 The practical problems of selecting an appropriate index . . . . . . . . . 42
2.19 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.20 Solution to Activity 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.21 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 47
2.22 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 47
2.23 Guidance on answering the Sample examination questions . . . . . . . . 49

3 Trigonometric functions and imaginary numbers 53


3.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

ii
Contents

3.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54


3.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Basic trigonometric definitions and graphs (a reminder) . . . . . . . . . . 54
3.7 Some rules involving trigonometric formulae . . . . . . . . . . . . . . . . 58
3.8 Derivatives and integrals of trigonometric expressions . . . . . . . . . . . 58
3.9 Trigonometric series as expansions . . . . . . . . . . . . . . . . . . . . . . 59
3.10 Other trigonometric functions: reciprocals and inverse functions . . . . . 59
3.11 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.12 Conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13 The Argand diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.14 De Moivre’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.15 A link between exponential expansions, trigonometric functions and imaginary
numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.17 Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.18 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 63
3.19 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 64
3.20 Guidance on answering the Sample examination questions . . . . . . . . 65

4 Difference equations 69
4.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 First order difference equations . . . . . . . . . . . . . . . . . . . . . . . 70
4.7 Behaviour of the solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.8 Linear second order difference equations . . . . . . . . . . . . . . . . . . 72
4.9 The non-homogeneous second order difference equation . . . . . . . . . . 74
4.10 Coupled difference equations . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.11 Graphing and describing solutions . . . . . . . . . . . . . . . . . . . . . . 77
4.12 Some applications of difference equations . . . . . . . . . . . . . . . . . . 77
4.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.14 Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.15 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 80

iii
Contents

4.16 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 80


4.17 Guidance on answering the Sample examination questions . . . . . . . . 81

5 Differential equations 85
5.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 First order and first degree differential equations . . . . . . . . . . . . . . 87
5.6.1 Case I: Variables separable . . . . . . . . . . . . . . . . . . . . . . 87
5.6.2 Case II: Homogeneous equations . . . . . . . . . . . . . . . . . . . 88
5.6.3 Case III: Linear equations . . . . . . . . . . . . . . . . . . . . . . 90
5.6.4 Case IV: Other cases . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7 Second order differential equations . . . . . . . . . . . . . . . . . . . . . 92
5.7.1 Determining the solution constants (e.g. A1 , A2 ) . . . . . . . . . . 93
5.8 Simultaneous differential equations . . . . . . . . . . . . . . . . . . . . . 94
5.9 The behaviour of differential equation solutions . . . . . . . . . . . . . . 96
5.10 Some applications of differential equations . . . . . . . . . . . . . . . . . 97
5.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.12 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 97
5.13 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 97
5.14 Guidance on answering the Sample examination questions . . . . . . . . 100

6 Further applications of matrices 107


6.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.5 Introduction and review . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.6 Input-output economics . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.7 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.8 Transition probabilities and Markov chains . . . . . . . . . . . . . . . . . 113
6.9 A note on determinants, eigen values and eigen vectors . . . . . . . . . . 114
6.9.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.9.2 Eigen values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

iv
Contents

6.9.3 Eigen vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115


6.10 Matrices in linear programming . . . . . . . . . . . . . . . . . . . . . . . 116
6.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.12 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 117
6.13 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 117
6.14 Guidance on answering the Sample examination questions . . . . . . . . 119

7 Markov chains and stochastic processes 123


7.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.5 Some definitions of stochastic processes . . . . . . . . . . . . . . . . . . . 124
7.6 A simple random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.6.1 The Gambler’s ruin problem . . . . . . . . . . . . . . . . . . . . . 125
7.6.2 The case of a single absorbing barrier . . . . . . . . . . . . . . . . 126
7.7 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.7.1 The Chapman-Kolmogorov equations . . . . . . . . . . . . . . . . 127
7.8 Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.8.1 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.9 Queueing theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.11 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 129
7.12 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 129
7.13 Guidance on answering the Sample examination questions . . . . . . . . 130

8 Stochastic modelling, multivariate models 133


8.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.5 Principal component factor analysis . . . . . . . . . . . . . . . . . . . . . 137
8.6 Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.6.1 The factor model . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.6.2 Estimation of factor loadings . . . . . . . . . . . . . . . . . . . . . 143
8.6.3 Factor rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

v
Contents

8.6.4 Factor scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


8.6.5 Other estimation methods . . . . . . . . . . . . . . . . . . . . . . 146
8.6.6 A summary of strategy for factor analysis . . . . . . . . . . . . . 146
8.7 Discriminant analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.7.1 Fisher’s discriminant analysis . . . . . . . . . . . . . . . . . . . . 147
8.7.2 Measurement of performance . . . . . . . . . . . . . . . . . . . . . 149
8.7.3 Other points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.7.4 Stepwise variable selection criteria – Wilks’ lambda . . . . . . . . 150
8.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.9 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 155
8.10 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 155
8.11 Guidance on answering the Sample examination questions . . . . . . . . 156

9 Forecasting 157
9.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.4 A note on spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.6 Classification of forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
9.6.1 Classification by leadtime . . . . . . . . . . . . . . . . . . . . . . 159
9.6.2 Classification by information used . . . . . . . . . . . . . . . . . . 160
9.7 The requirements of a forecasting exercise . . . . . . . . . . . . . . . . . 161
9.8 The structure of a time series . . . . . . . . . . . . . . . . . . . . . . . . 162
9.9 Decomposition models of time series . . . . . . . . . . . . . . . . . . . . 167
9.9.1 Model estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.9.2 Forecasting procedure with EWMAs . . . . . . . . . . . . . . . . 168
9.10 Simple Box-Jenkins (ARIMA) methods . . . . . . . . . . . . . . . . . . . 170
9.10.1 The Box-Jenkins methodology . . . . . . . . . . . . . . . . . . . . 170
9.10.2 Autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . 171
9.10.3 Moving average models . . . . . . . . . . . . . . . . . . . . . . . . 171
9.10.4 Autoregressive moving average models . . . . . . . . . . . . . . . 171
9.10.5 Building a model . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.12 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 176
9.13 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 176

vi
Contents

9.14 Guidance on answering the Sample examination questions . . . . . . . . 178

10 Econometrics, multiple regression and analysis of variance 183


10.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
10.6 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
10.7 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
10.8 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.8.1 Econometric criteria . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.8.2 Auto-correlated errors – violation of assumption 4 . . . . . . . . . 187
10.8.3 Multi-collinearity – violation of assumption 6 . . . . . . . . . . . 188
10.8.4 Mis-specification – violation of assumption 8 . . . . . . . . . . . . 189
10.8.5 Error in variables – violation of assumption 7 . . . . . . . . . . . 189
10.9 A case study – Lydia Pinkham’s vegetable compound . . . . . . . . . . . 189
10.10 Analysis of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
10.10.1 Tests of whether all multiple regression coefficients are zero . . . 196
10.10.2 Other uses of ANOVA tables . . . . . . . . . . . . . . . . . . . . 197
10.11 Forecasting using econometric models . . . . . . . . . . . . . . . . . . . 197
10.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
10.13 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . 200
10.14 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . 200
10.15 Guidance on answering the Sample examination questions . . . . . . . . 201

11 Exploratory data analysis 203


11.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.6 A typical data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
11.7 Graphical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
11.7.1 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11.7.2 Scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

vii
Contents

11.7.3 Three-dimensional scatter plots . . . . . . . . . . . . . . . . . . . 209


11.7.4 Other plots within software packages . . . . . . . . . . . . . . . . 209
11.8 Cluster analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11.8.1 Examples of clustering in marketing . . . . . . . . . . . . . . . . . 210
11.8.2 Types of clustering procedure . . . . . . . . . . . . . . . . . . . . 210
11.8.3 Hierarchical clustering . . . . . . . . . . . . . . . . . . . . . . . . 211
11.8.4 Distances between binary observations . . . . . . . . . . . . . . . 211
11.8.5 Hierarchical clustering methods . . . . . . . . . . . . . . . . . . . 212
11.8.6 Partitioning or non-hierarchical clustering . . . . . . . . . . . . . 218
11.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
11.10 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . 221
11.11 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . 222
11.12 Guidance on answering the Sample examination questions . . . . . . . . 225

12 Summary 229

A Sample examination paper 231

B Sample examination paper – Examiners’ commentary 237

viii
Preface

0.1 Introduction
People in business, economics and the social sciences are increasingly aware of the need
to be able to handle a range of mathematical tools. This course is designed to fill this
need by extending the 100 courses in Mathematics and Statistics into several even more
practical and powerful areas of mathematics. It is not just forecasting and index
numbers that have uses. Such things as differential equations and stochastic processes,
for example, do have direct, frequent and practical applications to everyday
management situations.
This course is intended to extend your mathematical ability and interests beyond the
knowledge acquired in earlier 100 courses. Throughout the mathematical and
quantitative courses of the degrees we attempt to emphasise the applications of
mathematics for management problems and decision-making. MT2076 Management
mathematics is no exception. However, you must always recognise the need to ‘walk
before you can run’ and hence new topics sometimes need to be covered in a relatively
detailed mathematical way before the topics’ uses can be emphasised by more
interesting and practical examples. It must be admitted that many good managers are
not very mathematically adept. However, they would be even more inquisitive, more
precise, more accurate in their statements, more selective in their use of data, more
critical of advice given to them, etc. if they had a better grasp of quantitative subjects.
Mathematics is an important tool which all good managers should appreciate.
Many of the topics within this course are extensions of the comparatively simple ideas
covered within your 100 courses in Mathematics and Statistics. Other topics are
fundamentally new. The course therefore both extends and reinforces existing
knowledge and introduces new areas of interest and applications of mathematics in the
ever-widening field of management.

0.2 Aims

To extend your mathematical and statistical ability and interests beyond the
knowledge acquired in your 100 courses in Mathematics and Statistics.

To introduce new areas of interest and applications of mathematics and statistics in


the ever-widening field of management.

To familiarise yourself with, and become competent in, dynamic models and
multivariate (as well as univariate) data analysis.

1
Preface

0.3 Learning outcomes


By the end of this course and having completed the Essential reading and activities, you
should:

be able to demonstrate further mathematical and statistical knowledge

be able to apply mathematics at varying levels to aid decision-making

understand how to analyse complex multivariate data sets with the aim of
extracting the important message contained within the huge amount of data which
is often available

be able to construct appropriate models and interpret the results generated (this
will often be a case of understanding the output from a computerised model)

be able to demonstrate the wide applicability of mathematical models while, at the


same time, identifying their limitations and possible misuse

discuss the more technical/mathematical/theoretical side of management


(including finance and economics)

be able to read management/financial journals in the areas of (for example)


management science and operations research, forecasting, financial engineering and
market analysis, economics and econometrics and so on with a reasonably high
level of assessing of the basic mathematical techniques employed therein.

0.4 Syllabus
Logical use of set theory and Venn diagrams.

Index numbers.

Trigonometric functions. Imaginary numbers. (The prime requirement for both


these topics is for modelling of cyclical dynamics via difference and differential
equations).

Difference (first and second order) and differential equations (linear, first and
second order). Simultaneous second order equations.

Further applications of matrices: input/output, networks, Markov chains,


transition/product switching matrices.

Simple stochastic processes – including ‘Gamblers ruin’, ‘Birth and Death’ and
queuing models. Analysis of queues to include expected waiting time and expected
queue length.

Statistical modelling. Analysis of multivariate models. Simple treatment of


construction and interpretation of factor analysis and discriminant analysis models.

2
0.5. Overview of topics

Time series analysis. Forecasting techniques (including exponential smoothing,


moving averages, trend and seasonality, simple Box-Jenkins (ARIMA)).

Introduction to econometrics. Multiple regression (including using F tests). Simple


analysis of variance.

Principles of mathematical modelling.

Clustering techniques and appreciation of other models. Data reduction models.


Interpreting various types of scatter plots.

0.5 Overview of topics


The additional mathematical knowledge acquired by this course on top of your existing
mathematical learning will enable you to:

formulate and analyse managerial problems in a mathematical manner

use mathematical logic (via Venn diagrams) to evaluate the sense, or otherwise, of
given data

understand the widely used (and sometimes misused) concept of index numbers

apply difference and differential equations to models and solve dynamic


relationships (for example, in sales, advertising, marketing, pricing, market
analysis, financial markets, etc.)

comprehend the uses of stochastic processes and Markov chains in predicting


distributions for future outcomes

analyse and model a series of observations over time (a ‘time series’) and forecast
ahead (an obviously useful attribute of a good manager)

understand the principles of mathematical models; ask sensible questions about


assumptions, validity of results, etc.

understand how econometric models can be used for analysis of complex economic
relationships

use relatively complex and powerful data reduction techniques in analysing the
typically multidimensional observations available to a manager

analyse multivariate data with a variety of techniques.


The above is a list of some of the specific knowledge which the successful student will
acquire. They represent a summary of the main chapters of the guide. Only the topics
of Chapter 3 are not specifically mentioned above as they are really a ‘means to an end’,
i.e. a preliminary chapter on trigonometric functions and imaginary numbers which are
required (among other things) for some of the subsequent chapters on difference and
differential equation analysis. Nonetheless, there are occasionally examination questions
specifically on Chapter 3 material.

3
Preface

Although containing many worked examples and, obviously, covering the whole
syllabus, this subject guide is not intended as a textbook and should not be
treated as such. However, if you use it correctly, it will provide a good indication of
the levels required and typical areas of application. A full understanding and
appreciation comes with practice, however, and, to this end, various texts are
recommended for reading. Many of these texts also have worked examples and exercises
which you should systematically work through.

0.6 Essential reading


In order to use this subject guide fully you should acquire a copy of the following
essential textbooks, some of which tackle the majority of the course while others cover
only specific parts:

Brzezniak, Z. and T. Zastawniak Basic stochastic processes: a course through


exercises. (London: Springer, 1998) [ISBN 9783540761754].

Hanke, J.E. and D.W. Wichern Business forecasting. (Upper Saddle River, NJ:
Pearson Prentice Hall, 2009) ninth edition [ISBN 9780135009338].

Johnson, R.A. and D.W. Wichern Applied multivariate statistical analysis. (Upper
Saddle River, NJ: Pearson Prentice Hall, 2007) sixth edition [ISBN 9780135143506].
A useful introductory textbook which is used in this course is:

Dowling, E.T. Schaum’s easy outline of introduction to mathematical economics.


(New York: McGraw-Hill, 2006) third edition [ISBN 9780071455343].
Detailed reading references in this subject guide refer to the editions of the set
textbooks listed above. New editions of one or more of these textbooks may have been
published by the time you study this course. You can use a more recent edition of any
of the books; use the detailed chapter and section headings and the index to identify
relevant readings. Also check the virtual learning environment (VLE) regularly for
updated guidance on readings.

0.7 Further reading


Please note that as long as you read the Essential reading you are then free to read
around the subject area in any text, paper or online resource. You will need to support
your learning by reading as widely as possible and by thinking about how these
principles apply in the real world. To help you read extensively, you have free access to
the VLE and University of London Online Library (see below).
The following books are a selection of additional texts covering certain aspects of the
course. They vary considerably in level and coverage of material. You should establish
whether they are necessary for you bearing in mind your own mathematical knowledge
and abilities. You are strongly advised to check the nature of these books (via the
internet for example) before contemplating purchasing any of them.

4
0.7. Further reading

In addition, many of you will have used (and perhaps acquired) some of the
mathematical/statistical texts for earlier 100 courses. Where appropriate, reference is
made to the relevant chapters of these books. They are indicated by an asterisk (*) in
the list below.

Aldenderfer, M.S. and R.K. Blashfield Cluster analysis. (Quantitative Applications


in the Social Sciences). (London: Sage, 1984) [ISBN 9780803923768].

Allen, R.G.D. Index numbers in theory and practice. (Basingstoke: Palgrave


Macmillan, 1982) [ISBN 9780333169162].

* Anthony, M. and N. Biggs Mathematics for economics and finance. (Cambridge:


Cambridge University Press, 2001) [ISBN 9780521559133].

Barnett, R.A. and M.R. Zeigler Essentials of college mathematics for business,
economics, life sciences and social sciences. (Upper Saddle River, NJ: Prentice
Hall, 1994) sixth edition [ISBN 9780023059315]. This book is now hard to find but
it is still useful if you can locate a second hand copy or find one in a library.

* Booth, D.J. Foundation mathematics. (Upper Saddle River, NJ: Prentice Hall,
1998) third edition [ISBN 9780201342949].

Chatfield, C. The analysis of time series: an introduction. (London: Chapman and


Hall, 2003) sixth edition [ISBN 9781584883173].

Chatfield, C. Problem solving: a statistician’s guide. (London: Chapman and


Hall/CRC, 1995) second edition [ISBN 9780412606304].

Chatfield, C. and A. Collins Introduction to multivariate analysis. (London:


Chapman and Hall, 2000) [ISBN 9780412160400].

Everitt, B.S., S. Landau, M. Leese and D. Stahl Cluster analysis. (London: Hodder
Arnold, 2011) fifth edition [ISBN 9780470749913].

Granger, C.W.J. Forecasting in business and economics. (Oxford: Academic Press


Inc, 1989) second edition [ISBN 9780122951817].

Gujarati, D.N. Basic econometrics. (New York: McGraw-Hill, 2009) fifth edition
[ISBN 9780071276252].

Haeussler, E.F. Jr., R.S. Paul and R.J. Wood Introductory mathematical analysis
for business, economics and the life and social sciences. (Upper Saddle River, NJ:
Prentice Hall, 2010) twelfth edition [ISBN 9780321643728].

Holden, K and A.W. Pearson Introductory mathematics for economics and business.
(Basingstoke: Palgrave Macmillan, 1992) second edition [ISBN 9780333576496].

Jacques, I. Mathematics for economics and business. (Upper Saddle River, NJ:
Prentice Hall, 2012) seventh edition [ISBN 9780273763567].

Levine, D.M., D. Stephan, T.C. Krehbiel and M.L. Berenson Statistics for
managers using Microsoft Excel. (Upper Saddle River, NJ: Pearson Prentice Hall,
2005) fourth edition. [ISBN 9780131440548].

5
Preface

* Newbold, P., W. Carlson and B. Thorne Statistics for business and economics.
(Upper Saddle River, NJ: Prentice Hall, 2012) seventh edition [ISBN
9780132745659].

* Ostaszewski, A. Mathematics in economics: models and methods. (Oxford:


Blackwell, 1993) [ISBN 9780631180562].

Owen, F. and R. Jones Statistics. (Upper Saddle River, NJ: Financial Times
Prentice Hall, 1994) fourth edition [ISBN 9780273603207].

Pindyck, R.S. and D.L. Rubinfield Econometric models and economic forecasts.
(New York: McGraw-Hill, 2007) fourth edition [ISBN 9780079132925].

0.8 Online study resources


In addition to the subject guide and the Essential reading, it is crucial that you take
advantage of the study resources that are available online for this course, including the
VLE and the Online Library.
You can access the VLE, the Online Library and your University of London email
account via the Student Portal at:
http://my.londoninternational.ac.uk
You should have received your login details for the Student Portal with your official
offer, which was emailed to the address that you gave on your application form. You
have probably already logged in to the Student Portal in order to register. As soon as
you registered, you will automatically have been granted access to the VLE, Online
Library and your fully functional University of London email account.
If you have forgotten these login details, please click on the ‘Forgotten your password’
link on the login page.

0.8.1 The VLE


The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:

Self-testing activities: Doing these allows you to test your own understanding of
subject material.

Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.

Past examination papers and Examiners’ commentaries: These provide advice on


how each examination question might best be answered.

6
0.9. How to use the subject guide

A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.

Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.

Recorded lectures: For some courses, where appropriate, the sessions from previous
years’ Study Weekends have been recorded and made available.

Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.

Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.

0.8.2 Making use of the Online Library


The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library you will either need to use
your University of London Student Portal login details, or you will be required to
register and use an Athens login:
http://tinyurl.com/ollathens
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine.
If you are having trouble finding an article listed in a reading list, try removing any
punctuation from the title, such as single quotation marks, question marks and colons.
For further advice, please see the online help pages:
www.external.shl.lon.ac.uk/summon/about.php

0.9 How to use the subject guide


The chapters of this subject guide follow a similar format and, unless indicated
otherwise, you should tackle the study of each topic in the following way:

1. Read the relevant chapter of the subject guide. You may be referred back to earlier
chapters if a refreshment of ideas is required.

2. Then do the reading from the essential textbooks.

3. Go through the worked examples and then tackle as many problems as possible
yourself. Remember that learning mathematics is best done by
attempting problems, not solely by reading.

7
Preface

In planning the workload associated with the course, you should appreciate that the
chapters of this subject guide are of different lengths and will therefore take a different
amount of time to cover. However, to help your time management the chapters and
topics of the course are converted below into approximate weeks of a typical 30-week
university course.

Chapter 1: 2 weeks
Chapter 2: 2 weeks
Chapter 3: 2 weeks
Chapter 4: 2 weeks
Chapter 5: 3 weeks
Chapter 6: 3 weeks
Chapter 7: 3 weeks
Chapter 8: 3 weeks
Chapter 9: 3 weeks
Chapter 10: 4 weeks
Chapter 11: 2 weeks
Chapter 12: 1 week
TOTAL 30 weeks

0.10 Examination advice


Important: the information and advice given here are based on the examination
structure used at the time this guide was written. Please note that subject guides may
be used for several years. Because of this we strongly advise you to always check both
the current Regulations for relevant information about the examination, and the VLE
where you should be advised of any forthcoming changes. You should also carefully
check the rubric/instructions on the paper you actually sit and follow those instructions.
The course is assessed by a three-hour unseen written examination. Candidates should
answer all eight questions. Questions will often consist of several parts – part marks
will be noted where appropriate on the examination paper. There will be a mixture of
problem-solving and comment-based questions.
A Sample examination paper is provided at the end of this subject guide.
Remember, it is important to check the VLE for:

up-to-date information on examination and assessment arrangements for this course

where available, past examination papers and Examiners’ commentaries for the
course which give advice on how each question might best be answered.

0.11 Examination technique


Examinations are nerve-racking occasions. If you are particularly prone to examination
nerves then it is a good idea to familiarise yourself with the examination situation by
setting yourself three-hour time slots to do questions in time-constrained examination

8
0.11. Examination technique

conditions. Eventually you can use past papers as mock examinations – in the meantime
create one for yourself from questions within a book. This is worthwhile doing!
Remember that even generous Examiners cannot award marks for blank pages! It is
surprising how many students fail to answer enough questions, fail to write comments
when required or fail to give sufficient explanation. All these failings are extremely
noteworthy – make certain you avoid them.
In the Examiners’ marking scheme for quantitative subjects, marks are almost always
awarded for method as well as accuracy. Bear this in mind when tackling problems.
State clearly any assumptions you feel it is necessary to make.
Become very familiar with how to operate your calculator. A calculator may be used
when answering questions on this paper and it must comply in all respects with the
specification given in the Regulations and on your Admissions Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.

9
Preface

10
1

Chapter 1
Set theory

1.1 Aims of the chapter


To introduce the alternative common notations for set theory.
To extend your knowledge of sets, beyond the basics which you will already have
encountered, with a greater emphasis on interpretation and logic.
To establish the usefulness of a diagrammatic approach to logic and data summary.

1.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

understand the basic notation and nomenclature of sets especially: unions,


intersections, complements, null sets, subsets, finite and infinite sets, differences of
sets, the order of a set, the universal set
construct a Venn diagram (not always straightforward!) from given relational data
interpret Venn diagrams and set notation and explain their meaning in
non-mathematical (‘everyday’) English
use sets and Venn diagrams to analyse data (for example, to show inconsistencies
and to derive maximum and/or minimum orders of sets).

1.3 Essential reading


Unfortunately, none of the essential textbooks covers this topic and, to the knowledge of
the author, no text covers set theory in a similar manner to the way in which the topic
is covered within this chapter. There are several illustrative examples within this
chapter to show the particular application of sets to management situations, etc. In
addition, you are particularly urged to review past papers.

1.4 Further reading


Some reference to sets is made in Anthony and Biggs (Chapter 2); Barnett and Zeigler
(Appendix A1); Booth (Module 25).

11
1. Set theory
1
1.5 Introduction
Set theory is a relatively new aspect of mathematics which is now taught at all levels of
education, from primary school upwards. The reason for this is its wide applicability in
denoting and enumerating events and its visual appeal in using Venn diagrams.
Although hidden in new nomenclature and notation, set theory is really only a
combination of logic and enumeration of events. One important use is in probability
theory but its elegance and efficiency in portraying logical associations demands every
student’s attention. Within this course we concentrate on using sets for logic analysis.
There are numerous examples of set theory questions employed for logic analysis within
past examination papers for MT2076 Management mathematics and you are
strongly advised to use some of them at some stage of your preparation.

1.6 Sets
A set is simply a collection of things or objects, of any kind. These objects are called
elements or members of the set. We refer to the set as an entity in its own right and
often denote it by A, B, C or D, etc.
If A is a set and x a member of the set, then we say x ∈ A, i.e. x ‘belongs to’ A. The
symbol 6∈ denotes the negation of ∈ i.e. x 6∈ A means ‘x does not belong to A’.
The elements of a set, and hence the set itself, are characterised by having one or more
properties that distinguish the elements of the set from those not in the set, for example
if C is the set of non-negative real numbers, then we might use the notation
C = {x | x is a real number and x ≥ 0}
i.e. the set of all x such that x is a real number and non-negative.

Example 1.1 If A is the set of all integers then 4 ∈ A but 6.7 6∈ A.

Since sets are determined by their elements we say that A = B if and only if they have
the same elements.
∅ represents the empty or null set, i.e. a set containing no elements.
The set containing everything is termed the universal set and is usually written as U .

1.7 Subsets
If A and B are two sets and all the elements of A also belong to B then it can be said
that:
A is contained in B
or A is a subset of B
or B contains A.
These expressions are all equivalent and may be symbolically written as A ⊆ B.

12
1.8. The order of sets: finite and infinite sets
1
1.8 The order of sets: finite and infinite sets
A set is said to be finite if it contains only a finite number of elements; otherwise the
set is an infinite set. The number of elements in a set A is called the order of A and is
denoted by |A| or n(A) or nA .

Example 1.2 The set of all integers is an infinite set.

Example 1.3 The set of days in a week has order 7.

1.9 Union and intersection of sets


The union of two sets A and B is a set containing all the elements in either A or B (or
both)
i.e. A ∪ B = {x | x ∈ A or x ∈ B}.

The intersection of two sets A and B is a set containing all the elements that are both
in A and B
i.e. A ∩ B = {x | x ∈ A and x ∈ B}.

If sets A and B have no elements in common, i.e. A ∩ B = ∅,then A and B are termed
disjoint sets.
The above notation can be extended into the case of a family of sets (for example, Ai ,
i = 1, 2, . . . , k). Thus the union of the family is

∪Ai = {x | x ∈ Ai for some i = 1, 2, . . . , k}, i = 1, 2, . . . , k. (1.1)

The intersection of the family is:

∩Ai = {x | x ∈ Ai for every i = 1, 2, . . . , k}, i = 1, 2, . . . , k. (1.2)

Example 1.4 If A = {1, 3, 5, 7} and B = {1, 2, 3, 4, 5}, then


A ∪ B = {1, 2, 3, 4, 5, 7} and A ∩ B = {1, 3, 5}.

Activity 1.1 If A = {a, b, c, d, e, f }, B = {a, e, g, h, j} and C = {b, c, f, g}, what are


the following subsets?

(a) A ∪ B

(b) B ∩ C

(c) A ∩ B c

(d) A ∩ (B ∪ C).

13
1. Set theory
1
1.10 Differences and complements
If A and B are sets then the difference set A − B is the set of all elements of A which
do not belong to B.
If B is a subset of A, then A − B is sometimes called the complement of B in A.
When A is the universal set one may simply refer to the complement of B to denote all
things not in B. The complement of a set A is denoted as Ac or Ā.
Note: De Morgan’s Theorems

(A ∩ B)c = Ac ∪ B c
(A ∪ B)c = Ac ∩ B c .

The above relationships are most easily confirmed by using a Venn diagram (see below)
to indicate that both sides of the above equations amount to the same areas of the
diagram.

1.11 Venn diagrams


Often the relationships that exist between sets can best be shown using a Venn
diagram. To construct a Venn diagram we let a certain region, usually a rectangle,
represent the universal set. This rectangle is often implied by the constraints of the page
and only in those circumstances where its boundary is important is the rectangle drawn
(see the diagrams below for example). Individual sets are then represented by regions,
often circles, within this rectangle. One can then easily depict intersections, unions,
complements, etc. on the diagram. Figure 1.1 shows Venn diagram examples.

Figure 1.1: Venn diagram examples.

14
1.11. Venn diagrams
1
In Figure 1.1 the triple set Venn diagram is particularly useful and can be used to show,
for example, that
n(A ∪ B ∪ C) = n(A) + n(B) + n(C) − n(A ∩ B) − n(A ∩ C) − n(B ∩ C) + n(A ∩ B ∩ C).
However, beware of trying to solve all problems from an equation point of view
(involving perhaps unions, intersections and complements). Many problems are better
tackled from a logical argument. See Examples 1.5 and 1.6 below.

Activity 1.2 Construct Venn diagrams involving A, B and C to show each of the
following subsets:

(a) A ∪ (B ∩ C c )

(b) (A ∪ B ∪ C)c

(c) B ∪ Ac

(d) (A ∪ B) ∩ (B ∪ C).

Example 1.5 A publishing company has three main magazine publications A, B


and C. A market survey on the reading habits of 200 people surveyed revealed:

84 read magazine A

111 read magazine B

73 read magazine C

59 read A and B

53 read B and C

32 read A and C

20 read all three magazines.


How many of those people surveyed:

(a) Read just one of the magazines?

(b) Read just two of the magazines?

(c) Read none of the magazines?


This problem can be solved by putting the information into a Venn diagram, as
shown in Figure 1.2.
The number of elements in each region might be calculated in a number of ways.
Perhaps starting from the centre and working outwards is the best idea here. Since
20 people read A, B and C and 32 read A and C then 12 must read A and C but
not B, etc. Hence from the diagram we have the answers:

(a) 13 + 19 + 8 = 40

15
1. Set theory
1
(b) 39 + 12 + 33 = 84

(c) 56.

Figure 1.2: Venn diagram for Example 1.5.

Activity 1.3 (Slightly more difficult)


An insurance company insures 20,000 businesses against the perils of fire, flood and
storm damage. During a 10-year period 99% of these businesses make no claim at all
against the insurance company. No business claims for more than one type of peril at
a time, but of those businesses that have made one or more claims during the stated
10-year period:

40% have claimed for fire damage

50% have claimed for flood damage

38% have claimed for storm damage

10% have claimed on different occasions for fire and storm damage

15% have claimed on different occasions for storm and flood damage

5% have claimed on different occasions for fire and flood damage.

(a) How many businesses have claimed for all three types of damages (fire, flood
and storm) on separate occasions?

(b) Assuming no business has claimed for the same type of damage more than once,
how many claims in total have been made?

Example 1.6 Of 30 employees in the marketing department of a large


multinational company, 24 are male, 20 have university degrees and 22 have had
experience in other companies.

(a) What is the fewest possible number of male degree holders with no experience
in other companies that might be in the department?

16
1.11. Venn diagrams
1
(b) What is the greatest possible number of female employees in the department
who do not have a university degree but have experience in other companies?
From the given information we know there are 6 females, 10 without university
degrees and 8 with no experience in another company. The answer to (a) occurs
when the females ‘use up’ as many ‘degrees’ and ‘no experiences’ as possible, i.e. 6
‘degrees’ and 6 ‘no experiences’. Furthermore, we know that if M = set of males, D
= set of degree holders, E = set with experience in another company then

n(M ∪ D) ≤ 30

and since
n(M ∪ D) = n(M ) + n(D) − n(M ∩ D)
we can say that
n(M ∩ D) ≥ 24 + 20 − 30 = 14.
Similarly, n(M ∩ E) ≥ 16 and n(D ∩ E) ≥ 12. To satisfy these conditions we might
try setting n(M ∩ D ∩ E c ) = 0.

(a) The Venn diagram in Figure 1.3 seems to satisfy all the conditions and hence it
is possible that there are no male degree holders with no experience in other
companies.

(b) Using a similar logic, it is possible that there are 6 females satisfying the
conditions (i.e. do not have a university degree, but have experience in other
companies) as indicated in Figure 1.4.

Figure 1.3: Venn diagram for Example 1.6.

Figure 1.4: Venn diagram for Example 1.6.

17
1. Set theory
1
We have approached Example 1.6 in a sort of ‘trial and error’ approach. See Example
1.7 below and Section 1.16 for examples where we determine possible orders of sets in a
more structured fashion.

Example 1.7 If n(X ∪ Y ∪ Z) = 25; n(X ∩ Y ∩ Z) = 5; n(X ∩ Y ) = 8;


n(Y ∩ Z) = 9; n(X ∩ Y c ∩ Z c ) = 2 and n(X) = n(Y ) = n(Z), what is n(X)?
From the given information we can construct the following Venn diagram in Figure
1.5.
Letting x, y and z be the number of members in the unknown areas we have the
following equations:

(1): 25 = 2 + 3 + 5 + x + y + 4 + z ⇒ x + y + z = 11.

Furthermore since n(X) = n(Y ) = n(Z) we have

10 + x = 12 + y = 9 + x + z

which means that z = 1 and x = y + 2. Substituting in (1), we have

x + x − 2 + 1 = 11 ⇒ x = 6.

Hence n(X) = 2 + 3 + 5 + 6 = 16.

Figure 1.5: Venn diagram for Example 1.7.

Activity 1.4 Of 20 management trainees in a large company, 16 are male, 15 are


graduates and 10 have had at least three years’ experience. Determine:

(a) the minimum number of males with at least three years’ experience

(b) the maximum number of female graduates who have had at least three years’
experience.

1.12 Logic analysis


Sets and Venn diagrams are particularly useful in depicting the interrelationships
between sets and also analysing whether given information makes sense or not. These
ideas are best illustrated by examples.

18
1.12. Logic analysis
1
Example 1.8 A company studied the preferences of 10,000 of its customers for its
products A, B and C. They discovered that 5,010 liked product A, 3,470 liked
product B and 4,820 liked product C. All products were liked by 500 people,
products A and B (and perhaps C) were liked by 1,000 people, products A and C
(and perhaps B) were liked by 840 people and products B and C (and perhaps A)
were liked by 1,410 people.

(a) Draw a Venn diagram to illustrate the above information and show that there
must be an error in the data provided.

(b) If the erroneous data are for those people liking products B and C (and perhaps
A) determine:
i. its correct value if all 10,000 customers like at least one product
ii. upper and lower limits on its value if some customers like none of the
products.

Suggested solution:

(a) Construct the Venn diagram in Figure 1.6. It is often a good idea to start from
the triple intersection and work outwards.
Total customers = 3670 + 340 + 500 + 500 + 1560 + 910 + 3070 = 10550.
Hence, there must be an error in the data.

(b) Let n(B ∩ C) = x, then the Venn diagram becomes that shown in Figure 1.7.
i. If the total customers liking A, B and C = 10,000 then

10000 = 3670 + 500 + 500 + 340 + 2970 − x + x − 500 + 4480 − x = 11960 − x.

Hence x = 1,960.
ii. Viewing the Venn diagram above, each ‘area’ being non-negative requires
x ≤ 2970, x ≥ 500 and x ≤ 4480. Furthermore, as we have seen x must be
at least 1,960 or we have ‘too many customers’. Thus 1960 ≤ x ≤ 2970.

Figure 1.6: Venn diagram for Example 1.8.

19
1. Set theory
1

Figure 1.7: Venn diagram for Example 1.8.

Example 1.9 An airline keeps information about its passengers and has noted the
following facts about the services it supplied between London and New York during
a particular week:

(a) The airline only operated two different types of aircraft which they denote by A
and Z.

(b) Travellers on Z always have excess baggage, X, whereas travellers on A


sometimes do not.

(c) Smokers, S, always travel on A and always have excess baggage.

(d) There is no Executive Class, E, travel on Z.

(e) Businessmen, B, always travel Executive Class and never smoke.

(f) Passengers requesting champagne, C, are always businessmen and never have
excess baggage.
Interpret each of the above statements in set notation and hence construct a single
Venn diagram to illustrate the relationships between A, B, C, E, S, X and Z.
The following additional quantitative data are available for the week and route in
question:

(g) 40% of all travellers on the airline used type Z aircraft.

(h) Only 20% of all travelling businessmen did not request champagne.

(i) Businessmen make up 80% of all the executive class travellers.

(j) 150 smokers travelled with the airline. This represents 10% of all the travellers.

(k) There were 160 passengers who requested champagne.


What is the minimum and maximum number of non-smoking, non-executive class
travellers on A aircraft?

20
1.13. Summary
1
Suggested solution:

We have the following:

(a) A ∪ Z = U (the universal set)

(b) Z ⊆ X

(c) S ⊆ (A ∩ X)

(d) E ∩ Z = ∅ (the null or empty set)

(e) B ⊆ E ∩ S c

(f) C ⊆ B ∩ X c
[Note: there are alternative ways of depicting the above statements in set notation.]
The Venn diagram, taking into account all the above relationships, might look
something like the one shown in Figure 1.8.
n(S) = 150 = 10% implies that n(travellers) = 1,500.
n(C) = 160 and therefore n(C)/n(B) = 0.8 implies that n(B) = 200.
Hence n(E) = 250 and n(Z) = 600, n(A) = 900.
If S ∩ E = ∅, then n(S c ∩ E c ∩ A) = 900 − (250 + 150) = 500 (minimum).
If n(S ∩ E) = 50 (the most since B ∩ S = ∅) then n(S c ∩ E c ∩ A) = 550 (maximum).

Figure 1.8: Venn diagram for Example 1.9.

1.13 Summary
This chapter stands largely on its own as a topic. However, the concepts covered are
extremely useful for summarising and depicting interrelated information. The topic is
often thought of as being one of the easiest (and hence most popular from a candidate’s
point of view). However, the translation from English to mathematical
notation/diagrams (and vice versa) is a much underrated skill and needs thorough
practice. The remarks included at the end of Section 1.15 might help you to establish
the boundaries of the syllabus.

21
1. Set theory
1
What you do not need to know:

Anything about the strict definitions of open sets, closed sets.


The extensions of set theory into group theory, rings, etc.
The definition of sets of rational, irrational, real numbers, etc.
The application of Venn diagrams and set theory in probability theory.

1.14 Solutions to activities


Activity 1.1

(a) {a, b, c, d, e, f, g, h, j}
(b) {g}
(c) {b, c, d, f }
(d) {a, b, c, e, f }.

Activity 1.2

(a) A ∪ (B ∩ C c ) is shown in ‘i)’.


(b) (A ∪ B ∪ C)c is shown in ‘ii)’.
(c) B ∪ Ac is shown in ‘iii)’.
(d) (A ∪ B) ∩ (B ∪ C) is shown in ‘iv)’.

22
1.14. Solutions to activities
1
Activity 1.3

(a) In total the number of businesses making claims = 20,000/100 = 200. However, the
Venn diagram below is in ‘percentage of businesses making claims’. We let x be the
number of businesses making claims for all three perils.

Thus 40 + 45 + 13 + x = 100, i.e. 98 + x = 100 and therefore x = 2%, i.e. 4


businesses.

(b) The total number of claims is

[27 + 32 + 15 + 2(3 + 13 + 8) + 3(2)] × 2 = [74 + 2(24) + 6] × 2 = 256.

Activity 1.4

(a) If we let the number who are male with at least three years’ experience be x then
we can construct the Venn diagram below for M (Males) and E (at least three
years’ experience) showing the order of the subsets (in terms of x).
Then, since every subset must have non-negative order, we must have

x ≥ 6, x ≤ 16, x ≥ 0, x ≤ 10.

Hence, in summary, 6 ≤ x ≤ 10. The minimum number of males with at least three
years’ experience is therefore six.

(b) Extending the above Venn diagram to include the set of graduates, G, we can
argue as follows: the number of females is four, each of whom could have had at
least three years’ experience and been graduates. Hence the maximum number of
female graduates who have had at least three years’ experience is four. It can be
checked by the following Venn diagram.

23
1. Set theory
1

1.15 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

understand the basic notation and nomenclature of sets especially: unions,


intersections, complements, null sets, subsets, finite and infinite sets, differences of
sets, the order of a set, the universal set

construct a Venn diagram (not always straightforward!) from given relational data

interpret Venn diagrams and set notation and explain their meaning in
non-mathematical (‘everyday’) English

use sets and Venn diagrams to analyse data (for example, to show inconsistencies
and to derive maximum and/or minimum orders of sets).

1.16 Sample examination questions


1. (Please note that this is only part of a full examination question.)
The 120 employees of Union City Manufacturing are classified according to whether
or not they are:
skilled, S, or unskilled
female, F , or male
employed on the production line, P , or not.

Group Number of workforce % of total salary bill


F 57 43
P 70 67
S 18 20
F ∩P 21 26
F ∩S 7 8
S∩P 8 4
F ∩P ∩S X (unknown) Y (unknown)

24
1.16. Sample examination questions
1
The table gives the number of employees which fall into each group identified, and
also the percentage of the total salary bill paid to each group.
(a) From this table calculate the number of people (as a function of X) in each of
the eight disjoint subsets which can be logically identified and produce an
appropriate Venn diagram. Similarly produce a fully annotated Venn diagram
for each group’s % of the total salary bill with subset orders as a function of Y .
(6 marks)
(b) Assuming that each subset of the above Venn diagrams has positive and
integer order, determine the smallest possible value for X and the largest
possible value for Y .
(4 marks)
(c) Assuming the values of X and Y determined in (b), which one of the eight
subsets has the lowest salary per person?
(3 marks)

2. A company undertakes a survey of its 120 adult employees and discovers that there
are:
10 unmarried men without degrees
50 married employees
60 employees with degrees
30 unmarried women without degrees
20 women with degrees
15 married women.
(a) Draw a Venn diagram (with W , D, M denoting ‘women’, ‘has degree’ and
‘married’, respectively) in order to determine the maximum and minimum
number of women who are married and have a degree.
(10 marks)
(b) On the assumption that the number of married women with degrees takes its
maximum value, construct a fully annotated Venn diagram (with W , D, M
denoting ‘women’, ‘has degree’ and ‘married’, respectively) to show the order
of each subset.
(4 marks)
(c) Making use of the diagram in (b) above, describe each of the following subsets
in words and state their order:
i. (W ∪ M )c
ii. (W c ∩ Dc ∩ M )
iii. M ∩ (W ∪ Dc ).
(6 marks)

3. An electronic subassembly consists of one of each of three components A, B and C


which are subject to faults. Tests have shown that failures of the subassemblies are

25
1. Set theory
1
caused by faults in one, two or all three of the components A, B and C. Analysis of
10,000 subassemblies shows that 95% of the subassemblies are free from faults.
Within the remainder there were 350 faulty A components, 250 faulty B
components and 150 faulty C components. Of the subassemblies that failed, 220
were caused by failures in two components only, and (of these 220) 170 had faulty
A components.
(a) Draw a Venn diagram to illustrate the above situation.
(2 marks)
(b) Create an equation for total component breakdowns and hence determine how
many subassemblies tested had:
i. faults in all three components at the same time.
ii. no faulty B or C components?
(8 marks)
(c) For each separate subset of your Venn diagram determine the maximum and
minimum number of faulty assemblies within the subset consistent with all the
information given in the question.
(6 marks)
(d) If the subassembly repairs cost for A, B and C components are respectively
$5, $3 and $2, what are the maximum and minimum possible costs for
repairing all the faulty components of the subassemblies tested which had
faulty B components?
(4 marks)

1.17 Guidance on answering the Sample examination


questions

1. (a) We have the following Venn diagrams for workforce and % of total salary,
respectively.

26
1.17. Guidance on answering the Sample examination questions
1

(b) Since each subset must have positive and integer order then 1 ≤ X ≤ 6 and
1 ≤ Y ≤ 3. [Note the difference between ‘positive’ and ‘nonnegative’.] Hence
the minimum value of X is 1, and the maximum value of Y is 3.
(c) Using the above values for X and Y and evaluating the percentage of salaries
per person for each of the eight subsets we can construct the following table:
Subset Workforce % total salary % salary ÷
workforce
F ∩ P c ∩ Sc 30 12 0.40
P ∩ F c ∩ Sc 42 40 0.95
S ∩ Fc ∩ P 4 11 2.75
F ∩ P ∩ Sc 20 23 1.15
F ∩ S ∩ Pc 6 5 0.83
S ∩ P ∩ Fc 7 1 0.14
F ∩P ∩S 1 3 3.00
F c ∩ P c ∩ Sc 10 5 0.50
Hence the lowest salary per person (in bold) for subsets is S ∩ P ∩ F c (perhaps
strangely).
2. (a) Drawing a Venn diagram of W , D and M and letting the number of married
women with degrees be x and the number of married men without degrees be
y gives the following Venn diagram.

Since the total number of employees is 120 then:


30 + (20 − x) + x + (15 − x) + y + (35 − y) + (5 + y) + 10 = 120,
i.e. 115 − x + y = 120 and hence y = 5 + x. We can therefore rewrite the orders
entirely in terms of x, as shown below.

27
1. Set theory
1

Noting that each subset must have non-negative order will produce the result
that 0 ≤ x ≤ 15 and hence the minimum order required is 0 and the maximum
order is 15.
(b) & (c) Setting x = 15 gives the following Venn below and enables us to determine:
i. n(W ∪ M )c = 35
ii. n(W c ∩ Dc ∩ M ) = 20
iii. n(M ∩ (W ∪ Dc )) = 35.
This gives the following Venn diagram.

3. (a) Using the labels of areas (subsets) we obtain the following Venn diagram.

Areas 2 + 4 + 6 = 220 subassemblies.


Areas 2 + 4 = 170 subassemblies.
Hence area 6 = 50 subassemblies.

28
1.17. Guidance on answering the Sample examination questions
1
Letting n(A ∩ B ∩ C) = x, we can then generate the Venn diagram for
subassemblies as shown.

(b) Forming an equation for components broken we have:

1 × (areas 1 + 5 + 7) = 2 × (areas 2 + 4 + 6) = 3 × area 3 = 750,

i.e. 280 − x + 2(220) + 3x = 750. Hence 720 + 2x = 750 and therefore x = 15.

i. 15 subassemblies had all three components faulty.

ii. 9500 + 180 − x = 9665 subassemblies have no faulty B or C components.

(c) n(A ∪ B ∪ C)c = 9500 (given).

Area 1: n(A ∩ B c ∩ C c ) = 165 (fixed).

Area 2: 85 ≤ n(A ∩ B ∩ C c ) ≤ 170.

Area 3: n(A ∩ B ∩ C) = 15 (fixed).

Area 4: 0 ≤ n(A ∩ B c ∩ C) ≤ 85.

Area 5: 0 ≤ n(Ac ∩ B c ∩ C) ≤ 85.

Area 6: n(Ac ∩ B ∩ C) = 50 (given).

Area 7: 15 ≤ n(Ac ∩ B ∩ C c ) ≤ 100.

The above may be obtained by looking at the extreme cases, shown below.

29
1. Set theory
1

(d) Within B areas 3 and 6 are ‘fixed’. Hence the above diagrams depict the
‘worst’ and ‘best’ cost situations for the problem posed, i.e.

max cost = (15)3 + (50)5 + (15)10 + (170)8 = $1,805


min cost = (100)3 + (50)5 + (15)10 + (85)8 = $1,380.

30
Chapter 2 2
Index numbers

2.1 Aims of the chapter


To give a good (more than superficial!) understanding of the widely used
techniques of index numbers.

To establish what a wide range of alternative index construction methods are used
and what a wide range of applications they have.

To enable you to ask searching questions about the methodology used by


producers, users and quoters of index numbers.

2.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

understand how index numbers are created and for what reason

work with all the following types of indices: price and quantity, simple, relative and
aggregate, fixed base and chain-based, Paasche and Laspeyres, ideal and non-ideal

create a deflated index

link together indices with different bases

fully interpret the message an index is telling you – this is an underrated skill

choose an appropriate index to summarise a given set of data

understand the advantages and disadvantages of the different index types

appreciate the difficulties involved in choosing the best index for a given situation.

2.3 Essential reading


For this topic, the subject guide is sufficiently detailed and therefore there is no need to
acquire books purely for the sake of index numbers.

31
2. Index numbers

2.4 Further reading

2 Of the texts listed in the introduction to this subject guide, both Jacques (Chapter 3)
and Owen and Jones (Chapter 5) have reasonable coverage of index numbers. However,
any other modern statistical text now tends to have a chapter on this increasingly used
topic. Those particularly interested in the topic could refer to Allen, R.G.D. Index
numbers in theory and practice. (Basingstoke: Palgrave Macmillan, 1982).

2.5 Introduction
In many ways this section of the subject stands apart from the rest. It is a
self-contained topic with little or no overlap with other chapters in this subject
introduction. However, index numbers are an increasingly used and much maligned
phenomena of the present day world. All managers should appreciate the uses and
abuses of Index numbers. It is included in the MT2076 Management mathematics
syllabus both because of its growing importance for managers and because surprisingly
few statistical courses devote sufficient time to the topic.
Indices are now used to measure a worker’s or company’s performance, the activities
within financial markets, a countrys economic standing, etc. They are even used to
determine the wage levels for certain types of workers.
Although the arithmetic of indices is simple, you will need to exercise great care in
selecting the appropriate index to use and in performing the often tedious calculations
involved. In addition, it is important for you to be able to interpret what (if anything!)
a particular index value or series of index values is telling you.

2.6 The general approach and notation


‘An index’ is a statistical measure designed to show changes in a variable, or group of
related variables, with respect to time. It shows the relative change rather than the
absolute magnitude of change. There are many uses of index numbers (e.g. Financial
Times Shares Index, Dow Jones Shares Index, Retail Price Index (RPI), index of
industrial production, trade weighted depreciation figures, IQs, etc.). They have become
increasingly popular over recent years however, they should be handled with care since
it is easy to misconstrue their meaning and to make unrealistic assumptions when
calculating them. Index numbers take various forms and vary from simple to complex.
Some of the more common versions are given within this chapter.
It might be useful to indicate the general notation here:

pit = price of ‘commodity’ i in period t


qit = quantity of ‘commodity’ i in period t.

The term ‘commodity’ is used here to indicate the object(s) under consideration – it
might be a car, a television, a basket of food, a week’s worth of labour, etc.

32
2.7. Simple index

2.7 Simple index


This relates to one single commodity (and hence we may, if we wish, drop the suffix i 2
here for simplicity).
pt × 100
Simple price index =
p0
where p0 is the price of the commodity in the ‘Base period’ (the period where the index
is set to be 100 and the ‘yardstick’ against which other values are measured).

Example 2.1 A simple index for labour costs per hour:

Year $ per hour ‘Price’ of labour ‘Price’ relative Index (Base 2000 = 100)
2000 5.0 1.00 100.0
2001 5.2 1.04 104.0
2002 5.5 1.10 110.0
2003 6.0 1.20 120.0
2004 6.2 1.24 124.0

Note: The Base (2000 above) is often chosen to be as ‘normal’ as possible (i.e. when
the price is not unduly high or low). The base period should be fairly up-to-date and
consequently is updated periodically.

2.8 Simple aggregate index


This is used for a fixed group of ‘commodities’, say k in number. (The fixed quantities
may be artificial i.e. not representing actual amounts purchased, etc.)
k
P
pit
i=1
Simple aggregate price index for period t = k
× 100.
P
pi0
i=1

Example 2.2 A product cost index:

‘Commodity’ input i Quantity 2002 Base ‘prices’ pi0 2008 ‘prices’ pit
Material A 1 kg 60 66
Material B 2 kg 30 48
Labour 3 hours 15 21
Overheads 3 hours 45 60
Total 150 195

Hence the simple aggregate price index for 2008 (Base 2002) is
195
× 100 = 130.0.
150

33
2. Index numbers

Note: One of the assumptions of the above index is that the quantities will remain
the same throughout our analysis. This is obviously false in many situations. This
2 difficulty can be tackled in various ways.

2.9 The average price relative index


In this index ‘commodities’ have equal importance. This has an advantage in that the
index is independent of the quantities; however, the disadvantage is that it takes no
account of the quantities!
k
1 X pit
Average price relative index for period t = × 100.
k i=1 pi0

Example 2.3 Continuing with Example 2.2, the price relatives for the four inputs
are 1.1, 1.6, 1.4 and 1.33, respectively. The average price relative index is therefore
100
(1.1 + 1.6 + 1.4 + 1.33) × = 110.8.
4

2.10 Weighted price relative indices


These use ‘weights’ wi and form a weighted average of price relatives
k
P
wi · pit /pi0
i=1
k
× 100.
P
wi
i=1

The weight wi for commodity i is a measure of the importance of that commodity in the
overall index. It might literally be the weight of commodity i used or purchased, or the
number of units, total expenditure on that item in some period, etc.

2.10.1 Laspeyres’ (base period weighted) index


Here the relative weights for each item are calculated as the amount spent on each item
in the base year, i.e. pi0 qi0 .
Thus Laspeyres’ price index, named after Ernst Louis Étienne Laspeyres, for period
t is
Pk Pk
pi0 qi0 · pit /pi0 pit qi0
i=1 i=1
k
× 100 = k × 100.
P P
pi0 qi0 pi0 qi0
i=1 i=1

34
2.10. Weighted price relative indices

Example 2.4 Continuing with Example 2.3, we might have:

‘Commodity’ input i ‘Weight’ wi = pi0 qi0 Price relative pit /pi0 wi × (pit /pi0 ) 2
Material A 20 1.10 22.0
Material B 20 1.60 32.0
Labour 30 1.40 42.0
Overheads 50 1.33 66.6
Total 120 162.6

Hence Laspeyres’ price index for period


162.2
t(2008) = × 100 = 135.2.
120
Note: If the quantities in the simple aggregate index correspond to the actual
amount used for those inputs in the base period then the simple aggregate and
Laspeyres’ indices give the same result.

2.10.2 Paasche’s (current period weighted) index

Here the relative weights for each item are calculated as the amount spent on each item
in the current year at base period prices (i.e. pi0 qit ).
Thus Paasche’s price index, named after Hermann Paasche, for period t is

k
P k
P
pi0 qit · pit /pi0 pit qit
i=1 i=1
k
× 100 = k
× 100.
P P
pi0 qit pi0 qit
i=1 i=1

Example 2.5 Obtaining and using the new weights wi = pi0 qit for the product cost
example:

‘Commodity’ input i ‘Weight’ wi = pi0 qit Price relative pit /pi0 wi × (pit /pi0 )
Material A 30 1.10 33.0
Material B 40 1.60 64.0
Labour 40 1.40 56.0
Overheads 60 1.33 80.0
Total 170 233.0

Hence the Paasche’s index for the period t is


233
× 100 = 137.1.
170

35
2. Index numbers

2.10.3 Advantages and disadvantages of Laspeyres’ versus


Paasche’s indices
2 The following table summarises the main advantages and disadvantages of these two
important classes of indices.

Laspeyres’ index Paasche’s index


Advantages Advantages
(a) Weights need calculating only once. (a) Weights are up-to-date and more relevant
(b) As a consequence of (a) the calculation
of the index is faster.
Disadvantages Disadvantages
(a) Base weights quickly become irrelevant (a) Collection of data to calculate latest
(b) Index tends to overstate price increases weights may prove difficult.
since the weights are not altered to allow (b) Price changes are under-estimated at
for movement from expensive items to the time of rising prices.
cheaper ones.

2.10.4 Other weights


For both Laspeyres’ and Paasche’s indices several ‘weights’ are possible, for example,
use actual base or current period quantities (respectively) instead of outlays, i.e. use
weights qi0 and qit for the Laspeyres’ and Paasche’s price indices instead of pi0 qi0 and
pi0 qit . These weights will produce relative rather than aggregate indices.

2.11 More complex, ‘ideal’ indices


The over- and under-estimation of price changes when using Laspeyres’ and Paasche’s
indices has led to the concept of an ‘ideal’ index number, two of which are:

i. Irving Fischer index:


v
u k k
uP P
u pit qi0 pit qit
u i=1 i=1
u × k × 100
t k
uP P
pi0 qi0 pi0 qit
i=1 i=1

i.e. the geometric mean of the original indices.

ii. Marshall-Edgeworth index:


k
P
pit (qit + qi0 )/2
i=1
k
× 100.
P
pi0 (qit + qi0 )/2
i=1

36
2.12. Volume indices

2.12 Volume indices


Strictly speaking, we have concentrated so far only on a price index. For each 2
formulation of a price index the equivalent volume index may be expressed
mathematically by replacing quantities with prices and vice versa. For example, the RPI
is a price index whereas the Index of Industrial Production is a quantity or volume
index.
We therefore have for example:
k
P
pi0 qit
i=1
Laspeyres’ (aggregate) volume index = k
× 100.
P
pi0 qi0
i=1

k
P
pit qit
i=1
Paasche’s (aggregate) volume index = k
× 100.
P
pit qi0
i=1

2.13 Index tests


The following two tests are often put forward as a means of determining how good an
index is:

i. Time reversal test: the reversal of the time subscripts should produce the
reciprocal of the original index, i.e. if the index calculates a value of 200 for the
period t2 when using a base of t1 , then it should ideally also give a value of 50 for
the index in t1 when using a base of t2 .

ii. Factor reversal test: the product of the price index and the quantity index
should equal the index of total value, i.e.
k
P
pit qit
i=1
k
× 100.
P
pi0 qi0
i=1

Of those covered in this subject guide only the Irving Fischer index satisfies both the
time reversal and factor reversal tests and is considered a truly ‘ideal index’.

2.14 Chain-linked index numbers


It is often argued that the base period must be regularly updated. As a consequence we
have the chain-based index (very popular in the USA) which calculates the index
required using the previous period as a base.

37
2. Index numbers

Thus we have Laspeyres’ (aggregate) chain price index


k
P
2 i=1
pit qit−1
k
× 100,
P
pit−1 qit−1
i=1

whereas Paasche’s (aggregate) chain price index is


k
P
pit qit
i=1
k
× 100.
P
pit−1 qit
i=1

Chain indices are particularly useful for period by period comparisons but, when
considering a longer time period, indices with a single base are easier to interpret.

2.15 Changing a base and linking index series


Bases are often changed to make them more relevant. As a consequence you are often
faced with the situation of having two or more indices apparently measuring the same
thing but using different base periods. To produce a single index with a common base
the most recent base period is usually chosen to be the base for the combined/linked
series. Index values based on older bases are then adjusted for the change in base (see
references for details). Although one is not mathematically justified in linking the series
with different bases when the weights for individual commodities also change, the loss of
precision is often small and the method acceptable.

2.16 ‘Deflating’ a series


With certain information, particularly when measured in monetary units, one might
wish to adjust the data to produce ‘real’ figures which account for inflation, for example
real earnings = apparent earnings / RPI.

Activity 2.1 The table below shows two types of indices calculated over the period
2002 to 2007. The indices are obtained from the total value of output (given in
billions of £) for a particular industrial sector in the UK, and the change in retail
prices (RPI).

Year 2002 2003 2004 2005 2006 2007


Index of output value 121.0 134.4 143.2 149.2 152.8 161.2
(Base = 1997)
RPI 100 105 110 115 122 128

(a) Calculate a new index (with base year 2002) of the value of production output
excluding the inflationary effects.

38
2.17. Further worked examples

(b) In which year did the output value show the greatest annual percentage
increase?
2
2.17 Further worked examples

Example 2.6 The following data represent the prices per unit of three different
commodities in 2000 and 2005 and the total value of purchases in those years:

Purchases (i.e. pt qt ) Prices pt


Commodity 2000 2005 2000 2005
A 18 30 6 5
B 8 20 8 10
C 6 4.5 1 1.5

You are asked to construct price indices using (a) Paasche and (b) Laspeyres.
[First note that here, and occassionally subsequently, we have dropped the
commodity suffix ‘i’ and used a shorter notation for the summation over i too.] Since
the question refers to expenditure on the three commodities, the weights are, in
effect, value weights and hence must be multiplied by the price relatives, not just the
prices in the two years. We therefore have:

Prices Value weights Price


relatives
Commodity 2000 2005 2000 2005
p0 pt w0 wt pt /p0 w0 pt /p0 wt pt /p0
A 6 5 18 30 0.833 15 25
B 8 10 8 20 1.25 10 25
C 1 1.5 6 4.5 1.50 9 6.75
Total 32 54.5 34 56.75

(a) Paasche’s index is


P
wi pi /p0
i 56.75
P × 100 = × 100 = 104.1.
wi 54.5
i

(b) Laspeyres’ index is


P
pin
i 46 + 61 + 70 + 130 30700
P × 100 = × 100 = = 100.656
pi0 45 + 60 + 80 + 120 305
i

for April, and


48 + 62 + 66 + 140 31600
× 100 = = 103.607
305 305
for May.

39
2. Index numbers

Example 2.7 A supplier of office furniture wishes to know if sales in real terms
have increased in the 10-year period 1998–2008. Furthermore he would like to know
2 if stock levels of his furniture were justified by the sales figures. The following data
refer to the stock holdings of the suppliers four main furniture items at the end of
1998 and 2008:

Stock levels
1998 2008
Items Number q0 Value q0 p0 (£) Number qt Value qt pt (£)
Chairs 400 40,000 300 60,000
Cabinets 700 80,000 900 180,000
Desks 140 42,000 200 90,000
Lights 60 30,000 90 60,000

Total sales for 1998 and 2008 were £1,200,000 and £2,400,000, respectively.

(a) Construct a weighted index of the price increases, 2008 as against 1998, for the
four items of stock together.
(b) Calculate using the above index the percentage change of sales in real terms.

Suggested solution:

(a) First, determining the prices for 2008:

Item Number £value Price/unit


Chairs 300 60,000 200
Cabinets 900 180,000 200
Desks 200 90,000 450
Lights 90 60,000 666.67

And hence a Laspeyres’ price index can now be determined:

Item 1998 quantity q0 2008 prices p1 q0 p1 q0 p0 (as given)


Chairs 400 200 80,000 40,000
Cabinets 700 200 140,000 80,000
Desks 140 450 63,000 42,000
Lights 60 666.67 40,000 30,000
Total 323,000 192,000

Hence index of price increases is


323000
(1998 = 100) = × 100 = 168.0
192000
(b) 1998 sales = £1,200,000 and hence using the index above the 1998 sales adjusted
for price increases during the decade = £1,200,000 × 168/100 = £2,016,000.
Hence real increase in sales = £(2,400,000 − 2,016,000) = £384,000, i.e. a
19.05% increase.

40
2.17. Further worked examples

Example 2.8 Every month a company purchases four items in the typical
quantities and at the prices shown below:
2
Price per units
Commodity Units Weights March April May
W Kilos 120 45 46 48
X Kilos 50 60 61 62
Y Litres 60 80 70 66
Z Thousand 100 120 130 140

Using March as a base, find for April and May:

(a) The simple aggregate price index.

(b) The weighted aggregate price index.

(c) If in June of the same year commodities W and X are expected to increase by
one per cent per kilo and the price of commodity Z is expected to increase by 10
per cent per thousand, how much must the cost per litre of Y decrease in order
that the weighted aggregate price index for June remains the same as for May?

Suggested solution:

(a) Simple aggregate index (March = 100)


P
w0 (p1 /p0 )
i 34
P × 100 = × 100 = 106.2.
w0 32
i

(b) A weighted aggregate price index is obtained by using the weights 120, 50, 60
and 100. Although it should be borne in mind that a number of other possible
weighted indices are possible, the following seems reasonable. For April,
P
wi (pi1 /pi0 )
i 120(46/45) + 50(61/60) + 60(70/80) + 100(130/120)
P × 100 = × 100
wi 330
i
334.333
= × 100 = 101.31.
330
Similarly a weighted aggregate price index for May is = 100 × 345.83/330 =
104.79.

(c) If the index is to remain as before, then


X pi1
wi ·
i
pi0

must remain unchanged, i.e.

345.83 = 120(48.48/45) + 50(62.62/60) + 60(1 − y)(66/80) + 100(154/120)

41
2. Index numbers

(here 100y is the percentage decrease in the price of Y ).


Solving for y:
2
49.5(1 − y) = 345.83 − 129.28 − 52.18 − 128.33 = 36.04,

i.e. y = 0.272. So there is a price decrease of 27.20% in the May price for Y .
[Note: A completely different set of results for (b) and (c) is possible if an
aggregate index of P
wi pi1
i
P × 100
wi pi0
i

is formed. In this case the answers become (b) 102.3 and 106.4 and (c) 19.5%.
This demonstrates the ability to create and use many apparently acceptable
indices.]

2.18 The practical problems of selecting an


appropriate index
In the above notes we have seen some of the very wide range of index numbers that
might be calculated. This is an indication of the complexity one is faced with when
trying to decide upon an appropriate index structure. There are certain key facts which
one must establish before the appropriate index can be chosen. They include:

Why do we require an index? Is it to represent changing prices, changing quantities


or changing expenditures?

Do we prefer a fixed base or an updated base methodology?

What are the costs and time delays of acquiring the data?

How many ‘commodities’ should we include? Which representative ‘commodities’


should we use?

How often should we collect the data?

What weights should we use? How often should they be updated?

Should we deflate the index?


On the next few pages are tables which indicate the sort of complexity involved once
one has decided to construct an index, in this case an index for share prices (e.g. on the
London Stock Exchange). Table 2.1 summarises three of the main indices that have
been used over many years to measure share prices in the stock exchange, namely the
FTSE 100 Index, The FT ‘All Share’ Index and the FT 250 Index. Even if one decides
to use 100 chosen shares, which ones do we choose? The list in 1989 (shown in Table
2.2) was a representative mixture of companies from various industrial/economic
sectors. For comparison, the current (January 2008) list of FTSE 100 companies is
shown in Table 2.3 and you should note how different the list has become. This is not

42
2.18. The practical problems of selecting an appropriate index

FTSE 100 FT All Share Index FT 250 Index


Base date 1984 1962 1985
Index Market free-float Market free-float Market free-float
construction capitalisation capitalisation capitalisation
2
weighted arithmetic weighted arithmetic weighted arithmetic
average average average
Number of 100 683 250
constituents
Market 80–85% of the Approximately 98% Approximately 14–15%
coverage entire UK equity of the entire UK of the entire UK
market equity market equity market
Function Long- and short-term Long-term equity Derivatives, Index
equity market market indicator. tracking funds,
indicator. Measure of Measure of electronic funds
portfolio performance. portfolio performance. transfers and
Creation of Index tracking performance
derivatives, index funds, electronic benchmark.
tracking funds, funds transfers.
electronic funds
transfers.
Calculation Effectively Effectively Whenever a change
frequency continuously (every continuously (every in the price of one
15 seconds 60 seconds of the stock occurs.
09:00–17:00 daily) 09:00–17:00 daily)
and end of day and end of day
Review of Every three months See if you can find See if you can find
constituents in March, June, out – merely for out – merely for
September, December. interest. interest.
Table 2.1: A comparison of the characteristics of some London share indices, as at January
2008.

surprising as the index membership is reviewed every three months. If interested, see
www.ftse.com/Indices/index.jsp for more details. The ‘weights’ that are used are
constantly changing to give more or less importance to certain shares. These tables
constitute one index measure at moments in time. Many of the shares chosen to be
within the index at one time have been replaced with new ones; the weights
(capitalisations) have changed also. Certain companies are no longer so important, some
have been taken over, some new companies have been created by privatisation, etc.
Some companies have moved in or out of the FTSE 100 on several occasions. The
problems are immense.

43
2. Index numbers

Builders/Construction Food Retailing Oil & Gas


2 Blue Circle Industries Argyll Group British Petroleum
BPB Industries ASDA MFI British Gas
English China Clays Sainsbury Burmah Oil
Pilkington Tesco Enterprise Oil
Redland Health & Household LASMO
RMC British Oxygen Shell
Tarmac Fisons Ultramar
Taylor Woodrow Glaxo Banks & Other
Electrical/Electronics ICI Financial
BICC Reckitt & Colman Abbey National
Carlton Communications Smith & Nephew Barclays Bank
GEC SmithKline Beecham Lloyds Bank
Lucas Wellcome Midland Bank
Racal Stores NatWest Bank
Brewers, Distillers and Boots Royal Bank of Scotland
Leisure Burton Standard Chartered
Allied Lyons GUS A TSB
Bass Kingfisher Insurance Companies
Grand Metropolitan Marks & Spencer Commercial Union
Guinness Sears General Accident
Ladbroke Telecommunications Guardian Royal Exchange
Scottish & Newcastle British Telecom Legal & General
Trusthouse Forte Cable & Wireless Prudential
Whitbread A STC Royal Insurance
Misc Holding Property Sun Alliance
Companies Hammerson A Mining Finance
BAA Land Securities Rio Tinto Zinc
BAT Industries MEPC Textiles
British Airways Food Manufacturing Courtaulds
BTR Assoc. British Foods Paper & Packaging
Cookson Cadbury Maxwell Communications
Granada Hillsdown Reed International
Lonrho RHM Group Engineering
Hanson Trust Unilever BET
Pearson United Biscuits British Steel
P&O Deferred British Aerospace
Polly Peck GKN
Rank Organisation Hawker Siddeley
Reuters B Rolls Royce
Rothmans Siebe
Thorn EMI
Trafalgar House
Table 2.2: The industrial/economic sector breakdown of the 100 share in the FTSE 100
in 1989.

44
2.19. Summary

Activity 2.2 (Mainly for interest) Compare Table 2.2 with Table 2.3 and note
how few companies were in the FTSE 100 in 1989 and 2008. Note also the changing
type of company involved. 2

Activity 2.3 (Mainly for interest) Try to find out what the weightings are for
the 100 companies in the FTSE 100 – the constituents might have changed by the
time you read this subject guide (remember the company list is reviewed every three
months and often several changes occur).

2.19 Summary
What you should know

The subject of index numbers is wide-ranging due to the many alternative indices which
can be created from a data stream – you may come across some extra ones that are not
specifically mentioned within this subject guide. However, this chapter refers to all the
index types you are called upon to understand and use within this course.

What you do not need to know

There is no obligation for you to know how any particular ‘well-known’ index (for
example, the Financial Times 100 or the Dow Jones) has been created. However, it is
important to understand the difficulties in constructing indices that have such aims.

2.20 Solution to Activity 2.1


To produce an index with 2002 as a base we calculate (It /I2002 ) × 100. We then deflate
this series by dividing the series just obtained by the (RPI/100) for the corresponding
year. Since the RPI has a value of 100.0 in 2002 there will be no further adjustments
required:

(a) We have:

Year 2002 2003 2004 2005 2006 2007


Index of Output Value It 121.0 134.4 143.2 149.2 152.8 161.2
(Base = 1997)
RPI 100 105 110 115 122 128
Index of Output Value 100.0 111.07 118.35 123.31 126.28 133.22
(Base = 2002)
Deflated Index series for 100.00 105.79 107.59 107.22 103.51 104.08
Output Value (Base = 2002)
(b) The greatest annual percentage increase is 5.79% between 2002 and 2003.

45
2. Index numbers

3i Group G4S Reuters Group


AB Food GaxoSmithKline Rexam
Admiral Group Hammerson Rio Tinto
Alliance & Leicester HBOS Rolls Royce
AMEC Home Retail Royal & Sun Alliance
Anglo American HSBC Holdings Royal Bank of Scotland
Antofagasta Icap Royal Dutch Shell
Astra Zeneca Imperial Tobacco Group SAB Miller
Aviva Intercontinental Hotels Sage Group
BAE Systems International Power Sainsbury
Barclays ITV Schroders
BG Group Johnson Matthey Scottish & Newcastle
BHP Billiton Kazakhmys Scottish & Southern Energy
BP Kelda Group Severn Trent
British Airways Kingfisher Shire
British American Tobacco Land Securities Group Smith & Nephew
British Energy Legal & General Smith Group
British Land Liberty International Standard Chartered
British Sky Broadcasting Lloyds TSB Group Standard Life
BT Group London Stock Ex. Group Taylor Wimpey
Cable & Wireless Lonmin Tesco
Cadbury Schweppes Man Group Thomas Cooke Group
Cairn Energy Marks & Spencer TUI Travel
Capita Group Morrison (WM) Tullow Oil
Carnival Supermarkets Unilever
Carphone Warehouse National Grid United Utilities
Centrica Next Vedanta Resources
Compass Group Old Mutual Vodafone Group
Diageo Pearson Whitbread
Enterprise Inns Persimmon Wolseley
Experian Prudential WPP Group
FirstGroup Reckitt Benckiser Xstrata
Friends Provident Reed Elsevier Yell Group
Rentokil Initial
Resolution
Table 2.3: The January 2008 list of companies in the FTSE 100.

46
2.21. A reminder of your learning outcomes

2.21 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities, 2
you should be able to:

understand how index numbers are created and for what reason

work with all the following types of indices: price and quantity, simple, relative and
aggregate, fixed base and chain-based, Paasche and Laspeyres, ideal and non-ideal

create a deflated index

link together indices with different bases

fully interpret the message an index is telling you – this is an underrated skill

choose an appropriate index to summarise a given set of data

understand the advantages and disadvantages of the different index types

appreciate the difficulties involved in choosing the best index for a given situation.

2.22 Sample examination questions


1. The costs per kilogram of raw material X and Y have been registered from 2000 to
2006 and are reproduced below:

Year Cost per kg X Cost per kg Y


2000 7.00 4.00
2001 7.35 4.20
2002 7.98 4.70
2003 8.61 4.10
2004 9.10 5.10
2005 9.73 5.40
2006 10.43 5.60

The Multimix company has used X and Y in its product XANDY in the
proportions 40:60 by weight throughout the above period.
(a) Produce separate material price indices (Base 2000 = 100) for the raw
materials X and Y .
(4 marks)
(b) Construct a chain-based unlinked index for the raw material X and illustrate
its usefulness by determining the year in which the greatest percentage
increase in the price of X occurred. What is the size of this increase?
(4 marks)

47
2. Index numbers

(c) Construct an index series (Base 2000 = 100) for the total material cost of
XANDY. Comment upon this series.
2 (6 marks)
(d) Assuming that the costs of X and Y will continue to increase in the future at a
rate equal to their average rates of increase over the period 2000 to 2006, what
prediction would you give for the XANDY total material cost index in 2008?
(6 marks)

2. (Please note that this question is only part of a full examination


question.)
The following table gives indices for share prices on a stock exchange using two
different index methods (the collected share index based in 1985 and the
illustrative share index based in 2005). Also given is an inflation index (Base 1975).

Year ‘Collected’ Index ‘Illustrative’ Index Inflation Index


(Base 1985 = 100) (Base 2005 = 100) (Base 1975 = 100)
2002 145.0 310.0
2003 158.1 315.6
2004 170.2 330.4
2005 188.2 100.0 358.0
2006 116.9 378.8
2007 146.0 383.4
2008 168.0 410.4

Using the above data you are asked to:


(a) Combine the two index series for share prices so that the resultant series has a
common base.
(4 marks)
(b) Produce a series of deflated share prices to indicate whether share prices have
gone up more or less than inflation. In which year was the highest percentage
increase in deflated share prices and what was the value of this increase?
(8 marks)

3. (a) An economic leading indicator is designed to move up or down before the


economy begins to move the same way. Suppose you want to construct a
leading economic indicator. Because of the time and work involved, you decide
to use only four time series. You select the following four series:
unemployment, stock prices, producer prices and exports. Here are the figures
for 1989 and 1991:
Economic Time Series 1989 1991
Unemployment rate (%) 5.3 6.8
Index of stock prices, Standard & Poors 265.88 362.26
(1942 = 100)
Producer Price Index (1984 = 100) 109.6 115.2
Exports ($1,000 millions) 529.9 622.8

48
2.23. Guidance on answering the Sample examination questions

Source: US Department of Labour, Monthly Labour Review, May 1991.


The weights you arbitrarily assign are: Unemployment Rate 20%, Stock Prices
40%, Producers Price Index 25% and Exports 15%. 2
Using 1989 as the base period, construct a leading economic indicator (index
value) for 1991. Interpret your leading indicator.
(6 marks)
(b) Prices of selected foods for 1977 and 1992 are given in the following table:
1977 1992
Item Price Amount Produced Price Amount Produced
Cabbage 6 2000 5 1500
Carrots 10 200 12 200
Peas 20 400 18 500
Lettuce 15 100 15 200
i. Using the Laspeyres’ formula, calculate a weighted index of price for 1992
(1977 = 100).
(4 marks)
ii. Using a Paasche formula, calculate a weighted index of price for 1992.
(4 marks)
iii. Interpret each of the two price indices above and discuss the
appropriateness of each.
(2 marks)
iv. Compute a value index for 1992 (1977 = 100). Interpret.
(4 marks)

2.23 Guidance on answering the Sample examination


questions
1.
(a) & (b)

Year Cost per Cost per Index for X Index for Y Chained
kg X kg Y index for X
2000 7.00 4.00 100.0 100.0
2001 7.35 4.20 105.0 105.0 105.0
2002 7.98 4.70 114.0 117.5 108.57
2003 8.61 4.10 123.0 102.5 107.89
2004 9.10 5.10 130.0 127.5 105.69
2005 9.73 5.40 139.0 135.0 106.92
2006 10.43 5.60 149.0 140.0 107.19
Greatest increase in X per year is for 2002 when the rise was 8.57%.

49
2. Index numbers

(c)

Year XANDY per kg XANDY Price Index XANDY chained


2 2000 5.20 100.0
2001 5.46 105.0 105.0
2002 6.01 115.6 110.1
2003 5.90 113.5 98.2
2004 6.70 128.8 113.5
2005 7.13 137.2 106.4
2006 7.53 144.8 105.6
The price of XANDY is increasing other than in year 2003. The largest
increase is in 2004 when it increased by 13.5%.
(d) Average annual percentage increase in X is (149 − 100)/6 = 8.1667% and
average annual percentage increase in Y is 6.6667%. Hence we would predict
that in 2008, X will cost (10.43) × (1.081667)2 = 12.20 per kg and Y will cost
(5.6) × (1.066667)2 = 6.37 per kg. Hence XANDY is predicted to cost
(12.20 × 0.4) + (6.37 × 0.6) = 8.70 per kg.
[Alternatively (and probably more reasonably), using compound increases,
annual percentage increase in X, say x, is such that (1 + x)6 = 149/100 i.e.
x = 1.491/6 − 1 = 0.0687. Similarly we might determine the annual
(compound) increase in Y , y say, as y = 1.401/6 − 1 = 1.0577 − 1 = 0.0577.
Hence in 2008, X will cost (10.43) × (1.0687)2 = 11.91 per kg and Y will cost
(5.6)(1.0577)2 = 6.26 per kg. Hence XANDY is predicted to cost
(11.91 × 0.4) + (6.26 × 0.6) = 8.52 per kg.]

2.
(a) & (b) To combine the two series we multiply the base 1985 values by the conversion
factor 100/188.2 (i.e. the one year of overlap gives a measure of the relative
values of the two indices). By convention we would pick the later of the two
bases for the combined index. Afterwards we would deflate the series by
multiplying by 100/(inflation index) for each year. Then (perhaps) form a new
index series for deflated share prices with 2002 = 100 (by multiplying by
100/24.855). We therefore get the following results:

Year Combined Inflation Deflated Deflated Index


Index Index Share Price (Base 2002 = 100)
2002 77.05 310.0 24.855 100.0
2003 84.01 315.6 26.619 107.10
2004 90.44 330.4 27.373 110.13
2005 100.0 358.0 27.933 112.38
2006 116.9 378.8 30.861 124.16
2007 146.0 383.4 38.080 153.21
2008 168.0 410.4 40.936 164.70
The highest percentage increase in deflated share prices occurred in 2007 when
it rose by (153.21 − 124.16)/124.16 = 23.4%. A chain index might show this
even more clearly.

50
2.23. Guidance on answering the Sample examination questions

3. (a) Indicator =
P        
wi (pit /pi0 ) 6.8 362.26 115.2 622.8
P
wi
= 0.2
5.3
+0.4
265.88
+0.25
109.6
+0.15
529.9
= 1.2404. 2
i.e. an index of 124.04.
So the leading economic indicator has increased in value from 1 in 1989 to
1.2404 in 1991. Business activity increased 24% from 1989 to 1991.
Least impact is caused by Exports which rose by only 17.5% with weight of
15%.
(b) i. Laspeyres’ price index =
P
p q (5)(2000) + (12)(200) + (18)(400) + (15)(100)
P it i0 ×100 = ×100 = 89.8
pi0 qi0 (6)(2000) + (10)(200) + (20)(400) + (15)(100)

i.e. prices down by 10.2%.


ii. Paasche’s price index =
P
p q (5)(1500) + (12)(200) + (18)(500) + (15)(200)
P it it ×100 = ×100 = 91.3
pi0 qit (6)(1500) + (10)(200) + (20)(500) + (15)(200)

i.e. price down by 8.7%.


iii. Appropriateness depends upon cost of acquiring latest quantity data,
degree of changing tastes/substitution, desire for accuracy etc.
P
pit qit
Value index = P × 100 = 93.2.
pi0 qi0

iv. Value down 6.8% between 1977 and 1992.

51
2. Index numbers

52
Chapter 3
Trigonometric functions and
imaginary numbers 3

3.1 Aims of the chapter

To indicate how trigonometric functions can be used to model dynamic (or static)
situations where cycles are present.

To explain how imaginary numbers can occur as the solution to certain quadratic
equations.

To establish a relationship between complex numbers, exponential and


trigonometric functions.

To provide a solid mathematical basis for some of the problems encountered when
solving difference or differential equations.

3.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

evaluate the values of trigonometric functions

sketch graphs of the three main trigonometric functions and functions of them

differentiate and integrate functions involving trigonometric functions

use series expansions of trigonometric functions and exponentials

manipulate and use imaginary numbers

use De Moivre’s theorem to interchange between complex numbers and


trigonometric functions

represent complex numbers on an Argand diagram

understand the meaning and usefulness of complex conjugates.

53
3. Trigonometric functions and imaginary numbers

3.3 Essential reading


Dowling, E.T. Schaum’s easy outline of introduction to mathematical economics.
(New York: McGraw-Hill, 2006). Chapters 20.4–20.7.

3 3.4 Further reading


Booth, D.J. Foundation mathematics. (Upper Saddle River, NJ: Prentice Hall,
1998). Modules 9, 10 and 13.

3.5 Introduction
Sines, cosines and tangents are functions which one learns at school, where they are
mainly taught as a means of solving geometric problems concerning triangles. Although
this is clearly an important application of such trigonometric functions, more important
for a manager and mathematical modeller is the use of such functions in dynamic
relationships (e.g. in describing economic cycles, competitive markets, etc.). These
applications occur because of the cyclical nature of these trigonometric functions. They
are particularly useful in solving certain difference and differential equations but before
embarking upon these important areas (Chapters 4 and 5) we must first learn (or
perhaps simply recall) the basics of trigonometric functions.
Related to this area is the topic of imaginary numbers. It seems strange√that a whole
new number system involving the concept of an imaginary number i = −1 is very
important for modelling and system investigations for economists and management
sciences. However, imaginary numbers are extremely useful in the field of mathematics
and, although it is not the intention of this course to turn you into mathematicians,
they are sufficiently important that their basic ideas and usefulness should be part of
this second/supplementary mathematics course.
Perhaps this chapter is more theoretical in nature than we would initially wish.
However, by means of suitable economic and management models we hope to
demonstrate their usefulness in due course. Furthermore, as already stated, this chapter
is a necessary prerequisite for certain aspects of Chapters 4 and 5 on difference and
differential equations.

3.6 Basic trigonometric definitions and graphs (a


reminder)
Consider the following right-angled triangle ABC in Figure 3.1.
The sine (abbreviated sin) of the angle θ i.e. sin θ is defined as a/c.
The cosine (abbreviated cos) of the angle θ i.e. cos θ is defined as b/c and the tangent
(abbreviated tan) of the angle θ i.e. tan θ is defined as a/b.

54
3.6. Basic trigonometric definitions and graphs (a reminder)

Figure 3.1: A right-angled triangle to define simple trigonometric functions.

For any angle θ, sin θ is finite and takes values between −1 and +1 (inclusive of these
limiting values). A similar statement holds for cos θ. For tan θ, however, we can have
values anywhere between −∞ and +∞.
The graphs of these trigonometric functions are given in Figures 3.2, 3.3 and 3.4,
respectively.

Figure 3.2: Plot of the sin function.

The angles can be defined in terms of degrees (◦ ) or radians (Figures 3.2 to 3.4) above
use π radians for the horizontal axes. A radian is defined as the angle subtended by an
arc of length 1 in a circle of radius 1. Thus an angle of x radians in a circle of radius r is
subtended by an arc of length rx (see Figure 3.5).
Recognising that the circumference of the circle is 2πr, where π = 3.1416
approximately, then 60◦ = π/3 radians, 90◦ = π/2 radians, 180◦ = π radians and
360◦ = 2π radians etc. Although it is possible to work with either degrees or radians
within this and many other courses involving trigonometric functions, many of the

55
3. Trigonometric functions and imaginary numbers

Figure 3.3: Plot of the cos function.

Figure 3.4: Plot of the tan function.

56
3.6. Basic trigonometric definitions and graphs (a reminder)

Figure 3.5: Radians.

application area and texts tend to use radians. This is a practice which this subject
guide will normally follow (although you are perfectly free to use degrees if you prefer).
Values for sin, cos and tan of an angle will be found on all but the most basic
calculators. However, partly (but not entirely) as only a basic calculator is allowed
in the examination, certain values are worth remembering. For example:

Angle Angle (◦ ) Sine θ Cosine θ Tangent θ


(θ radians)
0 0 0 √1 0√
π/6 30 1/2
√ 3/2
√ 1/ 3
π/4 45 1/
√ 2 1/ 2 √1
π/3 60 3/2 1/2 3
π/2 90 1 0 ∞
π 180 0 −1 0
3π/2 270 −1 0 −∞
2π 360 0 1 0

Activity 3.1 This activity should essentially be revisiting material you have
already covered in your 100 courses. State the values of each of the following
trigonometric functions without the use of a calculator (you may use surds, i.e.
square roots, where necessary):

(a) cos(5π/3), sin(−π/6), tan(7π/3), sin(11π/3), cos(3π/4), sin(7π/6), tan(7π/4)


where the angles are in radians.

(b) cos(135), tan(−45), sin(225), cos(−45), sin(300), tan(420) where the angles are
in degrees.

57
3. Trigonometric functions and imaginary numbers

Activity 3.2 Produce a sketch diagram for each of the following trigonometric
functions:

(a) 3 cos(πt/3), 2 sin(2πt/3) where the angles are in radians.

(b) 3 sin(40t), 4 cos(30t − 45) where the angles are in degrees.


3
3.7 Some rules involving trigonometric formulae
There are numerous equalities and rules that can be derived from our definitions of sin,
cos and tan. The following are some of the more straightforward and useful:

cos(π/2 − θ) = sin θ
sin2 θ + cos2 θ = 1
cos(α + β) = cos α cos β − sin α sin β
sin(α + β) = cos α sin β + sin α cos β
cos(α − β) = cos α cos β + sin α sin β
sin(α − β) = sin α cos β − cos α sin β.

3.8 Derivatives and integrals of trigonometric


expressions
Since the trigonometric functions have been introduced into this course for the sake of
modelling cyclical systems, it should not be surprising that their derivatives are equally
important since we are often required to find optimal values for certain functions. The
basic rules of differentiation apply and the following results can be derived from first
principles if necessary. Bear in mind that an integral canR be regarded as the reverse of
differentiation (for example, if d(sin x)/dx = cos x then cos x dx = sin x).

Function f (x) Derivative df (x)/dx


sin x cos x
cos x − sin x
tan x (cos x)−2 , i.e. sec2 x

Activity 3.3

(a) Determine the differential of each of the following functions.


i. sin(2x)
ii. x cos 2x
iii. 3 sin2 (4x)

(b) Determine the integral of each of the following functions.

58
3.9. Trigonometric series as expansions

i. cos(4x)
ii. tan(x/2)
iii. −3 sin 2x · cos4 2x.

3.9 Trigonometric series as expansions 3


For approximations and certain solution procedures it is often useful to expand the
trigonometric functions (particularly sin and cos) in a power series:
sin x = x − x3 /3! + x5 /5! − x7 /7! + · · ·
cos x = 1 − x2 /2! + x4 /4! − x6 /6! + · · · .
These expansions are always valid in the sense that the right-hand side always
converges to the left-hand side for every real value of x.

3.10 Other trigonometric functions: reciprocals and


inverse functions
The following definitions will prove useful when reading certain texts:
sec x = (cos x)−1
cosec x = (sin x)−1
cot x = (tan x)−1 .
If sin x = y then x = sin−1 y = arcsin(y); if cos x = y then x = cos−1 y = arccos(y) and if
tan x = y then x = tan−1 y = arctan(y).

3.11 Complex numbers


When solving quadratic functions of the form ax2 + bx + c we know that the two
solutions are
p p
−b − (b2 − 4ac) −b + (b2 − 4ac)
x= , and x=
2a 2a
2
which are real numbers so long as b − 4ac is non-negative.
√ When b2 − 4ac < 0, however,
we can still solve the quadratic if we introduce i = −1. Thus we have
p p
−b − i (4ac − b2 ) −b + i (4ac − b2 )
x= , and x=
2a 2a
as the two ‘imaginary’ solutions.
We have thus created a new number system – this is where we might have a
combination of a real number and an imaginary number. This mixed number system is
called complex numbers. The complex number system√consists of all expressions of
the form a + ib where a and b are real numbers and i = −1, as defined earlier. a is
called the real part of the complex number and ib is called the imaginary part. Complex
numbers obey all the usual laws of algebra.

59
3. Trigonometric functions and imaginary numbers

3.12 Conjugates
The (complex) conjugate of the complex number z = a + ib is defined as z̄ = a − ib. We
see that if a certain complex number is the solution to a quadratic equation then the
conjugate complex number is the other solution.

3
3.13 The Argand diagram
Any single complex number z = a + ib can be represented as a point on a
two-dimensional (2D) graph where the axes are the real and imaginary parts of the
complex number and the coordinates of z are (a , b) (see Figure 3.6). Thus, using our
knowledge of trigonometry we can write a = r cos θ and b = r sin θ and hence

z = r(cos θ + i sin θ).


p
This is called the trigonometric form of z; r = (a2 + b2 ) is called the modulus of z
and is written |z|; the angle θ where −π < θ ≤ π is called the argument of z (arg z).

Figure 3.6: The Argand diagram.

3.14 De Moivre’s theorem


This states that
(cos θ + i sin θ)n = cos nθ + i sin nθ.
The theorem also holds for negative as well as positive values of n. Furthermore
(cos θ − i sin θ)n = cos nθ − i sin nθ.

Example 3.1 Suppose we wish to find the real and imaginary parts of z n where
z = a + ib.

60
3.15. A link between exponential expansions, trigonometric functions and imaginary numbers

First we write z as
(a2 + b2 )0.5 (cos θ + i sin θ)
where θ = arctan(b/a).
Hence z n = (a2 + b2 )n/2 (cos nθ + i sin nθ) i.e. the answer has a real part of
(a2 + b2 )n/2 cos nθ and an imaginary part of (a2 + b2 )n/2 sin nθ.
3
3.15 A link between exponential expansions,
trigonometric functions and imaginary numbers
It can be shown that the exponential function exp x (or ex ) can be expanded as

x 2 x3 x4
exp x = 1 + x + + + + ···
2! 3! 4!

for all real x. In a similar fashion, if z is a complex number,

z2 z3 z4
exp z = 1 + z + + + + ···
2! 3! 4!

and when z = iu this becomes

(iu)2 (iu)3 (iu)4


eiu = 1 + iu + + + + ···
2! 3! 4!
u2 iu3 u4
= 1 + iu − − + + ···
2! 3! 4! 
u2 u4 u3 u5
 
= 1− + − ··· + i u − + − ···
2! 4! 3! 5!
= cos u + i sin u.

Hence we can write a complex number z in the form reiθ and using De Moivre’s
theorem z n = (eiθ )n = eiθn .
[You might note, as an aside, that eiπ = −1. Perhaps some of you will get the same
sense of amazement as the author always does when he sees such an equation relating
two irrational numbers e and π and the square root of minus one!]

3.16 Summary
This chapter has apparently been based on pure mathematics. However its importance
becomes more obvious when the knowledge acquired is used in practical situations. We
will return to trigonometric functions in Chapters 4 and 5.
The fairly extensive coverage of trigonometric functions in this chapter still leaves a lot
of material uncovered.

61
3. Trigonometric functions and imaginary numbers

What you do not need to know

The detailed integration problems concerning sin−1 x, cos−1 x, tan−1 x, etc.

Hyperbolic functions i.e. sinh, cosh and tanh. If they are completely meaningless to
3 you then don’t worry!

3.17 Solutions to activities


√ √
sin(−π/6) = −1/2, tan(7π/3) = 3, sin(11π/3) = − 3/2,
3.1 (a) cos(5π/3) = 1/2,√
cos(3π/4) = −1/ 2, sin(7π/6) = −1/2, tan(7π/4) = −1.
√ √ √
√ 2, tan(−45) =√−1, sin(225) = −1/ 2, cos(−45) = 1/ 2,
(b) cos(135) = −1/
sin(300) = − 3/2, tan(420) = 3.

3.2 (a)

62
3.18. A reminder of your learning outcomes

(b)

3.3 (a) i. 2 cos(2x).


ii. cos 2x − 2x sin x.
iii. 24 sin(4x) cos(4x).
(b) i. (1/4) sin(4x).
ii. −2 ln | cos(x/2)|.
iii. (3/10) cos5 (2x).

3.18 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

evaluate the values of trigonometric functions

sketch graphs of the three main trigonometric functions and functions of them

differentiate and integrate functions involving trigonometric functions

63
3. Trigonometric functions and imaginary numbers

use series expansions of trigonometric functions and exponentials

manipulate and use imaginary numbers

use De Moivre’s theorem to interchange between complex numbers and


trigonometric functions

3 represent complex numbers on an Argand diagram

understand the meaning and usefulness of complex conjugates.

3.19 Sample examination questions


(Please note that each of these sample questions is only part of a full
examination question.)

1. The rate of sales, dS/dt, of a product in a market with cyclical demand is modelled
by  
dS πt
= 500 1 + sin
dt 10
where t is measured in weeks.
Determine the total volume of sales of the new product within the first four weeks
using:
(a) direct integration
(6 marks)
(b) series expansion of the sin function up to and including terms in t5 .
(6 marks)
[Note: You may assume that π = 3.1416.]

2. Find the real and imaginary parts of


(a) (2 + 3i)/(3 + 2i)
(b) 1/i5

(c) loge 12 ( 3 + i)


(d) (4 + 3i)eiπ/3 .
(12 marks)

3. You are given the complex numbers z = 2 − 3i and w = 1 + 4i. Find the real and
imaginary parts of
(a) z − w
(b) zw
(c) z/w
(d) z 8 .
(10 marks)

64
3.20. Guidance on answering the Sample examination questions

3. (a) Expand esin x as a series up to terms in x4 and hence evaluate


Z π/3
esin x dx.
0

(7 marks)
[Note: You may assume that π = 3.1416.]
3
(b) Find the real and imaginary parts of
i. (4 − 3i)/(3 + 2i)
 
ii. loge √12 (1 − i)
and draw an Argand diagram for your answer to (a).
(7 marks)

3.20 Guidance on answering the Sample examination


questions
1. (a) We have
Z 4  
πt
S = 500 1 + sin dt
0 10
 4
10 πt
= 500 t − cos
π 10 0
  
10 4π
= 500 4 + 1 − cos
π 10
= 500[4 + 3.1831(1 − 0.3090)]
= 500[6.19947]
= 3099.73.

[Note: The answer can be left as a function of cos when only basic calculators
are permitted.]
(b) We have
 3  5
πt πt 1 πt 1 πt
sin = − + − ··· .
10 10 3! 10 5! 10
Therefore,
4
π 3 t3 π 5 t5
Z  
πt
S = 500 1+ − + − · · · dt
0 10 6000 12000000
4
πt2 π 3 t4 π 5 t6

≈ 500 t + − +
20 24000 72000000 0
≈ 500[4 + 2.5133 − 0.3307 + 0.0174]
≈ 500[6.2]
= 3100.

65
3. Trigonometric functions and imaginary numbers

2. (a) We have

2 + 3i (2 + 3i) (3 − 2i) 6 + 9i − 4i + 6 12 + 5i 12 5
= · = 2
= = + i.
3 + 2i (3 + 2i) (3 − 2i) 9 − 4i 13 13 13

(b) We have
3 1
5
1
= 22 =
1 1
= =
i
= −i.
i iii (−1)(−1)i i −1

(c) We have
1 √
   π π
loge ( 3 + i) = loge sin + i cos ,
2 3 3
or, more usefully,
π π  iπ
loge cos + i sin = loge eiπ/6 = .
6 6 6

(d) We have
√ ! √ ! √ !
π
 π 1 3 4−3 3 3+4 3
(4+3i)eiπ/3 = (4+3i) cos + i sin = (4+3i) +i = +i .
3 3 2 2 2 2

3. (a) We have
z − w = (2 + 3i) − (1 − 4i) = (1 + 7i).

(b) We have
zw = (2 + 3i)(1 − 4i) = 2 − 5i − 12i2 = 14 − 5i.

(c) we have

z 2 + 3i (2 + 3i)(1 + 4i) 2 + 11i − 12 10 11


= = = =− +i .
w 1 − 4i (1 − 4i)(1 + 4i) 1 + 16 17 17

(d) We have

z 7 = (2 + 3i)7 = 137 (cos θ + i sin θ)7
where θ = tan−1 (3/2) = [56◦ 310 ], so

z7 = 137 (cos 7θ + i sin 7θ) [= 7921.4(0.8274 + i0.5616) = 6554 + 4449i].

Note: You may omit the parts in square brackets ‘[ ]’ if trigonometric functions are
not permitted by the current calculator regulations.

4. (a) We have

x x2 x3 x4 x3 x5 x7
ex = 1 + + + + + ··· and sin x = x − + − + ···
1! 2! 3! 4! 3! 5! 7!
66
3.20. Guidance on answering the Sample examination questions

so
sin x (sin x)2 (sin x)3
esin x = 1 + + + + ···
1! 2! 3! 
x3 x5 x6 x4 x6

1 2
= 1+x− + − ··· + x + − + − ···
6 120 2 36 3 60
x5
 
1 1
+
6
3
x −
2
+ · · · + x4 + · · ·
24
3
2 4
x x
= 1+x+ − + ··· .
2 8
Hence,
π/3 π/3
x 2 x3 x5
Z 
sin x
e dx ≈ x + + −
0 2 6 40 0
= 1.04720 + 0.54831 + 0.19140 − 0.03148
= 1.7554.

(b) i. We have

4 − 3i (4 − 3i)(3 − 2i) 12 − 17i − 6 6 17


= = = − i.
3 + 2i (3 + 2i)(3 − 2i) 9+4 13 13

ii. We have  
1
loge √ (1 − i) = loge eiu

2
√ √
where cos u = 1/ 2, sin u = 1/ 2 i.e. u = 7π/4. Hence
 
1 7π
loge √ (1 − i) = iu = i.
2 4

67
3. Trigonometric functions and imaginary numbers

68
Chapter 4
Difference equations

4.1 Aims of the chapter


To introduce dynamic rather than static modelling.
4
To use examples to indicate how such mathematics are useful for modelling real-life
managerial/economic situations.
To establish the reason why you have just covered Chapter 3 in the subject guide.
To provide solution procedures for first and second order equations.
To enable you to understand the importance of discussing (describing) solution
series as well as graphing them.

4.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

solve problems involving first and second order difference equations


understand the principles of how to handle coupled first-order difference equations
interpret and explain your solutions using graphical methods and appropriate
description terminology.

4.3 Essential reading


Dowling, E.T. Schaum’s easy outline of introduction to mathematical economics.
(New York: McGraw-Hill, 2006). Chapters 19 and 20.

4.4 Further reading


Anthony, M. and N. Biggs Mathematics for economics and finance. (Cambridge:
Cambridge University Press, 2001) Sections of Chapters 3 and 7 (for aspects
relating to first order difference equations) and Chapters 23 and 24.
Jacques, I. Mathematics for economics and business. (Upper Saddle River, NJ:
Prentice Hall, 2012) Chapter 7.

69
4. Difference equations

4.5 Introduction
In Chapter 5 we will see equations involving a variable, y say, and one or more of its
differentials, dy/dx, d2 y/dx2 , etc. say (sometimes denoted Dy, D2 y, etc.). Such
equations are termed differential equations. Similarly an equation involving a function,
Y say, and one or more of its differences, ∆Y . ∆2 Y , etc. say, is called a difference
equation. ∆Y is a notation for Yk+1 − Yk , for example.
We will see that the solution procedures and terminology are very similar between
differential and difference equations.
A difference equation is said to be linear if it can be written in the form
4
f0 (k)Yk+n + f1 (k)Yk+n−1 + · · · + fn (k)Yk = g(k) for k = 0, 1, 2, . . .

i.e. no product of Yk and Yk+r occurs for any r.


The difference equation is of nth order if both f0 and fn are non-zero. Thus:

Yk+2 + Yk+1 − 2Yk = 3k is a linear difference equation with constant coefficients of


order 2.

2k Yk+3 + 3k Yk+2 − Yk = 2 is linear with non-constant coefficients and of order 3.

Yk+2 Yk + (Yk )2 = 0 is non-linear and of order 2.


One use for first order difference equations will have been encountered in your 100
courses. Although not specifically covered within the first year course, it is a topic that
is often used to solve certain financial problems and we will find the familiar (and, in
general, highly important) geometric series formula occurring within certain solutions
for first order difference equations. The intention of this chapter is to show how first
order difference equations can be applied and solved and to extend the discussion into
second order difference equations. It is hoped that you will come to appreciate the
importance of difference equations and their solution to economic and management
problems in particular. The key feature of both difference and differential equations is
their ability to model dynamic rather than static situations.

4.6 First order difference equations


First order linear difference equations are used to analyse changes, typically with
respect to time, when what happens in one period depends upon what happens in the
previous period. Thus, for example, we might have Income Y (t) in period t and Income
Y (t − 1) in period t − 1 related by the equation

Y (t) = aY (t − 1) + b

for some constants a and b.


An alternative notation for the above, and the one used within this chapter of the
subject guide, is
Yk = aYk−1 + b.

70
4.7. Behaviour of the solutions

It is easy to see (and you should convince yourself of this) that if a = 1 then the
sequence of numbers Yk is an arithmetic progression with common difference b and first
term Y0 . Therefore Yk = Y0 + bk.
If b = 0, you should also convince yourself that we have a geometric series with common
ratio a and first term Y0 . Hence Yk = ak Y0 .
If a 6= 1 and b 6= 0 then Y2 = aY1 + b = a(aY0 + b) + b = a2 Y0 + b(1 + a) and hence
Y3 = aY2 + b = a3 Y0 + b(1 + a + a2 ) etc. and, in general,
Yk = ak Y0 + b(1 + a + a2 + · · · + ak ) = ak Y0 + [b(1 − ak )/(1 − a)] = Y ∗ + (Y0 − Y ∗ )ak
where Y ∗ = b/(1 − a) and is known as the ‘constant’ or ‘time-independent’ solution.

Example 4.1 Suppose 4


1
Yk − Yk−1 = 2,
3
with Y0 = 2. The equation can (and should) be rewritten as
1
Yk = Yk−1 + 2,
3
i.e. a = 1/3 and b = 2.
The time-independent solution is therefore Y ∗ = 2/(1 − (1/3)) = 3 and the solution
series is Yk = 3 + (2 − 3)(1/3)k = 3 − (1/3)k i.e. Yk becomes asymptotic to 3 (from
below) as k goes towards infinity.

Example 4.2 Suppose


Yk+1 = −5Yk + 6
with Y0 = 0.
Here a = −5 and b = 6 and hence Y ∗ = 6/(1 + 5) = 1 and the solution series becomes
Yk = 1 − (−5)k i.e. Yk oscillates with increasing magnitude as k goes towards infinity.

4.7 Behaviour of the solutions


The important points to consider include whether the system is stable, whether it is
oscillating, divergent, convergent, etc. Such concepts are equally important when we
have the more complicated difference equations that follow in the remainder of Chapter
4. The behaviour of the general solution (or time path) for Yk depends upon the
coefficients a and b:
If a < −1 then Yk oscillates increasingly.
If a = −1 then Yk oscillates with fixed amplitude (magnitude).
If −1 < a < 0 then Yk oscillates with decreasing magnitude (‘damped oscillation’).
If a = 0 then we have the trivial solution that Yk = b for all k.
If 0 < a < 1 then Yk converges to Y ∗ .
If a = 1, b = 0 then Yk = Y0 for all k.
If a = 1, b > 0 then Yk diverges to infinity.

71
4. Difference equations

If a = 1, b < 0 then Yk diverges to minus infinity.


If a > 1 then Yk diverges to plus or minus infinity (depending upon whether Y0 is
positive or negative).

Activity 4.1 Solve the following equation and describe the solution series (note
that Y0 = 5):
1
Yk + Yk−1 − 5 = 0.
2

4 4.8 Linear second order difference equations


This is our first extension of our knowledge concerning difference equations. A difference
equation is said to be linear if it can be written in the form:

f0 (k)Yk+n + f1 (k)Yk+n−1 + · · · + fn (k)Yk = g(k), k = 0, 1, 2, . . .

(i.e. there are no products of Yk and Yk+r , etc.).


The equation is of order n if both f0 (k) and fn (k) are non-zero.
We will concentrate on linear equations with constant coefficients, i.e. when all the
fi (k)s are constant.
A typical second-order difference equation is of the form:

Yk+2 + a1 Yk+1 + a2 Yk = rk , k = 0, 1, 2, . . .

where a1 , a2 are constants and a2 6= 0.


We refer to the equation Yk+2 + a1 Yk+1 + a2 Yk = 0 as the reduced equation
corresponding to the complete equation Yk+2 + a1 Yk+1 + a2 Yk = rk .
We find separately a solution to the reduced equation and particular solution(s) to the
complete equation, i.e./e.g. for the reduced equation:
suppose Yk = mk then mk+2 + a1 mk+1 + a2 mk = 0, i.e.

m2 + a1 m + a2 = 0.

This is the auxiliary equation or characteristic equation. If its roots are real and
different, say m1 and m2 , then the complementary function for the difference
equation is
Yk = Amk1 + Bmk2 , for A, B constants.
The particular solution is any solution that we are able to find for the original
complete equation. This can be obtained by guesswork, although there is a perfectly
logical procedure for finding such particular solutions.
The general solution is then the sum of the complementary function and the
particular solution.
Strictly we have three cases to consider when trying to determine the complementary
function:

72
4.8. Linear second order difference equations

i. where m1 , m2 are real and unequal:

then Yk = Amk1 + Bmk2

ii. where m1 , m2 are real and equal (to m, say):

then Yk = (A + Bk)mk

iii. where m1 , m2 are complex numbers.


These roots are, of necessity, complex conjugates and equal to d + if and d − if , say,
where
a1
p
a21 − 4a2 4
d=− and f=
2 2
then
Yk = Amk1 + Bmk2
where A and B might now be constant complex numbers, say g + ih and g − ih,
respectively, and hence

Yk = (g + ih)(d + if )k + (g − ih)(d − if )k .

We use initial condition about Y0 , Y1 for example to solve for d, f , g and h. The
solution is often converted into a trigonometric relationship using the mathematics of
Sections 3.13 and 3.14. The general process is as follows.
Since the roots of the quadratic auxiliary equation are complex conjugates we can write
them as
√ √
m1 = a2 z1 and m2 = a2 z2 ,
where p
−(a1 /2) ± (i/2) 4a2 − a21
z1, 2 = √
a2
complex conjugates, i.e. the roots are
√ iu √
a2 e and a2 e−iu

where  
a1
u = arccos − √ .
2 a2
Thus,
k/2 k/2
yk = a2 (Aeiku + Be−iku ) = a2 (E cos ku + F sin ku),
where E = A + B and F = i(A − B) are two real constants. To simplify further, we can
rearrange this equation for Yk so that

yk = C(a2 )k/2 cos(ku − )

for constants C and .


If it is some weeks since you studied Chapter 3 of this subject guide then it might be
worthwhile for you to briefly remind yourself of complex numbers, complex conjugates,
real and imaginary parts of a complex number, etc.

73
4. Difference equations

Example 4.3
Suppose Yk+2 + Yk+1 − 6Yk = 0

then the auxiliary equation is m2 + m − 6 = 0

i.e. (m + 3)(m − 2) = 0

The complementary function is then Yk = A(−3)k + B(2)k .

4 Example 4.4
Suppose Yk+2 + 8Yk+1 + 16Yk = 0

then the auxiliary equation is m2 + 8m + 16 = 0

i.e. (m + 4)(m + 4) = 0

The complementary function is then Yk = (A + Bk)(−4)k

Example 4.5
Suppose Yk+2 + 0.5Yk+1 + 0.25Yk = 0

then the auxiliary equation is m2 + 0.5m + 0.25 = 0


Solving this quadratic we find that

i√
p
−0.5 ± 0.52 − 4(0.25)
m1, 2 = = −0.25 ± 0.75
2 2
and hence we have the complementary function:
 k/2
1
yk = A cos(ku − θ)
4

where A and θ are constants and u = arccos(−0.5) = 2π/3, i.e.


 k  
1 2kπ
yk = A cos −θ .
2 3

4.9 The non-homogeneous second order difference


equation
As suggested above, when we have a difference equation of the form
Yk+2 + a1 Yk+1 + a2 Yk = rk , k = 0, 1, 2, . . .
we need first to derive the complementary function as in Section 4.8 above and then
find a particular solution to add to it.

74
4.9. The non-homogeneous second order difference equation

The appropriate particular solution, Yk∗ say, depends upon rk . The general approach is
perhaps demonstrated by examples.
If rk = ak then we try a particular solution of the form Yk∗ = Aak for some constant A,
substitute this into the complete equation and solve for A. If, however, a is one of the
roots m1 , m2 of the auxiliary equation then we must try Yk∗ = Akak .
If a = m1 = m2 , by some strange coincidence, then we try Yk∗ = Ak 2 ak etc.
The complete list of trial values of Yk∗ to be tried for different values of rk are given in
the following table:

rk Trial Yk∗ 4
ak Aak
kn A0 + A1 k + A2 k 2 + · · · + An k n
k n ak ak (A0 + A1 k + A2 k 2 + · · · + An k n )
ak sin bk or ak cos bk ak (A1 sin bk + A2 cos bk)

If rk contains a term k n ak and a is a root of the auxiliary equation then in the trial
solution, Yk∗ , we must include the term k n+1 ak if a is a single-fold root of the auxiliary
equation, and k n+2 ak as well if a is a two-fold root. Terms k n+1 ak cos bk and
k n+1 ak sin bk must similarly be included if aeib is a solution of the auxiliary equation. If
the rk consists of more than one term, each of these terms should be treated separately.

Example 4.6 To solve Yt+2 − Yt+1 − 2Yt = 3 with Y0 = 2 and Y1 = 2, we have that
the auxiliary equation is m2 − m − 2 = 0, i.e. (m − 2)(m + 1) = 0, therefore m1 = 2
and m2 = −1 and the complementary function is

Yt = A(2)t + B(−1)t .

For the particular solution we try Yt = C, a constant. Inserting this value into the
given equation we obtain
C − C − 2C = 3,
and hence C = −3/2.
Using the given ‘initial conditions’ we have

Y0 = 2 = A + B − 3/2

and
Y1 = 2 = 2A − B − 3/2.
Solving these two equations we find that A = 7/3 and B = 7/6 The complete
solution is therefore
7 7 3
Yt = (2)t + (−1)t − .
3 6 2
t t
Note: Since (−1) oscillates forever and (2) grows ever larger the solution series for
Yt diverges to infinity in an oscillating manner. The solution series is unstable.

75
4. Difference equations

Example 4.7 To solve 4Yt+2 + 4Yt+1 + Yt = 2 + 3t with Y1 = 1, Y2 = 2, we have


that the auxiliary equation is 4m2 + 4m + 1 = 0, i.e. (2m + 1)(2m + 1) = 0, therefore
m = −1/2 (twice).
Hence the complementary function is

Yt = (A + Bt)(−1/2)t .

For a particular solution we try Yt = C + Dt, i.e.

4C + 4D(t + 2) + 4C + 4D(t + 1) + C + Dt = 2 + 3t.


4
Equating powers of t1 we get 4D + 4D + D = 3, i.e. 9D = 3, i.e. D = 1/3. Equating
powers of t0 then 4C + 8D + 4C + 4D + C = 2, i.e. 9C + 12D = 2 and hence
C = −2/9. The complete solution then becomes
 t
1 2 1
Yt = (A + Bt) − − + t.
2 9 3

Using Y1 = 1 we get  
1 2 1
(A + B) − − + = 1,
2 9 3
and using Y2 = 2 we get
 2
1 2 2
(A + 2B) − − + = 2.
2 9 3

Solving these two equations we get A = 88/9 and B = 72/9 = 8. Thus the complete
solution is    t
88 1 2 t
Yt = + 8t − − + .
9 2 9 3
This solution series oscillates with decreasing magnitude but eventually comes to
behave like t/3, i.e. it grows linearly towards infinity.

Example 4.8 To solve 2Yk+2 − Yk+1 − Yk = cos kπ we have that the auxiliary
equation is 2m2 − m − 1 = 0, i.e. (2m + 1)(m − 1) = 0 and hence the complementary
function is  k
1
Yk = A − + B.
2
For a particular solution we try Yk∗ = A1 cos kπ + A2 sin kπ. Substitution into the
given equation yields

2[A1 cos(k+2)π+A2 sin(k+2)π]−[A1 cos(k+1)π+A2 sin(k+1)π]−[A1 cos kπ+A2 sin kπ] = cos kπ.

Now sin(k + 2)π = sin kπ; cos(k + 2)π = coskπ; sin(k + 1)π = − sin kπ and
cos(k + 1)π = − cos kπ so
2[A1 cos kπ+A2 sin kπ]−[A1 (− cos kπ)+A2 (− sin kπ)]−[A1 cos kπ+A2 sin kπ] = cos kπ
and therefore 2A1 cos kπ + 2A2 sin kπ = cos kπ and hence A1 = 1/2 and A2 = 0.

76
4.10. Coupled difference equations

The required general solution is therefore


 k
1 1
Yk = A − + B + cos kπ.
2 2

We would then solve for A and B using any given ‘initial’ conditions.

Activity 4.2 Solve the following equation and describe the solution series:

Yk − 2Yk−1 − 15Yk−2 = 4k

with Y0 = 0 and Y1 = 4.
4

4.10 Coupled difference equations


We illustrate how certain types of coupled systems (simultaneous difference equations, if
you like) can be solved using second order equations, by means of the following example.

Example 4.9 Suppose that the sequence Yt and Xt are linked by the following
equations, which hold for all t > 0

Yt − Yt−1 = 6Xt−1 and Xt = Yt−1 + 2.

In additional, suppose we are informed that Y0 = 1 and X0 = 1/6. We can then show
that for t > 1,
Yt − Yt−1 = 6Xt−1 = 6(Yt−2 + 2)
i.e. Yt − Yt−1 − 6Yt−2 = 12.
We therefore now have a second order difference equation in Y which can be solved
in the usual fashion. Try it! You should get the solution that

Yt = −2 + 2(3t ) + (−2)t and Xt = Yt−1 + 2 = 2(3t−1 ) + (−2)t−1 .

4.11 Graphing and describing solutions


There is often a need to produce graphical representations of the solution series in order
to emphasise key facts like oscillation, divergence, convergence, etc. You might like to
refer to Section 5.9 where there is a comparable discussion on graphing differential
equation solutions.

4.12 Some applications of difference equations


The applications of difference equations are varied and potentially useful in most
dynamic systems. The following areas are particularly noteworthy for management

77
4. Difference equations

problems:

Industrial dynamics: An approach to model building of business and other


organisations which uses models developed in engineering describing the dynamic
behaviour of closed loop feedback control systems.

Queueing theory: There is a massive literature on this subject. Models of queuing


systems can be developed using differential-difference equations (see Section 7.9).

Learning models of consumer behaviour: Data from market surveys can be used to
develop mathematical models of purchasing behaviour. This area of application
4 overlaps with Markov Models (see Chapter 7).

Supply and demand models.

Advertising and sales models.

Financial.

Example 4.10
(Some financial applications of first order difference equations)
First order difference equations are very useful in the mathematics of finance (see,
for example, Chapter 4 in Anthony and Biggs).
Capital accrues under compound interest. Suppose a fixed annual interest rate
100r% is available to investors, and interest is compounded annually. If we invest P
then after t years we have an amount P (1 + r)t . The same result can be derived very
simply via difference equations. If we let Yt be the capital at the end of year t, we
have Y0 = P and the recurrence (difference) equation

Yt+1 = (1 + r)Yt for t = 0, 1, 2, . . . .

This is a standard first order difference equation with a = (1 + r) and b = 0. The


solution is P (1 + r)t in accordance with what we previously obtained.
The above is encouraging, in the sense that we can see that the difference equation
approach gives the appropriate solution. However, it hardly appears worth the effort.
Consider now, however, a situation where the investor withdraws an amount I at the
end of each year for N years. What is then the balance of the account after t years?
This is less obvious, but the following difference equations will provide the solution:

Yt+1 = (1 + r)Yt − I for t = 0, 1, 2, . . . .

Now we have a first order difference equation with a = (1 + r) and b = −I. Being
well practised in solving such equations we should be able to produce the solution
series as:  
I I
Yt = + P − (1 + r)t .
r r
This formula enables us to answer a number of questions. First, we might want to
know how large the withdrawals I can be given an initial investment of P , if we

78
4.13. Summary

want to be able to withdraw I annually for N years. The condition that nothing is
left after N years is YN = 0, i.e.
 
I I
+ P− (1 + r)t = 0
r r

and rearranging gives


r(1 + r)N
 
I= P.
(1 + r)N − 1
Alternatively, we might be interested in what principal P is required to provide an
annual income I for the next N years? A different rearrangement of the equation
will give the answer as:
4
 
I 1
P = 1− .
r (1 + r)N
Note: Do not attempt to remember these separate formulas but do know how to
derive them!

4.13 Summary
Difference and differential equations represent an extensive area of mathematics and
there are many types of equations and models which are beyond the scope of this
course. The subject guide tries to show the limitations of the material within the
syllabus. For difference equations you should, for example, note that you do not need to
know third or higher order equations. There is also no need to look at simultaneous
difference equations beyond those covered by Section 4.10.

4.14 Solutions to activities


4.1 Rearranging the equation we have Yk = −(1/2)Yk−1 + 5, i.e. the standard form of
first order difference equation with a = −(1/2) and b = 5.
The time-independent solution is therefore Y ∗ = 5/(1 − (−1/2)) = 10/3 and the
solution series is Yk = 10/3 + (5 − 10/3)(−1/2)k = (10/3) + (5/3)(−1/2)k .
Yk will oscillate with diminishing magnitude until it becomes asymptotic to 10/3.

4.2 The auxiliary equation is m2 − 2m − 15 = 0, i.e. (m − 5)(m + 3) = 0 and hence


m = 5 or −3. Hence the complimentary function is Yk = A(5)k + B(−3)k . For a
particular solution we try Yk = C + Dk and hence

C + DK − 2[C + D(K − 1)] − 15[C + D(K − 2)] = −16C + 32D − 16kD = 4k.

Hence D = −1/4 and C = −1/2.


Thus combining the solutions we have A(5)k + B(−3)k − 1/2 − (1/4)k
Y0 = 0 implies that A + B − 1/2 − 1 = 0, i.e. A + B = 3/2.
Y1 = 5A − 3B − 1/2 − 1/4 = 1, i.e. 5A − 3B = 7/4.

79
4. Difference equations

Solving gives A = 25/32 and B = 23/32. Hence


Yk = (25/32)(5)k + (23/32)(−3)k − 1/2 − (1/4)k .
The solution series oscillates with increasing magnitude and hence is unstable (i.e.
divergent).

4.15 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
4 you should be able to:

solve problems involving first and second order difference equations


understand the principles of how to handle coupled first-order difference equations
interpret and explain your solutions using graphical methods and appropriate
description terminology.

4.16 Sample examination questions


(Please note that each of these questions is only part of a full examination
question.)

1. You are given the following second order difference equation:


2Yt+2 + Yt = 6
where Y0 = 6 and Y1 = 7.
(a) Solve the above equation for Yt .
(7 marks)
(b) Discuss the behaviour of Yt as t increases.
(2 marks)
2. A closed economy without government activity is modelled by the following
equations:
Ct = 80 + 0.6Yt−1
It = Yt−1 − Yt−2
Yt = Ct + It
Y0 = 200 and Y1 = 220
where Ct is the consumption in period t, Yt is the national income in period t and
It is the investment in period t.
Comment upon the time path of national income and produce an illustrative sketch
graph of Yt against t.
(10 marks)

80
4.17. Guidance on answering the Sample examination questions

3. The following equations relate to Consumption Ct , Investment It , Income Yt and


Production Qt at time t:
3
Ct = Yt−1
8
1
It = 20 + (Qt−1 − Qt−2 )
8
Qt = Yt
Yt = Ct + It .

(a) Derive a second order difference equation in Y . 4


(2 marks)
(b) Given Y0 = 33 and Y1 = 32.5 determine a general solution for Yt and comment
upon the behaviour of the solution as t changes.
(8 marks)
4. The backlog of orders Bt at time t in the processing department of a company has
become too large, and the manager has decided to increase available resources in a
steady manner so as to reduce the backlog. The average number A(t) of orders
arriving per week into the processing department is 100, and the current backlog is
1,000 orders. The new policy is to clear during a week 20% of the backlog from the
beginning of the week.
(a) Create a difference equation relating Bt to Bt−1 and hence solve the equation
for Bt in terms of t.
(8 marks)
(b) Determine how many weeks it would take to reduce the backlog to 600 orders
or less.
(2 marks)
(c) If we now suppose a more realistic representation for the arrival of new orders
is given by:
A(t) = 100 + (−1)t (20)
[i.e. between t = 0 and t = 1 we get 120 new orders and between t = 1 and
t = 2 we get 80 orders, etc.] show that the backlog relations now becomes:
Bt = (4600/9)(4/5)t + 500 − (100/9)(−1)t .
(6 marks)

4.17 Guidance on answering the Sample examination


questions
1. (a) For 2Yt+ 2 + Yt = 6, we try Yt = mt for a complementary function. Then
 t  
2 1 1 πt πt
2m + 1 = 0 ⇒ m = ± √ i ⇒ Yt = √ A cos + B sin .
2 2 2 2

81
4. Difference equations

For a particular solution we try Yt = C, so 2C + C = 6 and hence C = 2.


Hence a combined solution is
 t  
1 πt πt
Yt = √ A cos + B sin + 2.
2 2 2

Now, using the initial conditions:


Y0 = 6, i.e. A + 2 = 6 and hence A = 4.
√ √
Y1 = 7 and hence B/ 2 + 2 = 7 and hence B = 5 2.
Hence our solution is
4 t 

 
1 πt πt
Yt = √ 4 cos + 5 2 sin + 2.
2 2 2

(b) Yt oscillates with decreasing magnitude and is convergent to Y = 2 as t tends


to infinity.

2. We have
Yt = Ct + It = 80 + 0.6Yt−1 + Yt−2 = 1.6Yt−1 − Yt−2 + 80
i.e. Yt − 1.6Yt−1 + Yt−2 = 80.
Suppose Yt = mt is a solution for the reduced equation, then m2 − 1.6m + 1 = 0
and we find that m = 0.8 ± i0.6.
√ This then leads to a solution of the form
−1
Yt = Ar cos(θt − ) where r = 0.82 + 0.62 = 1, θ = cos 0.8 [= 0.6435r or 36.09o ]
t

and A and  are constants to be determined by the initial conditions.


For a particular solution we try Yt = k and hence k − 1.6k + k = 80 which means
that k = 200.
Thus Yt = A cos(0.6435t − ) + 200.
Substituting in the given initial conditions we find that Y0 = 200 = A cos(−) + 200
and Y1 = 220 = A cos(0.6435 − ) + 200. Solving these equations gives  = π/2 and
A = 100/3.
Our complete solution is therefore

Yt = (100/3) cos(0.6435t − π/2) + 200

or leave in terms of θ = cos−1 0.8.


The time path oscillates with constant magnitude between 200 ± (100/3).

3. (a) We have

3 1 1 1
Yt = Ct + It = Yt−1 + 20 + (Yt−1 − Yt−2 ) ⇒ Yt − Yt−1 + Yt−2 = 20.
8 8 2 8

(b) Let Yt = mt then the auxiliary equation is



1
2 1 2 4± 16 − 32 1
m − m+ = 0 ⇒ 8m −4m+1 = 0 ⇒ m= = (1±i).
2 8 16 4
82
4.17. Guidance on answering the Sample examination questions

So,
 t/2  
1 πt πt
Yt = A cos + B sin .
8 4 4

The particular solution is of the form Yt = k and hence

1 1 5
k − k + k = 20 ⇒ k = 20 ⇒ k = 32.
2 8 8

Hence
4
 t/2  
1 πt πt
Yt = A cos + B sin + 32.
8 4 4

And now using the initial conditions: Y0 = 33 → A = 1. Y1 = 32.5 →

 1/2 
1 π π
32.5 = cos + B sin + 32 ⇒ B = 1.
8 4 4

Hence
 t/2  
1 πt πt
Yt = cos + sin + 32.
8 4 4

or
 t/2

 
1 πt π
Yt = 2 cos − + 32.
8 4 4

So Yt will oscillate and converge to 32.

4. (a) B0 = 1000 and Bt = (4/5)Bt−1 + 100.

Using the standard formula for the solution to a first order difference equation
we find that

Bt = 500(4/5)t + 500.

(b) For Bt = 600 then 500(4/5)t = 100, i.e. (4/5)t = 1/5.

Taking logarithms then t ln(0.8) = ln(0.2), i.e.


t = (ln 0.2)/(ln 0.8) = −1.6094/ − 0.2231 = 7.2, i.e. t = 8 days before backlog
goes below 600.

83
4. Difference equations

(c) Now B0 = 1000 and Bt = (4/5)Bt−1 + 100 + (−1)t−1 20. Hence


 
4
B1 = 1000 + 100 + (−1)0 20
5
    
4 1 4 4
B2 = B1 + 100 + (−1) 20 = 1000 + 100 + (−1) 20 + 100 + (−1)1 20
0
5 5 5
    
4 4 4
B3 = 1000 + 100 + (−1) 20 + 100 + (−1) 20 + 100 + (−1)2 20
0 1
5 5 5
 3 "  2 # "  2 #
4 4 4 4 4
= 1000 + 100 1 + + − 20(−1)3 1 − +
5 5 5 5 5
4 3 4 3
 3 " # " #
1 − 45

4 3 1 − −5
= 1000 + 100 − 20(−1)
5 1 − 45 1 + 45
 3 "  3 # "  3 #
4 4 100 4
= 1000 + 500 1 − − (−1)3 1 − −
5 5 9 5
   3  
4600 4 100
= + 500 − (−1)3 .
9 5 9

Similarly, or more formally using proof by induction, we can show that


   t  
4600 4 100
Bt = + 500 − (−1)t .
9 5 9

84
Chapter 5
Differential equations

5.1 Aims of the chapter


To introduce continuously dynamic rather than static modelling.

To use examples to indicate how such mathematics are useful for modelling real-life
managerial/economic situations. 5
To re-establish the reason why you have covered Chapter 3 of the subject guide.

To establish the considerable similarities (and some slight differences) between


difference and differential equation structures and solution procedures.

To provide solution procedures for first and second order equations.

To indicate methods of approach for solving simultaneous differential equations.

To enable you to understand the importance of discussing (describing) solution


series as well as graphing them.

5.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

solve problems involving first and (constant coefficient) second order differential
equations

solve relatively simple simultaneous differential equations

use such mathematics for modelling real-life managerial/economic situations

interpret the solutions and graph them.

5.3 Essential reading


Dowling, E.T. Schaum’s easy outline of introduction to mathematical economics.
(New York: McGraw-Hill, 2006). Chapters 18 and 20.

85
5. Differential equations

5.4 Further reading


Jacques, I. Mathematics for economics and business. (Upper Saddle River, NJ:
Prentice Hall, 2012) Chapter 7.2.
There are lots of textbooks which cover this topic. However, the subject guide is
particularly extensive (with several examples and a thorough coverage of all the methods
required in this topic) and hence no further reading other than Jacques is suggested.

5.5 Introduction
Differential equations might be simply regarded as the continuous equivalent of the
difference equations of Chapter 4. As such we will find that many of the solution
procedures and methods are analogous in both types of equations. As with difference
5 equations, differential equations are concerned with modelling of dynamic relationships.
You have, in a sense, come across simple differential equations whenever you have
solved integration problems, for example solving dy/dx = xn to get the solution
y = xn+1 /(n + 1) + constant is solving a differential equation. We will, of course, be
concerned with more difficult equations within this chapter.

Definitions

i. The order of the differential equation is that of the highest order derivative
present.

ii. The degree of the differential equation is that of the highest power of the highest
order of derivative present.
Thus the equation
 3 2  2 4
dy dy
3
+2 + 3xy = 0
dx dx2
is a third order, second degree differential equation.

Consider the relation f (x, y) = 0, involving n arbitrary constants A, B, etc. By


successive differentiation of f (x, y) with respect to x there are n relationships involving
x, y and the first n derivatives of y with respect to x, as well as some of the arbitrary
constants. If we eliminate the constants from these equations we derive an ordinary
differential equation of the nth order.

Example 5.1 If y 2 = 4Ax, then

dy y2
2y = 4A =
dx x
and hence
dy y
= .
dx 2x
This last equation represents a first order, first degree differential equation.

86
5.6. First order and first degree differential equations

5.6 First order and first degree differential equations


These equations may be written in the general form
dy
P +Q=0
dx
where P and Q are (possibly) functions of x and y.
There are several types of first order, first degree equations which may be dealt with in
different ways.

5.6.1 Case I: Variables separable


Case I(a)

P and Q are functions of x only. Thus 5


dy
P +Q=0
dx
becomes
dy Q
= − = F (x),
dx P
say, and hence Z
y = F (x) dx + c
for some constant c.

Case I(b)

P and Q are both constants. Thus


dy Q
= − = k,
dx P
say, becomes
y = kx + c.

Case I(c)

P and Q are functions of y only. Thus


P dy
= −1
Q dx
becomes
dy
F (y) = −1
dx
where F (y) = P/Q, and so
dx
= −F (y)
dy
and hence Z
x = − F (y) dy + c.

87
5. Differential equations

Case I(d)

P and Q are functions of x and y but the variables are separable, i.e. we have:

P f (y)
= ,
Q g(x)

say, and hence the equation becomes


dy
f (y) + g(x) = 0
dx
and hence Z Z
dy
f (y) dx + g(x) dx + c,
dx
5 i.e. Z Z
f (y) dy + g(x) dx = c.

5.6.2 Case II: Homogeneous equations


A homogeneous equation (or expression) in x and y of the nth degree is one in which
the sum of the powers of x and y in each term is n, for example:

x3 y + x2 y 2 − 3xy 3 + 6y 4

is a homogeneous expression of the fourth degree in x and y.


A first order, first degree homogeneous differential equation reduces down to:
dy Q y
=− =F (5.1)
dx P x
by dividing Q and P by xn , where n is the degree of homogeneity.
We then let v = y/x, i.e. y = vx and hence

dy dv
=v+x ,
dx dx
Therefore (5.1) becomes:

dv dv
v+x = F (v) ⇒ x = F (v) − v = φ(v),
dx dx
say, therefore
dv 1
= dx,
φ(v) x
i.e. Z
dv
= loge x + loge c.
φ(v)
We then evaluate the integral, replace v by y/x and simplify.

88
5.6. First order and first degree differential equations

Example 5.2 To solve


dy
y 2 + (xy + x2 ) = 0,
dx
we rewrite this as
dy y2 (y/x)2
=− =− . (5.2)
dx xy + x2 y/x + 1
Let v = y/x, i.e. y = vx and
dy dv
=v+x
dx dx
and (5.2) becomes
dv v2
v+x =− ,
dx v+1
i.e.
dv 2v 2 + v 5
x =− .
dx v+1
Therefore,
v+1 1
2
dv = − dx.
2v + v x
Using partial fractions:
Z   Z
1 1 1
− dv = − dx.
v 2v + 1 x

Therefore,
1
loge v − loge (2v + 1) = loge C − loge x
2
becomes
v 2 x2
loge = loge C 2 = loge C1
2v + 1
which becomes
y 2 x = C1 (2y + x).

Example 5.3 An equation reducible to a homogeneous equation


dy a1 x + b 1 y + c 1
= (5.3)
dx a2 x + b 2 y + c 2
If x = X + h and y = Y + k for a suitable choice of h and k then (5.3) can become
dY a1 X + b 1 Y
=
dX a2 X + b 2 Y
which is solvable using the method of homogeneous equations. For example, suppose:
dy x−y+2
=
dx x+y−2
then, if we let x = X and y = Y + 2:
dY X −Y 1 − Y /X
= = .
dX X +Y 1 + Y /X

89
5. Differential equations

Then, as usual, letting V = Y /X we get:

dV 1−V
V +X =
dX 1+V
becomes
dV 1 − 2V − V 2
X = ,
dX 1+V
i.e.
2(1 + V ) dV 2
2
· =− .
V + 2V − 1 dX X
Integrating then gives

loge (V 2 + 2V − 1) = loge C − loge X 2

5 which becomes
loge X 2 (V 2 + 2V − 1) = loge C
that is
Y2
 
2 Y
X + 2 − 1 = C,
X2 X
or
Y 2 + 2XY − X 2 = C,
i.e.
(y − 2)2 + 2x(y − 2) − x2 = C.

5.6.3 Case III: Linear equations


If the dependent variable y and the derivative dy/dx occur in the first order then the
differential equation is known as a linear differential equation of the first order, i.e.:
dy
+ Py = Q (5.4)
dx
where P and Q are functions of x only.
Multiplying (5.4) by v, a function of x to be determined (known as the ‘integrating
factor’), then we have
dy
v + P vy = vQ.
dx
But, thinking about integration by parts:
dy d dv
v = (vy) − y
dx dx dx
and hence in equation (5.4)
d dv
(vy) − y + P vy = vQ
dx dx
which becomes  
d dv
(vy) + y P V − = vQ (5.5)
dx dx

90
5.6. First order and first degree differential equations

and choosing v such that P v − dv/dx = 0 proves very useful, i.e.

1 dv
=P
v dx
becomes Z
loge v = P dx

dx , thus (5.5) becomes:


R
P
and hence v = e

d  R P dx 
y = Qe P dx
R
e
dx
and integrating gives
5
Z 
dx dx dx + c .
R R
− P P
y=e Qe

Example 5.4 To solve


dy x
+ 2y = − .
dx 2
The integrating factor is
2 dx
R
e = e2x .
and hence
dy 1
e2x + 2ye2x = − xe2x
dx 2
becomes
d 1
ye2x = − xe2x

dx 2
which becomes
Z
2x 1
ye = − xe2x dx
2
 Z 
1 1 2x 1 2x
= − xe − e dx + c
2 2 2
1 2x 1 2x
= e − xe + c,
8 4
and hence
1 x
y= − + ce−2x .
8 4

5.6.4 Case IV: Other cases


Other first order differential equations of the first order can be solved using a variety of
techniques – often by substitution – to create an equation of a more acceptable form.
You need not worry too much about such equations for this course.

91
5. Differential equations

5.7 Second order differential equations


Second order differential equations have similar solution procedures to second order
difference equations (see Chapter 4). Thus we again encounter auxiliary equations,
complementary functions, particular and general solutions, etc.
The second order differential equation with constant coefficients is of the general form
d2 y dy
p0 2
+ p1 + p2 y = f (x)
dx dx
and we initially solve the general equation when f (x) = 0.
p0 m2 + p1 m + p2 = 0.
If we try a solution of the form y = Aemx (where m is a constant) and substitute it into
the above equation then we obtain the auxiliary equation which has roots m1 and m2
5 say.
Just as in Section 4.6 we have three cases to consider:

i. If m1 , m2 are real and unequal, then


y = Aem1 x + Bem2 x
for constants A and B to be determined.
ii. If m1 , m2 are real and equal (to m, say), then
y = (A + Bx)emx .

iii. If m1 , m2 are complex then


y = Aeαx cos(βx − )
where p
p1 4p0 p2 − p21
α=− and β = .
2p0 2p0
Let us now look at the situation when f (x) 6= 0.
For this we need to find a particular solution (often called a particular integral (PI)
for the differential equation case) to add to the complementary function solution when
f (x) is set to zero. If you wish you can use the differential operator D methodology
described in some texts. However, it is quite sufficient to establish ad hoc rules for
deriving appropriate particular integrals for particular forms of f (x). The following
undetermined coefficients method can be used:
For the differential equation
d2 y dy
p0 2
+ p1 + p2 y = f (x)
dx dx
the particular solution can be written as A1 r1 (x) + A2 r2 (x) + A3 r3 (x) + · · · + An rn (x)
where r1 (x), r2 (x), . . . , rn (x) are the terms of f (x) plus those arising from successive
differentiation of f (x), and A1 , A2 , . . . , An are constants to be determined by
substitution in the left-hand side of the original equation, differentiating and equating
the terms on both sides. Thus, for example:

92
5.7. Second order differential equations

If f (x) = x4 + x2 then the PI is of the form A1 x4 + A2 x3 + A3 x2 + A4 x + A5 .


If f (x) = e−2x + e2x then the PI is of the form A1 e−2x + A2 e2x since f (x) and its
derivatives can only contain terms in e−2x and e2x .
If f (x) = sin ax, or cos ax, then the PI is of the form A1 sin ax + A2 cos ax since
f (x) and its derivatives can only contain terms in sin ax and cos ax.
If f (x) contains a term, eax say, which is also a term of the complementary function,
then we need to amend our PI as follows: if a is a k-fold root of the complementary
function [i.e. the auxiliary equation has a factor (m − a)k ] and eax is a term in f (x)
then the appropriate PI is of the form A1 xk ea + A2 xk−1 ea + A3 xk−2 ea + · · · + Ak+1 .
If f (x) contains a term xt eax which is also a term of the complementary function
corresponding to a k-fold root of the auxiliary equation then the appropriate PI is
of the form A1 xt+k + A2 xt+k−1 + A3 xt+k−2 + · · · + At+k+1 .
As with difference equations, when f (x) is made up of several terms (e.g.
5
x2 + sin 2x + 4e3x ) then the PI must cater for each of these terms e.g. the PI for this
f (x) (assuming no complications arising from the complementary function) will be of
the form y = A1 x2 + A2 x + A3 + A4 sin 2x + A5 cos 2x + A6 e3x .

5.7.1 Determining the solution constants (e.g. A1 , A2 )


In order to determine values for the constants within the complementary functions and
particular integrals the following points should be noted:

For first order differential equations the constants can be determined by using
initial (or ‘end’) conditions often given in the form y = 0 when x = 0, dy/dx = 1
when x = 2 etc., for example.
For second order differential equations the constants within the PI are determined
directly by substituting the PI and its first and second derivatives into the given
equation and then equating coefficients on left- and right-hand sides of the
differential equation. Then, and only then, can the constants within the
complementary function be determined by evaluating the general solution
(complementary solution + particular integral) for ‘end’ conditions.

Example 5.5 To solve


d2 y
2
− 4y = 2 − 4x2 ,
dx
2
the auxiliary equation is m − 4 = 0 and hence m = 2 or m = −2. Thus the
complementary function is A1 e2x + A2 e−2x . For the particular integral we try
y = A3 x2 + A4 x + A5 . Thus

dy d2 y
= 2A3 x + A4 and = 2A3 ,
dx dx2
and hence, inserting these functions into the given differential equation produces:

2A3 − 4A3 x2 − 4A4 x − 4A5 = 2 − 4x2 .

93
5. Differential equations

Equating coefficients of x2 : −4A3 = −4 and hence A3 = 1.


Equating coefficients of x1 : −4A4 = 0 and hence A4 = 0.
Equating coefficients of x0 : 2A3 − 4A5 = 2 and hence A5 = 0.
Thus the general solution is: y = A1 e2x + A2 e−2x + x2 .
Now we make use of any ‘initial or end’ conditions. For example, suppose y = 0
when x = 0, and dy/dx = 2 when x = 0.
Then A1 + A2 = 0 and 2A1 − 2A2 = 2. Hence A1 = 1/2 and A2 = −1/2. Our
solution is therefore y = 0.5e2x − 0.5e−2x + x2 .

5.8 Simultaneous differential equations


5
Much of this course is concerned with multivariable data and multivariable models. It is
not only in Chapters 8 and 10 (whose titles suggest multivariate situations) that we
should appreciate that many real-life situations cannot be adequately analysed with
univariate models. This course therefore includes this section on simultaneous
differential equations, involving two functions of a single independent variable, and their
derivatives. We will limit our excursion into this area by concerning ourselves with just
a pair (rather than a more extensive set) of differential equations but it ought to be
clear that some of the ideas can be extended into more than two dependent variables
and more than two simultaneous equations.
For this course, it is not necessary to establish a general theory for solving simultaneous
differential equations. It will be sufficient here to give a few examples showing the
methods that are most widely used. Sometimes it is most appropriate to create a second
order differential equation to solve as in Section 5.7; on other occasions it is better to
make simplifying assumptions and substitutions.
Note: It should be emphasised that you need not consider approaches (and equation
types) beyond the scope established by the examples within this subject guide.
Furthermore, it will often be the case that the examination questions give clues as to an
appropriate solution procedure.

Example 5.6 (Where each of the given equations involves only one of the
dependent variables, and can therefore be treated separately.)
Suppose d2 x/dt2 = a and d2 y/dt2 = b. Then

at2 bt2
x= + A1 t + A2 and y = + B1 t + B2
2 2
where a and b are given constants and A1 , A2 , B1 and B2 are constants to be
determined using initial conditions. It is, of course, possible that one or both of the
two given equations are more complicated but for which the methods of Sections 5.6
and 5.7 can be applied.

94
5.8. Simultaneous differential equations

Example 5.7 Suppose dx/dt = −ay and dy/dt = ax. Eliminating y, say, we have

d2 x dy
= −a = −a2 x
dt2 dt
and this can be solved using the methods of Section 5.7 to produce solutions of the
form x = A cos(at + ) and y = A sin(at + ) for constants A and  to be determined.

Example 5.8 Suppose we have a more complicated pair of equations as follows:


dx dy
+ a + bx = c
dt dt
dy dx
+e + f y = g.
dt dt
An approach using the differential operator, D say, is useful here. Let us also 5
consider the situation where c and g are zero and we can rewrite the equations as:

(D + b)x + aDy = 0
eDx + (D + f )y = 0.

Eliminating x or y by, for example, subtracting (D + b) times the second equation


from eD times the first gives:

(1 − ae)D2 y + (b + f )Dy + f by = 0
and (1 − ae)D2 x + (b + f )Dx + f bx = 0

i.e. two ‘straightforward’ second order differential equations which can each be
solved using the methods of Section 5.7.
If c and g are not zero, but are given constants, a particular solution is evidently
x = c/b and y = g/f . The constants of the solutions for x and y are often dependent
upon each other as a consequence of the given initial conditions.

Example 5.9 Suppose we have simultaneous equations involving second order


derivatives:
d2 x d2 y
+ a + bx + cy = e
dt2 dt2
d2 y d2 x
+ f + gy + hx = j.
dt2 dt2
Suppose, first of all, that e and j are zero and assume x = Aemt , y = Bemt . Thus we
get

(m2 + b)A + (am2 + c)B = 0


(m2 + g)B + (f m2 + h)A = 0.

Eliminating A (or B) will give the following quadratic in m2 :

(m2 +b)(m2 +g)−(am2 +c)(f m2 +h) = 0 ⇒ (1−af )m4 +(b+g−ah−cf )m2 +(bg−ch) = 0.

95
5. Differential equations

We can then solve this for two solutions of m2 , say m21 and m22 . We then get plus or
minus the square root of each of these as a possible solution for m.
Finally, if appropriate conditions are met so that we have two positive roots for m2 ,
we see that we might have the solutions of the form:

x = A1 em1 t + A2 e−m1 t + A3 em2 t + A4 e−m2 t


and y = B1 em1 t + B2 e−m1 t + B3 em2 t + B4 e−m2 t

where the constants Ai and Bi (i = 1, . . . , 4) might be dependent upon each other


and can be determined given sufficient initial conditions.
If, on the other hand, we have two negative roots for m2 (−p21 and −p22 say) we will
ultimately finish up with:

x = A1 cos p1 t + A2 sin p1 t + A3 cos p2 t + A4 sin p2 t


5 and y = B1 cos p1 t + B2 sin p1 t + B3 cos p2 t + B4 sin p2 t.

Finally, if we have one positive solution for m2 and one negative one, −p2 say, we
will ultimately finish up with a solution of the form:

x = A1 emt + A2 e−mt + A3 cos pt + A4 sin pt


and y = B1 emt + B2 e−mt + B3 cos pt + B4 sin pt.

An alternative approach to the above is to assume that y = µx which, when


substituted into the given simultaneous equations, will give us a pair of equations,
each being solvable using the methods of Section 5.7.

5.9 The behaviour of differential equation solutions

It is important to understand not only how to solve various types of differential


equations but also how to interpret their solutions. In particular, when differential (or
difference) equations are used to model managerial and economic situations then one
would wish to know whether the solution is stable, oscillating, diverging, converging,
cyclical, etc. You should familiarise yourself with these ideas and the appropriate
terminology to describe the solutions. In particular, one is often interested in the
long-run situation, i.e. what happens when t tends towards infinity? You should also
recall how to graph functions in order to produce a meaningful description of the
solution function to differential and difference equations.
It might also be desirable (and possible) to give a graphical representation of the
solutions for the simultaneous equations of Section 5.8. Thus, for example, the solutions
for x and y might be related such that they describe an ellipse or some bidirectional
oscillation over time.

96
5.10. Some applications of differential equations

5.10 Some applications of differential equations


The similarity of applications between difference and differential equations is so strong
that the reader is simply referred to Section 4.12. Simultaneous differential equations
are widely used in the physical sciences and many of these models are increasingly
finding a useful application in economic and financial modelling.

5.11 Summary
Following Chapters 4 and 5, and the numerous examples contained within them, you
should now have a good grasp of tackling a wide range of dynamic models. Once again,
however, it is necessary to note that for differential equations there is no need to go
beyond the bounds of the topics in this chapter. 5
What you do not need to know

Third or higher order equations (although the methods outlined above for second
order equations carry over very well for these more complex equations)

Partial differential equations (although their importance in finance markets is


noteworthy).

5.12 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

solve problems involving first and (constant coefficient) second order differential
equations

solve relatively simple simultaneous differential equations

use such mathematics for modelling real-life managerial/economic situations

interpret the solutions and graph them.

5.13 Sample examination questions

1. Suppose K(t) is the amount of capital at time t, k(t) is the excess of capital over
equilibrium amount Ke and I(t) is the rate of investment at time t. Furthermore,
suppose that a deficiency of capital below a certain equilibrium level Ke leads to an
increase in the rate of capital investment and a surplus of capital leads to a

97
5. Differential equations

decrease in the rate of capital investment, i.e.:

dk(t)
= I(t)
dt
k(t) = K(t) − Ke
dI(t)
= −ak(t)
dt
for some a > 0.
(a) Derive a second order differential equation in k(t) and solve it under the
assumption that I(0) = I0 and k(0) = k0 .
(10 marks)
(b) Show why your analysis above is contrary to the experience in many countries
5 that capital can grow indefinitely.
(2 marks)
(c) Suppose now that the above model is changed so that the rate of investment
consists of two parts:
i. an amount depending only on K(t), say rK(t) and
ii. an amount whose rate of change depends upon how much total capital
differs from equilibrium level, i.e. suppose

dI dK
=r − a(K − Ke )
dt dt
and furthermore that
Ke = K0 ebt .

Create a second order differential equation in K(t) and show that this model
will allow for capital to grow indefinitely.
(8 marks)

2. (Please note that this is only a part of a full examination question.)


Solve the following differential equation

d2 w dw
− 8 + 16w = 64
dt2 dt
where w = 7 and dw/dt = 11 when t = 0.
(7 marks)

3. In a highly competitive situation between two companies, the rate of elimination of


opposing company customers is proportional to the number of your own customers
ni (i = 1, 2) at a given moment of time. Each customer of company i eliminates ki
customers of the opposing company, and initial customers at time t = 0 are Ni
(i = 1, 2).

98
5.13. Sample examination questions

(a) Given that the following differential equations hold:


dn1 = −k n
)
2 2
dt for k1 , k2 > 0
dn2 = −k n
1 1
dt
derive second order differential equations in n1 and n2 and find a solution for
n1 and n2 in terms of k1 , k2 , N1 and N2 .
(12 marks)
(b) Show that company 2 will be the first to lose all their customers if
k1 N12 > k2 N22 .
(8 marks)
4. In considering profit from a machine that is subject to breakdown, the following
equations occur:
5
dx
+ λx − λy = c2 − λc1
dt
dy
− µx + µy = −c3
dt
where
x(t) is the net return to time t if the machine was running at time zero
y(t) is the net return if it was broken down at time zero
c1 is the setup cost that arises when a breakdown occurs
c2 is the gross profit per hour while the machine is running
c3 is the cost per hour of repair
λ is the rate at which breakdowns occur
1/µ is the average time of repair.
(a) Derive a second order differential equation for y and, using x(0) = y(0) = 0,
solve it to give y (and hence x) as a function of c1 , c2 , c3 , λ and µ.
(16 marks)
(b) If c1 = 10, c2 = 20, c3 = 25, λ = 0.1 and µ = 0.5, determine the average rate of
profit x/t or y/t for large t.
(4 marks)
5. (a) The output of a company depends on the capital, K, and time, t. The
production and savings functions, Q(t) and S(t), respectively, are given by
Q = αtK and S = Q − β where α and β are positive constants.
You may assume that the rate of capital accumulation is equal to savings.
i. Show that dK/dt = t(αK − β).
(3 marks)
ii. If the initial capital is K0 , solve the above equation for K in terms of t, α,
β and K0 .
(4 marks)

99
5. Differential equations

iii. Draw sketch graphs for (and comment upon the behaviour of) Q(t) for the
three cases where K0 is equal to, or greater than or less than β/α.
(4 marks)
(b) Solve the following second order differential equation

d2 z dz
2
− 4 + 4z = 8(t2 + sin 2t)
dt dt
where z = 6 and dz/dt = 2 when t = 0.
(9 marks)

6. (Please note that this is only a part of a full examination question.)


The price elasticity of demand for a particular product is given by

5 −
p dq 4p2
= 2 .
q dp p +1
If q = 4 when p = 1 determine the demand function qD as a function of p.
(10 marks)

5.14 Guidance on answering the Sample examination


questions
1. (a) We have
d2 k(t) dI(t)
= = −ak(t).
dt2 dt
mt

If k(t)
√ =e then the auxiliary equation is m2 + a = 0 and hence m = i a or
−i a.
Hence the complementary function and general solution is
√ √
k(t) = c1 ei at + c2 ei at
√ √  √ √ 
= c1 cos at + i sin at + c2 cos at − i sin at
√ √
= (c1 + c2 ) cos at + i(c1 − c2 ) sin at.

At t = 0, k0 = c1 + c2 and

dk(t) √ √ √ √
= I(t) = −(c1 + c2 ) a sin at + i(c1 − c2 ) a cos at
dt

so at t = 0, I0 = i(c1 − c2 ) a.
Hence
√ I0 √ √
k(t) = k0 cos at + √ sin at = A sin( at + )
a
where √ 
I2

−1 k0 a
A = 0 + k02
2
and  = tan
a I0

100
5.14. Guidance on answering the Sample examination questions


i.e. k oscillates with period equal to 2π/ a and amplitude
 2 1/2
I0 2
+ k0 .
a

(b) Hence Ke − A ≤ K(t) ≤ Ke + A and K(t) cannot grow indefinitely. This is


contrary to the indefinite growth in most countries.
(c) We have

d2 k(t) dI dK d2 k dK
2
= =r − a(K − Ke ) → 2 − r + aK = aKe
dt dt dt dt dt
and
d2 k d2 K
2
= 2
− b2 K0 ebt .
dt dt
Hence we have the following second order differential equation in K: 5
d2 K dK
2
−r + aK = (a + b2 )K0 ebt .
dt dt
The auxiliary equation is m2 − rm + a = 0 and hence

r ± r2 − 4a
m=
2
giving α and β, say. Then

K(t) = Beαt + Ceβt .

For a particular solution we try K = Debt and substituting this into the
equation we find that
(a + b2 )K0
D= 2 .
b − rb + a
Hence
K(t) = Beαt + Ceβt + Debt
and, if b > 0 and for suitable constants, K(t) can grow exponentially.
2. For
d2 w dw
2
−8 + 16w = 64
dt dt
we try w = Aeλt and hence the auxiliary equation is:

λ2 − 8λ + 16 = 0

i.e. (λ − 4)2 = 0 and hence λ = 4 (twice).


Hence the complementary function is

w = Ae4t + Bte4t .

For a particular solution we try w = k which means that 16k = 64 and hence
k = 4. Thus
w = Ae4t + Bte4t + 4.

101
5. Differential equations

Then using the initial conditions:


w = 7 when t = 0 means that 7 = A + 4 and hence A = 3.
dw/dt = 11 when t = 0 means that 11 = 4A + B and hence B = −1.
The complete solution is therefore

w = 3e4t − te4t + 4.

3. (a) If we differentiate the given equations we find that

d2 n1 dn2
2
= −k2 = k1 k2 n1
dt dt
d2 n2 dn1
= −k 1 = k1 k2 n2 .
dt2 dt
5 Solving these in the usual fashion (Section 5.7) will give the solutions:
√ √ √ √
n1 = A 1 e k1 k2 t
+ B1 e− k1 k2 t
and n2 = A2 e k1 k2 t
+ B2 e− k1 k2 t
.

Now n1 = N1 and n2 = N2 when t = 0 and hence N1 = A1 + B1 and


N2 = A2 + B2 .
We also know that dn1 /dt = −k2 n2 which leads to:
p √ p √ √ √
A1 k1 k2 e k1 k2 t − B1 k1 k2 e− k1 k2 t = −k2 A2 e k1 k2 t − k2 B2 e− k1 k2 t

and hence equating coefficients gives


p p
A1 k1 k2 = −k2 A2 and B1 k1 k2 = k2 A2 .

Using the equations for N1 and N2 together with the relationships between the
constants A1 , A2 , B1 and B2 established above, we can establish that:
r !
1 k2
A1 = N1 − N2
2 k1
r !
1 k1
A2 = N2 − N1
2 k2
r !
1 k2
B1 = N1 + N2
2 k1
r !
1 k1
and B2 = N2 − N1
2 k2

which we can insert as constants in our solution for n1 and n2 .


(b) We note that ni = 0 when
√ √
Ai e k1 k2 t
= −Bi e− k1 k2 t

i.e. when
Bi
ek1 k2 t = − .
Ai

102
5.14. Guidance on answering the Sample examination questions

So the smaller the value of −Bi /Ai the less time to die out. Company 2 will be
the first to lose its customers if −B2 /A2 < −B1 /A1 . Inserting the values for
A1 , B1 , A2 and B2 and rearranging will give the required condition that
N12 k1 > N22 k2 .

4. (a) Rewriting the given differential equations using the differential operator D, i.e.
d/dt, then
(D + λ)x − λy = α (1)
and (D + µ)y − µx = β (2)
where α = c2 − λc1 and β = −c3 .
Adding µ times (1) to (2) times (D + λ) gives
(D2 + (µ + λ)D)y = αµ + λβ,
5
i.e. a straightforward second order differential equation to solve as follows:
The auxiliary equation is m2 + (µ + λ)m = 0 and hence m = 0 or −(µ + λ).
Hence the complementary function is y = A + Be−(µ+λ)t .
For the particular solution we try y = Ct. Deriving dy/dt and d2 y/dt2 ,
inserting them into the differential equation will give:
αµ + λβ
C= .
µ+λ
Thus the general solution is
 
−(µ+λ)t αµ + λβ
y = A + Be + .
µ+λ
Now, from (2), we know that
(D + µ)y − β
x =
µ
1 dy β
= +y−
µ dt µ
   
1 −(µ+λ)t αµ + λβ αµ + λβ β
= −(µ + λ)Be + + t + A + Be−(µ+λ)t −
µ µ+λ µ+λ µ
 
α − β λ −(µ+λ)t αµ + λβ
= A+ − Be + t.
µ+λ µ µ+λ
Setting y = 0 and x = 0 when t = 0 allows us to solve for A and B to give:
(β − α)µ (α − β)µ
A= and B = .
(µ + λ)2 (µ + λ)2

(b) In the long run, i.e. as t tends to infinity, then


x y αµ + λβ
= =
t t µ+λ
and hence, inserting the given values for c1 , c2 , c3 , λ and µ give α = 19,
β = −25, A = −61.11, B = 61.11 and x/t = y/t = 11.67.

103
5. Differential equations

5. (a) i. ‘The rate of capital accumulation is equal to savings’ means that


dK/dt = S, i.e. dK/dt = Q − βt = αtK − βt, as required.
ii. Rearranging the answer to (i.) we have
dK
= tdt.
αK − β
Integrating both sides we find that
1 t2 αt2 2 /2
ln(αK−β) = +c ⇒ ln(αK−β) = +αc ⇒ αK = Ceαt +β
α 2 2
where c and C are constants.
Using the initial condition that K = K0 when t = 0 then αK0 = C + β,
i.e. C = αK0 − β. Hence
 
β 2 β
K = K0 − eαt /2 + .
5 α α
iii. If K0 = β/α then K = β/α andQ = αtK = βt. Hence we have a linear
relationship between Q and t:

2
If K0 = β/α then K = K 0 eαt /2 + β/α, for K 0 a positive constant, and
2
Q = βt + αK 0 teαt /2 . Hence we have exponential growth on top of a linear
relationship between Q and t:

104
5.14. Guidance on answering the Sample examination questions

2 /2
If K0 = β/α then K = K 00 eαt + β/α, for K 00 a negative constant, and
2
Q = βt + αK 00 teαt /2 .
Hence Q goes to zero at an increasing rate:

(b) Auxiliary equation is m2 − 4m + 4 = 0, i.e. (m − 2)2 = 0 and hence m = 2


(twice). Thus the complementary function is

z = Ae2t + Bte2t .

For a particular solution we try z = Ct2 + Dt + E + F sin 2t + G cos 2t.


This means that dz/dt = 2Ct + D + 2F cos 2t − 2G sin 2t and
d2 z/dt2 = 2C − 4F sin 2t − 4G cos 2t.
Inserting these into the given equation:

2C − 4F sin 2t − 4G cos 2t − 4(2Ct + D + 2F cos 2t − 2G sin 2t)


+4(Ct2 + Dt + E + F sin 2t + G cos 2t)
= 8t2 + 8 sin 2t.

Equating coefficients:

t2 : 4C = 8, i.e. C = 2
t1 : −8C + 4D = 0, i.e. D = 4
t0 : 2C − 4D + 4E = 0, i.e. E = 3
sin 2t : −4F + 8G + 4F = 8, i.e. G = 1
cos 2t : −4G − 8F + 4G = 0, i.e. F = 0.

Hence the general solution is

z = 2t2 + 4t + 3 + cos 2t + Ae2t + Bte2t .

Now, when t = 0, z = 6 and dz/dt = 2.


Hence 6 = 3 + 1 + A, i.e. A = 2 and 2 = 4 + 2A + B, i.e. B = −6. Therefore

z = 2t2 + 4t + 3 + cos 2t + 2e2t − 6te2t .

105
5. Differential equations

6. We have
p dq 4p2 dq 4p
− = 2 → =− 2 dp
q dp p +1 q p +1
and then integrating both sides we find that
k
ln q = −2 ln(p2 + 1) + c = ln
(p2 + 1)2

for some constant k. So q = k/(p2 + 1)2 and using the initial condition we have
4 = k/(1 + 1)2 and hence k = 16 and q = 16/(p2 + 1)2 .

106
Chapter 6
Further applications of matrices

6.1 Aims of the chapter


To recall the use and laws of matrix algebra as covered in earlier 100 courses.

To extend the matrix theory covered in earlier 100 courses to cover determinants,
eigen values and eigen vectors (these additional tools are required to understand
aspects of Chapter 8).

To demonstrate how the matrix method for solving simultaneous equations is


particularly useful for input-output models.
6
To show how matrices can be used to represent networks and to calculate
connectivities, etc.

To introduce stochastic models through transition probability matrices.

6.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

solve input-output models using matrix inversion methods

draw networks from given matrices and construct matrices to represent given
networks

construct connectivity matrices and use them effectively

construct and use transition matrices, e.g. to determine long-run equilibrium state
probabilities

determine the determinant, eigen values and eigen vectors of given matrices.

6.3 Essential reading


Dowling, E.T. Schaum’s easy outline of introduction to mathematical economics.
(New York: McGraw-Hill, 2006). Chapters 12.7, 12.8, 14 and 15 (the latter two
mainly for background knowledge).

107
6. Further applications of matrices

6.4 Further reading


Haeussler, E.F. Jr., R.S. Paul and R.J. Wood Introductory mathematical analysis
for business, economics and the life and social sciences. (Upper Saddle River, NJ:
Prentice Hall, 2010) Chapter 6 (as background).
Jacques, I. Mathematics for economics and business. (Upper Saddle River, NJ:
Prentice Hall, 2012) Chapter 7.
Johnson, R.A., and D.W. Wichern Applied multivariate statistical analysis. (Upper
Saddle River, NJ: Pearson Prentice Hall, 2007) Chapter 2.

6.5 Introduction and review


This chapter of the subject guide is intended to extend your existing knowledge about
matrices – particularly towards further application areas where matrices have proved
particularly useful. You will already have come across vectors and matrices in earlier
100 courses. To remind you of the main points, you should be familiar with the basic
6 concepts of vectors and matrices. In particular, you should be able to manipulate them
by means of addition, subtraction, scalar multiplication and dot products. You should
also be able to find the inverse of a matrix and use it in solving a set of linear equations.
(You can use whichever matrix inversion method you are happiest with.) What follows
in the remainder of these notes is a further explanation of the use of matrices in specific
areas, namely, input-output economics, networks, transition probabilities and Markov
chains (these topics are covered in more detail in Chapter 7) and in linear programming.

6.6 Input-output economics


Many economic models are based upon the relationships between outputs of production
and the necessary inputs required for this production. Although we will only be dealing
with relatively simple models, you should appreciate that the basic ideas outlined here
are used for more complex and realistic economic analysis.
If xi (i = 1, 2, . . . , n) is the total production of good i, aij (i, j = 1, 2, . . . , n) is the
quantity of the good (commodity) i that is used to produce a unit of good (commodity)
j, and bi (i = 1, 2, . . . , n) is the final demand for good i then
n
X
xi = aij xj + bi , i = 1, 2, . . . , n
j=1

i.e. x = A · x + b or (I − A)x = b where I is the n × n identity matrix, A is the


input-output matrix (or technology matrix), x is a column vector of productions and b
is a column vector of demands.
Thus
x = (I − A)−1 b.
The above relationship can be represented as a type of flow diagram (also see Figure
6.3):

108
6.6. Input-output economics

6
Figure 6.1: A three-product input-output diagram.

Example 6.1 The technology matrix for a three-industry input-output model is:
 
0.5 0 0.2
A =  0.2 0.8 0.12  .
1.0 0.4 0

If the non-industry demand for the output of these industries is d1 = 5, d2 = 3 and


d3 = 4, determine the equilibrium output levels for the three industries.
If x is the output vector then x = Ax + b, i.e. x = (I − A)−1 b. Now
 
0.5 0 −0.2
(I − A) =  −0.2 0.2 −0.12 
−1.0 −0.4 1

and inverting this, using any method, we get


 
7.6 4 2
(I − A)−1 =  16 15 5 
14 10 5

and     
7.6 4 2 5 58
(I − A)−1 b =  16 15 5   3  =  145 
14 10 5 4 120
and therefore the necessary production amounts for the three commodities are 58,
145 and 120 units, respectively.

109
6. Further applications of matrices

6.7 Networks
An increasingly important area of management mathematics/operational research is the
use of graph theory and network theory in modelling realistic situations. Central to
these ideas is the use of matrices to represent interrelationships. Having depicted the
situation with matrices, a computer can then be used to solve specific problems, e.g.
transportation problems, transshipment problems, allocation problems, maximum flow
between two points, optimal company configuration for maximal communication,
minimum cost of depot locations, critical path (minimum completion time) for a
project, etc. We are clearly not going to be heavily involved in these areas within this
course. However, this chapter is intended to highlight the usefulness of matrices in
extracting the mathematical relationships from a given diagram or given situation. You
should appreciate how the resulting matrices can be manipulated.

Example 6.2
(Representation of a network as a matrix)
Suppose we have Figure 6.2 where nodes 1 to 5 are connected by arcs along which
6 something (money, water, electricity, goods, chemicals, information, etc.) flows. The
number alongside each directed arc might represent the flow, the arc capacity, the
cost per unit flow, etc.
We can then represent this diagram by the following matrix, where the (i, j)th
element corresponds to the ‘value’ on the arc connecting node i to j:
 
0 0 2 0 0
 6 0 0 0 0 
 
 3 4 0 5 0 .
 
 0 0 0 0 4 
3 0 1 2 0

Figure 6.2: Diagram for Example 6.2.

110
6.7. Networks

Figure 6.3: Diagram for Example 6.3.


6
Example 6.3
(Where conservation flow at a node creates a matrix equation)
Figure 6.3 depicts the complex intersection of five one-way streets (direction
indicated by the arrows) which have been analysed due to traffic congestion. The
values alongside each arrow indicate the observed flow of vehicles per hour flowing
along each street where a survey has been taken.
(a) Represent the relationship between unknown values xi , i = 1, 2, . . . ,5 using
matrix algebra.
(b) If you are informed that x5 = 400 determine maximum and minimum possible
values for each xi .
(c) If you are further informed that x2 + x4 = 1400 use matrix algebra to determine
the complete flow pattern for xi , i = 1, 2, . . . , 4.

(a) Conserving the traffic flow at each intersection we have:


1200 = x1 + x4 + x5
1200 = x1 + x2
800 = x2 + x3 − x 5
800 = x3 + x4 .
Hence, in matrix form:
 
  x1  
1 0 0 1 1 1200
 1
 x2 
1 0 0 0     1200 
  x3 =
  800  .

 0 1 1 0 −1  
 x4 
0 0 1 1 0 800
x5

111
6. Further applications of matrices

(b) So if x5 = 400,     
1 0 0 1 x1 800
 1 1 0 0   x2   1200 
=
  1200  .
  
 0 1 1 0   x3
0 0 1 1 x4 800
Solving using matrix methods
 
1 0 0 1 1 0 0 0
 0 1
 0 −1 −1 1 0 0 

 0 0 1 1 1 −1 1 0 
0 0 1 1 0 0 0 1

gives  
1 0 0 1 1 0 0 0
 0
 1 0 −1 −1 1 0 0 
.
 0 0 1 1 1 −1 1 0 
0 0 0 0 −1 1 −1 1
At this point the last row of the matrix equation tells us we have not
6 got a unique solution. It tells us, however, that:

x1 + x4 = 800 (1)
x2 − x4 = 400 (2)
x3 + x4 = 800 (3)

(1) implies that x1 ≤ 800 and x4 ≤ 800.


(2) implies that x2 ≥ 400.
(3) implies that x3 ≤ 800 and x4 ≤ 800.
If x4 = 800 (its maximum) then x1 = 0 (its minimum), x2 = 1200 (its
maximum) and x3 = 0 (its minimum).
Conversely, if x4 = 0 (its minimum) then x1 = 800 (its maximum), x2 = 400 (its
minimum) and x3 = 800 (its maximum).
Hence

0≤ x1 ≤ 800
400 ≤ x2 ≤ 1200
0≤ x3 ≤ 800
0≤ x4 ≤ 800.

(c) If x2 + x4 = 1400, then continuing with matrix methods with this new
additional last line to find an inverse and a solution vector:
 
1 0 0 1 1 0 0 0
 0 1 0 −1 −1 1 0 0 
 
 0 0 1 1
 1 −1 1 0 

 0 0 1 1 1 −1 1 0 
0 1 0 1 0 0 0 1

112
6.8. Transition probabilities and Markov chains

becomes  
1 0 0 1 1 0 0 0

 0 1 0 −1 −1 1 0 0 


 0 0 1 1 1 −1 1 0 

 0 0 1 1 1 −1 1 0 
0 0 0 2 1 −1 0 1
which becomes  
1 0 0 0 1/2 1/2 0 −1/2
 0
 1 0 0 −1/2 1/2 0 1/2 
 0 0 1 0 1/2 −1/2 1 −1/2 
0 0 0 1 1/2 −1/2 0 1/2
and so    
1/2 1/2 0 −1/2 800 300
 −1/2 1/2 0 1/2   1200   900 
x= 
 1/2 −1/2 1 −1/2   1200
=
  300  .

1/2 −1/2 0 1/2 1400 500

6.8 Transition probabilities and Markov chains 6


(See also Chapter 7.) In these applications matrices are used to depict the changes that
might occur between one stage and the next in a dynamic situation. Thus, for example,
the (i, j)th element of the matrix might represent the probability of going from
‘position’ i to ‘position’ j in each single step of the model. Such models are highly
useful for analysing evolving market shares.

Example 6.4 Suppose P represents the probabilities of changing between three


states A, B and C, and P is given by the matrix:
 
0.6 0.3 0.1
P =  0.2 0.7 0.1  .
0.1 0.3 0.6

Suppose we start at stage 0 in state A, i.e. p(0) = 1 0 0 , then p(1) = p(0) P ;
p(2) = p(1) P etc. where p(i) is the vector of probabilities of states after the ith step.

For example, using the above P and p(0) then p(1) = 0.6 0.3 0.1 and
 
0.6 0.3 0.1
p(2) = 0.6 0.3 0.1  0.2 0.7 0.1  = 0.43 0.42 0.15
 
etc.
0.1 0.3 0.6

In order to find the equilibrium probabilities (the limiting case of p(t) as t gets larger
and larger) we can proceed as follows. Suppose the equilibrium probabilities are
p1 p2 p3 then
 
  0.6 0.3 0.1
p1 p2 p3 = p1 p2 p3  0.2 0.7 0.1 
0.1 0.3 0.6

113
6. Further applications of matrices

and, in addition we know that p1 + p2 + p3 = 1.


The matrix equation above represents three equations in three unknowns although
only (any) two of these equations are independent. Including the equation
p1 + p2 + p3 = 1 gives us three equations in three unknowns which you should be
capable of solving – if indeed a solution exists. Using whatever method you wish we
find that the solution vector is 0.3 0.5 0.2 .

6.9 A note on determinants, eigen values and eigen


vectors

In certain texts, for example Dowling, you will come across the concepts of eigen values,
characteristic roots, eigen vectors, etc. These can prove very useful in certain aspects of
matrix manipulation and analysis of matrix-based models. Although these concepts will
not be examined as a separate topic of the course, they will be useful in Chapter 8 when
6 we discuss factor analysis and discriminant analysis. Hence a summary of the main
aspects of eigen values and their associated eigen vectors is given here. Unfortunately
this does mean that, although throughout MT2076 Management mathematics it
has been our aim to avoid the necessity for determinants wherever possible, it is
appropriate to mention them here. Some uses of eigen values and eigen vectors will
become apparent in Chapter 8. Further details of the mechanics of calculating
determinants, eigen values and eigen vectors can be found in Johnson and Wichern,
Chapter 2.

6.9.1 Determinants

The determinant of a square k × k (i.e. k rows and k columns)P matrix A = {aij } is


denoted by |A| and is calculated as |A| = a11 if k = 1 or |A| = aij |Aij |(−1)i+j if k > 1
where Aij is the (k − 1) × (k − 1) matrix obtained by deleting the ith row and the jth
column of |A|. One often uses the first row and first column, i.e. i = 1, j = 1 for the
calculation.
Note that one can only determine determinants of square matrices.

 
4 3
Example 6.5 Suppose A is the 2 × 2 matrix given as A = , then then
1 2
|A| = 4|2|(−1)2 + 3|1|(−1)3 = 4(2) − 3(1) = 5.

Example 6.6 Suppose A is now a 3 × 3 matrix given as


 
3 2 −1
A= 0 3 2 .
7 1 0

114
6.9. A note on determinants, eigen values and eigen vectors

Then

3 2
(−1) + 2 0 2 (−1) + (−1) 0 3
2
3

|A| = 3 (−1)4
1 0 7 0 7 1
= 3(−2) − 2(−14) − 1(−21)
= −6 + 28 + 21
= 43.

6.9.2 Eigen values


Suppose A is a k × k square matrix and I is the k × k identity matrix. The scalars
λ1 , λ2 , . . . , λk which satisfy the polynomial |A − λI| = 0 are called the eigen values (or
characteristic roots) of the matrix A. The equation |A − λI| = 0 (as a function of the
λs) is called the characteristic equation.
Eigen values have many uses and properties. One of them is noteworthy for us:
A symmetric matrix is one in which the matrix and its transpose are identical i.e.
aij = aji for all i, j. It is worth noting that the determinant of a square symmetric 6
matrix A can be written as the product of its eigen values.

 
9 1
Example 6.7 Suppose A = , then the eigen values of A are the λs that
1 9
satisfy
9−λ 1
= 0,
1 9−λ
i.e. (9 − λ)(9 − λ) − 1 = 0, i.e.
81 − 18λ + λ2 − 1 = 80 − 18λ + λ2 = (8 − λ)(10 − λ) = 0. Hence the eigen values are
λ1 = 8 and λ2 = 10.

6.9.3 Eigen vectors


If A is a square k × k matrix and λ is an eigen value of A then a non-zero column vector
x satisfying Ax = λx is called an eigen vector (characteristic vector) of A associated
with the eigen value λ.

Example 6.8 Continuing with Example 6.7, the eigen vector corresponding to
λ1 = 8 is  
x1
x=
x2
such that     
9 1 x1 x1
=8
1 9 x2 x2
i.e. 9x1 + x2 = 8x1 and x1 + 9x2 = 8x2 .
 
−1
So setting x2 = 1 (arbitrarily) gives x1 = −1 and x = or, in its unit length
1

115
6. Further applications of matrices

(normalised) form  √ 
−1/√ 2
.
1/ 2
The normalised eigen vector corresponding to the eigen value λ2 = 10 is
 √ 
1/√2
.
1/ 2

A further noteworthy fact concerning eigen values and eigen vectors is the spectral
decomposition:
If A is a symmetric k × k matrix, then A can be decomposed using its eigen values
λ1 , λ2 , . . . , λk and corresponding eigen vectors x1 , x2 , . . . , xk as follows:
k
X
A= λi xi x0i .
i=1

We will find (in Chapter 8) how this is useful when A is a covariance matrix.
6
6.10 Matrices in linear programming
Many relationships in economic and management models take the form of linear
equations or, more usually, linearly constrained functions. Before the advent of personal
computers, etc. reasonably difficult linear programs were solved by hand using what is
known as the simplex algorithm. This involves creating equalities from possibly given
inequalities by adding in slack variables. The resulting set of equations is then solved
using the algorithm which really amounts to matrix manipulation. You should
appreciate the usefulness of matrices in the area and, furthermore, recognise that
computer-based software often uses matrix arrays, etc. in a similar fashion.

6.11 Summary
What you need to know

How to construct input-output models and how to solve them using matrix
methods.
The relationship between network and matrices (in both directions); how to
construct matrix descriptions of various problems.
How to construct and solve transition probability problems; how to evaluate the
equilibrium probabilities.
A basic understanding of how matrices can be used to solve linear programs; the
power and limitations of such models.
The assumptions that are required for each of the above mathematical models.

116
6.12. A reminder of your learning outcomes

What you do not need to know

How to use the specifically devised algorithms of transportation, assignment,


transhipment, critical path, etc.

How to solve linear programs (which can be done using matrix methods or
graphically).

6.12 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

solve input-output models using matrix inversion methods

draw networks from given matrices and construct matrices to represent given
networks 6
construct connectivity matrices and use them effectively

construct and use transition matrices, e.g. to determine long-run equilibrium state
probabilities

determine the determinant, eigen values and eigen vectors of given matrices.

6.13 Sample examination questions


(Note how some questions overlap with the topics in Section 7.9.)

1. An important stage of a production line consists of two machines working in


parallel. The production continues satisfactorily if one or both of these machines is
working. A machine breaks down in a given period with probability q. Assume that
machines only break down at the end of a period. When this occurs the parallel
machine takes over, if available, beginning at the next period. A broken machine
takes two periods to repair. Let Xt be a vector (U V ) where U represents the
number of machines operating at the end of the period t and V takes the value 1 if
a repair currently being undertaken requires 1 additional period to be completed,
and 0 otherwise. The state space consists of the vector X taking values (2 0), (1 0),
(0 1) and (1 1). Denote these four states as 0, 1, 2 and 3, respectively.
(a) Construct the transition matrix for the Markov chain {Xt }.
(7 marks)
(b) What are the equilibrium probabilities?
(9 marks)

117
6. Further applications of matrices

(c) If it costs $10,000 per period when the production is inoperative (both
machines down) and zero otherwise, what is the expected average cost per
period when (i.) q = 0.1, and (ii.) q = 0.5?
(4 marks)

2. (a) Given the following matrix of technical coefficients for products X, Y and Z:

X Y Z
 
X 0.1 0.1 0.2
A= Y  0.1 0.1 0.1 
Z 0.1 0.3 0.1

Determine the changes in total output for the three products when the final
demand for X rises by 2,000 and the final demand for Z falls by 1,600 units
simultaneously.
(12 marks)
(b) The following matrix shows the number of different ‘one-stage’ (i.e. visiting no
cities en route) journeys between cities A, B, C and D.
6
A B C D
 
A 1 1 0 0
B 0
 0 1 0 
A= .
C 1 0 0 1 
D 0 0 1 0

i. Draw a network to illustrate this information.


(3 marks)
ii. Construct matrices to show the number of ‘two-stage’ and ‘three-stage’
journeys (i.e. passing respectively through one and two cities en route)
between cities A, B, C and D.
(5 marks)

3. (Please note that this is only part of a full examination question.)


Matrices M1 , M2 and M3 are transition matrices for Markov chains. For each
matrix the four states are A, B, C and D, respectively. Unfortunately the
information for M3 is partially missing and some probabilities are currently shown
as α, β or γ.
     
0.5 0.4 0.1 0 0.3 0 0.2 0.5 0.1 0.7 0.1 γ
 1 0 0 0   0 1 0 0   0 β 0 0 
M1 =  0 0.3 0 0.7  , M2 =  0.5 0 0.1 0.4  M3 =  0.4 0.2 0.2
    .
α 
0 0.4 0.6 0 0.2 0 0.5 0.3 0 0 0 1

(a) Determine the values for α, β and γ.


(2 marks)

118
6.14. Guidance on answering the Sample examination questions

(b) Identify all absorbing states, and determine which (if any) of the Markov
chains are absorbing. [Note: A Markov chain is said to be absorbing if it has at
least one absorbing state and if it is possible to go from every state to an
absorbing state (not necessarily in one step)].
(6 marks)
(c) If, for M1 you are initially in the state D, use matrix methods to determine
how many ways there are of going to states A, B and C in three or less
transitions.
(6 marks)
(d) If you are initially in state D for transition matrix M2 , use matrix
multiplication to determine the probability of being in state D after two
transitions.
(4 marks)

6.14 Guidance on answering the Sample examination


questions 6
1. (a) Transition matrix P is given by:

0 1 2 3
 
0 1−q q 0 0
1  0 0 q 1−q 
A= 
 0
.
2 0 0 1 
3 1−q q 0 0

(b) Suppose equilibrium probabilities are π0 , π1 , π2 , π3 .


Then (π0 π1 π2 π3 ). P = (π0 π1 π2 π3 ) i.e.

π0 (1 − q) + π3 (1 − q) = π0 (1)
π0 q + π3 q = π1 (2)
π1 q = π2 (3)
π1 (1 − q) + π2 = π3 (4)

substituting from (3) into (4) shows that π1 = π3 and then using this fact with
(2) shows that: π3 = (q/(1 − q))π0 .
We also know that π0 + π1 + π2 + π3 = 1 and hence substituting for everything
in terms of π0 and solving gives π0 = (1 − q)/(1 + q + q 2 ) and hence
π1 = q/(1 + q + q 2 ), π2 = q 2 /(1 + q + q 2 ) and π3 = q/(1 + q + q 2 ).
(c) Expected average cost is
10000q 2
π0 · 0 + π1 · 0 + π2 · 10000 + π3 · 0 = .
1 + q + q2
Hence if q = 0.1, average cost per period is 100/1.11 = $90.09 and, if q = 0.5,
average cost is 2500/1.75 = $1428.57.

119
6. Further applications of matrices

2. (a) If output changes are x, y and z then:


   
x 2000
 y  = (I − A)−1  0 .
z −1600

Now I − A is  
0.9 −0.1 −0.2
 −0.1 0.9 −0.1 
−0.1 −0.3 0.9
and inverting (by any method) will eventually produce the inverse (I − A)−1 as
 
1.168 0.225 0.284
 0.150 1.183 0.165  .
0.180 0.419 1.198

Thus the required changes in total output are:


    
1.168 0.225 0.284 2000 1882
6  0.150 1.183 0.165   0  =  36  .
0.180 0.419 1.198 −1600 −1557

i.e. an increase in X of 1,882 units, an increase in Y of 36 units and a decrease


in Z of 1,557 units.
(b) i.

ii. The ‘two-stage’ matrix is:

A B C D
 
A 1 1 1 0
B 1 0 0 0 

C 1 1 1 0 
D 1 0 0 1

120
6.14. Guidance on answering the Sample examination questions

and the ‘three-stage’ matrix is:


A B C D
 
A 2 1 1 1
B 1
 1 1 0 

C 2 1 1 1 
D 1 1 1 0

3. (a) Since each row of these matrices must add up to 1 (it is obvious that we are
‘going from row to column’ here rather than vice versa) then α = 0.2, β = 1
and γ = 0.1.
(b) M1 has no absorbing state and hence is not an absorbing Markov chain.
M2 has one absorbing state (B) but one cannot reach it from all (indeed any)
other states. Hence it is not an absorbing Markov chain.
M3 has two absorbing states (B and D) and states A and C can reach both of
them (one would be enough!). Hence M3 is an absorbing Markov chain.
(c) The connectivity matrix, C1 say, is:
 
1
 1
1
0
1
0
0
0 
6
C1 = 
 0
.
1 0 1 
0 1 1 0
For two transitions we have the following number of routes
 
2 2 1 1
 1 1 1 0 
C2 = C12 =  1

1 1 0 
1 1 0 1
and for three transitions the number of routes is
 
4 4 3 1
 2 2 1 1 
C3 = C13 = 
 2 2 1

1 
2 2 2 0
and the total number of ways of going between states in three or fewer
transitions is C1 + C12 + C13 = C1 + C2 + C3 =
 
7 7 5 2
 4 3 2 1 
 3 4 2 2 .
 

3 4 3 1
It is the final row elements 3, 4 and 3 which we require.
(d) We have
   
0.3 0 0.2 0.5 0.3 0 0.2 0.5
 0 1 0 0   0 1 0 0 
0 0 0 1   → 0.2 0 0.5 0.3  
 0.5 0 0.1 0.4   0.5 0 0.1 0.4 
0.2 0 0.5 0.3 0.2 0 0.5 0.3

→ 0.37 0 0.24 0.39 .

121
6. Further applications of matrices

Hence the required probability of being in state D after two transitions is 0.39.

122
Chapter 7
Markov chains and stochastic
processes

7.1 Aims of the chapter


To introduce the terminology of stochastic processes.

To explain and establish the wide usefulness of the fundamental stochastic process
models of:
i. the ‘simple random walk’
ii. the ‘gambler’s ruin’
iii. the Poisson process
iv. the ‘birth and death’ process.
7
To establish the main terminology and notation for various queuing models.

7.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

solve random walk problems

construct transition matrices and use them to determine equilibrium state


probabilities using Chapman-Kolmogorov equations

fully derive and use the Poisson distribution from the Poisson process assumptions

fully derive and use the general solution for the gambler’s ruin problem

fully derive and use the general solution for the birth and death model

establish how various types of queues can be solved using stochastic processes.

7.3 Essential reading


Brzezniak, Z. and T. Zastawniak Basic stochastic processes: a course through
exercises. (London: Springer, 1998).

123
7. Markov chains and stochastic processes

7.4 Introduction
Stochastic processes deal with systems which develop in space or time according to
some probabilistic laws. They attempt to describe (and predict) the behaviour of the
system in some mathematical way. They have wide-ranging applications including:

(a) Risk theory (e.g. a mathematical analysis of the random fluctuations in the capital
of an insurance company).

(b) Models for social and labour mobility. Research in movements between social
groups, occupation groups, etc. in order to correlate such movements with other
factors affecting the composition of society. The aim is to have a verified model
that can be used in predicting the composition of the social groups in the future in
order to match this with forecasted requirements for occupational groups, etc.
Recent studies have developed the ideas of the dynamic model of social structure.

(c) Queueing models.

(d) Models for the diffusion of news, rumours and epidemics.

(e) Models for population growth; birth and death models, etc.

7
7.5 Some definitions of stochastic processes
Let X(t), Y (t) etc. denote the properties of the system at time t, e.g. the number of
people waiting in a queue at time t; the hours of sunshine on day t, etc.
The ‘state of the system’ at time t1 is the value of X(t1 ).
The ‘state space’ is the total sample space (set of all possible values) of X(t), sometimes
denoted as {X(t)}.
The ‘stochastic process’ comprises the random variables X(t), Y (t),. . . and their
probability laws.
There are several types of possible systems depending upon whether we have discrete or
continuous time, discrete or continuous random variables.
A realisation of the process is X(t) plotted against t.
[Note: For discrete time we usually write Xn (n = 0, 1, 2, . . .) rather than X(t).]

7.6 A simple random walk


This is an example of a stochastic process with discrete time and discrete state space. It
is represented by a particle moving along a line, one unit at a time.
For transition probabilities we let P (step in a positive direction) = p;
P (step in a negative direction) = q = 1 − p.
Suppose Xn is the position of the particle at time n.

124
7.6. A simple random walk

Usually the process has added complications such as reflecting barriers or absorbing
barriers, e.g. ‘Brownian movement’, ‘Gambler’s ruin’.
Suppose the particle starts at j, unrestricted by barriers.
Let Zi be the jump at ith step, i.e. Zi = ±1, then P (Zi = 1) = p and P (Zi = −1) = q.
Hence the expected value of Zi , E(Zi ) = p − q.
Z1 , Z2 , . . . , Zn give a sequence of identically distributed independent random variables.
The position at time n is Xn = Xn−1 + Zn = j + Z1 + Z2 + · · · + Zn .

7.6.1 The Gambler’s ruin problem


Suppose player A has capital $j and player B has capital $(a − j). At each play A wins
with probability p and is paid $1 by B while B wins with probability q and is paid $1 by
A. The game continues until one of the players is financially ruined.
The equivalent mathematical problem is a simple random walk with absorbing barriers
at 0 and a, with the particle starting at j.

Consider the ruin of B:


Let θj = P (B is ruined before A | start at j) = P (Reach a before 0 | start at j). We can
derive the following difference equation for θ:

θj = pθj+1 + qθj−1 for 0 < j < a

with θ0 = 0; θa = 1 as absorbing boundary conditions.


The above difference equation is second order and homogeneous. (See Section 4.8.)
When solving, the auxiliary equation is pm2 − m + q = 0, i.e. (pm − q)(m − 1) = 0, i.e.
we have roots m1 = 1 and m2 = q/p.
For q 6= p, the general solution has the form θj = A + B(q/p)j and the boundary
conditions imply that:

0=A+B and 1 = A + (q/p)a B.

So,
1
A= and B = −A
1 − (q/p)a

125
7. Markov chains and stochastic processes

and hence
1 − (q/p)j
θj = .
1 − (q/p)a
For q = p the solution has the form θj = (A + Bj).
Using the boundary conditions gives A = 0 and B = 1/a and hence θj = j/a.
In a similar fashion we can show that the probability that A is ruined is:
(q/p)j − (q/p)a a−j
for q 6= p, or for q = p.
1 − (q/p)a a
Note that, in the long run, one of the players is ruined.

7.6.2 The case of a single absorbing barrier


The solution to this is easily derived from the above case by letting a tend towards
infinity. Thus, for B’s ruin,
θj → (q/p)j for q < p (positive drift)
and
θj → 1 for q ≥ p (zero or negative drift).

7 7.7 Markov chains


Assuming a system can be in any of a finite or countably infinite number of states i
(i = 1, 2, 3, . . .) then, if the probability of transition from state i to state j depends on
(i, j) and not on the previous history of the process, the process is a Markov chain.
Thus for a Markov chain P (Xn+1 = j | Xn = i, Xn−1 , Xn−2 , . . .) = P (Xn+1 = j | Xn = i).
If these transition probabilities do not depend on time, then the process is termed
homogeneous.
For a homogeneous Markov chain we let pij = P (Xn+1 = j | Xn = i) for every n and the
matrix P = (pij ).
(n)
The n-step transition probability pij = P (Xn = j | X0 = i) is the probability of
reaching state j in n steps after being in state i initially.
Examples of Markov chains are simple random walks, branching processes, Ehrenfest
diffusion models, etc.
To review the simple random walk with absorbing barriers at 0, a as a Markov chain
with a + 1 states then the transition probability matrix is given by:
From↓ To→ 0 1 2 3 · · a−2 a−1 a
 
0 1 0 0 0 · · 0 0 0
1 
 q 0 p 0 · · 0 0 0 

2 
 0 q 0 p · · 0 0 0 

· 
 · · · · · · · · · 

· 
 · · · · · · · · · 

a−1  0 0 0 0 · · q 0 p 
a 0 0 0 0 0 0 0 0 1

126
7.8. Markov processes

7.7.1 The Chapman-Kolmogorov equations


Suppose p(0) is the row vector of the probability distribution of initial states (states at
n = 0) and p(n) is the row vector of probabilities of states at time n, then for a general
homogeneous Markov chain with states at n = 1, 2, . . .:

(a) p(n) = p(0) · P n

(b) If p(n) tends to a limiting distribution Π, say, then Π satisfies the equation
Π = Π · P (the equilibrium equation). (See Section 6.8.)

7.8 Markov processes


These are continuous time processes with discrete state variables. Again ‘Markov’
means the process has no memory, i.e. the future only depends on the present, not on
the past. An example of a well known and well used Markov process is:

7.8.1 The Poisson process


Suppose P (one event in (t, t + δt)) = λδt + o(δt) where o(δt) means terms that are
negligibly small when divided by δt and denote by pn (t) the probability of n events in t 7
units of time. Then we have the following:

pn (t + δt) = (1 − λδt)pn (t) + λδtpn−1 (t) + o(δt)

i.e. rearranging
pn (t + δt) − pn (t) o(δt)
= −λpn (t) + λpn−1 (t) + .
δt δt
Letting δt → 0 gives

dpn (t)
= −λpn (t) + λpn−1 (t) for n ≥ 1
dt

while
dp0 (t)
= −λp0 (t).
dt
Solving these equations gives
(λt)n −λt
pn (t) = e
n!
i.e. the Poisson distribution with parameter λt.
λ is the rate of the process and the probability of waiting more than T units of time is
given by e−λT (the negative exponential distribution).
Birth and birth and death processes can be solved in a similar fashion to the above (see
Sample examination question 2 at the end of the chapter).

127
7. Markov chains and stochastic processes

7.9 Queueing theory


The general situation here is one where a ‘customer’ arrives at a service channel and
requires ‘service’. If the ‘server’ is free, service is given immediately; otherwise a queue
is formed. In order to solve the queueing problem we need to specify:

the input process, i.e. the laws governing arrivals

the queue discipline, i.e. the rules of queuing and the way in which a customer is
selected from the queue, e.g. FIFO (First in First Out) or LIFO (Last in First Out)

the service mechanism (i.e. the laws governing the service time e.g. a normal
distribution, a negative exponential distribution, etc.)

mean and distribution of waiting time

mean and distribution of server’s busy times (i.e. length of time of continuous
working)

what one hopes to get out of the analysis

the mean and distribution of queue length

7 mean and distribution of queueing time.


The methods used to solve queuing problems include direct experimentation, analysis
and simulation.
A queue is often specified as, for example M/G/1,where the parameters tell you arrival
distribution/service time distribution/ number of servers. Thus M/G/1 stands for
random arrivals, general service time, one server. Only M/M/s queues are Markov
processes.

7.10 Summary
What you need to know

The basic terminology of stochastic processes.

The wide applicability of stochastic processes.

How to solve random walk type problems with or without one or two absorbing
barriers.

How to construct transition matrices and solve them using Chapman- Kolmogorov
equations.

How to derive the Poisson distribution.

How to solve simple birth and death processes.

128
7.11. A reminder of your learning outcomes

What you do not need to know

The more complicated systems which require probability generating function,


z-transforms, Bessel function, etc. to solve.
Renewal theory.

7.11 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

solve random walk problems


construct transition matrices and use them to determine equilibrium state
probabilities using Chapman-Kolmogorov equations
fully derive and use the Poisson distribution from the Poisson process assumptions
fully derive and use the general solution for the gambler’s ruin problem
fully derive and use the general solution for the birth and death model
establish how various types of queues can be solved using stochastic processes.
7
7.12 Sample examination questions
1. Consider the following random walk with two reflecting barriers. Suppose a particle
is initially in the state j and that the states 0 and a (a > 0) are reflecting barriers.
Let Xn be the position of the particle immediately after the nth jump. The particle
remains forever among the states 0, 1, 2, . . . , a. Upon reaching one of the barrier
states 0 or a the particle remains there until a jump of the appropriate direction
returns it to the neighbouring interior state i.e. to either state 1 or state a − 1.
Suppose finally that the particle, when lying between the barriers, jumps up one
state with probability p, jumps down one state with probability q or otherwise
stays where it is.
(a) Write out a transition matrix showing how Xn depends upon Xn−1 .
(4 marks)
(b) Why might you describe this situation as a random walk with reflecting
barriers?
(2 marks)
(c) Assuming the limiting equilibrium distribution of the state occupation
probabilities is given by π0 , π1 , π2 , . . . , πa show that for p 6= q,
   k
1 − p/q p
πk = a+1
for k = 1, 2, . . . , a.
1 − (p/q) q

(12 marks)

129
7. Markov chains and stochastic processes

(d) What does the equilibrium distribution become when p = q?


(2 marks)
2. Consider the problem of a queue of hotel guests waiting to check into a smallish
hotel with only one receptionist to serve them. Suppose the guests join the queue
according to a Poisson distribution with an average arrival rate of λ per minute
and the time for the receptionist to check in a guest is a Poisson distribution with a
rate of µ per minute. Assume further that steady state conditions apply and that
the receptionist can only check in one person at a time.
(a) Show that:
i. The probability, Pn , that there are exactly n people in the queuing system
is given by
λ
Pn = (1 − ρ)ρn for n = 0, 1, 2, 3, . . . , where ρ = .
µ
(10 marks)
ii. The expected line length, L, (including anyone that the receptionist is
currently checking in) is:
λ
.
µ−λ
(3 marks)
7 (b) Evaluate the above functions Pn for n = 0, 1, 2, 3 and L if the mean
inter-arrival time for new guests is 15 minutes and the average time required
for the receptionist to check in a guest once they have reached the front of the
queue is 10 minutes.
(4 marks)
(c) Discuss whether the above model in (a) (with different λ and µ) would be
suitable to describe the situation of guests waiting for taxis to take them to
the airport.
(3 marks)

7.13 Guidance on answering the Sample examination


questions
1. (a) The transition matrix, T say, is given by:
0 1 2 3 · · · a
 
0 1−p p 0 0 · · · 0
1 
 q r p 0 · · · 0 

2 
 0 q r p 0 · · 0 

3 
 0 0 q r p · · 0 

T= · 
 · · · · · · · · 

· 
 · · · · · · · · 

a−1  0 · · · · q r p 
a 0 · · · · · q 1−q

130
7.13. Guidance on answering the Sample examination questions

where r = 1 − p − q.
(b) ‘Random walk’ since, other than at barriers, movement is independent of
previous movements.
‘Reflecting barriers’ because the particle cannot pass through them, nor is it
absorbed, but the barrier allows the particle to be reflected back into the 1 to
a − 1 states.
(c) Assume equilibrium probabilities of state i is πi , i = 0, 1, 2, . . . , a. Then
 
π0 π 1 π 2 π3 · · πa = π0 π 1 π2 π3 · · π a · T
→ π0 = (1 − p)π0 + qπ1
π1 = pπ0 + rπ1 + qπ2
π2 = pπ1 + rπ2 + qπ3
..
.
πa−1 = pπa−2 + rπa−1 + qπa
πa = pπa−1 + (1 − q)πa .

Working through these equations in turn we find that π1 = (p/q)π0 the first
equation. When substituted into the next it gives π2 = (p/q)2 π0 and so on. In
general,
 k
p
πk =
q
π0 . 7
Now,

π0 + π 1 + · · · + πa = 1
 2  a !
p p p
→ π0 1+ + + ··· + = 1
q q q
1 − (p/q)a+1
 
→ π0 = 1
1 − p/q
1 − p/q
→ π0 =
1 − (p/q)a+1

and hence, as required,


 k
1 − p/q p
πk = .
1 − (p/q)a+1 q

(d) When p = q then π0 = π1 = π2 = · · · = πa , and hence πi = 1/(a + 1).

2. (a) i. We have

pn (t + ∆t) = pn (t)[1 − λ∆t − µ∆t] + pn−1 (t)λ∆t + pn+1 (t)µ∆t + o(∆t)

or, for n = 0,

p0 (t + ∆t) = p0 (t)[1 − λ∆t] + p1 (t)µ∆t + o(∆t).

131
7. Markov chains and stochastic processes

So rearranging:

pn (t + ∆t) − pn (t)
= −(λ + µ)pn (t) + λpn−1 (t) + µpn+1 (t)
∆t
p0 (t + ∆t) − p0 (t)
= −λp0 (t) + µp1 (t).
∆t
Hence in steady state, i.e. letting ∆t → 0 and noting that the left-hand
side of the above equations are differentials and will go to zero in steady
state (and dropping the parameter t) we get:

λ
p1 = p0 = ρp0
µ
and, in general,
pn = ρ n p0 .
Then, since

X 1
pi = 1 → p0 (1 + ρ + ρ2 + · · · ) = 1 → p0 = =1−ρ
i=0
1/(1 − ρ)

so long as ρ < 1.
Hence pn = ρn (1 − ρ) as required.
7 ii. The expected queue line length is

X X X d
L= nPn = n(1 − ρ)ρn = (1 − ρ)ρ (ρn )
n=0

d X n
= (1 − ρ)ρ ( ρ )

 
d 1
= (1 − ρ)ρ
dρ 1 − ρ
1
= (1 − ρ)ρ
(1 − ρ)2
ρ
=
1−ρ
λ/µ
=
1 − λ/µ
λ
= .
µ−λ

(b) If the mean inter-arrival time for guests is 15 minutes then λ = 1/15 and,
similarly, µ = 1/10. Hence ρ = λ/µ = 2/3. Substituting into the formulae
above we obtain p0 = 1/3 = 0.333, p1 = 2/9 = 0.222, p2 = 4/27 = 0.148 and
p3 = 8/81 = 0.099.
(c) The service rate would probably need to be remodelled since there may or may
not be a server (taxi) to serve the queue. There may also be a build-up of taxis
awaiting passengers.

132
Chapter 8
Stochastic modelling, multivariate
models

8.1 Aims of the chapter


To establish the need and use of multivariate rather than univariate models for
certain problems.

To introduce the typical multivariate data used in multivariate models – such data
will be also be widely used in Chapters 10 and 11.

To demonstrate the use of multivariate models by a detailed analysis of the


fundamental forms of factor analysis and discriminant analysis.

8.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to interpret and use the output from a: 8
factor analysis program

discriminant analysis program.

8.3 Essential reading


Johnson, R.A. and D.W. Wichern Applied multivariate statistical analysis. (Upper
Saddle River, NJ: Pearson Prentice Hall, 2007) Chapters 9 and 11 especially.

8.4 Introduction
In any business application, large amounts of data are encountered. Examples include
the response to market research questionnaires, the share prices quoted in the Financial
Times or Wall Street Journal, and the map references of customer locations. In order to
make more effective decisions, managers need to understand the information contained
in the data. Most data sets can be envisaged in the form of a table, where the columns
are variables and the rows are observations. For example, the raw data from a market
research survey could be presented in tabular form as follows.

133
8. Stochastic modelling, multivariate models

Question 1 ······ Question p


Respondent 1 X11 ······ X1p
······ ······ ······ ······
Respondent n Xn1 ······ Xnp

This represents the answers of n respondents to p questions and the answer of the ith
respondent to the jth question is Xij .
If a financial fund manager wanted to examine the performance of the companies within
their fund over a period of time, they could tabulate the necessary data in the following
way.

Company 1 ······ Company p


Time period 1 X11 ······ X1p
······ ······ ······ ······
Time period n Xn1 ······ Xnp

In this case, Xij represents the share price of the jth company at the end of the ith
time period.
The map references of the customers could be tabulated as follows.

Variable 1 ······ Variable p


Customer 1 X11 ······ X1p
······ ······ ······ ······
Customer n Xn1 ······ Xnp

8 Here, variable 1 would be longitude, variable 2 would be latitude and the remaining
variables would describe details about the customers site such as size of storage facility.
In each of these examples, each observation (respondent, time period or customer) is
multivariate; that is, more than one variable (question, company or map reference) is
needed to describe it. The behaviour of these multivariate observations is typically
random or stochastic (the words are synonymous). For example, if we knew how one
respondent had answered 29 out of 30 questions, we would not be able to guarantee an
accurate prediction of the remaining answer. The knowledge of the other answers would
probably improve the prediction that we would otherwise have made, but there will be
a random component in the response that will prevent perfect prediction.
If we think of a univariate (i.e. single) random variable, its random behaviour is
characterised by a probability function, such as the Normal distribution (see Figure
8.1). In an exactly similar way, the behaviour of a multivariate random variable (such as
a set of answers to a market research questionnaire) is characterised by a multivariate
probability function. A bivariate distribution such as a bivariate Normal can be drawn
(see Figure 8.2). Here the two variables are called x1 and x2 , the probability function is
denoted by f (x1 , x2 ). Although the probability function cannot be drawn for more than
two variables the principle remains the same.
There are many statistical techniques designed to analyse multivariate data. Within this
subject we will only be able to look at a small selection, which has been chosen on the
grounds of greater managerial relevance.

134
8.4. Introduction

Figure 8.1: A (Normal) probability function for a univariate random variable.

In order to make more informed decisions and to answer ‘what if’ questions, managers
may want to estimate a particular variable (such as sales per month). This is a
regression type problem as one variable (the dependent variable) is explained in terms
of the other variables (the independent variables).
Chapters 9 and 10 on forecasting and econometrics are concerned with regression type
8
problems, although formal regression estimation may not be used. The additional
structure imposed on the data in these chapters is that the data are time series, i.e. the
observations are taken sequentially over time.
In other situations, often with survey data, the concern is with the number of variables
used to describe the phenomenon observed. For example, a questionnaire investigating
shoppers’ preference for washing-up liquid may have 30 questions, but, intuitively, you
might suggest that there are far fewer underlying concepts describing this preference.
For example these concepts may be:

cheapness/dearness

effectiveness at cleaning

kindness to hands

attractiveness (colour/smell/packaging).
The problem here is one of data reduction: reducing the number of variables observed to
a set of more basic concepts or factors. Often the variables measured fall into groups of
similarly behaving variables. The methods described in Chapter 11 on exploratory data
analysis are designed to gain familiarity with the data. The methods discussed are
designed to detect inter-relationships between variables, to see how similarly pairs of

135
8. Stochastic modelling, multivariate models

Figure 8.2: A bivariate Normal probability function.

8
variables behave. Emphasis is placed on graphical methods, especially in the initial
stages, as they offer a quick and easy way of looking for structure in the data (and as a
means of detecting rogue values due to mis-typing). A method that can be used to
detect groups of similarly behaving variables is cluster analysis (also covered in Chapter
11). Cluster analysis can also be used to group similar observations together.
Other important multivariate methods address specific problems.
Factor analysis constructs a reduced number, say p, of factors to replace the m observed
variables, where p is less than m. The emphasis in factor analysis is on the construction
of easily interpreted factors. This technique is more powerful than the use of cluster
analysis to group variables in many ways; one is that factor analysis allows the factors
to be quantified.
Classification methods such as discriminant analysis and logistic regression are designed
to put the multivariate observations into categories. These are regression type problems
in that the dependent variable is the category the observation belongs to. For example,
banks may collect data about customers to whom they have lent money (such as salary,
family size, time spent at current address, number of credit cards held, etc.); the
relevant categories of interest will be whether the borrower repaid the loan or not. In
this type of application, a database of existing borrowers would be used to calibrate the
model. This model would then be used to predict whether new potential borrowers were
good or poor risks.

136
8.5. Principal component factor analysis

8.5 Principal component factor analysis


When a questionnaire is constructed or when a set of data are collected, it is usually the
case that far more questions are asked, or far more variables measured, than one feels is
really necessary.
For example, a market researcher may be sent out to ask shoppers 30 questions about
their reasons for buying a particular washing-up liquid. This does not mean that the
decision process that a shopper uses to choose their washing-up liquid has 30 different
dimensions. There may only be two or three real considerations such as:

value for money

effectiveness

kindness to hands.
The questionnaire will be designed to shed some different lights on the shoppers’
perceptions of these considerations.
The number of really different considerations can be called the dimensionality of the
data. A common objective is to identify the dimensionality of a particular set of data.
An analysis of the data collected ought to indicate how many dimensions adequately
describe the shopper’s decision process.
A common problem in the collection of multivariate data is that the variables measured
are correlated with each other. This problem is often encountered in regression analysis,
when there is multi-collinearity in the data (see Chapter 10).
If the data are plotted for two variables, the scatter diagram shows a cluster of points 8
falling into an elliptical envelope; in more dimensions this shape is called an ellipsoid.
The shape of the ellipsoid gives information about the strength of the relationship
between the variables. The more the ellipsoid departs from a sphere, the stronger the
relationship. The information contained in the shape of the ellipsoid can be described in
terms of the direction and magnitude of the axes of the figure. The attractions of these
axes are that they:

are at right angles to each other (orthogonality)

indicate the relative importance of axes; the longest axis has the largest variability
along its length.
Principal components analysis identifies these axes (principal components). The
analysis uses the correlation matrix between the variables as its input. Using eigen value
analysis (see Section 6.9) to decompose the correlation matrix, the associated eigen
vectors become the principal components. They are produced so that they have the
properties of orthogonality mentioned. In addition, they are produced in order of
variation explained. Each component is composed of a linear combination of the original
variables. In summary, the objective of principal components analysis is to transform
the variables measured to a set of uncorrelated linear combinations of these variables.
The data reduction we often wish to undertake might be thought of as being two steps.
The first step (principal components analysis) changes the m correlated variables to m

137
8. Stochastic modelling, multivariate models

uncorrelated variables. The second step (factor analysis) aims to reduce the m
uncorrelated components into p (where p < m) uncorrelated factors.

Example 8.1 Principal component analysis of some car data


These data are a selection of cars priced from £6,000 to £20,000.
If we consider the variables that might be used to describe the performance of a car,
such as engine size, power output, top speed, we would expect all these variables to
be correlated with each other. The laws of physics and thermodynamics provide the
linkage.
Figures 8.3 and 8.4 show different graphical presentations (multiple scatter diagram
and three-dimensional (3D) plot – see Chapter 11) of the data.

Figure 8.3: A multiple scatter diagram for brake horsepower, top speed and engine size
of UK cars.

Figure 8.4: 3D ellipsoid plot of UK car data.

138
8.5. Principal component factor analysis

The principal components analysis output from SPSS (Statistical Package for the Social
Sciences) is summarised below.
Correlation matrix:

Factor matrix:
Note: In order to describe the factors as functions of the variables, the correlation
matrix (in the top of the above tableau), output by SPSS must be inverted.

Final statistics:

The first principal component (factor 1), for example, is composed of a weighted sum of
the three variables:
Factor(1) = 0.444 bhp’ + 0.391 Max speed’ + 0.328 Engine size’
The variables are standardised i.e./e.g.

(observed bhp − mean bhp)


bhp’ =
Standard deviation bhp
This component explains 72.9 per cent of the variation in the data. The total variation
in the data is summarised by the determinant of the correlation matrix. The fraction

139
8. Stochastic modelling, multivariate models

represented by a particular component is obtained by dividing the associated eigen


value by the number of variables, three in this case.
Note the dramatic decrease in variation explained by these components, the third and
last only explaining an additional 3.1 per cent.
The principal components can be superimposed on the 3D plot. With an effort of
imagination it can be seen that the components are genuinely orthogonal.

Figure 8.5: Principal components plot.


8
It is noticed with many data sets that the last few principal components explained very
little variation and could be discarded without much penalty and with the advantage of
explaining the data set with fewer components. This is the data reduction discussed
earlier. One could argue that the car data discussed could be described by just one
component, or perhaps by two. Thus the dimensionality of this subset of the car
variables is either one or two, certainly not three.

Underlying results

If S is a p × p covariance matrix, the eigen values are the solutions to

Det(S − λI) = 0.

This equation will have p roots λ1 , . . . , λp in order of decreasing size. The eigen vector
v1 is found by solving
Sv1 = λ1 v1 .
It can also be shown that the matrix can be decomposed in the following way:

S = λ1 v1 v10 + · · · + λp vp vp0 .

140
8.5. Principal component factor analysis

Example 8.2 German companies share performance


A subset of German companies will be used here for principle components. The
mean and standard deviation of daily returns are given below.

Correlation matrix is:

Determinant of correlation matrix = 0.0069382.


Principal components analysis gives:
8

The correlations are all quite high showing that there is a high degree of similarity
between the performance of the companies.
If the first two principal components are concentrated upon (these explain 85 per
cent of the total variation), the loadings on the components are as follows:

141
8. Stochastic modelling, multivariate models

The first principal component is an average over all the companies with no
particular emphasis. The second principal component puts negative weights on the
chemical companies and positive weights on the car manufacturing companies.

8.6 Factor analysis


The development of this multivariate statistical procedure follows on from principal
component analysis. The convenience of simply using the important components and
discarding the less significant ones is appealing. In effect, factor analysis formalises this
approach and adds extra power to it, making interpretation easier.
The motivation for its development came from behavioural scientists anxious to
measure concepts such as intelligence and physical fitness. These concepts cannot be
measured directly; information can be gained by giving individuals various tests, the
results of which are dependent on these concepts. The results of intelligence tests will be
strongly correlated with each other, as will the results of physical fitness tests, but there
will not be a strong correlation between the results of the two types of tests. One of the
objectives of factor analysis is to group together highly correlated variables that give
information about the same concept.
Put another way, the objective is to represent a number of observable variables as a
smaller number of unobservable concepts (constructs or factors).
In the last example, the use of two components implies a hypothesis that the
performance of the German companies is a function of the market as a whole plus an
adjustment for the particular sector.
8
8.6.1 The factor model
The hypothesis underlying factor analysis is that the observed values of the
variables are linear combinations of the unobservable factors plus random variation
specific to the individual variable.

Variable(Xi0 ) = Loadingi1 Factor(1) + Loadingi2 Factor(2) + · · ·


+ Loadingim Factor(m) + specific variation(i).

Xi0 is the variable Xi minus its mean. The factors are formally called common factors,
the coefficients are called loadings.
The factor model is explaining p (say) variables in terms of m factors, where p > m.
This is the data reduction: from observable variables to m unobservable factors plus p
unobservable error terms.
In order to achieve this, various assumptions are imposed on the model:

The average value of each factor is zero: E(Factor) = 0.

The factors have variance of 1 and are uncorrelated with each other.

The average value of each error term is zero: E(specific variation) = 0.

142
8.6. Factor analysis

The error terms are uncorrelated with each other.

The error terms and the factors are uncorrelated with each other.
These assumptions allow a non-unique solution to be found for the values of the factor
loadings. We can also express the variance of the variable X as follows:

Variance[Xi0 ] = (Loadingi1 )2 + (Loadingi2 )2 + · · ·


+(Loadingim )2 + Variance[specific variation(i)].

The term, (Loadingi1 )2 + (Loadingi2 )2 + · · · + (Loadingim )2 , represents the variation of


variable Xi explained by the common factors, this is called the communality of the
variable.
Variance[specific variation(i)] is called the specific variance of the variable.

8.6.2 Estimation of factor loadings


There are several methods offered by most packages. The most important are:

maximum likelihood estimation

principal components.
The most straightforward method is to take principal components and discard those
components explaining little variation. Let us look further at the German companies.
The following is an edited and annotated output from SPSS.

If the factor model is a good reflection of the structure of the data, then one would
expect the correlation matrix reconstructed from the factors to closely resemble the
original correlation matrix based on the original variables. The following is a
comparison between the two matrices provided by SPSS.
Reproduced correlation matrix:

143
8. Stochastic modelling, multivariate models

The lower left triangle contains the reproduced correlation matrix; the diagonal,
reproduced communalities; and the upper right triangle residuals between the observed
correlations and the reproduced correlations.
For example the:

observed correlation between Daimler and BMW was 0.7861


value calculated from the factor model is 0.8526
error in estimating the correlation is (0.7861 − 0.8526) = −0.0665.
Although there are 4 (26.7 per cent) residuals (above diagonal) with absolute values >
0.05, overall the fit is good with no large errors.
Comparing the actual and factor-based correlation matrices is one way of judging
whether the number of factors is adequate. Another test offered by SPSS is the Kaiser
Meyer Olkin measure of sampling adequacy (based on partial correlations). The more
inter-correlation there is between variables, the more profitable factor analysis will be.
The KMO is classified as below.

KMO > 0.9 is marvellous


KMO > 0.8 is meritorious
KMO > 0.7 is middling
KMO > 0.6 is mediocre
KMO > 0.5 is miserable
KMO < 0.5 is unacceptable

8 Individual sampling adequacy for variables is shown in the anti-image correlation


matrix (negative partial correlation). Variables with low sampling adequacy may be
candidates from the analysis.

8.6.3 Factor rotation


The factors chosen satisfy the requirements of principal components, the first component
explains most variation and the second component explains the next most and so on.
In order to understand the data, the analyst needs to examine the factor matrix so that
he can put intuitive labels on the factor by looking at the weights on each variable. In
the German companies example, the two factors were characterised as a market index
and a sector component. This was straightforward as there are only six variables. In
many cases, where there are more variables, the initial factors are more difficult to
interpret.
This is the point where factor analysis offers extra value, over principal components.
The crucial fact is that once the number of factors has been chosen, the actual
definition of the factors is not unique, and the factors can be rotated without any loss of
explanatory power.
There are several criteria for rotation. They all take into account that, for ease of
interpretation of factors, it is easier if variables have either large weights (when their
presence helps name the factor) or small weights (when they can be ignored).

144
8.6. Factor analysis

There are several criteria for rotation:

Varimax: (the most common) minimises the number of variables with a high
weighting on each factor.

Quartimax: minimises the number of factors needed to explain a variable. The


resulting factors often include a ‘general’ factor with most variables represented.

Equamax: a combination of the above.


If varimax rotation is applied to the German company data, the following results are
obtained.
Rotated factor matrix:

Factor transformation matrix:

In effect, the Varimax algorithm rotates the axes until its objective function is
maximised. The objective function is the variance of the squared loadings on the
rotated factors. This leads to high positive or negative loadings for some variables and
negligible loadings on others on each factor. The concentration on some variables and
discounting of others facilitates the labelling of the factors.
In the example the angle of rotation that achieves this maximum is −44.7◦ . The factor
transformation matrix gives this information, the first argument is the cosine of the
angle of rotation, the third is the sine of the angle of rotation, i.e.
cos(−44.7◦ ) = 0.71034 and sin(−44.7◦ ) = −0.70386.
After rotation, the loading of the chemical companies is maximised on factor 1, the
loading of the car companies is maximised on factor 2.

8.6.4 Factor scores


In order to achieve the benefits of data reduction, we may want to look at the values of
factors for the original observations. In this example, these observations would be factor
values on a weekly basis. The scores tabulated are used as weights for the summation of
the standardised returns for the five companies.

145
8. Stochastic modelling, multivariate models

Factor score coefficient matrix:

The scores computed could be used as an index for weekly returns in the chemical and
car manufacturing sectors.
The important distinction to remember is that the:

factor loadings are the coefficients on the factors which are summed to give the
variables

factor scores are the coefficients on the variable values which are summed to give
the ‘observed’ factor.

8.6.5 Other estimation methods


Maximum likelihood estimation of factors is the main alternative to principal
components. If it is assumed that the variables are multivariate normal random
variables, then maximum likelihood estimation can be used. Using this approach, the
8 factor loadings are used to maximise the likelihood (a product of probabilities) of the
observed data occurring, given the assumption of multivariate normality. Although a
more complex procedure than principal components, many analysts prefer it. The ideal
situation is when there is little difference between the two estimation methods.

8.6.6 A summary of strategy for factor analysis


There are a number of choices to be made in factor analysis which means that there is
no single set of results to be expected from one set of data. Results that give an
intuitively satisfactory interpretation of the factors are the most useful. The following
steps are recommended:

Principal component factor analysis: a useful first pass through the data. Perform a
varimax rotation. Calculate factor scores for the observations, plot them looking for
outlying values (this might indicate observations which exert an undue influence on
the covariance matrix and might be better removed).

Repeat with maximum likelihood estimation.

Compare different loadings and factor scores.

Repeat steps 1–3 for different numbers of factors. Does increasing the number of
factors add to ease of interpretability?

146
8.7. Discriminant analysis

If the data set is large, split the data in half and carry out the factor analyses on
each half. This allows one to check the stability of the results.

8.7 Discriminant analysis


Probably the most visible use of a predictive classification technique is the credit
scoring approach (sometimes) used by banks and finance companies. A series of answers
to a questionnaire are used to decide the applicants eligibility for a loan.
The population considered can be divided into two or more groups, such as good or bad
risks. We have sample information available on a set of variables, such as questionnaire
answers, for each group in the population. Using this information, a decision rule (or set
of rules) is constructed to predict group membership. The initial data are used to derive
the rule. This rule is then used with new observations to predict their membership.
A selection of applications of discriminant analysis and classification methods is given
below:

The use of accounting variables for building discriminant functions has received much
attention in finance. Some more detailed examples will be discussed later.
Market researchers use questionnaires to identify ‘innovators’, i.e. the sort of person
they should target when promoting a new product.
Candidates for training courses often sit tests as they progress. The scores on these
tests can be used to discriminate between those who will successfully complete the
course and those who will not.

8.7.1 Fisher’s discriminant analysis

This is the pioneering and still most used form of discriminant analysis, developed in
the 1930s. The approach is to transform a set of multivariate observations into
univariate observations which are used to discriminate between groups.
Let us use the credit risk problem as an illustrative example.

147
8. Stochastic modelling, multivariate models

The multivariate observation for individual i is: incomei , agei , (no. of credit cards)i ,
(size of family)i . Let us refer to this as X = (X1 , X2 , X3 , X4 ).
The two populations are: good credit risks and poor credit risks. Let us denote these
populations as A1 and A2 .
The multivariate observation is transformed to the univariate discriminant score by an
equation of the form:

Discriminant score = a1 incomei + a2 agei + a3 (no. of credit cards)i + a4 (size of family)i

i.e. Y = a1 X1 + a2 X2 + a3 X3 + a4 X4 = a0 X.
The covariance matrix G (see below) is assumed to be the same for each population.
This assumption makes the solution straightforward, but is also restrictive and difficult
to justify for many applications.
The means of the discriminant scores for each population are:

µ1Y = a0 µ1
µ2Y = a0 µ2 .

The computation of the coefficients a1 , . . . , a4 is carried out in order to discriminate


between the two groups most efficiently.
The objective is to maximise the similarity within the groups (make the variance of the
discriminant score as small as possible within each group) and to maximise the distance
between the groups, i.e. we wish to
Mean discriminant score for group 1 − Mean discriminant score for group 2
Maximise .
8 Variance of discriminant score within groups
This objective is summarised by

a0 (µ1 − µ2 )2
Maximise .
a0 GA
The solution to this optimisation leads to the calculation of the coefficients a1 , a2 , a3
and a4 .
It can be shown that the solution is given by:

Y = a1 X1 + a2 X2 + a3 X3 + a4 X4 = a0 X = (µ1 − µ2 )0 G−1 X.

This is called Fisher’s linear discriminant function.


The discrimination is performed by computing the discriminant score and comparing it
with a cut-off point m = 0.5(ȳ1 − ȳ2 ).
If the discriminant score is greater than the cut-off score then the applicant is
considered a good risk.
If the discriminant score is less than the cut-off score then the applicant is considered a
poor risk.
For a fresh applicant for credit we take their observations x1 , x2 , x3 , x4 and transform
them to y = a1 x1 + a2 x2 + a3 x3 + a4 x4 and

148
8.7. Discriminant analysis

if y > m we allocate the applicant to A1 – a good credit risk.


if y < m we allocate the applicant to A2 – a poor credit risk
There are some difficulties with this sort of exercise:

a shortage of examples from one population (banks have a far greater proportion of
good risks as customers)
the data are biased in that the bank will have discarded many observations that it
considered bad risks before the study is carried out.

8.7.2 Measurement of performance


A useful way of judging the performance of any classification method is via the
‘confusion matrix’.

where

n1C number of GROUP 1 items correctly classified as GROUP 1 items


n1M number of GROUP 1 items misclassified as GROUP 2 items
n2C number of GROUP 2 items correctly classified as GROUP 2 items
8
n2M number of GROUP 2 items misclassified as GROUP 1 items.
The apparent error rate (APER) is then
n1M + n2M
APER =
n1 + n2
i.e. the proportion of items misclassified in the data used for estimation (the
training set). The APER will always tend to underestimate the actual error rate
(AER), the rate on out-of-sample data, because it uses the same data that were used to
build the classification rule.

8.7.3 Other points


One way of looking at the classification problem is in terms of conditional probability,
i.e. given this multivariate observation what is the probability that the item comes from
GROUP 1? This can be expressed using Bayes’ theorem in the following way:
P (reading | Group 1)P (Group 1)
P (Group 1 | reading) = .
P (reading | Group 1)P (Group 1) + P (reading | Group 2)P (Group 2)

The SPSS default for P (GROUP i) is 1/(number of groups), i.e. 0.5 when there are two
groups; if the analyst has a better estimate than this, he may incorporate this in the
analysis.

149
8. Stochastic modelling, multivariate models

8.7.4 Stepwise variable selection criteria – Wilks’ lambda

This is a useful measure for all classification methods.

Within groups sum of squares


Wilks λ =
Total sum of squares
(yi1 − ȳ1 )2 + (yi2 − ȳ2 )2
P P
= P .
(yi − ȳ)2

When λ is close to 1, this implies that there is little difference between the means of the
discriminant scores in each sub-population. (Bad news!)
The closer λ is to 0, the greater the difference between the mean discriminant scores in
each sub-population. (Good news!)
This statistic is an F -distributed random variable. A common stepwise procedure is to
keep choosing the variable that makes the greatest reduction in Wilks’ λ, until the
reduction is no longer significant. In this respect it is similar to stepwise multiple
regression.

8
Example 8.3 An example of discriminant analysis using some bankruptcy
data
These data are taken from Johnson and Wichern and relate to a sample of US
companies (1968–72). The objective is to discriminate between bankrupt and
non-bankrupt companies. The variables are:

CF/TD – cash flow / total debt

NI/TA – net income / total assets

CA/CL – current assets / current liabilities

CA/N – current assets over net sales

INDICATOR – 1 = bankrupt, 2 = non-bankrupt


The data relate to about two years before the companies went bankrupt (or did not).
We can gain some insight by looking at crossplots. There is some differentiation
between the two groups. Some basic statistics also help to indicate differences
between the variables in either group. The following is typical of some of the output
from a statistical package:

150
8.7. Discriminant analysis

Classification function coefficients (Fisher’s linear discriminant function)


8

(These are an interesting by-product of the analysis. For each case a score is worked
out for membership of group 1 and group 2. The case is allocated to the group where
it has the higher score.)

This is an ANOVA (see Chapter 10) of the discriminant score as the dependent

151
8. Stochastic modelling, multivariate models

variable and the grouping variable as the classification.


The eigen value is (between group sum of squares)/(within group sum of squares).
The Canonical correlation is the Pearson correlation between the grouping variable
and the discriminant score.
Wilks’ lambda is (within group sum of squares)/(total sum of squares)
(Canonical correlation)2 + Wilks’ lambda = 1.

(These are for use with standardised independent variables.)


Structure matrix: Pooled within-groups correlations between discriminating
variables and canonical discriminant functions
(Variables ordered by size of correlation within function.)

8
(A means of judging the contribution made by a variable to discrimination.)

(These are the coefficients used with the unstandardised independent variables to
construct the discriminant score.)

152
8.7. Discriminant analysis

Test of Equality of Group Covariance Matrices Using Box’s M


The ranks and natural logarithms of determinants printed are those of the group
covariance matrices.

(The null hypothesis is that the covariance matrices in each group are equal. The
test is very sensitive to non-normality, so that rejection of H0 may mean that the
covariance matrices may not differ much but there is non-normality.)

153
8. Stochastic modelling, multivariate models

(This display shows the distribution of discriminant scores in both groups, note the
mis-classified cases.)
Classification results for cases selected for use in the analysis

Percentage of ‘grouped’ cases correctly classified: 91.89 per cent (this is the
apparent success rate).

154
8.8. Summary

Classification results for cases not selected for use in the analysis

Percent of ‘grouped’ cases correctly classified: 66.67 per cent (this is the actual
success rate).

8.8 Summary
What you need to know

How to identify and describe the multivariate observations of a given experiment.

How to recognise broad problem types such as: explaining one variable in terms of
another; data grouping or data reduction.

How to interpret the output from factor analysis software.

How to interpret the output from discriminant analysis software.


8
What you do not need to know

Formulas for univariate or multivariate density functions.

8.9 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to interpret and use the output from a:

factor analysis program

discriminant analysis program.

8.10 Sample examination questions


1. (Part question)
Briefly explain the general principals and uses of each of the following multivariate
techniques:

155
8. Stochastic modelling, multivariate models

(a) Discriminant analysis.


(b) Factor analysis.
(4 marks)

8.11 Guidance on answering the Sample examination


questions
1. This is essentially bookwork. A possible answer might be:
(a) Discriminant analysis is a multivariate technique which tries to put the
observations into categories. It works somewhat like multiple regression but
the dependent variable is the category the observation belongs to. Hence one
forms a linear combination of the independent variables whose aim is to
produce discriminant scores (values of this linear combination) which are as
‘different’ as possible for the different categories. Discriminant analysis is used
in credit scoring, etc.
(b) Factor analysis is a data reduction technique. It constructs a reduced number
of factors, m say, to replace (represent) the p observed variables (m < p). The
factors should (ideally) be easily interpreted.

156
Chapter 9
Forecasting

9.1 Aims of the chapter


To introduce the terminology of forecasting.

To give the reader an understanding of the main forecasting methods widely used
for many applications.

To establish the usefulness (if it is not already obvious) of forecasting for


management.

9.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

understand the basic terminology of forecasting

classify the forecasting models by leadtime and information used

decompose a model into trend, seasonality, cycles and random variation

construct simple models to forecast time series – these should include exponential 9
smoothing, moving averages, seasonal forecasts and simple Box-Jenkins methods

assess the usefulness of a forecasting model by determining and interpreting the


performance measures of root mean square error and mean absolute deviation.

9.3 Essential reading


Hanke, J.E. and D.W. Wichern Business forecasting. (Upper Saddle River, NJ:
Pearson Prentice Hall, 2009) Chapters 1, 3, 4, 5, 8 and 9.

9.4 A note on spreadsheets


There is obviously currently no requirement to be able to use spreadsheets in the
examination for this subject. However, you are strongly advised to familiarise yourselves
with the use of spreadsheets in general – it is such an important tool – and particularly

157
9. Forecasting

for certain aspects of MT2076 Management mathematics. One such aspect is in


forecasting. You will find it beneficial to write simple spreadsheet programs to use in
forecasting. This will not only enable you to quickly check whether you understand
given examples and solutions, but also to more easily understand the impact of
changing certain parameters, e.g. smoothing constants, etc. upon the resulting forecasts.
Although the examination system dictates that you should only use simple calculators
when producing forecasts within an examination, in reality one would obviously use the
power of a spreadsheet to do these calculations. Unless you appreciate how to construct
a spreadsheet to do the sort of forecasts discussed in this chapter you will not have
gained all you can from the knowledge contained herein. If you have progressed beyond
using the usual texts of the type ‘How to use spreadsheets’, ‘Excel made simple’, etc.,
you will find the following text very useful for using several parts of MT2076
Management mathematics:

Levine, D.M., D. Stephan, T.C. Krehbiel and M.L. Berenson Statistics for
managers using Microsoft Excel. (Upper Saddle River, NJ: Pearson Prentice Hall,
2005) fourth edition [ISBN 9780131440548].

9.5 Introduction

A forecast is an estimate of the value of a variable at a future point in time. In this


chapter we will concentrate on quantitative forecasting, the provision of numerical
estimates of the expected value of a variable and, equally importantly, an estimate of
the uncertainty associated with the estimate.
Forecasting is important because it is an aid to making decisions. Formal forecasting
permits decision makers to make more informed decisions and thus allow organisations
9 to perform more effectively. The type of decision involved ranges from inventory
management and production control, through capacity planning to economic
management.
Let us look at two common types of forecasting problem.

i. Forecasting the timing of an event. The timing of an event such as an upturn or


downturn in the economy, or period of peak demand for a consumer durable, is
often of crucial importance to decision makers.

ii. Forecasting a time series. A time series is a sequence of observations of a variable


recorded at equidistant points in time. Examples of practical interest are sales
figures recorded weekly or monthly, demand for electricity, the retail sales index.
The forecasting of time series is the main area of interest, since other problems such as
the timing of an event can be subsumed into it.
In order to gain an idea of the problems where forecasting makes a contribution let us
look at the set of examples shown in Table 9.1.

158
9.6. Classification of forecasts

Table 9.1: Examples of forecasting problems.

9.6 Classification of forecasts


Two important characteristics are used here to classify forecasts: leadtime and the
information used. Classification of a forecast is a useful way of structuring the choice of
forecasting technique. It is not always a straightforward process, often needing a
thoughtful analysis of the forecast required.
9
9.6.1 Classification by leadtime

The distance in time between when a forecast is made and the future point to which it
refers is the leadtime of the forecast. For a forecast of UK unemployment in December
2010 made in December 2008, the leadtime is two years.
A convenient and generally accepted division of leadtimes is into short-, medium- and
long-term problems. A time series can be regarded as the output of a system and the
classification follows from this.
If it is believed that the underlying system is unlikely to undergo any significant
changes during the leadtime in question, then the problem is short term. In other
words, short-term forecasting techniques are based on the hypothesis that the relevant
future observations are generated by the same system that generated past observations.
Forecasting product sales for inventory control is a typical short-term forecasting
problem.
If one expects the system to change in one of a number of ways, then the forecasting
problem is over the medium term. An example of a medium-term forecasting problem is

159
9. Forecasting

the demand for a product that will be significantly further through its product life cycle
at the end of the leadtime than it is when the forecast is made.
Long-term forecasting problems occur when there is little or no information available
about the state of the system generating the time series at the end of the leadtime. In
these circumstances, there are little quantitative data available and the opinions of
those with relevant expertise are used to help formulate the forecast.
The length of the leadtime is not sufficient to classify a forecasting problem into short,
medium and long term as the underlying systems may differ drastically in their
susceptibility to change.
Table 9.2 shows some examples of different systems and the differences between the
length of leadtimes in the same class.

Table 9.2: Examples of classification by leadtime.

9 The entries in the table are realistic estimates of the relevant time periods for each
environment. Note that in some cases, all three classifications cannot be used, as the
demand for particular fashion items has no long term, while the demand for nuclear
power stations has no short term. Note also that the length of the leadtimes associated
with short-term forecasts varies greatly depending on the stability of the system
involved.

9.6.2 Classification by information used


i. Extrapolative models. Extrapolative models are univariate, that is, they are
based only on past observations of the time series.
If the variable being forecast is denoted as Xt , for example the sales of a product at
time t, then an extrapolative forecast with a leadtime j is denoted:
E(Xt+j | Xt , . . . , X0 ). (9.1)
This means that the extrapolative forecast is a conditional expectation of Xt+j ,
using only the information contained in Xt , . . . , X0 the history of the time series to
date.

160
9.7. The requirements of a forecasting exercise

Univariate forecasting methods form a major proportion of the forecasting carried


out in industry, especially useful for routine and repetitive tasks. The forecasting
process using extrapolative models is easy and inexpensive once the initial analysis
is complete. Extrapolative models generally produce forecasts that compare
favourably with more complex models and, because of this, they provide useful
benchmarks for performance comparisons.

ii. Causal models. Causal models are multivariate, the underlying hypothesis is that
the behaviour of the variable being forecast is influenced by one or more other
variables.
A simple example is forecasting the sales of a product using advertising
expenditure as an explanatory variable. In this case, it is believed that advertising
expenditure will affect sales in a direct way, perhaps after a time delay. Thus the
inclusion of extra information about advertising expenditure will lead to an
improvement over the univariate forecast. There is no theoretical limit to the
number of explanatory variables used; however, in practice the improvement in
forecasting gained by each new explanatory variable tends to decrease quickly.
The multivariate forecast is a conditional expectation using more information than
the univariate, thus a multivariate forecast for j periods ahead is:

E(Xt+j | Xt , . . . , X0 , Yt , . . . , Y0 , Zt , . . . , Z0 ) (9.2)

where Yt and Zt are explanatory variables.


The great benefit of multivariate models is that they can be used for planning purposes,
for the evaluation of alternative policies within the company. They can be used to
answer ‘what if’ questions, for example, ‘What happens to sales if we increase
advertising expenditure by 10 per cent?’
Extrapolative models tend to be used when urgency is required, such as inventory 9
control, when there is not time to gather extra information. Using government economic
statistics is often a problem in that there is a long publication delay.

9.7 The requirements of a forecasting exercise


For a particular forecasting problem, the approach used can be considered appropriate
if costs are minimised. These costs include the penalties associated with errors in the
forecast, the costs associated with producing the forecast, which may include software
costs, staff costs and possibly consultancy fees.
When forecasting for inventory control of a large number of different products, one
would prefer a robust, reasonably accurate approach that requires little manual
intervention. Any savings from increased forecast accuracy would be dissipated by the
cost of extra manpower.
The trade-off between increased accuracy and the cost of delivering it is shown
schematically in Figure 9.1.

161
9. Forecasting

Figure 9.1: Cost-based choice of forecasting method.

9.8 The structure of a time series


A popular and common approach to time series analysis is to decompose a series into
components assignable to different effects. A typical decomposition is given below:

(a) trend
(b) seasonality
(c) cycles
(d) random variation.
Using a decomposition approach, each component is estimated and then these estimates
are re-combined to form a forecast. An advantage of the approach is that the
components are intuitively reasonable and can be easily explained.
In this section these components are discussed and the means of isolating them from a
9 time series will be explained. The first three components listed above are said to be
systematic components and random variation is the residual, sometimes called noise.

Trend

The trend of a time series is the systematic increase of the variable over time. The trend
line is a smooth line indicating the path of the series, ignoring other components. When
forecasting the trend is usually taken to be a straight line, which is extrapolated into
the future. This idea is used when isolating the trend from an existing time series. The
trend is found by taking a moving average of the data available, a process sometimes
called ‘smoothing’.
Trend at time t is
M −1
2
1 X
Tt = Xt+i (9.3)
M
i=− M2−1

where M is an odd number. As M increases the smoothing effect increases; as M


decreases the smoothing effect decreases.

162
9.8. The structure of a time series

Figure 9.2: Moving averages of the UK food retailer share price.

Example 9.1 Using the historical weekly share price (over a two-year period) of a
company, a major UK food retailer, the effect of different choices of M , for example
7 and 25, is shown in Figure 9.2.
Note that the share price changes quite dramatically from week to week. For a
moving average, with M = 7, the changes in the share price are smoothed to quite a
large extent, and the use of M = 25 produces a very smooth curve.

Seasonality

The seasonality of a time series is the effect on the variable of the time of year that the
measurement is being made. The mechanism by which the season affects the variable
may be via the weather, for example, greater use of fuel for heating during the winter; 9
or it may be due to dates such as holidays or financial year ends.
Seasonal fluctuations are often very pronounced in weekly, monthly or quarterly time
series, and they often contribute a major proportion of the variability of the data.
Isolation of the seasonal pattern is thus of considerable importance in analysing and
forecasting time series.
The first stage is to identify the underlying trend; this represents where the series would
have been in the absence of seasonal and random fluctuation. This is done by a centred
moving average which represents an average for a year’s observations; for example a
centred moving average for quarterly data is:
1
Tt = (Xt−2 + 2(Xt−1 + Xt + Xt+1 ) + Xt+2 ) . (9.4)
8
A centred moving average is necessary when averaging over an even number of
observations, in order to make the timing of the average coincide with an observation,
rather than fall between them.
The next stage is to identify the seasonal pattern. In order to do this, it is convenient to

163
9. Forecasting

change the notation slightly:

Xt becomes Xij

where Xij represents the jth period in year i. Thus t = iL + j, where there are L
periods per year, for example L = 12 or L = 52.
There are several ways of modelling seasonality. Here we shall concentrate on the
multiplicative representation given.

Xij = Tij (1 + Sj ) + ij (9.5)

such that S1 + S2 + . . . + SL = 1 and ij is a random error term.

Example 9.2 Calculate a centred moving average for the registrations of new cars
in the United Kingdom (Years 1 to 7). Then calculate the multiplicative seasonal
factors and comment on the seasonal profile revealed.
The full data are given in Table 9.3.
Table 9.4 illustrates the calculations necessary.
The centred moving average is calculated using this version of (9.4):
1
Tt = (xt−6 + 2(Xt−5 + · · · + Xt+5 ) + Xt+6 ) .
24
The data and the centred moving average for these data are shown in Figure 9.3.
The raw seasonal fluctuations are found by dividing the series Xt by the trend Tt
9 and then subtracting 1. [Note: this latter stage of subtracting 1 is not always done
nor is it really necessary so long as when one later produces the forecasts one
remembers the definition of seasonality used when constructing the seasonal
forecasts from the trend forecasts.]
These values are observations of the seasonal factor, Sj , for the relevant months.
These values are averaged for each month to give Sj0 ; the average value of the Sj0 s for
j = 1, 2, . . . , 12 is found and subtracted from each Sj0 , giving the final estimate of the
seasonal factor Sj which satisfies the condition that S1 + S2 + · · · + S12 = 1. This
requirement ensures that a year of seasonal data has the same mean as a year of
de-seasonalised data. The seasonal factors are shown in Figure 9.4. Note that the
peak seasonal factor occurs in month 8 (August) when, in the United Kingdom, the
registration letter on a car’s number plate used to change. The lowest seasonal
demand is in December, as buyers postpone their decision to buy until the next year,
in order to gain a better second-hand price when they come to sell their car.
Note the main features of this seasonal pattern are dictated by arbitrary, legislative
decisions rather than an underlying weather-based effect, etc.

164
9.8. The structure of a time series

Table 9.3: Monthly new car registrations in the United Kingdom over a seven-year period.

Table 9.4: Processing the data.

165
9. Forecasting

Figure 9.3: New cars registered in the United Kingdom over seven years with a 12-month
centred moving average.

Figure 9.4: Seasonal factors for UK car registration data (multiplicative model).

166
9.9. Decomposition models of time series

9.9 Decomposition models of time series


The components of a time series have been discussed, in the context of the analysis of a
time series. In this section, we will extend the analysis to describe the series as a simple
mathematical model.
The simplest case is when a series exhibits only random variation.

Xt = Tt + t (9.6)

where
Tt = Tt−1 + bt

and
t ∼ N (0, σ 2 ), bt ∼ N (0, σb2 )

and
σ 2  σb2 .

(9.6) says that the variable Xt is the sum of two random variables; Tt is the trend term
discussed earlier. If the expected period-to-period change is zero, as in (9.7), it is called
the level of the series; however, the level is subject to a small random change each
period, bt ; t is the random error term (or noise term) which has a far larger variance
than the disturbance in level.
The objective of the forecaster is to make the best possible estimate of the level, Tt , as
this will be the best forecast at time t for the value of the variable to be forecast L
periods ahead, Xt+L .
An incremental trend term can be introduced to make the model more general. This
term, ct , represents the systematic change in level at time t; this term is subject to a
random disturbance term, dt . The model including trend is given here: 9
Xt = Tt + t (9.7)

where

Tt = Tt−1 + bt + ct
ct = ct−1 + dt

and
dt ∼ N (0, σd2 )

and
σ 2  σd2 .

Seasonality can be introduced in the model in a number of ways. The simplest method
is to de-seasonalise the series by dividing the observations by (1 + Sj ), and then
proceeding as in (9.7).

167
9. Forecasting

9.9.1 Model estimation


As we have seen, moving averages can be used to estimate past trends. However, an M
period moving average trend for time t only becomes available at time t + (M − 1)/2 (or
t + M/2 for centred moving averages). That is, the moving average estimates are always
out of date.
An approach that overcomes this difficulty to a large extent is the exponential weighted
moving average (EWMA):

T̂t = αXt + α2 Xt−1 + α3 Xt−2 + · · · (9.8)

where α is a constant taking values between 0 and 1.


[N.B. T̂t is the forecast made at time t for the value at time t + 1.]
The form of the EWMA puts a lower weight on an observation as it becomes more
dated.
The equation shown in (9.8) can be simplified as follows:

T̂t = αXt + (1 − α)T̂t−1 . (9.9)

or
T̂t = T̂t−1 + α(Xt − T̂t−1 ) (9.10)
In words, the estimate of the level at time t is equal to the previous estimates made at
time t − 1 plus α times the previous error. For the model with a systematic trend,
equation (9.10) is adapted and there is a further updating equation for the systematic
trend term.

T̂t = T̂t−1 + ĉt−1 + α(Xt − (T̂t−1 + ĉt−1 )) (9.11)


ĉt = ĉt−1 + β((T̂t − T̂t−1 ) − ĉt−1 ). (9.12)

9 The value of β is also between 0 and 1. Note that the basic updating equation can be
characterised as:

new estimate = old estimate + updating factor × (outcome − old estimate).

9.9.2 Forecasting procedure with EWMAs


For a given time series, say T observations, a reasonable approach is to take two-thirds
of the data for model estimation, that is finding the optimal values for the smoothing
constants α and β. There is not an analytical solution for these values as they are found
by a grid search method. The objective function is the mean squared error (MSE) of the
one step ahead forecast error, i.e. MSE is
T
X (Xt+1 − E(Xt+1 | Xt , . . .))2
t=1
T

where the k-step ahead forecast is:

E(Xt+k | Xt , . . .) = T̂t + kĉt .

168
9.9. Decomposition models of time series

9
Table 9.5: Details of the ‘level’ only exponential smoothing model on sales of a product.

The systematic trend ĉt is omitted for the random variation only forecast.
The starting points for the calculation of the mean squared error (MSE) are

T̂1 = X1 , ĉ1 = 0.

Example 9.3 Simple exponential smoothing to forecast the sales of a


product
Table 9.5 shows the sales data Xt in column 2 and the exponentially smoothed
forecast is given as Tt in column 3. The error squared term is also noted.
A level only model is used. The value of the smoothing constant that minimises the
mean squared error has been determined, using a grid search method, to be 0.298.

169
9. Forecasting

The first 40 observations were used for model estimation (i.e. the calculation of the
mean squared error). Using the starting procedure mentioned, the estimated trend
at time t = 2 is X1 = 82.37. Using equation (9.9) then gives

T̂3 = 0.298 × 73.06 + (1 − 0.298) × 82.37 = 79.60.

This procedure is then repeated for the remaining data.

9.10 Simple Box-Jenkins (ARIMA) methods


In this chapter we have already seen forecasts based upon moving averages and
seasonality. We will also come across forecasts for dependent variables based upon
several independent (explanatory) variables in Chapter 10. Autoregressive integrated
moving average (ARIMA) models are a class of linear models that are based upon the
patterns that might be evident in the history of the data. They can be used for
stationary (where the time series varies about a fixed level) and non-stationary (where
the time series has no natural constant mean level) time series. ARIMA models do not
require independent (explanatory) variables but use the information in the given time
series itself to forecast future values. George Box and Gwylim Jenkins were two pioneers
in using ARIMA models and hence this type of forecasting is often referred to as
Box-Jenkins forecasting.

9.10.1 The Box-Jenkins methodology


The method makes no initial assumption about the form of the appropriate forecasting
model and using an iterative approach to finding a suitable model from a general class
of them. A model is regarded as fitting well if the residuals (errors) are acceptably small
and randomly distributed. An examination of the plot of the time series and the
9 autocorrelation (and partial autocorrelation) for various time lags is used to pick an
initial model. The autocorrelation and partial autocorrelation patterns from the time
series are compared with standard (theoretical) patterns and the matching ARIMA
model is formulated. Some theoretical autocorrelation and partial autocorrelation
patterns are shown in Figures 9.5, 9.6 and 9.7.
[Note: A partial autocorrelation at time lag k is the correlation between observations
Yt and Yt−k after adjusting for the effects of the intervening observations
Yt−1 , Yt−2 , . . . , Yt−k+1 .]
It is, of course, virtually impossible that there will be a perfect fit between any
theoretical model and the plots from the actual data – this is because of random
variation, etc. However, it is often possible to identify a possible model and its adequacy
is then assessed by looking at the residuals of the model. If necessary the model can
then be modified and experienced; using the ARIMA models helps to build up useful
models in an efficient fashion. We consider the three main types of Box-Jenkins models
(i.e. autoregressive, moving average and autoregressive moving average) below. Before
doing so, however, it will be useful to recognise the notation used:

Yt is the response (dependent) variable (i.e. the time series to be forecast) at time t.

170
9.10. Simple Box-Jenkins (ARIMA) methods

Yt−i , for various i = 1, 2, . . . , are the response variables at timelag i. They are used
like independent (explanatory) variables.

φi and ωi for various i = 1, 2, . . . , are coefficients to be determined.

t−i is the error term in period t − i.

9.10.2 Autoregressive models


A pth order autoregressive model, AR(p), takes the form:

Yt = φ0 + φ1 Yt−1 + φ2 Yt−2 + · · · + φp Yt−p + t .

Autoregressive models are suitable for stationary time series and the constant φ0 would
be related to the level of the series. If the data vary about zero then the constant φ0 is
not required.

9.10.3 Moving average models


The qth order moving average model, MA(q), is of the form

Yt = µ + t − ω1 t−1 − ω2 t−2 − · · · − ωq t−q

where µ is the constant mean of the process.


Note: The description of this model as being moving average is historical and has
nothing to do with moving averages as we have already covered.

9.10.4 Autoregressive moving average models


We can combine a model with autoregressive terms with a moving average model. The
resulting ARIMA (p, q) model is a mixed autoregressive moving average model of the
form:
9
Yt = φ0 + φ1 Yt−1 + φ2 Yt−2 + · · · + φp Yt−p + t − ω1 t−1 − ω2 t−2 − · · · − ωq t−q .

Note: Table 9.6 might be useful to summarise the patterns associated with the various
autoregressive-moving average models.

Table 9.6: Autocorrelation and partial autocorrelation characteristics of the various


models.

In practice the models used rarely exceed p and/or q = 2.

171
9. Forecasting

Figure 9.5: Autocorrelation and partial autocorrelation coefficients of AR[1] and AR[2]
models. The first four graphs are for AR[1] and the next four are for AR[2]. Variations
occur because of coefficient signs.

172
9.10. Simple Box-Jenkins (ARIMA) methods

Figure 9.6: Autocorrelation and partial autocorrelation coefficients of MA[1] and MA[2]
models. The first four graphs are for MA[1] and the next four are for MA[2]. Variations
occur because of coefficient signs.

173
9. Forecasting

Figure 9.7: Autocorrelation and partial autocorrelation coefficients of mixed ARMA[1,1]


models. Variations occur because of coefficient signs.

174
9.11. Summary

9.10.5 Building a model


The Box-Jenkins forecasting approach is really an iterative method of trying to identify
a suitable model. The steps are as follows:

1. Model identification
The time series being forecast should be stationary, i.e. appearing to vary about a
fixed level. If the series is not stationary then it can often be converted to
stationarity by taking first differences. Diagrams similar to Figure 9.5 are then used
to suggest a model.
2. Model estimation
The parameters of the model are estimated so that they minimise the sum of
squares of the errors, i.e. fitted values minus actual values.
3. Model checking
A model is adequate if the residuals (errors) cannot be used to improve the
forecasts, i.e. they contain no identifiable pattern (i.e. they are random).
An overall check of the model adequacy is provided by the Ljung-Box statistic.
This is defined to be m
X rk2 (e)
Q = n(n + 2)
k=1
n−k
where
• rk (e) is the residual autocorrelation at lag k
• n is the number of residuals
• k is the time lag
• m is the number of time lags to be tested.
If the observed Q value is small (i.e has a p-value of < 0.05) then the model is
considered adequate. [If the data is random, then Q should follow a chi-squared 9
distribution on m degress of freedom.]
4. Forecasting with the model
Consider re-assessing the model once some more data become available.

9.11 Summary
What you need to know

How to classify forecasts according to the framework given.


How to calculate moving averages.
How to perform exponential smoothing.
How to perform seasonality-based forecasting.
How to analyse time series using Box-Jenkins (ARIMA) models.

175
9. Forecasting

What you do not need to know

How to derive optimal smoothing constants.

How to derive optimal coefficients for Box-Jenkins forecasts.

9.12 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

understand the basic terminology of forecasting

classify the forecasting models by leadtime and information used

decompose a model into trend, seasonality, cycles and random variation

construct simple models to forecast time series – these should include exponential
smoothing, moving averages, seasonal forecasts and simple Box-Jenkins methods

assess the usefulness of a forecasting model by determining and interpreting the


performance measures of root mean square error and mean absolute deviation.

9.13 Sample examination questions

9 1. (Please note that this is only part of a full examination question.)


The sales of vegetable burgers are shown in the table below.

Although two stock controllers agree that simple exponential smoothing is the best
forecasting method, they disagree about the most appropriate loss criterion for
choosing the smoothing constant. Albert prefers root mean square forecasting
error, but Bertram prefers mean absolute forecasting error. The values of the
smoothing constant that satisfy Albert and Bertram are 0.42 and 0.54, but due to
an administrative error, the information linking the smoothing constant to the loss
criterion has been mis-filed. Using the data above, say which smoothing constant
relates to which loss criterion.
(14 marks)

176
9.13. Sample examination questions

2. The table below shows the total export orders for a company during 2000–03:

(a) Briefly discuss how one might choose between a multiplicative and additive
seasonal forecasting model.

(4 marks)

(b) Using three-point moving averages, isolate the trend.

(4 marks)

(c) Estimate the seasonal variations using a multiplicative model, and thus
forecast the value of exports for the company during the three periods in 2004.

(5 marks)

(d) Repeat (b) but using an additive model.

(5 marks)
9
(e) If the exports for 2004 turn out to be Jan–Apr: 66; May–Aug: 72; Sep–Dec: 68
calculate the root mean square errors (RMSEs) for your 2004 forecasts
obtained from (c) and (d) above.

(2 marks)

3. Two forecasting equations are proposed for a set of sales data, Xt .

1. Xt = φXt−1 + at

2. Xt = µ + θat−1 + at

where at is an error term; φ is 1.05, µ is 115 and θ is 0.1. Ten observations are
given below:

177
9. Forecasting

(a) Evaluate the one period ahead forecasts provided by each equation. Which
forecasting equation gives the lowest mean absolute error?
(14 marks)
(b) Explain how you would forecast more than one period ahead with each
equation, write down an expression for an N period ahead forecast. Evaluate a
forecast made at time t = 10, for N = 5 periods ahead.
(6 marks)

9.14 Guidance on answering the Sample examination


questions
9 1. With α = 0.42 we get the following forecasts and errors, etc:

178
9.14. Guidance on answering the Sample examination questions

p
Hence RMSE = (803.2/10) = 8.96 and (mean absolute deviation (MAD)) =
83.3/10 = 8.33.

However, with α = 0.54 we get:

p
Hence RMSE = (830.2/10) = 9.11 and MAD = 81.4/10 = 8.14.

Hence it would appear that (of these two smoothing constants) 0.42 would be used
if the loss criterion is root mean squared deviation and 0.54 would be used if the
loss criterion was mean absolute deviation.
9

2. (a) See Section 9.8 (Seasonality). A diagram might help to see if the seasonality
factor increases (decreases) as the data rise (fall) or whether they are the same
irrespective of whether the data rise or fall. In the first case one is tempted to
use multiplicative factors, in the second case additive seasonality factors seem
more appropriate.

(b) The trend per period is (63.0 − 50.0)/9 = 1.4444.

[Note: In the table below we are using the three-month moving average as a
trend forecast (an interpretation of part (a)). An alternative, giving different
answers, would be to determine a trend through the whole data, i.e
(61 − 45)/11 = 1.455 per period (season) and then produce a trend forecast by
adding on 1.455 each season. This would actually be a more common approach
(but, strictly, is not what the question suggested).]

179
9. Forecasting

(c) For the multiplicative seasonality the average Jan–Apr seasonal factor is
(0.962 + 0.931 + 0.968)/3 = 0.954. Similarly the averages for May–Aug and
Sep–Dec are 1.106 and 0.941, respectively (note that there are four values to
be averaged for May–Aug). The trended forecasts for the first three periods of
2004 are 63.0 + 2(1.4444) = 65.889 for Jan–Apr, 63.0 + 3(1.4444) = 67.333 for
May–Aug and 63.0 + 4(1.4444) = 68.778 for Sep–Dec. Multiplying by the
seasonal factors will then give the seasonal forecasts as 65.889(0.954) = 62.86,
67.333(1.106) = 74.47 and 68.778(0.941) = 64.72, respectively.

(d) For the additive seasonality the average Jan–Apr seasonal factor is
9 (−2 − 4 − 2))/3 = −2.67. Similarly the averages for May–Aug and Sep–Dec are
6.00 and −3.33, respectively (note that there are four values to be averaged for
May–Aug). The trended forecasts for the first three periods of 2004 are as
before and adding the seasonal factors will then give the seasonal forecasts as
65.889 − 2.67 = 63.22, 67.333 + 6.00 = 73.33 and 68.778 − 3.33 = 65.45,
respectively.

(e) To calculate the RMSE we determine


rP
(Actual − Estimate)2
RMSE = .
n

For the multiplicative model this gives


r
1
((66 − 62.86)2 + (72 − 74.47)2 + (68 − 64.72)2 ) = 2.984
3

and 2.309 for the additive seasonality model.

180
9.14. Guidance on answering the Sample examination questions

3. The forecasts (together with RMSE and mean absolute errors) using the two
models (equations) are as follows:

Hence the first model (Equation 1) gives the lowest mean absolute error.
For Equation (1) we would use E(Xt+N | Xt ) = φN Xt which gives

X15 = (1.05)5 (120.5) = 153.8.

For Equation (2) we would use E(Xt+N | Xt ) = µ which gives X15 = 115.
9

181
9. Forecasting

182
Chapter 10
Econometrics, multiple regression
and analysis of variance

10.1 Aims of the chapter

To outline the process by which econometric models are formulated and tested.

To give a clear explanation of why assumptions are necessary when model building
(using multiple regression as a specific case) and what happens when they are
invalid.

To explain the output from a multiple regression statistical package.

To introduce analysis of variance (ANOVA) tables and show how they can be used
as a general approach to data analysis and model testing.

To explain how hypothesis tests (t, F and Durbin Watson in particular) can be
used to validate a model.

10.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:
10
understand the assumptions made for constructing a multiple regression model and
appreciate the difficulties when they do not hold

interpret the output from a multiple regression package

perform significance tests (involving the t distribution) on the beta coefficients of a


multiple regression

perform F tests on the overall usefulness of a multiple regression model and on


simple ANOVA results

perform Durbin Watson tests for autocorrelation.

183
10. Econometrics, multiple regression and analysis of variance

10.3 Essential reading

Hanke, J.E. and D.W. Wichern Business forecasting. (Upper Saddle River, NJ:
Pearson Prentice Hall, 2009) Chapter 7. There’s also a small section on ANOVA in
Chapter 6.

Johnson, R.A., and D.W. Wichern Applied multivariate statistical analysis. (New
York: Pearson Prentice Hall, 2007) Chapters 6.4–6.7, 7.

10.4 Further reading

Pindyck, R.S. and D.L. Rubinfield Econometric models and economic forecasts.
(New York: McGraw-Hill, 2007).

10.5 Introduction
Econometrics is the application of statistical techniques to the formulation, estimation
and validation of economic relationships.
An economic model seeks to represent and explain economic behaviour; thus, the
structure of the equations developed must make good economic sense. If the model is
also statistically valid, then it can be used for prediction purposes. The causal models
discussed in Chapter 9 are econometric models designed to represent the economics at
the micro-level of the company.
Most of the effort in econometric modelling goes into the validation of the model as,
thanks to the ease of use of current software, the computation of the coefficients is
straightforward. Such software can also be used to produce an analysis of variance – a
concise and widely used approach to analysing models and testing the results. The topic
of ANOVA could be introduced in other chapters of this subject guide (such is its wide
applicability), but it seems most straightforward to introduce it as optional output from
10 a regression package. Its possible use in other areas will then be referred to. The
application of econometric methods consists of five stages:

formulation

estimation

validation

forecasting

implementation.
These stages will be discussed in turn.

184
10.6. Formulation

10.6 Formulation
Formulation involves three main steps:
Choice. The first choice is of the variables to be included. For example, if the objective
is to model and forecast the sales of a non-essential product, then one might choose to
include a measure of promotional effort (e.g. advertising expenditure), a measure of
buying power (e.g. consumers’ real disposable income) and any other relevant variable
(such as temperature if demand is weather-related).
Preliminary specification. The direction of causality has to be considered. This will
be described by the expected sign and magnitude of the coefficients, for example:

Sales = α + β1 (Adv. Exp.) + β2 (Con Ret Disp Inc) − β3 (Temp.) (10.1)

Here we would expect that α, β1 , β2 , β3 > 0 and we would hope that β1 > 1. This means
that we would expect £1 spent on advertising to generate more than £1 worth of sales.
Terminology:

The dependent variable is called the endogenous variable (sales).


The independent variables are called exogenous variables (adv. exp., etc.).
Mathematical form of the equation. A crude categorisation of possible
mathematical forms is into:

linear: as in (10.1) above


log-linear:

Sales = α + β1 log (Adv. Exp.) + β2 log (Con Ret Disp Inc) − β3 log (Temp.).

log-log:

log(Sales) = α + β1 log (Adv. Exp.) + β2 log (Con Ret Disp Inc) − β3 log (Temp.).

The choice of form depends on how one believes a variable affects the variable being
predicted. If the effect is simply additive, then the linear model is appropriate. If the
effect is proportional, then a log-linear or log-log model may be considered.
10

10.7 Estimation
Estimation involves collecting the data and deciding upon the appropriate estimation
technique.
Data collection. The first stage is the gathering of the data. This is a non-trivial
exercise and it probably will take much time and effort to collect and check the data.
Once they have been collected, subjecting the data set to exploratory data analysis will
help confirm the hypotheses about relationships between the variables (see Chapter 11).
Selection of estimation technique. The choice lies between various types of single
equation models using a least squared error objective function or multiple simultaneous

185
10. Econometrics, multiple regression and analysis of variance

equations. The only method that falls within the scope of this course is straightforward
least squares estimation as described in the first year quantitative methods subject; this
is called OLS (Ordinary Least Squares) by econometricians.
The output from an OLS estimation procedure will give the following information:

coefficient estimates

the standard error of the coefficient estimates

the coefficient of determination R2 .


There may be more information provided, depending on the software used; the above is
the minimum amount necessary to comment on the model.
R2 can be used to test whether any of the coefficients are significantly different from
zero; that is, the null hypothesis would be that the sales variable is unrelated to any of
the exogenous variables identified. The analyst would want to be able to reject this
hypothesis; otherwise the model is useless.
The individual coefficients are checked next. The hypothesis that each coefficient is
different from zero is checked by comparing

|(Coefficient/Standard error of coefficient)| > zα/2

where α is the significance of the test and z represents a tabulated normal value (or
Student’s t distribution if the regression uses fewer than 30 observations).

10.8 Validation
For a model to be acceptable it must be checked to see if various assumptions have been
significantly violated.

10.8.1 Econometric criteria

10 The OLS estimation procedure makes a number of assumptions about the data and the
system generating them. Departures from these assumptions have various consequences
which undermine the usefulness of the model. Consider a model of this form:

Yt = α + β1 X1t + β2 X2t + ut .

These assumptions are:


For the residuals:

1. the residuals ut are normal random variables

2. E(ut ) = 0, the average residual is zero

3. E(u2t ) = constant (the variance of the residual does not change with time or any
other variable); this is called ‘homoscedasticity’

186
10.8. Validation

4. E(ut ut−1 ) = 0 residuals are independent over time. They are not correlated with
themselves (not auto-correlated).
For the independent variables:

1. Xt is truly exogenous
2. the correlation between X1t and X2t is less than 1; that is, they are not collinear.
Further assumptions are that:

1. the independent variables are uncorrelated with the residuals


2. the model is ‘true’: there is no mis-specification of variable or form, no missing
variables and no superfluous variables.
Violation of the assumptions about the residuals (1,2,3) can be tested by the
Jarques-Bera test, although this test is outside the scope of this course. If a model
breaches these assumptions the most likely explanation is that the form of the equation
is poor and other forms should be investigated.

10.8.2 Auto-correlated errors – violation of assumption 4


If the residuals are not independent, then the relationship between successive residuals
can be modelled:
Yt = α + β1 X1 + et
and the residual, et , could be represented as
et = ρet−1 + ut
where ut is a Normal random variable.
The cause of auto-correlated residuals may be one or more of the following:

common trend or cycle in the variables


omission of important explanatory variable
mis-specification of the form of the equation
use of smoothed or adjusted data.
10
The effects are that the coefficient estimates are unbiased, but the variance of the
residuals is underestimated, standard error of the coefficients.
The importance of autocorrelated residuals, apart from an incorrectly specified model,
is that the effects listed above are likely to lead to an overassessment of a model’s
statistical significance and consequentially poor inferences. The most common test used
for the detection of autocorrelated errors is the Durbin Watson test. This test is
appropriate provided that there is no lagged endogenous variable used (i.e. salest−1 used
to explain salest ). The statistic is shown below:
n
(et − et−1 )2
P
t=2
d= n .
e2t
P
t=1

187
10. Econometrics, multiple regression and analysis of variance

Figure 10.1: Testing for auto-correlated errors using the Durbin Watson statistic.

The values of d are tabulated in most statistical tables; typically two values are given,
dU and dL . Their use is demonstrated in Figure 10.1.

10.8.3 Multi-collinearity – violation of assumption 6


The exogenous variables are assumed to be independent of each other. This implies that
they should be uncorrelated with each other. In practice, it is impossible to ensure this.
Low levels of correlation between variables are acceptable, but when the variables are
highly correlated, they are close to being collinear. Some independent variables may be
functions of others. Reliability of the individual coefficients is severely reduced by
multicollinearity, but the overall goodness of fit of the whole model is not affected.
Effects:
10
Indeterminate coefficient estimates of βi (but unbiased).

The standard errors of these estimates are too large.

The coefficient estimates are unstable and change if the number of variables or
number of observations is altered.
The problem can be considered as having at least two variables contributing the same
information about the dependent variable. Thus one solution is to drop one variable.
However, in addition to the duplicated information the independent variable might
contain some unduplicated information.
Remedies:

Do nothing (in spite of the effects, the predictive ability of the model is unaffected).

188
10.9. A case study – Lydia Pinkham’s vegetable compound

Drop ‘superfluous’ variables.


Increase sample size (this reduces the severity of the effects mentioned).
Restricted least squares. One coefficient is fixed to a particular value and not
estimated. Another set of data, or prior knowledge, is used to fix this coefficient.
The likely presence of multicollinearity can be detected by looking at a correlation
matrix of the independent variables.

10.8.4 Mis-specification – violation of assumption 8


This simply means that the model is incorrect in that there is a discrepancy between it
and the underlying system, ‘the real world’. That is, either a variable has been omitted
when it should be included, or a variable that has been included should be omitted. An
omitted variable leads to biased estimates of the coefficients of the variables that have
been included. A superfluous variable can be detected if the estimate of βi is not
significantly different from 0. However, if discarding the variable changes other
regression coefficients, then the discarded variable cannot be genuinely superfluous.

10.8.5 Error in variables – violation of assumption 7


This can be caused by measurement errors or transcription errors, or incorrectly defined
variables.

10.9 A case study – Lydia Pinkham’s vegetable


compound
This product, manufactured in Massachusetts, USA, is a patent medicine which has
been continuously on sale since 1873. The data available here are from 1907 to 1959.
The company was a pioneer in the use of intensive advertising. It was their only
marketing expense and constituted between 40 and 80 per cent of revenue. No salesmen
were employed and no cash discounts were used after 1917. No competitors ever
penetrated the market; thus, the environment is remarkably stable throughout the 10
period for which we have data. These data have become a standard against which many
forecasting models have been assessed.
Variables used:
Sales revenue This is the endogenous (dependent) variable.
Advertising expenditure The most interesting exogenous variable, the
lagged effects of advertising can be investigated.
A measure of disposable income This is the national USA income.
Time This simply gives a time trend. It may represent
increasing affluence or decreasing gullibility.

Plots of the sales data alongside advertising expenditure are shown in Figure 10.2. Sales
data are plotted alongside disposable income in Figure 10.3. The full data set used is
given in Table 10.1.

189
10. Econometrics, multiple regression and analysis of variance

In line with good forecasting practice, described in Chapter 9, the first 41 observations
out of the 52 available will be used for model estimation. The choice of the variables in
this study has already been made. Our preliminary specification of the model is that:

Salest = α + β1 Adv. Exp.t + β2 Adv. Exp.t−1 + β3 Disp. Incomet

with all the coefficients positive. We would probably expect β1 > β2 , since the effects of
advertising decrease with time. We will assume a linear model initially. OLS estimation
of the model gives the results shown in Table 10.2.

Date Sales Salelag1 Advert Advlag1 DispInc


1908 921 1016 451 608 29.5
1909 934 921 529 451 30.2
1910 976 934 543 529 30.5
1911 930 976 525 543 31.9
1912 1052 930 549 525 33.9
1913 1184 1052 525 549 34.8
1914 1089 1184 578 525 35.8
1915 1087 1089 609 578 40.2
1916 1154 1087 504 609 47.8
1917 1330 1154 752 504 55.2
1918 1980 1330 613 752 62.3
1919 2223 1980 862 613 63.3
1920 2203 2223 866 862 71.5
1921 2514 2203 1016 866 60.2
1922 2726 2514 1360 1016 60.3
1923 3185 2726 1482 1360 69.7
1924 3351 3185 1608 1482 71.4
1925 3438 3351 1800 1608 73
1926 2917 3438 1941 1800 77.4
1927 2359 2917 1229 1941 77.4
1928 2240 2359 1373 1229 77.5
1929 2196 2240 1611 1373 83.1
1930 2111 2196 1508 1611 74.4
10 1931 1806 2111 983 1508 63.8
1932 1644 1806 1046 983 48.7
1933 1814 1644 1453 1046 45.7
1934 1770 1814 1504 1453 52
1935 1518 1770 807 1504 58.3
1936 1103 1518 339 807 66.2
1937 1266 1103 562 339 71
1938 1473 1266 745 562 65.7
1939 1423 1473 749 745 70.4
1940 1767 1423 862 749 76.1
1941 2161 1767 1034 862 93
1942 2336 2161 1054 1034 117.5
1943 2602 2336 1164 1054 133.5

190
10.9. A case study – Lydia Pinkham’s vegetable compound

Date Sales Salelag1 Advert Advlag1 DispInc


1944 2518 2602 1102 1164 146.8
1945 2637 2518 1145 1102 150.4
1946 2177 2637 1012 1145 160.6
1947 1920 2177 836 1012 170.1
1948 1910 1920 941 836 189.3
1949 1984 1910 981 941 189.7
1950 1787 1984 974 981 207.7
1951 1689 1787 768 974 227.5
1952 1866 1689 920 768 238.7
1953 1896 1866 964 920 252.5
1954 1684 1896 811 964 256.9
1955 1633 1684 789 811 274.4
1956 1657 1633 802 789 292.9
1957 1569 1657 770 802 308.8
1958 1390 1569 639 770 317.9
1959 1387 1390 744 639 337.3

Table 10.1: Lydia Pinkham’s vegetable compound data.

10

Table 10.2: Output from multiple regression analysis of Lydia Pinkham data.

191
10. Econometrics, multiple regression and analysis of variance

Figure 10.2: Plots for Lydia Pinkham’s sales and advertising.

The F statistic quoted is a check for the overall regression. The null hypothesis is that
all the βs are zero. In this case the F statistic is distributed as an F random variable
with 3 and 37 degrees of freedom (F values are included in most statistical tables). Here
the observed value of F is 38.9 and there is a negligibly small probability of this
occurring if the null hypothesis is true. Thus at least one β is significantly different from
0. Inspection of the values of the Student’s t test for the coefficients shows that only β2
is not significantly different from 0.
Re-estimation of the equation without lagged advertising expenditure produces only
minor changes:

10

In order to investigate whether the passage of time has had any useful effect, a date
variable (simply the year of the observation can be included).
As we can see this makes only a minor improvement.

192
10.9. A case study – Lydia Pinkham’s vegetable compound

Figure 10.3: Plots of Lydia Pinkham’s sales and disposable income.

10
That is:
Salest = 27608 + 1.35Advt + 7.83Disp.Inc.t − 14.32Datet .
Let us call this model (1).
The coefficient of this date variable is negative showing a downward tendency in sales.
The overall contribution that the disposable income variable and the date variable make
to the equation is to explain the longterm variations in sales about the constant. This
can be seen by looking at Figure 10.3, where disposable income follows a smooth
upward trend and the date term will contribute a negative trend (due to its coefficient).
Advertising is contributing to the explanation of short-term variation. A common
approach is to introduce a lagged endogenous variable, Salest−1 , which provides an
‘anchor’ for future variation in sales and removes the need for the long-term trend
variables of date and disposable income. In this case, there is also an opportunity to

193
10. Econometrics, multiple regression and analysis of variance

re-introduce lagged advertising. The details of the fitting of this equation are given
below.

That is:
Salest = 182.8 + 0.97Salest−1 + 0.55Advt − 0.66Advt−1 .
Let us call this model (2).
A possible development would be to model change in sales (Salest − Salest−1 ) in terms
of changes in advertising expenditure, but this will be left as an exercise for the
interested reader.
The Durbin Watson statistic shows significant positive auto-correlation of errors for all
except the last model (the critical value for dL is about 1.3 at five per cent significance).
In the last model, the behaviour of the DW statistic is biased by the lagged endogenous
variable, so no comment can be made.
In order to judge the degree of multicollinearity, the correlation matrix of the variables
is reproduced below.

10

Note that sales are correlated with all the variables, at least at a 10 per cent

194
10.10. Analysis of variance

significance level. There are significant correlations between advertising and lagged
advertising, advertising and lagged sales, lagged advertising and lagged sales.
Disposable income and the date variable are also quite highly correlated.
It is clear that model (2) fits the data better than model (1) as can be seen from this
summary table below:

Note that standard error is the root mean square error, where an error is (observed −
estimated) over the fitted region.
The forecasting performance of the two models will be examined in the next section.
Since disposable income will not be known exactly in advance, unlike advertising
expenditure, which is a variable controlled by the company, model (1) will be more
difficult to use for practical forecasting than model (2).

10.10 Analysis of variance


[This would be an extremely lengthy topic, with lots of background theory and notation,
if explained fully. Hence you are referred to a good statistical text (e.g. Newbold,
Carlson and Thorne; Johnson and Wichern; Levine et al.) for further explanation.]
You might have noticed in Table 10.2 that the multiple regression software has given an
analysis of variance (ANOVA) table for the Lydia Pinkham analysis. This is a
particularly simple ANOVA for the multiple regression model. Like all ANOVA tables it
comes from a breakdown of the total variability (sum of squared deviations about the
mean) of the data into component parts. This enables us to do certain hypothesis tests
very easily. Thus for multiple regression the total variability, SST, in the dependent
variable can be partitioned into the component explained by the regression, SSR, and
the component due to the unexplained error, SSE. 10
i.e. SST = SSR + SSE
where

the total sum of squares SST is given by


n
X
SST = (Yi − Ȳ )2
i=1

the explained sum of squares, SSR, is


n
X
SSR = (Ŷi − Ȳ )2
i=1

195
10. Econometrics, multiple regression and analysis of variance

and the unexplained sum of squares, SSE, is


n
X
SSE = (Yi − Ŷ )2 .
i=1

[Note: the common notation is as follows: Yi are the observed values of the dependent
variable, Ŷi is the corresponding values predicted by the regression model and Y is the
mean of the observed values.]
Again referring back to Table 10.2 we have SSR = 14,775,145.2 , SSE = 4,682,178.4 and
hence (and sometimes also shown in the ANOVA) SST = 19,457,323.6.
The ‘Mean Square’ figures in the table come from dividing the appropriate sum of
squares by the degrees of freedom (DF). Thus MSR = 14,775,145.2/3 = 4,925,048.4 and
MSE = 4,682,178.4/37 = 126,545.4.
Let us now look at doing a hypothesis test using the ‘Mean Square’ data.

10.10.1 Tests of whether all multiple regression coefficients are


zero
Consider the multiple regression model

Yt = α + β1 X1t + β2 X2t + · · · + βk Xkt + t .

Once a regression model is created a useful initial hypothesis to test (to see if the model
is at all worthwhile) is H0 : β1 = β2 = · · · = βk = 0. Accepting this hypothesis would
lead to the conclusion that none of the k independent variables are useful in explaining
the dependent variable and thus we need to produce a fresh model specification with a
new set of independent variables.
The variance of the regression model s2e can be estimated using
n
2i
P
i=1 SSE
s2e = = = MSE.
n−k−1 n−k−1
10 If the null hypothesis H0 is true then MSE = SSR/k is also a measure of error with k
degrees of freedom. As a result of having two possible estimates of a variance (if H0 is
true) then the ratio
SSR/k MSR
F = = .
SSE/(n − k − 1) MSE
has an F distribution with k degrees of freedom in the numerator and n − k − 1 degrees
of freedom in the denominator. Hence we compare the computed value of F with the
critical values of an F table at the appropriate level of significance. If the computed
value lies outside of the critical limits we will reject the null hypothesis and conclude
that at least one of the X variables is useful.
The ANOVA table output from a spreadsheet will often save you the effort of looking
up critical values from an F table by giving you a significance value (equivalent to a
p-value for testing individual variable coefficients). Thus, if the significance value of the
observed F is less than 0.05, say, we conclude that the H0 hypothesis can be rejected.

196
10.11. Forecasting using econometric models

Once again referring back to Table 10.2, the observed F value is MSR/MSE =
4925048.4/126545.4 = 38.92, which is highly significant i.e. lies in an extremely small
tail (approx 0.0000 probability) of the F distribution that one would expect it to belong
to if H0 is true. Hence H0 is rejected and the model does have some worth.

10.10.2 Other uses of ANOVA tables


The above example has demonstrated how ANOVA output can be easily used to
undertake certain key hypotheses tests. The general approach of getting separate
estimates of a variance where these estimates should be the same if certain hypotheses
hold (and an F test is possible) can be extended into many other areas e.g. testing to
see if the means of several populations are all equal. You should refer to statistical texts
for a fuller description. There is no need to worry about the full range of ANOVA uses
in all multivariate models.

10.11 Forecasting using econometric models


The emphasis in econometrics tends to be on fitting the model, in sample performance,
as opposed to forecasting or ‘out of sample’ performance. The same basic rules
mentioned for validating univariate models apply. It is advisable to save some data for
measuring forecasting performance. The errors in forecasting are caused by:

random variation
estimation error in coefficients
prediction errors in exogenous variables.
A detailed analysis of errors can be made using a prediction/realisation diagram where
predicted changes are plotted against actual changes. This diagram shows how well the
econometric model predicts turning points and whether it tends to over- or
under-estimate (see Figure 10.4).

The forecasts of the Lydia Pinkham data


10
The forecasts for one period ahead by model (1) and model (2) are shown in Figure
10.5. The root mean square errors of the models are:

Model (1) 1118.9


Model (2) 108.2
Model (1) forecasts poorly outside the fitted region, whereas model (2) continues to
forecast as well out of sample as it does within sample. The presence of the lagged sales
value allows model (2) to keep track of the current location of sales, whereas model (1)
has to rely on the historical relation between sales and disposable income and date,
which appears to be unstable.
A prediction-realisation diagram for the same forecasts is shown in Figure 10.6. This
shows that the predictions of model (1) are predominantly over-estimates and turning

197
10. Econometrics, multiple regression and analysis of variance

Figure 10.4: A prediction realisation diagram.

point errors. Model (2) tends to underestimate the magnitude of change but makes few
turning point errors.

10.12 Summary
The emphasis in this chapter has been on the rigorous formulation of an econometric
model, the validation of the model and the forecasting performance of the model. The
details of estimation methods are outside the scope of this course.

What you need to know

You should be familiar with the output from regression packages used to estimate
econometric models and be able to write down the relevant model in the form of an
equation.

You should be able to critically examine the validation process and analyse
10 forecasting performance.

The stages of econometric modelling.

How to formulate an equation in different functional forms.

The main assumptions of OLS estimation.

How to interpret models and estimation descriptions.

How to analyse forecast performance.

What you do not need to know

How to perform any estimation.

198
10.12. Summary

Figure 10.5: A comparison of the fitted models.

10

Figure 10.6: A comparison of predicted and realised changes.

199
10. Econometrics, multiple regression and analysis of variance

10.13 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

understand the assumptions made for constructing a multiple regression model and
appreciate the difficulties when they do not hold

interpret the output from a multiple regression package

perform significance tests (involving the t distribution) on the beta coefficients of a


multiple regression

perform F tests on the overall usefulness of a multiple regression model and on


simple ANOVA results

perform Durbin Watson tests for autocorrelation.

10.14 Sample examination questions


1. (a) In order to determine the staffing requirements for an accident and emergency
facility at a hospital, the number of casualties needing attention needs to be
forecast. Give four examples of explanatory variables that might be usefully
incorporated into a multiple regression model for forecasting casualty numbers.
(8 marks)
(b) An analyst builds the following regression model for a company offering loans
over the telephone:
Y = α + β1 X1 + β2 X2 + β3 X3 .
Y is the number of telephone calls received on a particular day, X1 is the
number of television advertisements appearing that day, X2 is the number of
press advertisements appearing that day, X3 is a variable that is 1 during
weekdays and 0 at weekends. The estimated coefficients appear below.
10

R2 is 0.78, the variance of the error term is 112.2.


i. Explain the implications of the equation.
(6 marks)
ii. Should the loan company employ more staff at the weekend than during
the week?
(2 marks)

200
10.15. Guidance on answering the Sample examination questions

iii. If the cost of a television advertisement is W times the cost of a


newspaper advertisement, what value of W would make the advertisers
indifferent (in terms of calls generated) to the choice of medium?
(4 marks)

2. (Please note this is only part of a full examination question.)


A simple linear regression model applied to the forecasting of inwards air
passengers Xt as a function of time (t quarters) produces the following simple
linear regression equation based upon 16 observations:

Xt = 17612.5 + 107.4t + et

The observed Durbin Watson statistic is 2.08.


Perform a Durbin Watson test of auto-correlation on this model and comment
upon the result.
(6 marks)

3. (Please note this is only part of a full examination question.)


Consider an experiment with one factor of interest containing four groups (or
levels) and eight observations in each group. You have the following incomplete
ANOVA summary table:

(a) Fill in the missing results (?) in the above table.


(5 marks)
(b) Test the null hypothesis that all four groups have equal population means.
(5 marks) 10

10.15 Guidance on answering the Sample


examination questions
1. (a) Some possibilities include:
• Temperature: perhaps divided into two i.e. days too hot for heat
exhaustion, too cold for hypothermia, slipping on ice, etc.
• Time of the year: school holidays – more children having accidents or less
activity due to people being on holiday.
• Special occasion dummy variable: football matches, etc.

201
10. Econometrics, multiple regression and analysis of variance

• Long-term illnesses and non-urgent referrals form general practitioners: it


may be possible to adjust staffing requirements because of backlog of
treatment, health checks, etc.
(b) i. With no advertising the calls would average 216 per week, each TV
advertisement generates on average 21.7 calls each, newspaper
advertisements generate 10.6 calls each. It appears that on average they
get 68 calls more on weekdays than at weekends.
22% (1 − 0.78) of the variance of calls is unexplained by the regression
and additional variables might be suggested.
Confidence
√ intervals for predicted calls will be at least plus or minus
2 × 112.2 = 22.1.
Do t tests to see if each variable is significant.
ii. It makes sense to employ fewer staff at weekends.
iii. The cost per newspaper advertisement is P say, the cost per call
generated is P/10.6. The cost per TV advertisement is W P , the cost per
call generated is then W P/21.7.
Thus for indifference between media we should have P/10.6 = W P/21.7,
i.e. W = 2.05.

2. For n = 16 observations the tabulated five per cent level critical values of the
Durbin Watson statistic are given as dL = 1.10 and dU = 1.37.
The formal hypotheses are H0 : No autocorrelation against H1 : autocorrelation.
The limits are then evaluated as dL = 1.10, dU = 1.37, 4 − dU = 2.63 and
4 − dL = 2.90. Since the observed value of d is 2.08 it lies within the acceptance
region and hence we conclude that there is no autocorrelation in the equation.

3. (a) The parameters are n = total number of observations = 4 × 8 = 32; c =


number of groups (levels) = 4. Hence we have the following ANOVA table:

10

(b) H0 : All four population means are the same i.e. µ1 = µ2 = µ3 = µ4 .


The decision rule at five per cent level is to reject H0 if F > 2.95 (you
sometimes have to interpolate between the degrees of freedom in statistical
tables for critical F values). Since it is (i.e. 4 > 2.95) then we reject H0 and
conclude that the means are not all the same.

202
Chapter 11
Exploratory data analysis

11.1 Aims of the chapter


To explain the main methods (pictorial as well as numerical) by which large
amounts of data can be explored in order to establish which data to use.

To show how cluster analysis can be used as a means of data reduction as well as
for grouping objects according to some criteria.

11.2 Learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

construct and interpret ‘box and whisker’ diagrams

produce and interpret bivariate scatter diagrams

interpret multiple scatter plots

perform non-hierarchical (K-mean partitioning) clustering

perform hierarchical clustering using single, complete and average linkage methods

work with ‘distance’ or ‘similarity’ data

construct dendrograms for hierarchical clustering.

11.3 Essential reading 11


Johnson, R.A. and D.W. Wichern Applied multivariate statistical analysis. (Upper
Saddle River, NJ: Pearson Prentice Hall, 2007) Chapter 12.

11.4 Further reading


Aldenderfer, M.S. and R.K. Blashfield Cluster analysis. (Quantitative Applications
in the Social Sciences) (London: Sage, 1984).

203
11. Exploratory data analysis

Chatfield, C. and A. Collins Introduction to multivariate analysis. (London:


Chapman and Hall, 2000).
Everitt, B.S., S. Landau, M. Leese and D. Stahl. Cluster analysis (London: Hodder
Arnold, 2011).

11.5 Introduction
When carrying out large surveys for market research, social research or scientific
research, it is always wise to carry out an initial simple analysis of the data to:

i. detect outlying observations


ii. check for typographical errors
iii. compare the ranges of measurement of the variables
iv. compare the dispersion of the variables
v. check for relationships between variables
vi. detect groupings of variables
vii. to detect groupings of observations.
These analyses can be carried out without reference to a set of hypotheses since they
are exploratory and designed to find the messages the data may convey.
Many of these tasks, (i.) to (iv.), can be carried out by graphical means which will be
described in this chapter.
Checking for relationships between variables, (v.), can be carried out at different levels
of complexity. Graphical methods offer one approach that will be discussed here.
Regression methods, which will have been covered in an earlier 100 course, and factor
analysis, which is covered in Chapter 8, both offer different types of quantitative
information about relationships between variables.
Detection of groupings in variables or in observations, (vi.) and (vii.), can be explored
using cluster analysis methods. These will be discussed later in this chapter.

11.6 A typical data set


11
Throughout the chapter, the data to be analysed will be assumed to be a set of
multivariate observations. They can be presented in a tabular form.

204
11.7. Graphical methods

11.7 Graphical methods

One of the great benefits modern software offers is ease of graphical presentation.
Spreadsheets offer a wide range of different 2 and 3D graphs and dedicated statistical
packages, such as SPSS and SAS, offer good quality specialist statistical presentations.
A small selection of these will be discussed.
Some data describing the technical characteristics and price of a set of cars for sale in
the UK will be used to demonstrate these methods.
The data are given below in tabular form.
(The variables are in the following units: price in £, maximum speed in miles per hour,
seconds to reach 60 mph (96.6 kph), tour means miles per gallon (0.425 kilometres per
litre) achieved while touring, size-cc is the capacity of the engine, bhp is the maximum
power output, the length is in feet (0.305m) and the weight is in hundredweight (0.05 of
a ton, 50.8 kg).

11

205
11. Exploratory data analysis

11.7.1 Box plots

A box plot offers the same information as a histogram, but in a more condensed form.
It is designed to convey information about a single variable. It indicates both central
tendency and dispersion and gives information about observations a long way from the
mean or median of the data. The format of a typical box plot is given in Figure 11.1.
Box plots can be used to compare the distribution of variable values when the data are
divided into different categories. In Figure 11.2, the distribution of miles per gallon,
denoted as Tour, is shown for ‘cheap’ cars (less than £12,000) for one category and for
more expensive cars by two box plots.
The left hand box plot in Figure 11.2 for the more expensive cars shows a lower median
miles per gallon (about 34 mpg); cars 14 and 18 are extremely economical in
comparison with the other cars in this category. The cars under £12,000, shown on the
right, are more compactly distributed around a higher median of about 40 mpg. The
Citroen AX, car no. 2, is an extreme value (see Figure 11.1) in its class. The Rover
Montego Diesel, car no. 18, although even more economical, is classified as an outlier
rather than an extreme value because the more expensive cars have a greater
interquartile range (as represented by the box length).

11

Figure 11.1: A typical box plot.

206
11.7. Graphical methods

Figure 11.2: Box plots for miles per gallon for ‘cheap’ and ‘expensive’ cars.

11.7.2 Scatter plots

Although a simple scatter plot is well known, new software makes it easy to show a
large data set in terms of a matrix of scatter plots.
These plots make any outlying observation very obvious. If one is found, it can then be
checked for accuracy. If it is a genuine observation, the decision whether to keep it
within the data set must be made. Observations a long way from the mean can exert
undue influence in techniques like regression and distort results.
The nature of relations between variables can be seen by the trends in the individual 11
plots. If the data lie in a long narrow ellipse, then the variables will have a strong
positive relationship (if the major axis of the ellipse has a positive gradient) or a strong
negative relationship (if the major axis of the ellipse has a negative gradient).
The car data are shown graphed in a multiple scatter plot in Figure 11.3. Inspection of
the individual plots shows the following example relationships:
A strong positive relationship between: bhp versus maximum speed.
A strong negative relationship between: seconds to 60 mph versus maximum speed.
No linear relationship between: price and touring miles per gallon.

207
11. Exploratory data analysis

Figure 11.3: A multiple scatter plot for the car data.

[Key: BHP = maximum engine power output; Length = length of car in feet (0.305
metres), Max-sp = top speed in miles per hour, price = price in £, sec60 = time to
reach 60 mph in seconds, size-cc = engine capacity in cc, tour = fuel consumption in
miles per gallon (0.425 kilometres per litre) when touring, weight-k = weight in
hundredweights (0.05 of a ton, 50.8 kg).]
The correlation matrix of the car data is shown below.

11

A correlation is a measure of the strength of a linear relationship; however, it is


vulnerable to being distorted by outlying values. Comparison with the scatter plots
allows the analyst to check that the correlations genuinely reflect the underlying
relationship.

208
11.8. Cluster analysis

Figure 11.4: 3D plot of the car data.

11.7.3 Three-dimensional scatter plots


These graphs show the data plotted on three axes. If the three variables are correlated
then the observations will lie within a 3D ellipsoid. Unfortunately, we are constrained to
look at a 2D image of the 3D graph, which means we lose the advantage of perspective.
To overcome this, some packages allow us to rotate the 3D plots around each axis. This
facility allows us to choose the views that offer most information. An example of this
rotation is shown in Figures 11.4 and 11.5. A 3D plot of the car data is shown for
different axis positions. It is particularly clear in Figure 11.5 that the data seem to fall
into two clusters. The most apparent difference seems to be caused by price.

11.7.4 Other plots within software packages


Certain software, for example spreadsheet packages, have a wide variety of alternative
standard graphs. These include such things as compound bar charts, bubble graphs, etc.
These can be useful when dealing with more dimensions than one can handle in the
more normal graphs. You will not be assessed upon such graphs but, nonetheless, might
find it generally beneficial (indeed perhaps even entertaining!) to investigate these
graphical output methods. 11

11.8 Cluster analysis


In many circumstances, large quantities of data are available for analysis but the raw
data are too overwhelming for an analyst to tackle the problems to be addressed. As we
have seen, there are several techniques available to give us some insight into the
message that the data contain. A common objective is to see if the data exhibit any
groupings or clusters.

209
11. Exploratory data analysis

Figure 11.5: A second 3D plot of the car data.

In cluster analysis the underlying objective is to divide the data up into a number of
clusters. Each observation or case will be allocated to a cluster, each cluster will contain
similar cases and different clusters will contain cases dissimilar to those in other clusters.
Cluster analysis can be thought of as a form of exploratory data analysis that gives
initial insights into the structure of the data. In addition, it is often an excellent means
of communicating information about a data set in an easily digestible way.

11.8.1 Examples of clustering in marketing


A product such as a car or a television programme will appeal to many different groups
in society. These groups will have different requirements and these will result in
products being targeted at particular niches of the market.
One of the examples we will pursue is the detection of clusters in a set of data sampled
from cars available in the UK market. The object of an exercise of this type is strategic
positioning in the market; by determining clusters, a manufacturer can judge the level
of competition in each cluster.
Clustering television programmes (using market research data describing viewers
perceptions of the programmes) is useful from the point of view of scheduling, both
programmes and advertising, to know which programmes appeal to which groups of
people. Using cluster analysis on market research data, this sort of information can be
gained.
11
11.8.2 Types of clustering procedure
There are two main methods of clustering:

Hierarchical clustering – a tree diagram is produced showing the order in which


clusters are formed, going from one individual per cluster to all the individuals in
one cluster.

Partitioning – the set of individuals is divided into a stated number of clusters.

210
11.8. Cluster analysis

11.8.3 Hierarchical clustering


Before we start forming clusters, it is necessary to decide how we measure the distance
between individuals.
The most obvious means is Euclidean distance – the straight line distance between
two individuals whose positions can be expressed on an interval scale. If each individual
is described by measurements on n different variables then the distance between two
individuals, whose measurements are (a1 , a2 , . . . , an ) and (b1 , b2 , . . . , bn ) is:
v
u n
uX
t (ai − bi )2 .
i=1

If there are only two variables, say latitude and longitude, describing the location of two
depots, then the distance between them is simply:

11.8.4 Distances between binary observations


A binary observation is 0 or 1; an individual item either possesses a quality or it does
not. As an example, take a set of secondary schools that are described by a set of six
binary variables.

11

Euclidean distances are not appropriate for binary observations; however, the
relationship between individuals can be summarised in a contingency table.

211
11. Exploratory data analysis

The measure of similarity (or distance) between two individuals is determined by the
number of qualities they have in common, the number neither has and the number only
one has. There are many possible formulas for measuring similarity; one of the most
obvious is:
a+d
.
a+b+c+d
Using this formula, the measure of similarity between School (2) and School (3) is 2/3.
For all the schools these measures of similarity can be shown on a similarity matrix:

We can perform clustering using a measure of distance or a measure of similarity.


A measure of similarity that we are already familiar with is the correlation matrix; if we
wished to cluster the variables describing cars, the correlation matrix gives a ready-made
similarity matrix. (Note: we may want to take the positive value of correlations as a
strong negative correlation is as important as a strong positive correlation.)

11.8.5 Hierarchical clustering methods


The objective here is to form a tree diagram – a dendrogram – which shows how
clusters develop.
The basic algorithm starts with N clusters corresponding to each individual, then
groups them in a series of steps until they gather into a single cluster, the ‘root’ of the
tree.

11 1. Start with N clusters, each of 1 individual, and an N × N distance or similarity


matrix.
2. Search the distance or similarity matrix for the nearest or most similar pair of
clusters U and V (say).
3. Merge these clusters U and V to form a new cluster U V . The distance / similarity
matrix has to be updated. Rows and columns relating to U and to V are deleted. A
row and column for cluster U V are added.
4. Steps 2 and 3 are repeated (N − 1) times. At the end of this process all the
individuals (or variables) will be in one cluster.

212
11.8. Cluster analysis

Figure 11.6: Distances between clusters.

Step 3 is rather less specific than it at first appears. Once a cluster has more than
one element in it, how is distance/similarity measured?
There are two very simple ways of measuring the distance between two clusters:

1. Using nearest neighbours to define the distance is called single linkage. The
shortest distance between members of each cluster gives the distance between
clusters. (This is shown in Figure 11.6.)

2. Using furthest neighbours is called complete linkage. It is the furthest possible


distance between the two members in each cluster. This distance is the maximum
between any pair of members of the two clusters (see Figure 11.6).
Another obvious method of measuring distance/similarity between clusters is to take
the average distance between pairs of observations one from each cluster, this is called
average linkage. (This is an arithmetic mean of the distances shown in Figure 11.6.).
There are several other means of measuring cluster difference. The choice of method
often does lead to different dendrograms.
The single linkage method is poor at distinguishing between poorly separated clusters.
However, it can delineate non-ellipsoidal clusters, such as U shapes or serpentine shapes.
11

Example 11.1 Perform a cluster analysis of the car data, noting that the different
scales of measurement of the variables mean that a common scale has to be used. In
this case, the values for each variable have been standardised to z values, to give a
mean of 0 and a standard deviation of 1 for each variable. This problem is a little
cumbersome to go through in detail, so we will look at the dendrograms given as
output by a statistical package.
A dendrogram using single linkage is produced first.

213
11. Exploratory data analysis

The second dendrogram is generated using average distance.

11

Comments: In order to decide on the appropriate grouping for the cars, the output
from the use of both linkages has to be examined. The final choice is subjective, but
one is looking for homogeneity within groups and recognisable differences between
groups. One can get an impression from the dendrograms, but drawing up a table
with the items labelled is also useful. The membership of the five clusters from single
and average linkage algorithms are shown in the following table. (The dendrograms

214
11.8. Cluster analysis

Table 11.1: Cluster membership for five clusters.

have had extra lines added to them to make the grouping into five clusters more
apparent.)
In each case two large clusters exist: cluster 2 is identical for both cases, consisting
of larger, faster, more expensive cars which could be classified as medium saloons.
The two cluster 1s have many cheaper and smaller cars in common, but the results
disagree on three cars: the Nissan is separate according to the single linkage 11
algorithm, two Rovers are separate according to the average linking. Both
algorithms agree that the Land Rover and the diesel engined Rover 825 are
sufficiently different to be put in their own clusters.

Example 11.2 Perform a cluster analysis on the variable used in the car data. In
this example, we can use the correlation matrix of the variable as the measure of
similarity. However, as can be seen from earlier in the chapter, there are some
negative correlations. But since it is the magnitude of the correlation rather than its

215
11. Exploratory data analysis

sign that is important, the positive (i.e. absolute) values are taken in each case.
The similarity matrix of the variables for the car data is shown below.

The way in which the dendrogram is formed is summarised in an agglomeration


schedule.
Agglomeration schedule using single linkage

The way in which the schedule is formed will be shown in detail. Searching through
the similarity matrix, for step 1, shows that the most similar variables are size and
weight (0.928); these are formed into a cluster.
The adjusted similarity matrix becomes:

11

216
11.8. Cluster analysis

Note that, as single linkage is being used, the measure of similarity between the
other variables and the new cluster is the maximum similarity (correlation) between
the variables involved.
The next most similar pair of clusters is price/bhp.
The adjusted similarity matrix becomes:

The next most similar pair is seconds to 60 and maximum speed.


The adjusted similarity matrix becomes:

The next most similar are the maximum speed/acceleration cluster with the
price/bhp cluster. The adjusted matrix is:

We now cluster size/weight with price/bhp/maximum speed/acceleration. The


matrix becomes:
11

The penultimate clustering occurs between length and the size/weight/


price/bhp/maximum speed/acceleration cluster. The matrix becomes:

217
11. Exploratory data analysis

This concludes the clustering process, where all the items are now in one cluster.
The dendrogram below represents the clusters formed.
Note: The dendrogram which follows the above clustering analysis (and those above)
have a horizontal scale which has been produced by computer software. When
producing such diagrams by hand, e.g. in the examination, you should have the scale
matching the values where clustering occurs. For example, size and weight form a
cluster at 0.928 on the horizontal scale, price and bhp then form a cluster at 0.891
on the horizontal scale, etc.
Dendrogram using single linkage

11.8.6 Partitioning or non-hierarchical clustering

When there are many cases to be clustered, hierarchical clustering becomes unwieldy,
the dendrograms are too big to read easily and little insight is gained. A different
approach is more appropriate in this situation.
11 The K means method is designed to group the cases into a collection of K clusters,
where K is predetermined for each analysis. The analyst makes an informed guess
about the appropriate number of clusters.
Each case is identified by its coordinates (a multivariate observation) and, using the
following algorithm, cases are allocated to the K clusters.
The membership of the clusters is examined and judged on intuitive grounds. If the
clustering is too coarse, K is increased: if the clustering is too fine, K is reduced. The
decision is mainly subjective. If the membership of a cluster suggests an intuitively
satisfactory name which distinguishes the cluster from other clusters, then this is an
encouraging sign that the appropriate number of clusters has been chosen.

218
11.8. Cluster analysis

The algorithm is given below:

1. Either partition the items into K initial clusters, or specify K initial centroids
(seed points).

2. Go through the list of items, assigning each item to the cluster whose centroid
(mean) is closest. Re-calculate the centroid for the cluster receiving the item and
for the cluster losing the item.

3. Repeat step 2 until no further reassignments take place.

Example 11.3 Use K-means cluster analysis to divide the car data into four
clusters. The analysis produces the following output. The centres of each cluster are
defined by the mean value of each variable for the items in that cluster as shown
below.
Final cluster centres

The membership of each of these clusters is described in the table below; in addition
to the membership of the cluster, the distance of each car from its cluster distance is
given.

11

219
11. Exploratory data analysis

Figure 11.7: A cluster diagram for variables in car price data.

It is often useful to show cluster membership graphically. In order to do this, two


variables need to be chosen to provide axes that are important in describing the
observations. In Figure 11.7, the axes of price and bhp have been chosen. The
clusters are indicated by the different symbols.

11
11.9 Summary
Exploratory data analysis is a means of becoming familiar with the information
contained in a large data set. Graphical methods allow the analyst to check the data for
errors and to gain initial insights into interrelationships between variables.
Cluster analysis is a useful tool for finding homogeneous groups in either observations
or variables. As such, it is often a useful means of gaining insight into the data before
carrying out more sophisticated analyses such as factor analysis. Hierarchical clustering

220
11.10. A reminder of your learning outcomes

using a correlation matrix will show how variables cluster and give some indication how
many factors will be needed.
There are several decisions that need to be made during a cluster analysis:

definition of the objective, i.e. to cluster cases or variables

selection of variables

standardisation of variables – this may be necessary (in hierarchical clustering) if


the variables are measures in different units. In the cars example, with variables
like engine size in cc and miles per gallon, the variables need to be standardised (by
dividing by standard deviation). This ensures that the variable with largest
magnitude does not dominate the clustering.

What you need to know

How to perform simple graphics, box and scatter plots.

How to perform cluster analysis using hierarchical techniques.

What you do not need to know

How to produce 3D graphics.

The computational details of K means clustering.

11.10 A reminder of your learning outcomes


By the end of this chapter, and having completed the Essential reading and activities,
you should be able to:

construct and interpret ‘box and whisker’ diagrams

produce and interpret bivariate scatter diagrams


11
interpret multiple scatter plots

perform non-hierarchical (K-mean partitioning) clustering

perform hierarchical clustering using single, complete and average linkage methods

work with ‘distance’ or ‘similarity’ data

construct dendrograms for hierarchical clustering.

221
11. Exploratory data analysis

11.11 Sample examination questions


(Please note that several of these questions are only part of a full
examination question.)

1. The similarities between interest rates across the world are summarised by the
following correlation matrix.

Draw a sketch dendrogram showing the hierarchical clustering of the countries


concerned. Use single linkage (furthest neighbour) as a measure of distance between
clusters.
(10 marks)

2. One of the independent variables to be used in a multiple regression which


attempts to explain the daily running cost of a maternity hospital is OCCUP, the
number of beds occupied. During the month of March 1999 OCCUP takes the
following values:
77, 64, 77, 81, 63, 79, 70, 72, 73, 81, 96, 73, 56, 68, 70, 64, 64, 73, 79, 61, 70, 68, 70,
72, 72, 66, 73, 69, 69, 68, 64.

(a) Draw a ‘box and whiskers’ diagram (‘box plot’) to illustrate these data.
11 (8 marks)
(b) Identify any outlier(s) and suggest what you should do about it/ them.
(3 marks)

3. (a) Explain why partitioning is sometimes preferred to hierarchical clustering.


(2 marks)
(b) The following table shows the cluster centres (means values) from a K-mean
cluster analysis when used to divide products into four clusters using variables
A through to E:

222
11.11. Sample examination questions

One of the products clustered has the following values for variables A to E:

Which cluster does this product belong to and what is its ‘distance from its
cluster centre’ ?

(7 marks)

4. (a) Compare and contrast the relative merits of K means cluster analysis and
hierarchical cluster analysis.

(4 marks)

(b) ‘Complete linkage’ is equivalent to which of the following criteria?

• furthest neighbour

• nearest neighbour

• centroid clustering

• median clustering

• maximum clustering.
11
(2 marks)

(c) The results of an employee satisfaction questionnaire have been collected. It is


felt that some of the questions generate very similar responses. The correlation
matrix of the answers to a section of questions is given below.

Perform a hierarchical cluster analysis on these data, use single linkage to draw
a dendrogram.

(12 marks)

223
11. Exploratory data analysis

(d) If you were asked to remove two questions, which two would lead to least loss
of information?
(2 marks)

5. (a) In multivariate data analysis what is a ‘multiple scatter plot’ and what are its
uses?
(5 marks)
(b) What is meant by the term ‘outlier’ and why are ‘outliers’ important?
(3 marks)
(c) The following table shows bivariate observations on 21 subjects:

11

224
11.12. Guidance on answering the Sample examination questions

You are asked to:


i. Draw a ‘box and whisker’ diagram (‘box plot’) for each variable to
identify any ‘outliers’ or ‘extremes’.
(8 marks)
ii. Produce a scatter diagram of X against Y and clearly mark the
observation(s) that you have identified as possible ‘outliers’ or ‘extremes’.
Do any other subjects seem to produce ‘outlying’ results?
(4 marks)

11.12 Guidance on answering the Sample


examination questions
1. (Part outline answer)
The following outline answer is a summary of the various stages that you should
show. If we let the countries be numbered 1 (Belgium) to 10 (US) we find that
clusters occur in the following order and with the following measure (correlation
coefficient):

2. (a) Putting observations in order we get:


11

and we can determine that the median is 70, lower quartile is 66, upper
quartile is 73, interquartile range = 7. Hence upper outliers are > 83.5 and
lower outliers are < 55.5.

225
11. Exploratory data analysis

(b) Hence 96 is an outlier (actually an extreme). We could ignore it or investigate


if it should be 69, say. See if there were special circumstances e.g. neighbouring
hospital closed, use of beds for other emergencies, etc.
3. (a) When there are many cases to be clustered, hierarchical clustering techniques
can become unwieldy, the dendrograms are too big to read easily and little
insight is gained.
(b) Assuming Euclidean distance is the distance measure used: Distance from
cluster centre 1 is

[(12 − 6)2 + (15 − 15)2 + (80 − 30)2 + (0.1 − 0.6)2 + (64 − 50)2 ]1/2 = 52.27.

Similarly, one can determine the distance from cluster centre 2 to be 35.57,
from cluster centre 3 to be 30.23 and from cluster centre 4 to be 23.88.
Hence the cluster the product belongs to is cluster 4 with ‘distance’ 23.88.
Alternatively, if one chose to use absolute values as distance measure:
Distance from cluster centre 1 is

|12 − 6| + |15 − 15| + |80 − 30| + |0.1 − 0.6| + |64 − 50| = 70.5.

Similarly, the distance from cluster centres 2, 3 and 4 becomes 51.0, 35.8 and
42.4, respectively.
Hence the product is placed in cluster 3 with ‘distance’ 35.8.
4. (a) K means can deal with many cases but needs a given value for K and thus
involves several attempts. Hierarchical methods give a dendrogram which is
informative but is only useful for relatively few observations. It can be used for
measuring the relationship between variables.
(b) Complete linkage is equivalent to ‘Furthest neighbour’.
(c) We start by clustering Q6 and Q7 (since they have the highest correlation)
11 and the amended tableau will become:

Note that there is no need for more than a triangular ‘matrix’ and there is no
need for the diagonal of zeros. Note also that the values involving Q6/Q7 are

226
11.12. Guidance on answering the Sample examination questions

the highest when given a choice e.g. the value for the cell Q5, Q6/Q7 is the
highest of 0.717 and 0.517 i.e. single linkage takes an optimistic view.
Continuing in this manner we will form the clusters as follows: Q5, Q6/Q7,
Q8, Q9, Q10 → Q5/Q6/Q7, Q8, Q9, Q10 → Q5/Q6/ Q7, Q8, Q9/Q10 →
Q5/Q6/Q7/Q8, Q9/Q10 → Q5/Q6/Q7/Q8/ Q9/Q10.
(d) We would remove the most correlated questions i.e. two from Questions 5, 6
and 7 (the first three clustered).

5. (a) A multiple scatter diagram is a matrix of scatter diagrams for a collection of


variables where every pair of variables is ‘scattered’ against each other.
It is used to identify correlations between variables, redundant variables,
outliers, nature of inter-relationships, etc.
(b) ‘Outliers’, in the strict statistical sense, are values that lie between 1.5 and 3
interquartile ranges below or above the lower and upper quartiles
(respectively) of a distribution. Less precisely, outliers are values that do not
follow the usual pattern of the other observations. In this respect outliers
might include ‘extremes’, etc.
When we see outliers we should check if they are genuine results or errors. If
errors, then either correct them or discard. If genuine, then decide whether to
leave them in the data (sometimes, even, concentrate on them) or to discard
them. They may have undue influence upon certain statistics e.g. mean,
variance, correlation.
(c) i. The X data are already ordered. Lower quartile is the (21+1)/4th = 5.5th
ordered observation i.e. 3. In a similar fashion the median is 4, the upper
quartile is 6. Hence the interquartile range is 6 − 3 = 3 and the extremes
and outliers are defined to be:
◦ Lower extreme = 3 − 3(3) = −6
◦ Lower outlier = 3 − 1.5(3) = −1.5
◦ Upper outlier = 6 + 1.5(3) = 10.5
◦ Upper extreme = 6 + 3(3) = 15.
Drawing the box plot will show that the only outlier is the value of 11.5
from subject 21.
The Y must initially be ordered. Then following the same process as for
the X data we find the Y has a lower quartile of 2. In a similar fashion
the median is 3.5, the upper quartile is 4.5. Hence the interquartile range
11
is 4.5 − 2 = 2.5 and the extremes and outliers are defined to be:
◦ Lower extreme = 2 − 3(2.5) = −5.5
◦ Lower outlier = 2 − 1.5(2.5) = −1.75
◦ Upper outlier = 4.5 + 1.5(2.5) = 8.25
◦ Upper extreme = 4.5 + 3(2.5) = 12.
Drawing the box plot for Y will show that the only outlier is now the
value of 11 from subject 20.

227
11. Exploratory data analysis

ii. Remember to leave the data in their original unordered pairs. When
producing the scatter diagram of X against Y you will see the subjects 20
and 21 are indeed away from the rest of the pattern (scatter). In addition,
you should see that subject 18 looks to be an additional (bivariate) outlier.

11

228
Chapter 12
Summary

This course and its subject guide are illustrative of many areas of mathematics (and
statistics) which lie beyond the basic elements within the 100 courses but which,
nonetheless, are extremely useful within the management area. Although not every
manager may use such mathematics, the management field has abundant areas of
application for such higher mathematical knowledge. Rest assured that you will find a
use for what you learn within this course!
A few of the subjects within this course’s syllabus may seem a little abstract when
summarised within a set of notes. However, it is hoped that you will appreciate the full
range of applications by reading texts. For example, differential equations are vitally
useful in virtually all dynamically changing relationships (e.g. between sales and
advertising, between prices and inflation, etc.).
At this final stage of the course it is recommended that you try to create your own
application area – imagine, if you like, that you are the Examiner of the subject and are
required to set the mathematics within a realistic storyline. You should find a seemingly
endless variety of possible questions!
Go through each chapter in turn, revising the mathematics and then finding/deriving
possible applications. Past examination papers from the VLE will give you many
examples and application areas to think about.
Finally, obtain copies of management science and financial mathematics type journals
and try to read articles that appear interesting to you. You should now find that there
is only an occasional piece of mathematics within published papers that you cannot
handle and understand.

12

229
12. Summary

12

230
A

Appendix A
Sample examination paper

Important note: This Sample examination paper reflects the examination and
assessment arrangements for this course in the academic year 2014–2015. The format
and structure of the examination may have changed since the publication of this subject
guide. You can find the most recent examination papers on the VLE where all changes
to the format of the examination are posted.

Time allowed: 3 hours.


Candidates should answer all EIGHT questions. Candidates are strongly advised
to divide their time accordingly.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.*
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.

* Note graph paper is not included here.

231
A. Sample examination paper
A
1. In today’s energy conscious world there is a need to make certain homes are energy
efficient. The local government of a town has performed a survey of 1000 homes in
the town to establish whether they are need more roof insulation (R), cavity wall
insulation (C) or improved windows (W ). The survey shows the following facts:
Some homes are totally efficient but 100 are inefficient in all three aspects.
540 homes suffer from R, 310 from C and 250 from W .
Only 15 homes suffer from W alone.
180 homes belong to the set R ∩ C ∩ W c .
(a) Draw a Venn diagram to depict the above information and, where necessary,
show the order of subsets as a function of the number of homes, x, which suffer
from W and C but not R.
(5 marks)
(b) Determine the maximum and minimum value of x.
(3 marks)
(c) Remedying R, C and W individually costs £500, £400 and £800 per home
respectively. Whenever remedial actions of both W and C occur at the same
house there is a net saving of £50 because of material transportation savings.
If the local government decides to make all the 1000 homes efficient with
respect to roof insulations, cavity wall insulation and improved windows what
will be the maximum total cost?
(3 marks)

2. The table below shows the average cost of commuting into London over the period
2008 to 2012 and a salary (earnings) index (Base 1995 = 100):

2008 2009 2010 2011 2012


Average cost of commuting for the year (£) 8,000 8,800 9,500 10,200 10,800
Index of Salary Level (Base 1995 = 100) 180.0 190.0 205.5 210.0 220.5

(a) Using the salary index as a deflationary measure, calculate a deflated index
series of commuting costs (Base 2008 = 100).
(6 marks)
(b) What is the highest annual percentage increase in deflated commuting costs
and when did that occur?
(3 marks)

3. The technology matrix for a three-industry input-output process is given by

X Y Z
 
X 0.75 0 0.1
A= Y  0.2 0.8 0.12 
Z 0.5 0.2 0.5

232
A
(a) Use matrix methods to determine the total output for the three products when
the final (non-industry) demands for X, Y and Z are 100, 300 and 100
respectively.
(10 marks)
(b) Draw a three-product input-output network diagram to depict the above
information.
(3 marks)

4. (a) Find the real and imaginary parts of (7 − 2i)e−2x e3ix .


(2 marks)
(b) Alvin is a currency investor, and is particularly interested in the performance
of the currency of Woozyland, the Groat, against that of the US $. He has a
strong belief that a major political scandal is about to break in Woozyland,
and has decided that if this happens then the ratio of Groats to Dollars, y, will
satisfy the following differential equation x days after the scandal breaks:

d2 y dy
2
+ 4 + 13y = 104
dx dx
Immediately after the scandal breaks (i.e. when x = 0) Alvin expects the
currency ratio to take the value y = 11, and to be decreasing at the rate 6 per
day, and it is at this point in time that he will use his Dollars to buy Groats.
Determine the complete solution of the differential equation, and deduce that
after a few days he can expect to make a Dollar gross profit of more than 37%
if he converts his Groats into Dollars.
(8 marks)
(c) Consider the following second order difference equation:

Yk+2 − 9Yk = 90 ∗ (3k )

where Y1 = 30 and Y2 = 162.


Find the complete solution.
(8 marks)

5. (a) George is an accountant in BCD Limited, which runs a chain of cinemas. He


has data for the last 60 months showing the total number of seats sold per
month, and wishes to forecast the number of seats that will be sold next
month. He has decided to use simple exponential smoothing and has calculated
the root mean square error for five values of the smoothing constant α, as
shown in the following table. He then makes the claim ‘well, the best value of α
must be bigger than 0.4, perhaps 0.45 or 0.5 because the larger the value of α is
so the more accurate will be the forecast.’ Comment on this claim.
Value of α 0.2 0.25 0.3 0.35 0.4
RMS Error 11.6 10.4 9.1 8.6 8.3
(3 marks)

233
A. Sample examination paper
A
(b) ReadiChem Limited sells rare minerals and gems to jewellery manufacturing
companies. The table below shows the total number of kilos of topaz and beryl
that were sold in the twelve months of 2012.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Topaz 5.2 6.0 4.8 5.5 6.4 5.3 6.0 6.8 5.5 6.1 6.8 5.6
Beryl 11.9 9.5 10.1 11.9 12.9 10.4 11.2 13.0 14.0 11.8 12.5 14.3
Their Business Manager, Mr Wally Where, is responsible for analysing these
data and for producing a forecast of the kilos of topaz sold in January 2013.
He only understands moving averages, not exponential smoothing nor any
other forecasting methodology, and has asked you to advise him. He suspects
there is a seasonality of two, three or four months in the data values. Working
first on the topaz data, and then on the beryl data, examine in each case the
month-on-month increase or decrease in kilos sold. In the usual notation, and
using only these net changes, advise Wally which of MA2, MA3 and MA4 is
most appropriate for:
i. topaz and
ii. beryl, and hence
iii. make a forecast of the number of kilos of topaz that will be sold in
January 2013.
iv. Estimate the current average month on month trend for sales of topaz.
(14 marks)

6. An independent car magazine performs 6 tests on 120 makes of car. An analysis of


the results using maximum likelihood factor analysis (and extracting two factors)
produces the following estimated factor loadings and communalities:

Estimated factor loadings


Test Factor 1 Factor 2 Communalities
Service Interval 0.740 −0.273 0.62
Comfort and Interior Finish 0.568 0.288 0.41
Engine Performance 0.724 −0.211 0.57
Dashboard Style 0.553 0.429 0.49
Reliability 0.595 −0.132 0.37
Colour Range 0.392 0.450 0.36

(a) Interpret these results.


(4 marks)
(b) Discuss whether one might wish to perform factor rotation upon the above
factors and suggest possible alternative criteria for any such rotation?
(6 marks)

7. Illegal immigration into the UK is a big problem, and the Government are creating
two new reception centres to house individuals who are caught, in order to
investigate them before either returning them to their country of origin or granting
them legal stay in the UK. Each reception centre will have numerous staff, led by

234
A
three Senior Immigration Officers (SIOs). The list of applicants for these positions
has been narrowed down to just nine, named A to I inclusive for convenience, of
whom three will have to be discarded and the two teams of SIOs will be staffed by
the remaining six.
It is important that the three eventual SIOs at either base should have as much
breadth and depth of experience between them as possible to ensure that most, if
not all, difficulties and situations can be recognised and tackled. A similarity
matrix has been constructed showing, on a scale from 0 to 100, how each individual
agrees or not with each of the other eight applicants from the analysis of the
responses to 100 ‘either/or’ questions relating to experience. Thus, for example, A
and B had the same responses to 26 questions. There is one complication, however:
candidates B and H must not be assigned to the same team because there is a very
serious personal disagreement between them.

Similarity on a Scale from 0 to 100 A B C D E F G H I


A —
B 26 —
C 81 50 —
D 21 80 35 —
E 22 55 79 70 —
F 73 72 41 42 86 —
G 57 30 71 10 38 83 —
H 68 20 14 65 28 25 54 —
I 59 32 66 61 24 48 60 45 —
(a) First reduce the list of applicants from nine to six, by identifying the three
with the most similarity using single linkage.
(3 marks)
(b) From the remaining six candidates use complete linkage to choose the two
teams of three SIOs.
(5 marks)
(c) Construct a dendogram to illustrate the decision processes of parts (a) and (b).
(3 marks)
(d) Comment on the outcome of your solution to part (b).
(2 marks)
8. In the least squares linear regression model the existence of auto-correlated errors
is a violation of one of the model’s assumptions.
(a) What is meant by auto-correlated errors and how can we test for their
presence?
(5 marks)
(b) What are the possible causes of auto-correlated errors and the effects upon a
model that suffers from them?
(4 marks)
9.

235
A. Sample examination paper
A

236
Appendix B B
Sample examination paper –
Examiners’ commentary

1. Reading for this question


A very standard question asking for a Venn diagram. There are examples of a
similar nature throughout Chapter 1 of the subject guide.
Approaching the question

(a) This happens to be a 3 set Venn – perhaps the most common but not the only
number of sets candidates might be called upon to handle. Note especially that
540 homes suffering from R says nothing about C and W . Similarly, treatment
is required for the orders of 310 and 250 given in the question. Starting from
the middle of the diagram outwards – a method that seems to work most of
the time – you should get the following Venn:

Remember to label the sets R, C and W and to give the order of all subsets
(as a function of x if necessary). Note where x belongs and also that the subset
(R ∪ C ∪ W )c does not have to be null (empty) since there is nothing in the
question that implies all 1000 homes surveyed require at least one energy
efficiency improvement.
(b) Setting the order of each subset to be non-negative and summarising these
constraints we find that 0 ≤ x ≤ 30.
(c) Simply a case of summing up the costs associated with each type of
improvement – each obtained by multiplying the cost per house times the

237
B. Sample examination paper – Examiners’ commentary

number of houses so improved. One must then remember to subtract the net
transportation savings.
B Total cost = 540(£500) + 310(£400) + 250(£800) − (100 + x)(£50)
= £270,000 + £124,000 + £200,000 − £5,000 − £50x
= £(589,000 − 50x)

which is maximised when x = 0. So maximum cost = £589,000.

2. Reading for this question


An index number question which assesses three aspects of the topic: deflating costs
by dividing by an inflation measure; forming a fixed based series from a string of
values over a number of years; and interpreting the results in a sensible fashion.
Examples of a similar nature are found in the subject guide, Chapter 2, sections 2.7
and 2.16.
Approaching the question
(a) It does not matter which of (sometimes) several orders are used for this
process (e.g. you could either deflate the cost before attempting to form a fixed
base index or create a fixed base series before deflating). Two methods are
suggested below:

2008 2009 2010 2011 2012


Deflated cost 4,444.4 4,631.6 4,622.9 4,857.1 4,898.0
(commuting cost × 100/salary index)
Or, alternative approach, 8,000.0 8,336.9 8,320.9 8,742.6 8,816.3
(commuting cost × 180/salary index)
Either way, deflated index 100.0 104.2 104.0 109.3 110.2
(Base 2008 = 100)
obtained by taking a year’s value,
dividing by the 2008 value in the same
series and remembering to
multiply by 100
(b)
2008 2009 2010 2011 2012
Percentage yearly change in index — 4.21 −0.19 5.07 0.84
The highest annual (note the word) percentage increase of 5.07% occurs
between 2010 and 2011.

3. Reading for this question


A standard application of matrices for input/output analysis – see Chapter 6,
section 6.6 of the subject guide.
Approaching the question
(a) Following the standard procedure, given the technology matrix A, we have to
determine I − A and then invert it. Post multiplying by the column vector of

238
demands we obtain the necessary production amounts. So production required
is (I − A)−1 · d, i.e.

15.2 4 4
 
100
 
3120
 B
 32 15 10  ·  300  =  8700  .
28 10 10 100 6800

You can use any method you wish for inverting the matrix – the Examiners
have a personal preference for row operation methods.
Make certain with input-output questions that the matrix is ‘the right way
round’ (i.e. not transposed).
Given sufficient time (and inclination) it is often worthwhile checking whether
your answer for production would indeed lead to the necessary net output
when some of the production of each ‘commodity’ is required to produce other
commodities.
(b) This is just like Figure 6.1 in the subject guide. Remember to include the
actual flow along each arc and the direction of flow too.

4. Reading for this question


Part (a) of this question comes straight from the subject guide, Chapter 3, section
3.19. For part (b), refer to Chapter 5, section 5.7 of the subject guide. For part (c),
see Chapter 4, section 4.8 of the subject guide.
Approaching the question
(a)
(7 − 2i) × e−2x × e3ix = (7 − 2i) × e−2x × (cos 3x + i sin 3x).
By expansion of the brackets and remembering that i2 = −1 we find the real
and imaginary parts are e−2x [7 cos 3x + 2 sin 3x] and e−2x [−2 cos 3x + 7 sin 3x],
respectively.
(b) A standard second order differential equation question but do read the
description (‘story line’) of the situation in the question carefully. Following
the standard procedure:
The reduced equation is
m2 + 4m + 13 = 0,

239
B. Sample examination paper – Examiners’ commentary

so m = −2 ± 3i, and the complementary function is


y = Ae−2x cos 3x + Be−2x sin 3x.
B The particular integral is of the form y = C, say, so that
13C = 104 =⇒ C = 8.
The general solution is therefore
y = Ae−2x cos 3x + Be−2x sin 3x + 8.

Now using the initial conditions: when x = 0, y = 11, so A = 3, and


dy
= e−2x [−2A cos 3x − 3A sin 3x − 2B sin 3x + 3B cos 3x],
dx
which equals −2A + 3B = −6 + 3B when x = 0, and this is given to be a
decrease of 6 Groats per day, so B = 0.
Hence the complete solution is
y = 3e−2x cos 3x + 8.

[Note one cannot assume that when x = 1, y = 5 since the differential (rate)
only holds at a specific point x = 0 and is continually changing as x changes
continuously.]
When x tends to infinity, y tends to 8 (Groats per Dollar), so for every Dollar
initially he can buy 11 Groats, and when he sells the 11 Groats this will buy
him 1.375 Dollars, giving him a gross profit of 37.5%: i.e. more than 37%.
(c) Again very standard – this time for a second order difference equation.
The auxiliary equation is m2 − 9 = 0 and hence m = ±3 and so the
complementary function is
Yk = A(3k ) + B(−3)k .
For the particular solution (PS) the given right-hand side is a multiple of 3k ,
so the form of the PS will have to be
Yk = Ck3k .
Then Yk+2 = C(k + 2)9 · 3k , and
Yk+2 − 9Yk = 3k [9C(k + 2) − 9Ck] = 18C · 3k , = 90 · 3k .
So C = 5 and the general solution is
Yk = A(3k ) + B(−3)k + 5k(3k ).
Using the given conditions:
When k = 1, Y1 = 30, so 30 = 3A − 3B + 15, so that A − B = 5.
When k = 2, Y2 = 162, so 162 = 9A + 9B + 135, so that A + B = 3.
Hence A = 4 and B = −1 and the complete solution is Yk = (4 + 5k)3k − (−2)k .
Surprisingly there is no request for a graph or a discussion/description in this
difference equation/differential equation question. This is rather unusual but
the question is already a long one under the present format of the examination
paper.

240
5. Reading for this question
Overall this is a straightforward forecasting question and the moving average
approach is probably the simplest technique, which is covered in Chapter 9 of the B
subject guide. The Sample examination question 2 is probably the closest example
to follow.
Approaching the question
A straightforward forecasting question but with several ‘traps’ and sources of
uncertainty because alternative methods are possible. Be bold – make your
assumptions clear and continue with your answer!
An outline answer is as follows:
(a) First, the best value of α, (α∗ , say) will be the one that, for example,
minimises the RMSE (although minimising the Mean Absolute Error might
well give a slightly different value). From the table the RMSE values are
decreasing, so it may indeed be the case that the RMSE values will continue to
decrease, although ‘perhaps 0.45 or 0.5’ has been plucked from the air – there
is nothing to suggest that α∗ cannot be larger than 0.5.
Also, because the five values in the table are not only decreasing but also
decreasing at a decreasing rate, and the last two values are quite close, it is
possible that the minimum occurs between 0.35 and 0.4, i.e. we have passed
the point of minimum RMS somewhere between α = 0.35 and 0.4.
A simple linear interpretation – although it strictly is not a linear relationship
– suggests a value closer to 0.4, such as 0.38 or 0.39. Secondly the bold claim
that ‘the larger the value of α is so the more accurate will be the forecast’ is
nonsense – but quite often quoted! If this statement were indeed true then we
would always set α∗ = 1.0!
Another approach to pick up a mark is to point out that RMS is not the only
measure of accuracy.
Other measures (e.g. Mean Absolute Deviation) may give a different message.
(b) The extended table with the calculated month-on-month changes is:

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Topaz 5.2 6.0 4.8 5.5 6.4 5.3 6.0 6.8 5.5 6.1 6.8 5.6
Change 0.8 −1.2 0.7 0.9 −1.1 0.7 0.8 −1.3 0.6 0.7 −1.2 —
Beryl 11.9 9.5 10.1 11.9 12.9 10.4 11.2 13.0 14.0 11.8 12.5 14.3
Change −2.4 0.6 1.8 1.0 −2.5 0.8 1.8 1.0 −2.2 0.7 1.8 —

i. Consider topaz:
In January, April, July and October the increase to the next month is 0.8,
0.9, 0.8 and 0.7 respectively; and in February, May, August and November
the decrease to the next month is 1.2, 1.1, 1.3 and 1.2.
In March, June and September the increase to the next month is 0.7, 0.7
and 0.6.

241
B. Sample examination paper – Examiners’ commentary

There is a clear three-months’ seasonality, so use MA3.


Always read the question! It specifically says ‘using only these net
B changes’.
ii. Now consider beryl.
A similar analysis shows that the seasonality is four-months, so here Wally
should use MA4. The above remarks about wasted time in determining
forecasts based on the 3 moving average parameters applies here also.
iii. To make a forecast of topaz in January 2013 the simplest method is to use
MA3 and take the average of the last 3 months as the forecast for the
next month. The average of October, November and December is
(6.1 + 6.8 + 5.6)/3, = 6.17 tonnes.
An alternative, slightly more complex approach is to take the average
change in March, June and September, i.e. (0.7 + 0.7 + 0.6)/3, = 0.67, and
then the forecast for January could be taken as December value
+0.67 = 5.6 + 0.67 = 6.27 tonnes. This is using an averaging process to
determine a trend.
Any sensible forecast (with evidence!) will do since no specific method is
requested.
iv. The best estimate of the trend, bearing in mind the 3 month seasonality,
is to examine the last 3 × 3 + 1 = 10 months: sales increased from March
to December by 5.6 − 4.8, so the current trend is (5.6 − 4.8)/9 = 0.089
tonnes per month.
There are other acceptable approaches for determining the trend but the
above is the most logical and most straightforward approach.

6. Reading for this question


For general bookwork/theory, see Chapter 8, sections 8.5 and 8.6 in the subject
guide.
Approaching the question

(a) Candidates should briefly explain how the factors work and why they are
created.
Looking at the factor loadings, all the variables have positive (and reasonably
high) loadings on the first factor. This might suggest that this factor reflects
the overall ‘package’ or market appeal of the car (perhaps an indication of the
car’s cost?). This might be labelled a ‘general appeal’ factor. For the second
factor, half the loadings are positive and half of them are negative. It would
appear, within this factor, that cars that do well (above average) in the visual,
non-mechanical type of tests (Comfort and Interior Finish, Dashboard Style,
Colour Range) do poorly (below average) in the ‘mechanical/engineering/non
visual’ type tests (Service Interval, Engine Performance, Reliability). Perhaps
this factor can be classified as a ‘visual-non visual’ factor.
The ith communality is the portion of the variance in the ith variable
contributed by the m common factors.

242
(b) Factor rotation is sometimes appropriate to give a clearer interpretation of the
factors. In this question (as seen above) it has not been too difficult to come
up with suitable interpretations on the first two factors. This is partly because
there are only six variables being used. With more variables the interpretation
B
of the (initial) factors can become quite difficult (indeed, it can become the
hardest part of the whole factor analysis procedure). Although not obviously
necessary for this question, one can perform factor rotation to achieve more
easily interpretable results.
The crucial fact is that once the number of factors has been chosen, the actual
definition of the factors is not unique and the factors can be rotated without
any loss of explanatory power. There are several criteria for rotation which
each take into account that, for ease of interpretation of the factors, it is easier
if variables have either large weights (when their presence helps name/label
the factor) or small weights (when they can be ignored).
Three common factor rotation methods are:
• Varimax which minimises the number of variables with a high weighting
on each factor. It rotates the axes until its objective function (the variance
of the squared loadings on the rotated factors) is maximised.
• Quartimax which minimises the number of factors needed to explain a
variable. The resulting factors often include a ‘general’ factor with most
variables represented.
• Equamax which is a combination of the above.

7. Reading for this question


In essence the question is one testing the clustering technique with both single and
complete linkage used within the same requirement. In the subject guide this is
well covered in Chapter 11, Section 11.8.5 and in Sample examination questions 1
and 4 of section 11.11.
Approaching the question

(a) First pair E and F since the maximum compatibility is 86:

Similarity on a Scale from 0 to 100 A B C D [E, F] G H I


A —
B 26 —
C 81 50 —
D 21 80 35 —
[E, F] 73 72 79 70 —
G 57 30 71 10 83 —
H 68 20 14 65 28 54 —
I 59 32 66 61 48 60 45 —

Then link G to [E,F] via the greatest remaining entry, 83. We thus have our
cluster E, F and G to eliminate.
(b) The remaining similarity matrix, with E, F and G removed, will be:

243
B. Sample examination paper – Examiners’ commentary

Similarity on a Scale from 0 to 100 A B C D H I


A —
B 26 —
B C 81 50 —
D 21 80 35 —
H 68 20 14 35 —
I 59 32 34 39 55 —
Link H with C (because 14 is the least similarity) and update the tableau as
below:

Similarity on a Scale from 0 to 100 A B [C,H] D I


A —
B 26 —
[C,H] 68 20 —
D 21 80 35 —
I 59 32 45 61 —

Note that we do not add B to [C,H] – recognition of this alone gets credit.
To continue we have some alternative approaches, e.g.
• either the lowest similarity with [C,H] other than B is D is 35, so the first
team is [C,D,H] and the second team is the remaining three: [A,B,I]
• or the lowest entry is 21, so link D and A. So far the 1st team is [C,H] and
the 2nd is [A,D]. Now B must be linked with [A,D] so the two teams are
[A,B,D] and [C,H,I].
(c) Make certain your dendrogram – you might reasonably find it easier to
produce separate ones for parts (a) and (b) – has (i) a clear title; (ii) good axis
names and scales; and (iii) avoids crossing lines. Bearing in mind the great
variety of answers possible for parts (a) and (b) the dendogram would be
awarded full marks so long as it is consistent with the candidate’s clustering
process and adhered to the good practice stated above.
(d) There is one mark for each of any two sensible comments. For example:
• In part (a), pairing G with [E,F] as rejects means that the least correlated
pair, G and D, cannot be in one of the two teams.
• In part (b) the process means that the first team of three will have
relatively lower commonality than the second, so team 2 may be more
widely experienced, especially if the ‘either’ approach is used. However,
this is based on single ‘lowest’ values, so perhaps average linkage might be
more helpful here. Alternatively, since B and H are both in the final six,
start with, say, B, and find the two to join B, provided this does not
include H.

8. Reading for this question


This question comes straight from the subject guide. As well as the summary
solution below, see Chapter 10, section 10.8.2 in the subject guide.

244
Approaching the question
(a) If the residuals are not independent, then the relationship between successive
residuals can be modelled. B
Yt = α + β1 X1 + et

and the residual, et , could be represented as

et = ρet−1 + ut ,

where ut is a Normal random variable.


The most common test used for the detection of auto-correlated errors is the
Durbin Watson test. This test is appropriate provided that there is no lagged
endogenous variable used (i.e. salest−1 used to explain salest ).
The statistic is shown below:
n
(et − et−1 )2
P
t=2
d= n .
e2t
P
t=1

The values of d are tabulated in most statistical tables; typically two values
are given, dU and dL . Their use is demonstrated in the diagram below which
depicts testing for auto-correlated errors using the Durbin Watson statistic.

(b) The cause of auto-correlated residuals may be one or more of the following:
• Common trend or cycle in the variables
• Omission of important explanatory variable
• Mis-specification of the form of the equation
• Use of smoothed or adjusted data.
The effects are that the coefficient estimates are unbiased but the variance of
the residuals is underestimated and standard error of the coefficients is
underestimated.

245
B. Sample examination paper – Examiners’ commentary

The importance of autocorrelated residuals, apart from an incorrectly specified


model, is that the effects listed above are likely to lead to an over-assessment
of a model’s statistical significance and consequentially poor inferences.
B

246
Comment form
We welcome any comments you may have on the materials which are sent to you as part of your study
pack. Such feedback from students helps us in our effort to improve the materials produced for the
International Programmes.
If you have any comments about this guide, either general or specific (including corrections,
non-availability of Essential readings, etc.), please take the time to complete and return this form.

Title of this subject guide:

Name
Address

Email
Student number
For which qualification are you studying?

Comments

Please continue on additional sheets if necessary.

Date:

Please send your completed form (or a photocopy of it) to:


Publishing Manager, Publications Office, University of London International Programmes,
Stewart House, 32 Russell Square, London WC1B 5DN, UK.

You might also like