Moam - Info Management-Mathematics 59c225b41723ddbf52d0b67d
Moam - Info Management-Mathematics 59c225b41723ddbf52d0b67d
R.D. Hewins
MT2076
2014
Undergraduate study in
Economics, Management,
Finance and the Social Sciences
This subject guide is for a 200 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 5 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
R.D. Hewins, MSc, DIC, ARGS, ANCM, Senior Teaching Fellow at Warwick Business School,
University of Warwick, and The Tanaka Business School, Imperial College London
With typesetting and proof-reading provided by:
James S. Abdey, BA (Hons), MSc, PGCertHE, PhD, Department of Statistics, London School of
Economics and Political Science.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the author is unable to enter into any correspondence relating to, or arising
from, the guide. If you have any comments on this subject guide, favourable or unfavourable,
please use the form at the back of this guide.
The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form,
or by any means, without permission in writing from the publisher. We make every effort to
respect copyright. If you think we have inadvertently used your copyright material, please let
us know.
Contents
Contents
Preface 1
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.3 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.4 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.5 Overview of topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.6 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.7 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.8 Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.8.1 The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.8.2 Making use of the Online Library . . . . . . . . . . . . . . . . . . 7
0.9 How to use the subject guide . . . . . . . . . . . . . . . . . . . . . . . . 7
0.10 Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
0.11 Examination technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1 Set theory 11
1.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.8 The order of sets: finite and infinite sets . . . . . . . . . . . . . . . . . . 13
1.9 Union and intersection of sets . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 Differences and complements . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.11 Venn diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.12 Logic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.14 Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
i
Contents
2 Index numbers 31
2.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 The general approach and notation . . . . . . . . . . . . . . . . . . . . . 32
2.7 Simple index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8 Simple aggregate index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9 The average price relative index . . . . . . . . . . . . . . . . . . . . . . . 34
2.10 Weighted price relative indices . . . . . . . . . . . . . . . . . . . . . . . . 34
2.10.1 Laspeyres’ (base period weighted) index . . . . . . . . . . . . . . 34
2.10.2 Paasche’s (current period weighted) index . . . . . . . . . . . . . 35
2.10.3 Advantages and disadvantages of Laspeyres’ versus Paasche’s indices 36
2.10.4 Other weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.11 More complex, ‘ideal’ indices . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.12 Volume indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.13 Index tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.14 Chain-linked index numbers . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.15 Changing a base and linking index series . . . . . . . . . . . . . . . . . . 38
2.16 ‘Deflating’ a series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.17 Further worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.18 The practical problems of selecting an appropriate index . . . . . . . . . 42
2.19 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.20 Solution to Activity 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.21 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 47
2.22 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 47
2.23 Guidance on answering the Sample examination questions . . . . . . . . 49
ii
Contents
4 Difference equations 69
4.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 First order difference equations . . . . . . . . . . . . . . . . . . . . . . . 70
4.7 Behaviour of the solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.8 Linear second order difference equations . . . . . . . . . . . . . . . . . . 72
4.9 The non-homogeneous second order difference equation . . . . . . . . . . 74
4.10 Coupled difference equations . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.11 Graphing and describing solutions . . . . . . . . . . . . . . . . . . . . . . 77
4.12 Some applications of difference equations . . . . . . . . . . . . . . . . . . 77
4.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.14 Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.15 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 80
iii
Contents
5 Differential equations 85
5.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 First order and first degree differential equations . . . . . . . . . . . . . . 87
5.6.1 Case I: Variables separable . . . . . . . . . . . . . . . . . . . . . . 87
5.6.2 Case II: Homogeneous equations . . . . . . . . . . . . . . . . . . . 88
5.6.3 Case III: Linear equations . . . . . . . . . . . . . . . . . . . . . . 90
5.6.4 Case IV: Other cases . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7 Second order differential equations . . . . . . . . . . . . . . . . . . . . . 92
5.7.1 Determining the solution constants (e.g. A1 , A2 ) . . . . . . . . . . 93
5.8 Simultaneous differential equations . . . . . . . . . . . . . . . . . . . . . 94
5.9 The behaviour of differential equation solutions . . . . . . . . . . . . . . 96
5.10 Some applications of differential equations . . . . . . . . . . . . . . . . . 97
5.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.12 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 97
5.13 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 97
5.14 Guidance on answering the Sample examination questions . . . . . . . . 100
iv
Contents
v
Contents
9 Forecasting 157
9.1 Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.3 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.4 A note on spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.6 Classification of forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
9.6.1 Classification by leadtime . . . . . . . . . . . . . . . . . . . . . . 159
9.6.2 Classification by information used . . . . . . . . . . . . . . . . . . 160
9.7 The requirements of a forecasting exercise . . . . . . . . . . . . . . . . . 161
9.8 The structure of a time series . . . . . . . . . . . . . . . . . . . . . . . . 162
9.9 Decomposition models of time series . . . . . . . . . . . . . . . . . . . . 167
9.9.1 Model estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.9.2 Forecasting procedure with EWMAs . . . . . . . . . . . . . . . . 168
9.10 Simple Box-Jenkins (ARIMA) methods . . . . . . . . . . . . . . . . . . . 170
9.10.1 The Box-Jenkins methodology . . . . . . . . . . . . . . . . . . . . 170
9.10.2 Autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . 171
9.10.3 Moving average models . . . . . . . . . . . . . . . . . . . . . . . . 171
9.10.4 Autoregressive moving average models . . . . . . . . . . . . . . . 171
9.10.5 Building a model . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.12 A reminder of your learning outcomes . . . . . . . . . . . . . . . . . . . . 176
9.13 Sample examination questions . . . . . . . . . . . . . . . . . . . . . . . . 176
vi
Contents
vii
Contents
12 Summary 229
viii
Preface
0.1 Introduction
People in business, economics and the social sciences are increasingly aware of the need
to be able to handle a range of mathematical tools. This course is designed to fill this
need by extending the 100 courses in Mathematics and Statistics into several even more
practical and powerful areas of mathematics. It is not just forecasting and index
numbers that have uses. Such things as differential equations and stochastic processes,
for example, do have direct, frequent and practical applications to everyday
management situations.
This course is intended to extend your mathematical ability and interests beyond the
knowledge acquired in earlier 100 courses. Throughout the mathematical and
quantitative courses of the degrees we attempt to emphasise the applications of
mathematics for management problems and decision-making. MT2076 Management
mathematics is no exception. However, you must always recognise the need to ‘walk
before you can run’ and hence new topics sometimes need to be covered in a relatively
detailed mathematical way before the topics’ uses can be emphasised by more
interesting and practical examples. It must be admitted that many good managers are
not very mathematically adept. However, they would be even more inquisitive, more
precise, more accurate in their statements, more selective in their use of data, more
critical of advice given to them, etc. if they had a better grasp of quantitative subjects.
Mathematics is an important tool which all good managers should appreciate.
Many of the topics within this course are extensions of the comparatively simple ideas
covered within your 100 courses in Mathematics and Statistics. Other topics are
fundamentally new. The course therefore both extends and reinforces existing
knowledge and introduces new areas of interest and applications of mathematics in the
ever-widening field of management.
0.2 Aims
To extend your mathematical and statistical ability and interests beyond the
knowledge acquired in your 100 courses in Mathematics and Statistics.
To familiarise yourself with, and become competent in, dynamic models and
multivariate (as well as univariate) data analysis.
1
Preface
understand how to analyse complex multivariate data sets with the aim of
extracting the important message contained within the huge amount of data which
is often available
be able to construct appropriate models and interpret the results generated (this
will often be a case of understanding the output from a computerised model)
0.4 Syllabus
Logical use of set theory and Venn diagrams.
Index numbers.
Difference (first and second order) and differential equations (linear, first and
second order). Simultaneous second order equations.
Simple stochastic processes – including ‘Gamblers ruin’, ‘Birth and Death’ and
queuing models. Analysis of queues to include expected waiting time and expected
queue length.
2
0.5. Overview of topics
use mathematical logic (via Venn diagrams) to evaluate the sense, or otherwise, of
given data
understand the widely used (and sometimes misused) concept of index numbers
analyse and model a series of observations over time (a ‘time series’) and forecast
ahead (an obviously useful attribute of a good manager)
understand how econometric models can be used for analysis of complex economic
relationships
use relatively complex and powerful data reduction techniques in analysing the
typically multidimensional observations available to a manager
3
Preface
Although containing many worked examples and, obviously, covering the whole
syllabus, this subject guide is not intended as a textbook and should not be
treated as such. However, if you use it correctly, it will provide a good indication of
the levels required and typical areas of application. A full understanding and
appreciation comes with practice, however, and, to this end, various texts are
recommended for reading. Many of these texts also have worked examples and exercises
which you should systematically work through.
Hanke, J.E. and D.W. Wichern Business forecasting. (Upper Saddle River, NJ:
Pearson Prentice Hall, 2009) ninth edition [ISBN 9780135009338].
Johnson, R.A. and D.W. Wichern Applied multivariate statistical analysis. (Upper
Saddle River, NJ: Pearson Prentice Hall, 2007) sixth edition [ISBN 9780135143506].
A useful introductory textbook which is used in this course is:
4
0.7. Further reading
In addition, many of you will have used (and perhaps acquired) some of the
mathematical/statistical texts for earlier 100 courses. Where appropriate, reference is
made to the relevant chapters of these books. They are indicated by an asterisk (*) in
the list below.
Barnett, R.A. and M.R. Zeigler Essentials of college mathematics for business,
economics, life sciences and social sciences. (Upper Saddle River, NJ: Prentice
Hall, 1994) sixth edition [ISBN 9780023059315]. This book is now hard to find but
it is still useful if you can locate a second hand copy or find one in a library.
* Booth, D.J. Foundation mathematics. (Upper Saddle River, NJ: Prentice Hall,
1998) third edition [ISBN 9780201342949].
Everitt, B.S., S. Landau, M. Leese and D. Stahl Cluster analysis. (London: Hodder
Arnold, 2011) fifth edition [ISBN 9780470749913].
Gujarati, D.N. Basic econometrics. (New York: McGraw-Hill, 2009) fifth edition
[ISBN 9780071276252].
Haeussler, E.F. Jr., R.S. Paul and R.J. Wood Introductory mathematical analysis
for business, economics and the life and social sciences. (Upper Saddle River, NJ:
Prentice Hall, 2010) twelfth edition [ISBN 9780321643728].
Holden, K and A.W. Pearson Introductory mathematics for economics and business.
(Basingstoke: Palgrave Macmillan, 1992) second edition [ISBN 9780333576496].
Jacques, I. Mathematics for economics and business. (Upper Saddle River, NJ:
Prentice Hall, 2012) seventh edition [ISBN 9780273763567].
Levine, D.M., D. Stephan, T.C. Krehbiel and M.L. Berenson Statistics for
managers using Microsoft Excel. (Upper Saddle River, NJ: Pearson Prentice Hall,
2005) fourth edition. [ISBN 9780131440548].
5
Preface
* Newbold, P., W. Carlson and B. Thorne Statistics for business and economics.
(Upper Saddle River, NJ: Prentice Hall, 2012) seventh edition [ISBN
9780132745659].
Owen, F. and R. Jones Statistics. (Upper Saddle River, NJ: Financial Times
Prentice Hall, 1994) fourth edition [ISBN 9780273603207].
Pindyck, R.S. and D.L. Rubinfield Econometric models and economic forecasts.
(New York: McGraw-Hill, 2007) fourth edition [ISBN 9780079132925].
Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
6
0.9. How to use the subject guide
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years’ Study Weekends have been recorded and made available.
Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.
Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.
1. Read the relevant chapter of the subject guide. You may be referred back to earlier
chapters if a refreshment of ideas is required.
3. Go through the worked examples and then tackle as many problems as possible
yourself. Remember that learning mathematics is best done by
attempting problems, not solely by reading.
7
Preface
In planning the workload associated with the course, you should appreciate that the
chapters of this subject guide are of different lengths and will therefore take a different
amount of time to cover. However, to help your time management the chapters and
topics of the course are converted below into approximate weeks of a typical 30-week
university course.
Chapter 1: 2 weeks
Chapter 2: 2 weeks
Chapter 3: 2 weeks
Chapter 4: 2 weeks
Chapter 5: 3 weeks
Chapter 6: 3 weeks
Chapter 7: 3 weeks
Chapter 8: 3 weeks
Chapter 9: 3 weeks
Chapter 10: 4 weeks
Chapter 11: 2 weeks
Chapter 12: 1 week
TOTAL 30 weeks
where available, past examination papers and Examiners’ commentaries for the
course which give advice on how each question might best be answered.
8
0.11. Examination technique
conditions. Eventually you can use past papers as mock examinations – in the meantime
create one for yourself from questions within a book. This is worthwhile doing!
Remember that even generous Examiners cannot award marks for blank pages! It is
surprising how many students fail to answer enough questions, fail to write comments
when required or fail to give sufficient explanation. All these failings are extremely
noteworthy – make certain you avoid them.
In the Examiners’ marking scheme for quantitative subjects, marks are almost always
awarded for method as well as accuracy. Bear this in mind when tackling problems.
State clearly any assumptions you feel it is necessary to make.
Become very familiar with how to operate your calculator. A calculator may be used
when answering questions on this paper and it must comply in all respects with the
specification given in the Regulations and on your Admissions Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
9
Preface
10
1
Chapter 1
Set theory
11
1. Set theory
1
1.5 Introduction
Set theory is a relatively new aspect of mathematics which is now taught at all levels of
education, from primary school upwards. The reason for this is its wide applicability in
denoting and enumerating events and its visual appeal in using Venn diagrams.
Although hidden in new nomenclature and notation, set theory is really only a
combination of logic and enumeration of events. One important use is in probability
theory but its elegance and efficiency in portraying logical associations demands every
student’s attention. Within this course we concentrate on using sets for logic analysis.
There are numerous examples of set theory questions employed for logic analysis within
past examination papers for MT2076 Management mathematics and you are
strongly advised to use some of them at some stage of your preparation.
1.6 Sets
A set is simply a collection of things or objects, of any kind. These objects are called
elements or members of the set. We refer to the set as an entity in its own right and
often denote it by A, B, C or D, etc.
If A is a set and x a member of the set, then we say x ∈ A, i.e. x ‘belongs to’ A. The
symbol 6∈ denotes the negation of ∈ i.e. x 6∈ A means ‘x does not belong to A’.
The elements of a set, and hence the set itself, are characterised by having one or more
properties that distinguish the elements of the set from those not in the set, for example
if C is the set of non-negative real numbers, then we might use the notation
C = {x | x is a real number and x ≥ 0}
i.e. the set of all x such that x is a real number and non-negative.
Since sets are determined by their elements we say that A = B if and only if they have
the same elements.
∅ represents the empty or null set, i.e. a set containing no elements.
The set containing everything is termed the universal set and is usually written as U .
1.7 Subsets
If A and B are two sets and all the elements of A also belong to B then it can be said
that:
A is contained in B
or A is a subset of B
or B contains A.
These expressions are all equivalent and may be symbolically written as A ⊆ B.
12
1.8. The order of sets: finite and infinite sets
1
1.8 The order of sets: finite and infinite sets
A set is said to be finite if it contains only a finite number of elements; otherwise the
set is an infinite set. The number of elements in a set A is called the order of A and is
denoted by |A| or n(A) or nA .
The intersection of two sets A and B is a set containing all the elements that are both
in A and B
i.e. A ∩ B = {x | x ∈ A and x ∈ B}.
If sets A and B have no elements in common, i.e. A ∩ B = ∅,then A and B are termed
disjoint sets.
The above notation can be extended into the case of a family of sets (for example, Ai ,
i = 1, 2, . . . , k). Thus the union of the family is
(a) A ∪ B
(b) B ∩ C
(c) A ∩ B c
(d) A ∩ (B ∪ C).
13
1. Set theory
1
1.10 Differences and complements
If A and B are sets then the difference set A − B is the set of all elements of A which
do not belong to B.
If B is a subset of A, then A − B is sometimes called the complement of B in A.
When A is the universal set one may simply refer to the complement of B to denote all
things not in B. The complement of a set A is denoted as Ac or Ā.
Note: De Morgan’s Theorems
(A ∩ B)c = Ac ∪ B c
(A ∪ B)c = Ac ∩ B c .
The above relationships are most easily confirmed by using a Venn diagram (see below)
to indicate that both sides of the above equations amount to the same areas of the
diagram.
14
1.11. Venn diagrams
1
In Figure 1.1 the triple set Venn diagram is particularly useful and can be used to show,
for example, that
n(A ∪ B ∪ C) = n(A) + n(B) + n(C) − n(A ∩ B) − n(A ∩ C) − n(B ∩ C) + n(A ∩ B ∩ C).
However, beware of trying to solve all problems from an equation point of view
(involving perhaps unions, intersections and complements). Many problems are better
tackled from a logical argument. See Examples 1.5 and 1.6 below.
Activity 1.2 Construct Venn diagrams involving A, B and C to show each of the
following subsets:
(a) A ∪ (B ∩ C c )
(b) (A ∪ B ∪ C)c
(c) B ∪ Ac
(d) (A ∪ B) ∩ (B ∪ C).
84 read magazine A
73 read magazine C
59 read A and B
53 read B and C
32 read A and C
(a) 13 + 19 + 8 = 40
15
1. Set theory
1
(b) 39 + 12 + 33 = 84
(c) 56.
10% have claimed on different occasions for fire and storm damage
15% have claimed on different occasions for storm and flood damage
(a) How many businesses have claimed for all three types of damages (fire, flood
and storm) on separate occasions?
(b) Assuming no business has claimed for the same type of damage more than once,
how many claims in total have been made?
(a) What is the fewest possible number of male degree holders with no experience
in other companies that might be in the department?
16
1.11. Venn diagrams
1
(b) What is the greatest possible number of female employees in the department
who do not have a university degree but have experience in other companies?
From the given information we know there are 6 females, 10 without university
degrees and 8 with no experience in another company. The answer to (a) occurs
when the females ‘use up’ as many ‘degrees’ and ‘no experiences’ as possible, i.e. 6
‘degrees’ and 6 ‘no experiences’. Furthermore, we know that if M = set of males, D
= set of degree holders, E = set with experience in another company then
n(M ∪ D) ≤ 30
and since
n(M ∪ D) = n(M ) + n(D) − n(M ∩ D)
we can say that
n(M ∩ D) ≥ 24 + 20 − 30 = 14.
Similarly, n(M ∩ E) ≥ 16 and n(D ∩ E) ≥ 12. To satisfy these conditions we might
try setting n(M ∩ D ∩ E c ) = 0.
(a) The Venn diagram in Figure 1.3 seems to satisfy all the conditions and hence it
is possible that there are no male degree holders with no experience in other
companies.
(b) Using a similar logic, it is possible that there are 6 females satisfying the
conditions (i.e. do not have a university degree, but have experience in other
companies) as indicated in Figure 1.4.
17
1. Set theory
1
We have approached Example 1.6 in a sort of ‘trial and error’ approach. See Example
1.7 below and Section 1.16 for examples where we determine possible orders of sets in a
more structured fashion.
(1): 25 = 2 + 3 + 5 + x + y + 4 + z ⇒ x + y + z = 11.
10 + x = 12 + y = 9 + x + z
x + x − 2 + 1 = 11 ⇒ x = 6.
(a) the minimum number of males with at least three years’ experience
(b) the maximum number of female graduates who have had at least three years’
experience.
18
1.12. Logic analysis
1
Example 1.8 A company studied the preferences of 10,000 of its customers for its
products A, B and C. They discovered that 5,010 liked product A, 3,470 liked
product B and 4,820 liked product C. All products were liked by 500 people,
products A and B (and perhaps C) were liked by 1,000 people, products A and C
(and perhaps B) were liked by 840 people and products B and C (and perhaps A)
were liked by 1,410 people.
(a) Draw a Venn diagram to illustrate the above information and show that there
must be an error in the data provided.
(b) If the erroneous data are for those people liking products B and C (and perhaps
A) determine:
i. its correct value if all 10,000 customers like at least one product
ii. upper and lower limits on its value if some customers like none of the
products.
Suggested solution:
(a) Construct the Venn diagram in Figure 1.6. It is often a good idea to start from
the triple intersection and work outwards.
Total customers = 3670 + 340 + 500 + 500 + 1560 + 910 + 3070 = 10550.
Hence, there must be an error in the data.
(b) Let n(B ∩ C) = x, then the Venn diagram becomes that shown in Figure 1.7.
i. If the total customers liking A, B and C = 10,000 then
Hence x = 1,960.
ii. Viewing the Venn diagram above, each ‘area’ being non-negative requires
x ≤ 2970, x ≥ 500 and x ≤ 4480. Furthermore, as we have seen x must be
at least 1,960 or we have ‘too many customers’. Thus 1960 ≤ x ≤ 2970.
19
1. Set theory
1
Example 1.9 An airline keeps information about its passengers and has noted the
following facts about the services it supplied between London and New York during
a particular week:
(a) The airline only operated two different types of aircraft which they denote by A
and Z.
(f) Passengers requesting champagne, C, are always businessmen and never have
excess baggage.
Interpret each of the above statements in set notation and hence construct a single
Venn diagram to illustrate the relationships between A, B, C, E, S, X and Z.
The following additional quantitative data are available for the week and route in
question:
(h) Only 20% of all travelling businessmen did not request champagne.
(j) 150 smokers travelled with the airline. This represents 10% of all the travellers.
20
1.13. Summary
1
Suggested solution:
(b) Z ⊆ X
(c) S ⊆ (A ∩ X)
(e) B ⊆ E ∩ S c
(f) C ⊆ B ∩ X c
[Note: there are alternative ways of depicting the above statements in set notation.]
The Venn diagram, taking into account all the above relationships, might look
something like the one shown in Figure 1.8.
n(S) = 150 = 10% implies that n(travellers) = 1,500.
n(C) = 160 and therefore n(C)/n(B) = 0.8 implies that n(B) = 200.
Hence n(E) = 250 and n(Z) = 600, n(A) = 900.
If S ∩ E = ∅, then n(S c ∩ E c ∩ A) = 900 − (250 + 150) = 500 (minimum).
If n(S ∩ E) = 50 (the most since B ∩ S = ∅) then n(S c ∩ E c ∩ A) = 550 (maximum).
1.13 Summary
This chapter stands largely on its own as a topic. However, the concepts covered are
extremely useful for summarising and depicting interrelated information. The topic is
often thought of as being one of the easiest (and hence most popular from a candidate’s
point of view). However, the translation from English to mathematical
notation/diagrams (and vice versa) is a much underrated skill and needs thorough
practice. The remarks included at the end of Section 1.15 might help you to establish
the boundaries of the syllabus.
21
1. Set theory
1
What you do not need to know:
(a) {a, b, c, d, e, f, g, h, j}
(b) {g}
(c) {b, c, d, f }
(d) {a, b, c, e, f }.
Activity 1.2
22
1.14. Solutions to activities
1
Activity 1.3
(a) In total the number of businesses making claims = 20,000/100 = 200. However, the
Venn diagram below is in ‘percentage of businesses making claims’. We let x be the
number of businesses making claims for all three perils.
Activity 1.4
(a) If we let the number who are male with at least three years’ experience be x then
we can construct the Venn diagram below for M (Males) and E (at least three
years’ experience) showing the order of the subsets (in terms of x).
Then, since every subset must have non-negative order, we must have
x ≥ 6, x ≤ 16, x ≥ 0, x ≤ 10.
Hence, in summary, 6 ≤ x ≤ 10. The minimum number of males with at least three
years’ experience is therefore six.
(b) Extending the above Venn diagram to include the set of graduates, G, we can
argue as follows: the number of females is four, each of whom could have had at
least three years’ experience and been graduates. Hence the maximum number of
female graduates who have had at least three years’ experience is four. It can be
checked by the following Venn diagram.
23
1. Set theory
1
construct a Venn diagram (not always straightforward!) from given relational data
interpret Venn diagrams and set notation and explain their meaning in
non-mathematical (‘everyday’) English
use sets and Venn diagrams to analyse data (for example, to show inconsistencies
and to derive maximum and/or minimum orders of sets).
24
1.16. Sample examination questions
1
The table gives the number of employees which fall into each group identified, and
also the percentage of the total salary bill paid to each group.
(a) From this table calculate the number of people (as a function of X) in each of
the eight disjoint subsets which can be logically identified and produce an
appropriate Venn diagram. Similarly produce a fully annotated Venn diagram
for each group’s % of the total salary bill with subset orders as a function of Y .
(6 marks)
(b) Assuming that each subset of the above Venn diagrams has positive and
integer order, determine the smallest possible value for X and the largest
possible value for Y .
(4 marks)
(c) Assuming the values of X and Y determined in (b), which one of the eight
subsets has the lowest salary per person?
(3 marks)
2. A company undertakes a survey of its 120 adult employees and discovers that there
are:
10 unmarried men without degrees
50 married employees
60 employees with degrees
30 unmarried women without degrees
20 women with degrees
15 married women.
(a) Draw a Venn diagram (with W , D, M denoting ‘women’, ‘has degree’ and
‘married’, respectively) in order to determine the maximum and minimum
number of women who are married and have a degree.
(10 marks)
(b) On the assumption that the number of married women with degrees takes its
maximum value, construct a fully annotated Venn diagram (with W , D, M
denoting ‘women’, ‘has degree’ and ‘married’, respectively) to show the order
of each subset.
(4 marks)
(c) Making use of the diagram in (b) above, describe each of the following subsets
in words and state their order:
i. (W ∪ M )c
ii. (W c ∩ Dc ∩ M )
iii. M ∩ (W ∪ Dc ).
(6 marks)
25
1. Set theory
1
caused by faults in one, two or all three of the components A, B and C. Analysis of
10,000 subassemblies shows that 95% of the subassemblies are free from faults.
Within the remainder there were 350 faulty A components, 250 faulty B
components and 150 faulty C components. Of the subassemblies that failed, 220
were caused by failures in two components only, and (of these 220) 170 had faulty
A components.
(a) Draw a Venn diagram to illustrate the above situation.
(2 marks)
(b) Create an equation for total component breakdowns and hence determine how
many subassemblies tested had:
i. faults in all three components at the same time.
ii. no faulty B or C components?
(8 marks)
(c) For each separate subset of your Venn diagram determine the maximum and
minimum number of faulty assemblies within the subset consistent with all the
information given in the question.
(6 marks)
(d) If the subassembly repairs cost for A, B and C components are respectively
$5, $3 and $2, what are the maximum and minimum possible costs for
repairing all the faulty components of the subassemblies tested which had
faulty B components?
(4 marks)
1. (a) We have the following Venn diagrams for workforce and % of total salary,
respectively.
26
1.17. Guidance on answering the Sample examination questions
1
(b) Since each subset must have positive and integer order then 1 ≤ X ≤ 6 and
1 ≤ Y ≤ 3. [Note the difference between ‘positive’ and ‘nonnegative’.] Hence
the minimum value of X is 1, and the maximum value of Y is 3.
(c) Using the above values for X and Y and evaluating the percentage of salaries
per person for each of the eight subsets we can construct the following table:
Subset Workforce % total salary % salary ÷
workforce
F ∩ P c ∩ Sc 30 12 0.40
P ∩ F c ∩ Sc 42 40 0.95
S ∩ Fc ∩ P 4 11 2.75
F ∩ P ∩ Sc 20 23 1.15
F ∩ S ∩ Pc 6 5 0.83
S ∩ P ∩ Fc 7 1 0.14
F ∩P ∩S 1 3 3.00
F c ∩ P c ∩ Sc 10 5 0.50
Hence the lowest salary per person (in bold) for subsets is S ∩ P ∩ F c (perhaps
strangely).
2. (a) Drawing a Venn diagram of W , D and M and letting the number of married
women with degrees be x and the number of married men without degrees be
y gives the following Venn diagram.
27
1. Set theory
1
Noting that each subset must have non-negative order will produce the result
that 0 ≤ x ≤ 15 and hence the minimum order required is 0 and the maximum
order is 15.
(b) & (c) Setting x = 15 gives the following Venn below and enables us to determine:
i. n(W ∪ M )c = 35
ii. n(W c ∩ Dc ∩ M ) = 20
iii. n(M ∩ (W ∪ Dc )) = 35.
This gives the following Venn diagram.
3. (a) Using the labels of areas (subsets) we obtain the following Venn diagram.
28
1.17. Guidance on answering the Sample examination questions
1
Letting n(A ∩ B ∩ C) = x, we can then generate the Venn diagram for
subassemblies as shown.
i.e. 280 − x + 2(220) + 3x = 750. Hence 720 + 2x = 750 and therefore x = 15.
The above may be obtained by looking at the extreme cases, shown below.
29
1. Set theory
1
(d) Within B areas 3 and 6 are ‘fixed’. Hence the above diagrams depict the
‘worst’ and ‘best’ cost situations for the problem posed, i.e.
30
Chapter 2 2
Index numbers
To establish what a wide range of alternative index construction methods are used
and what a wide range of applications they have.
understand how index numbers are created and for what reason
work with all the following types of indices: price and quantity, simple, relative and
aggregate, fixed base and chain-based, Paasche and Laspeyres, ideal and non-ideal
fully interpret the message an index is telling you – this is an underrated skill
appreciate the difficulties involved in choosing the best index for a given situation.
31
2. Index numbers
2 Of the texts listed in the introduction to this subject guide, both Jacques (Chapter 3)
and Owen and Jones (Chapter 5) have reasonable coverage of index numbers. However,
any other modern statistical text now tends to have a chapter on this increasingly used
topic. Those particularly interested in the topic could refer to Allen, R.G.D. Index
numbers in theory and practice. (Basingstoke: Palgrave Macmillan, 1982).
2.5 Introduction
In many ways this section of the subject stands apart from the rest. It is a
self-contained topic with little or no overlap with other chapters in this subject
introduction. However, index numbers are an increasingly used and much maligned
phenomena of the present day world. All managers should appreciate the uses and
abuses of Index numbers. It is included in the MT2076 Management mathematics
syllabus both because of its growing importance for managers and because surprisingly
few statistical courses devote sufficient time to the topic.
Indices are now used to measure a worker’s or company’s performance, the activities
within financial markets, a countrys economic standing, etc. They are even used to
determine the wage levels for certain types of workers.
Although the arithmetic of indices is simple, you will need to exercise great care in
selecting the appropriate index to use and in performing the often tedious calculations
involved. In addition, it is important for you to be able to interpret what (if anything!)
a particular index value or series of index values is telling you.
The term ‘commodity’ is used here to indicate the object(s) under consideration – it
might be a car, a television, a basket of food, a week’s worth of labour, etc.
32
2.7. Simple index
Year $ per hour ‘Price’ of labour ‘Price’ relative Index (Base 2000 = 100)
2000 5.0 1.00 100.0
2001 5.2 1.04 104.0
2002 5.5 1.10 110.0
2003 6.0 1.20 120.0
2004 6.2 1.24 124.0
Note: The Base (2000 above) is often chosen to be as ‘normal’ as possible (i.e. when
the price is not unduly high or low). The base period should be fairly up-to-date and
consequently is updated periodically.
‘Commodity’ input i Quantity 2002 Base ‘prices’ pi0 2008 ‘prices’ pit
Material A 1 kg 60 66
Material B 2 kg 30 48
Labour 3 hours 15 21
Overheads 3 hours 45 60
Total 150 195
Hence the simple aggregate price index for 2008 (Base 2002) is
195
× 100 = 130.0.
150
33
2. Index numbers
Note: One of the assumptions of the above index is that the quantities will remain
the same throughout our analysis. This is obviously false in many situations. This
2 difficulty can be tackled in various ways.
Example 2.3 Continuing with Example 2.2, the price relatives for the four inputs
are 1.1, 1.6, 1.4 and 1.33, respectively. The average price relative index is therefore
100
(1.1 + 1.6 + 1.4 + 1.33) × = 110.8.
4
The weight wi for commodity i is a measure of the importance of that commodity in the
overall index. It might literally be the weight of commodity i used or purchased, or the
number of units, total expenditure on that item in some period, etc.
34
2.10. Weighted price relative indices
‘Commodity’ input i ‘Weight’ wi = pi0 qi0 Price relative pit /pi0 wi × (pit /pi0 ) 2
Material A 20 1.10 22.0
Material B 20 1.60 32.0
Labour 30 1.40 42.0
Overheads 50 1.33 66.6
Total 120 162.6
Here the relative weights for each item are calculated as the amount spent on each item
in the current year at base period prices (i.e. pi0 qit ).
Thus Paasche’s price index, named after Hermann Paasche, for period t is
k
P k
P
pi0 qit · pit /pi0 pit qit
i=1 i=1
k
× 100 = k
× 100.
P P
pi0 qit pi0 qit
i=1 i=1
Example 2.5 Obtaining and using the new weights wi = pi0 qit for the product cost
example:
‘Commodity’ input i ‘Weight’ wi = pi0 qit Price relative pit /pi0 wi × (pit /pi0 )
Material A 30 1.10 33.0
Material B 40 1.60 64.0
Labour 40 1.40 56.0
Overheads 60 1.33 80.0
Total 170 233.0
35
2. Index numbers
36
2.12. Volume indices
k
P
pit qit
i=1
Paasche’s (aggregate) volume index = k
× 100.
P
pit qi0
i=1
i. Time reversal test: the reversal of the time subscripts should produce the
reciprocal of the original index, i.e. if the index calculates a value of 200 for the
period t2 when using a base of t1 , then it should ideally also give a value of 50 for
the index in t1 when using a base of t2 .
ii. Factor reversal test: the product of the price index and the quantity index
should equal the index of total value, i.e.
k
P
pit qit
i=1
k
× 100.
P
pi0 qi0
i=1
Of those covered in this subject guide only the Irving Fischer index satisfies both the
time reversal and factor reversal tests and is considered a truly ‘ideal index’.
37
2. Index numbers
Chain indices are particularly useful for period by period comparisons but, when
considering a longer time period, indices with a single base are easier to interpret.
Activity 2.1 The table below shows two types of indices calculated over the period
2002 to 2007. The indices are obtained from the total value of output (given in
billions of £) for a particular industrial sector in the UK, and the change in retail
prices (RPI).
(a) Calculate a new index (with base year 2002) of the value of production output
excluding the inflationary effects.
38
2.17. Further worked examples
(b) In which year did the output value show the greatest annual percentage
increase?
2
2.17 Further worked examples
Example 2.6 The following data represent the prices per unit of three different
commodities in 2000 and 2005 and the total value of purchases in those years:
You are asked to construct price indices using (a) Paasche and (b) Laspeyres.
[First note that here, and occassionally subsequently, we have dropped the
commodity suffix ‘i’ and used a shorter notation for the summation over i too.] Since
the question refers to expenditure on the three commodities, the weights are, in
effect, value weights and hence must be multiplied by the price relatives, not just the
prices in the two years. We therefore have:
39
2. Index numbers
Example 2.7 A supplier of office furniture wishes to know if sales in real terms
have increased in the 10-year period 1998–2008. Furthermore he would like to know
2 if stock levels of his furniture were justified by the sales figures. The following data
refer to the stock holdings of the suppliers four main furniture items at the end of
1998 and 2008:
Stock levels
1998 2008
Items Number q0 Value q0 p0 (£) Number qt Value qt pt (£)
Chairs 400 40,000 300 60,000
Cabinets 700 80,000 900 180,000
Desks 140 42,000 200 90,000
Lights 60 30,000 90 60,000
Total sales for 1998 and 2008 were £1,200,000 and £2,400,000, respectively.
(a) Construct a weighted index of the price increases, 2008 as against 1998, for the
four items of stock together.
(b) Calculate using the above index the percentage change of sales in real terms.
Suggested solution:
40
2.17. Further worked examples
Example 2.8 Every month a company purchases four items in the typical
quantities and at the prices shown below:
2
Price per units
Commodity Units Weights March April May
W Kilos 120 45 46 48
X Kilos 50 60 61 62
Y Litres 60 80 70 66
Z Thousand 100 120 130 140
(c) If in June of the same year commodities W and X are expected to increase by
one per cent per kilo and the price of commodity Z is expected to increase by 10
per cent per thousand, how much must the cost per litre of Y decrease in order
that the weighted aggregate price index for June remains the same as for May?
Suggested solution:
(b) A weighted aggregate price index is obtained by using the weights 120, 50, 60
and 100. Although it should be borne in mind that a number of other possible
weighted indices are possible, the following seems reasonable. For April,
P
wi (pi1 /pi0 )
i 120(46/45) + 50(61/60) + 60(70/80) + 100(130/120)
P × 100 = × 100
wi 330
i
334.333
= × 100 = 101.31.
330
Similarly a weighted aggregate price index for May is = 100 × 345.83/330 =
104.79.
41
2. Index numbers
i.e. y = 0.272. So there is a price decrease of 27.20% in the May price for Y .
[Note: A completely different set of results for (b) and (c) is possible if an
aggregate index of P
wi pi1
i
P × 100
wi pi0
i
is formed. In this case the answers become (b) 102.3 and 106.4 and (c) 19.5%.
This demonstrates the ability to create and use many apparently acceptable
indices.]
What are the costs and time delays of acquiring the data?
42
2.18. The practical problems of selecting an appropriate index
surprising as the index membership is reviewed every three months. If interested, see
www.ftse.com/Indices/index.jsp for more details. The ‘weights’ that are used are
constantly changing to give more or less importance to certain shares. These tables
constitute one index measure at moments in time. Many of the shares chosen to be
within the index at one time have been replaced with new ones; the weights
(capitalisations) have changed also. Certain companies are no longer so important, some
have been taken over, some new companies have been created by privatisation, etc.
Some companies have moved in or out of the FTSE 100 on several occasions. The
problems are immense.
43
2. Index numbers
44
2.19. Summary
Activity 2.2 (Mainly for interest) Compare Table 2.2 with Table 2.3 and note
how few companies were in the FTSE 100 in 1989 and 2008. Note also the changing
type of company involved. 2
Activity 2.3 (Mainly for interest) Try to find out what the weightings are for
the 100 companies in the FTSE 100 – the constituents might have changed by the
time you read this subject guide (remember the company list is reviewed every three
months and often several changes occur).
2.19 Summary
What you should know
The subject of index numbers is wide-ranging due to the many alternative indices which
can be created from a data stream – you may come across some extra ones that are not
specifically mentioned within this subject guide. However, this chapter refers to all the
index types you are called upon to understand and use within this course.
There is no obligation for you to know how any particular ‘well-known’ index (for
example, the Financial Times 100 or the Dow Jones) has been created. However, it is
important to understand the difficulties in constructing indices that have such aims.
(a) We have:
45
2. Index numbers
46
2.21. A reminder of your learning outcomes
understand how index numbers are created and for what reason
work with all the following types of indices: price and quantity, simple, relative and
aggregate, fixed base and chain-based, Paasche and Laspeyres, ideal and non-ideal
fully interpret the message an index is telling you – this is an underrated skill
appreciate the difficulties involved in choosing the best index for a given situation.
The Multimix company has used X and Y in its product XANDY in the
proportions 40:60 by weight throughout the above period.
(a) Produce separate material price indices (Base 2000 = 100) for the raw
materials X and Y .
(4 marks)
(b) Construct a chain-based unlinked index for the raw material X and illustrate
its usefulness by determining the year in which the greatest percentage
increase in the price of X occurred. What is the size of this increase?
(4 marks)
47
2. Index numbers
(c) Construct an index series (Base 2000 = 100) for the total material cost of
XANDY. Comment upon this series.
2 (6 marks)
(d) Assuming that the costs of X and Y will continue to increase in the future at a
rate equal to their average rates of increase over the period 2000 to 2006, what
prediction would you give for the XANDY total material cost index in 2008?
(6 marks)
48
2.23. Guidance on answering the Sample examination questions
Year Cost per Cost per Index for X Index for Y Chained
kg X kg Y index for X
2000 7.00 4.00 100.0 100.0
2001 7.35 4.20 105.0 105.0 105.0
2002 7.98 4.70 114.0 117.5 108.57
2003 8.61 4.10 123.0 102.5 107.89
2004 9.10 5.10 130.0 127.5 105.69
2005 9.73 5.40 139.0 135.0 106.92
2006 10.43 5.60 149.0 140.0 107.19
Greatest increase in X per year is for 2002 when the rise was 8.57%.
49
2. Index numbers
(c)
2.
(a) & (b) To combine the two series we multiply the base 1985 values by the conversion
factor 100/188.2 (i.e. the one year of overlap gives a measure of the relative
values of the two indices). By convention we would pick the later of the two
bases for the combined index. Afterwards we would deflate the series by
multiplying by 100/(inflation index) for each year. Then (perhaps) form a new
index series for deflated share prices with 2002 = 100 (by multiplying by
100/24.855). We therefore get the following results:
50
2.23. Guidance on answering the Sample examination questions
3. (a) Indicator =
P
wi (pit /pi0 ) 6.8 362.26 115.2 622.8
P
wi
= 0.2
5.3
+0.4
265.88
+0.25
109.6
+0.15
529.9
= 1.2404. 2
i.e. an index of 124.04.
So the leading economic indicator has increased in value from 1 in 1989 to
1.2404 in 1991. Business activity increased 24% from 1989 to 1991.
Least impact is caused by Exports which rose by only 17.5% with weight of
15%.
(b) i. Laspeyres’ price index =
P
p q (5)(2000) + (12)(200) + (18)(400) + (15)(100)
P it i0 ×100 = ×100 = 89.8
pi0 qi0 (6)(2000) + (10)(200) + (20)(400) + (15)(100)
51
2. Index numbers
52
Chapter 3
Trigonometric functions and
imaginary numbers 3
To indicate how trigonometric functions can be used to model dynamic (or static)
situations where cycles are present.
To explain how imaginary numbers can occur as the solution to certain quadratic
equations.
To provide a solid mathematical basis for some of the problems encountered when
solving difference or differential equations.
sketch graphs of the three main trigonometric functions and functions of them
53
3. Trigonometric functions and imaginary numbers
3.5 Introduction
Sines, cosines and tangents are functions which one learns at school, where they are
mainly taught as a means of solving geometric problems concerning triangles. Although
this is clearly an important application of such trigonometric functions, more important
for a manager and mathematical modeller is the use of such functions in dynamic
relationships (e.g. in describing economic cycles, competitive markets, etc.). These
applications occur because of the cyclical nature of these trigonometric functions. They
are particularly useful in solving certain difference and differential equations but before
embarking upon these important areas (Chapters 4 and 5) we must first learn (or
perhaps simply recall) the basics of trigonometric functions.
Related to this area is the topic of imaginary numbers. It seems strange√that a whole
new number system involving the concept of an imaginary number i = −1 is very
important for modelling and system investigations for economists and management
sciences. However, imaginary numbers are extremely useful in the field of mathematics
and, although it is not the intention of this course to turn you into mathematicians,
they are sufficiently important that their basic ideas and usefulness should be part of
this second/supplementary mathematics course.
Perhaps this chapter is more theoretical in nature than we would initially wish.
However, by means of suitable economic and management models we hope to
demonstrate their usefulness in due course. Furthermore, as already stated, this chapter
is a necessary prerequisite for certain aspects of Chapters 4 and 5 on difference and
differential equations.
54
3.6. Basic trigonometric definitions and graphs (a reminder)
For any angle θ, sin θ is finite and takes values between −1 and +1 (inclusive of these
limiting values). A similar statement holds for cos θ. For tan θ, however, we can have
values anywhere between −∞ and +∞.
The graphs of these trigonometric functions are given in Figures 3.2, 3.3 and 3.4,
respectively.
The angles can be defined in terms of degrees (◦ ) or radians (Figures 3.2 to 3.4) above
use π radians for the horizontal axes. A radian is defined as the angle subtended by an
arc of length 1 in a circle of radius 1. Thus an angle of x radians in a circle of radius r is
subtended by an arc of length rx (see Figure 3.5).
Recognising that the circumference of the circle is 2πr, where π = 3.1416
approximately, then 60◦ = π/3 radians, 90◦ = π/2 radians, 180◦ = π radians and
360◦ = 2π radians etc. Although it is possible to work with either degrees or radians
within this and many other courses involving trigonometric functions, many of the
55
3. Trigonometric functions and imaginary numbers
56
3.6. Basic trigonometric definitions and graphs (a reminder)
application area and texts tend to use radians. This is a practice which this subject
guide will normally follow (although you are perfectly free to use degrees if you prefer).
Values for sin, cos and tan of an angle will be found on all but the most basic
calculators. However, partly (but not entirely) as only a basic calculator is allowed
in the examination, certain values are worth remembering. For example:
Activity 3.1 This activity should essentially be revisiting material you have
already covered in your 100 courses. State the values of each of the following
trigonometric functions without the use of a calculator (you may use surds, i.e.
square roots, where necessary):
(b) cos(135), tan(−45), sin(225), cos(−45), sin(300), tan(420) where the angles are
in degrees.
57
3. Trigonometric functions and imaginary numbers
Activity 3.2 Produce a sketch diagram for each of the following trigonometric
functions:
cos(π/2 − θ) = sin θ
sin2 θ + cos2 θ = 1
cos(α + β) = cos α cos β − sin α sin β
sin(α + β) = cos α sin β + sin α cos β
cos(α − β) = cos α cos β + sin α sin β
sin(α − β) = sin α cos β − cos α sin β.
Activity 3.3
58
3.9. Trigonometric series as expansions
i. cos(4x)
ii. tan(x/2)
iii. −3 sin 2x · cos4 2x.
59
3. Trigonometric functions and imaginary numbers
3.12 Conjugates
The (complex) conjugate of the complex number z = a + ib is defined as z̄ = a − ib. We
see that if a certain complex number is the solution to a quadratic equation then the
conjugate complex number is the other solution.
3
3.13 The Argand diagram
Any single complex number z = a + ib can be represented as a point on a
two-dimensional (2D) graph where the axes are the real and imaginary parts of the
complex number and the coordinates of z are (a , b) (see Figure 3.6). Thus, using our
knowledge of trigonometry we can write a = r cos θ and b = r sin θ and hence
Example 3.1 Suppose we wish to find the real and imaginary parts of z n where
z = a + ib.
60
3.15. A link between exponential expansions, trigonometric functions and imaginary numbers
First we write z as
(a2 + b2 )0.5 (cos θ + i sin θ)
where θ = arctan(b/a).
Hence z n = (a2 + b2 )n/2 (cos nθ + i sin nθ) i.e. the answer has a real part of
(a2 + b2 )n/2 cos nθ and an imaginary part of (a2 + b2 )n/2 sin nθ.
3
3.15 A link between exponential expansions,
trigonometric functions and imaginary numbers
It can be shown that the exponential function exp x (or ex ) can be expanded as
x 2 x3 x4
exp x = 1 + x + + + + ···
2! 3! 4!
z2 z3 z4
exp z = 1 + z + + + + ···
2! 3! 4!
Hence we can write a complex number z in the form reiθ and using De Moivre’s
theorem z n = (eiθ )n = eiθn .
[You might note, as an aside, that eiπ = −1. Perhaps some of you will get the same
sense of amazement as the author always does when he sees such an equation relating
two irrational numbers e and π and the square root of minus one!]
3.16 Summary
This chapter has apparently been based on pure mathematics. However its importance
becomes more obvious when the knowledge acquired is used in practical situations. We
will return to trigonometric functions in Chapters 4 and 5.
The fairly extensive coverage of trigonometric functions in this chapter still leaves a lot
of material uncovered.
61
3. Trigonometric functions and imaginary numbers
Hyperbolic functions i.e. sinh, cosh and tanh. If they are completely meaningless to
3 you then don’t worry!
3.2 (a)
62
3.18. A reminder of your learning outcomes
(b)
sketch graphs of the three main trigonometric functions and functions of them
63
3. Trigonometric functions and imaginary numbers
1. The rate of sales, dS/dt, of a product in a market with cyclical demand is modelled
by
dS πt
= 500 1 + sin
dt 10
where t is measured in weeks.
Determine the total volume of sales of the new product within the first four weeks
using:
(a) direct integration
(6 marks)
(b) series expansion of the sin function up to and including terms in t5 .
(6 marks)
[Note: You may assume that π = 3.1416.]
(d) (4 + 3i)eiπ/3 .
(12 marks)
3. You are given the complex numbers z = 2 − 3i and w = 1 + 4i. Find the real and
imaginary parts of
(a) z − w
(b) zw
(c) z/w
(d) z 8 .
(10 marks)
64
3.20. Guidance on answering the Sample examination questions
(7 marks)
[Note: You may assume that π = 3.1416.]
3
(b) Find the real and imaginary parts of
i. (4 − 3i)/(3 + 2i)
ii. loge √12 (1 − i)
and draw an Argand diagram for your answer to (a).
(7 marks)
[Note: The answer can be left as a function of cos when only basic calculators
are permitted.]
(b) We have
3 5
πt πt 1 πt 1 πt
sin = − + − ··· .
10 10 3! 10 5! 10
Therefore,
4
π 3 t3 π 5 t5
Z
πt
S = 500 1+ − + − · · · dt
0 10 6000 12000000
4
πt2 π 3 t4 π 5 t6
≈ 500 t + − +
20 24000 72000000 0
≈ 500[4 + 2.5133 − 0.3307 + 0.0174]
≈ 500[6.2]
= 3100.
65
3. Trigonometric functions and imaginary numbers
2. (a) We have
2 + 3i (2 + 3i) (3 − 2i) 6 + 9i − 4i + 6 12 + 5i 12 5
= · = 2
= = + i.
3 + 2i (3 + 2i) (3 − 2i) 9 − 4i 13 13 13
(b) We have
3 1
5
1
= 22 =
1 1
= =
i
= −i.
i iii (−1)(−1)i i −1
(c) We have
1 √
π π
loge ( 3 + i) = loge sin + i cos ,
2 3 3
or, more usefully,
π π iπ
loge cos + i sin = loge eiπ/6 = .
6 6 6
(d) We have
√ ! √ ! √ !
π
π 1 3 4−3 3 3+4 3
(4+3i)eiπ/3 = (4+3i) cos + i sin = (4+3i) +i = +i .
3 3 2 2 2 2
3. (a) We have
z − w = (2 + 3i) − (1 − 4i) = (1 + 7i).
(b) We have
zw = (2 + 3i)(1 − 4i) = 2 − 5i − 12i2 = 14 − 5i.
(c) we have
(d) We have
√
z 7 = (2 + 3i)7 = 137 (cos θ + i sin θ)7
where θ = tan−1 (3/2) = [56◦ 310 ], so
√
z7 = 137 (cos 7θ + i sin 7θ) [= 7921.4(0.8274 + i0.5616) = 6554 + 4449i].
Note: You may omit the parts in square brackets ‘[ ]’ if trigonometric functions are
not permitted by the current calculator regulations.
4. (a) We have
x x2 x3 x4 x3 x5 x7
ex = 1 + + + + + ··· and sin x = x − + − + ···
1! 2! 3! 4! 3! 5! 7!
66
3.20. Guidance on answering the Sample examination questions
so
sin x (sin x)2 (sin x)3
esin x = 1 + + + + ···
1! 2! 3!
x3 x5 x6 x4 x6
1 2
= 1+x− + − ··· + x + − + − ···
6 120 2 36 3 60
x5
1 1
+
6
3
x −
2
+ · · · + x4 + · · ·
24
3
2 4
x x
= 1+x+ − + ··· .
2 8
Hence,
π/3 π/3
x 2 x3 x5
Z
sin x
e dx ≈ x + + −
0 2 6 40 0
= 1.04720 + 0.54831 + 0.19140 − 0.03148
= 1.7554.
(b) i. We have
ii. We have
1
loge √ (1 − i) = loge eiu
2
√ √
where cos u = 1/ 2, sin u = 1/ 2 i.e. u = 7π/4. Hence
1 7π
loge √ (1 − i) = iu = i.
2 4
67
3. Trigonometric functions and imaginary numbers
68
Chapter 4
Difference equations
69
4. Difference equations
4.5 Introduction
In Chapter 5 we will see equations involving a variable, y say, and one or more of its
differentials, dy/dx, d2 y/dx2 , etc. say (sometimes denoted Dy, D2 y, etc.). Such
equations are termed differential equations. Similarly an equation involving a function,
Y say, and one or more of its differences, ∆Y . ∆2 Y , etc. say, is called a difference
equation. ∆Y is a notation for Yk+1 − Yk , for example.
We will see that the solution procedures and terminology are very similar between
differential and difference equations.
A difference equation is said to be linear if it can be written in the form
4
f0 (k)Yk+n + f1 (k)Yk+n−1 + · · · + fn (k)Yk = g(k) for k = 0, 1, 2, . . .
Y (t) = aY (t − 1) + b
70
4.7. Behaviour of the solutions
It is easy to see (and you should convince yourself of this) that if a = 1 then the
sequence of numbers Yk is an arithmetic progression with common difference b and first
term Y0 . Therefore Yk = Y0 + bk.
If b = 0, you should also convince yourself that we have a geometric series with common
ratio a and first term Y0 . Hence Yk = ak Y0 .
If a 6= 1 and b 6= 0 then Y2 = aY1 + b = a(aY0 + b) + b = a2 Y0 + b(1 + a) and hence
Y3 = aY2 + b = a3 Y0 + b(1 + a + a2 ) etc. and, in general,
Yk = ak Y0 + b(1 + a + a2 + · · · + ak ) = ak Y0 + [b(1 − ak )/(1 − a)] = Y ∗ + (Y0 − Y ∗ )ak
where Y ∗ = b/(1 − a) and is known as the ‘constant’ or ‘time-independent’ solution.
71
4. Difference equations
Activity 4.1 Solve the following equation and describe the solution series (note
that Y0 = 5):
1
Yk + Yk−1 − 5 = 0.
2
Yk+2 + a1 Yk+1 + a2 Yk = rk , k = 0, 1, 2, . . .
m2 + a1 m + a2 = 0.
This is the auxiliary equation or characteristic equation. If its roots are real and
different, say m1 and m2 , then the complementary function for the difference
equation is
Yk = Amk1 + Bmk2 , for A, B constants.
The particular solution is any solution that we are able to find for the original
complete equation. This can be obtained by guesswork, although there is a perfectly
logical procedure for finding such particular solutions.
The general solution is then the sum of the complementary function and the
particular solution.
Strictly we have three cases to consider when trying to determine the complementary
function:
72
4.8. Linear second order difference equations
then Yk = (A + Bk)mk
Yk = (g + ih)(d + if )k + (g − ih)(d − if )k .
We use initial condition about Y0 , Y1 for example to solve for d, f , g and h. The
solution is often converted into a trigonometric relationship using the mathematics of
Sections 3.13 and 3.14. The general process is as follows.
Since the roots of the quadratic auxiliary equation are complex conjugates we can write
them as
√ √
m1 = a2 z1 and m2 = a2 z2 ,
where p
−(a1 /2) ± (i/2) 4a2 − a21
z1, 2 = √
a2
complex conjugates, i.e. the roots are
√ iu √
a2 e and a2 e−iu
where
a1
u = arccos − √ .
2 a2
Thus,
k/2 k/2
yk = a2 (Aeiku + Be−iku ) = a2 (E cos ku + F sin ku),
where E = A + B and F = i(A − B) are two real constants. To simplify further, we can
rearrange this equation for Yk so that
73
4. Difference equations
Example 4.3
Suppose Yk+2 + Yk+1 − 6Yk = 0
i.e. (m + 3)(m − 2) = 0
4 Example 4.4
Suppose Yk+2 + 8Yk+1 + 16Yk = 0
i.e. (m + 4)(m + 4) = 0
Example 4.5
Suppose Yk+2 + 0.5Yk+1 + 0.25Yk = 0
i√
p
−0.5 ± 0.52 − 4(0.25)
m1, 2 = = −0.25 ± 0.75
2 2
and hence we have the complementary function:
k/2
1
yk = A cos(ku − θ)
4
74
4.9. The non-homogeneous second order difference equation
The appropriate particular solution, Yk∗ say, depends upon rk . The general approach is
perhaps demonstrated by examples.
If rk = ak then we try a particular solution of the form Yk∗ = Aak for some constant A,
substitute this into the complete equation and solve for A. If, however, a is one of the
roots m1 , m2 of the auxiliary equation then we must try Yk∗ = Akak .
If a = m1 = m2 , by some strange coincidence, then we try Yk∗ = Ak 2 ak etc.
The complete list of trial values of Yk∗ to be tried for different values of rk are given in
the following table:
rk Trial Yk∗ 4
ak Aak
kn A0 + A1 k + A2 k 2 + · · · + An k n
k n ak ak (A0 + A1 k + A2 k 2 + · · · + An k n )
ak sin bk or ak cos bk ak (A1 sin bk + A2 cos bk)
If rk contains a term k n ak and a is a root of the auxiliary equation then in the trial
solution, Yk∗ , we must include the term k n+1 ak if a is a single-fold root of the auxiliary
equation, and k n+2 ak as well if a is a two-fold root. Terms k n+1 ak cos bk and
k n+1 ak sin bk must similarly be included if aeib is a solution of the auxiliary equation. If
the rk consists of more than one term, each of these terms should be treated separately.
Example 4.6 To solve Yt+2 − Yt+1 − 2Yt = 3 with Y0 = 2 and Y1 = 2, we have that
the auxiliary equation is m2 − m − 2 = 0, i.e. (m − 2)(m + 1) = 0, therefore m1 = 2
and m2 = −1 and the complementary function is
Yt = A(2)t + B(−1)t .
For the particular solution we try Yt = C, a constant. Inserting this value into the
given equation we obtain
C − C − 2C = 3,
and hence C = −3/2.
Using the given ‘initial conditions’ we have
Y0 = 2 = A + B − 3/2
and
Y1 = 2 = 2A − B − 3/2.
Solving these two equations we find that A = 7/3 and B = 7/6 The complete
solution is therefore
7 7 3
Yt = (2)t + (−1)t − .
3 6 2
t t
Note: Since (−1) oscillates forever and (2) grows ever larger the solution series for
Yt diverges to infinity in an oscillating manner. The solution series is unstable.
75
4. Difference equations
Yt = (A + Bt)(−1/2)t .
Using Y1 = 1 we get
1 2 1
(A + B) − − + = 1,
2 9 3
and using Y2 = 2 we get
2
1 2 2
(A + 2B) − − + = 2.
2 9 3
Solving these two equations we get A = 88/9 and B = 72/9 = 8. Thus the complete
solution is t
88 1 2 t
Yt = + 8t − − + .
9 2 9 3
This solution series oscillates with decreasing magnitude but eventually comes to
behave like t/3, i.e. it grows linearly towards infinity.
Example 4.8 To solve 2Yk+2 − Yk+1 − Yk = cos kπ we have that the auxiliary
equation is 2m2 − m − 1 = 0, i.e. (2m + 1)(m − 1) = 0 and hence the complementary
function is k
1
Yk = A − + B.
2
For a particular solution we try Yk∗ = A1 cos kπ + A2 sin kπ. Substitution into the
given equation yields
2[A1 cos(k+2)π+A2 sin(k+2)π]−[A1 cos(k+1)π+A2 sin(k+1)π]−[A1 cos kπ+A2 sin kπ] = cos kπ.
Now sin(k + 2)π = sin kπ; cos(k + 2)π = coskπ; sin(k + 1)π = − sin kπ and
cos(k + 1)π = − cos kπ so
2[A1 cos kπ+A2 sin kπ]−[A1 (− cos kπ)+A2 (− sin kπ)]−[A1 cos kπ+A2 sin kπ] = cos kπ
and therefore 2A1 cos kπ + 2A2 sin kπ = cos kπ and hence A1 = 1/2 and A2 = 0.
76
4.10. Coupled difference equations
We would then solve for A and B using any given ‘initial’ conditions.
Activity 4.2 Solve the following equation and describe the solution series:
Yk − 2Yk−1 − 15Yk−2 = 4k
with Y0 = 0 and Y1 = 4.
4
Example 4.9 Suppose that the sequence Yt and Xt are linked by the following
equations, which hold for all t > 0
In additional, suppose we are informed that Y0 = 1 and X0 = 1/6. We can then show
that for t > 1,
Yt − Yt−1 = 6Xt−1 = 6(Yt−2 + 2)
i.e. Yt − Yt−1 − 6Yt−2 = 12.
We therefore now have a second order difference equation in Y which can be solved
in the usual fashion. Try it! You should get the solution that
77
4. Difference equations
problems:
Learning models of consumer behaviour: Data from market surveys can be used to
develop mathematical models of purchasing behaviour. This area of application
4 overlaps with Markov Models (see Chapter 7).
Financial.
Example 4.10
(Some financial applications of first order difference equations)
First order difference equations are very useful in the mathematics of finance (see,
for example, Chapter 4 in Anthony and Biggs).
Capital accrues under compound interest. Suppose a fixed annual interest rate
100r% is available to investors, and interest is compounded annually. If we invest P
then after t years we have an amount P (1 + r)t . The same result can be derived very
simply via difference equations. If we let Yt be the capital at the end of year t, we
have Y0 = P and the recurrence (difference) equation
Now we have a first order difference equation with a = (1 + r) and b = −I. Being
well practised in solving such equations we should be able to produce the solution
series as:
I I
Yt = + P − (1 + r)t .
r r
This formula enables us to answer a number of questions. First, we might want to
know how large the withdrawals I can be given an initial investment of P , if we
78
4.13. Summary
want to be able to withdraw I annually for N years. The condition that nothing is
left after N years is YN = 0, i.e.
I I
+ P− (1 + r)t = 0
r r
4.13 Summary
Difference and differential equations represent an extensive area of mathematics and
there are many types of equations and models which are beyond the scope of this
course. The subject guide tries to show the limitations of the material within the
syllabus. For difference equations you should, for example, note that you do not need to
know third or higher order equations. There is also no need to look at simultaneous
difference equations beyond those covered by Section 4.10.
C + DK − 2[C + D(K − 1)] − 15[C + D(K − 2)] = −16C + 32D − 16kD = 4k.
79
4. Difference equations
80
4.17. Guidance on answering the Sample examination questions
81
4. Difference equations
2. We have
Yt = Ct + It = 80 + 0.6Yt−1 + Yt−2 = 1.6Yt−1 − Yt−2 + 80
i.e. Yt − 1.6Yt−1 + Yt−2 = 80.
Suppose Yt = mt is a solution for the reduced equation, then m2 − 1.6m + 1 = 0
and we find that m = 0.8 ± i0.6.
√ This then leads to a solution of the form
−1
Yt = Ar cos(θt − ) where r = 0.82 + 0.62 = 1, θ = cos 0.8 [= 0.6435r or 36.09o ]
t
3. (a) We have
3 1 1 1
Yt = Ct + It = Yt−1 + 20 + (Yt−1 − Yt−2 ) ⇒ Yt − Yt−1 + Yt−2 = 20.
8 8 2 8
So,
t/2
1 πt πt
Yt = A cos + B sin .
8 4 4
1 1 5
k − k + k = 20 ⇒ k = 20 ⇒ k = 32.
2 8 8
Hence
4
t/2
1 πt πt
Yt = A cos + B sin + 32.
8 4 4
1/2
1 π π
32.5 = cos + B sin + 32 ⇒ B = 1.
8 4 4
Hence
t/2
1 πt πt
Yt = cos + sin + 32.
8 4 4
or
t/2
√
1 πt π
Yt = 2 cos − + 32.
8 4 4
Using the standard formula for the solution to a first order difference equation
we find that
Bt = 500(4/5)t + 500.
83
4. Difference equations
84
Chapter 5
Differential equations
To use examples to indicate how such mathematics are useful for modelling real-life
managerial/economic situations. 5
To re-establish the reason why you have covered Chapter 3 of the subject guide.
solve problems involving first and (constant coefficient) second order differential
equations
85
5. Differential equations
5.5 Introduction
Differential equations might be simply regarded as the continuous equivalent of the
difference equations of Chapter 4. As such we will find that many of the solution
procedures and methods are analogous in both types of equations. As with difference
5 equations, differential equations are concerned with modelling of dynamic relationships.
You have, in a sense, come across simple differential equations whenever you have
solved integration problems, for example solving dy/dx = xn to get the solution
y = xn+1 /(n + 1) + constant is solving a differential equation. We will, of course, be
concerned with more difficult equations within this chapter.
Definitions
i. The order of the differential equation is that of the highest order derivative
present.
ii. The degree of the differential equation is that of the highest power of the highest
order of derivative present.
Thus the equation
3 2 2 4
dy dy
3
+2 + 3xy = 0
dx dx2
is a third order, second degree differential equation.
dy y2
2y = 4A =
dx x
and hence
dy y
= .
dx 2x
This last equation represents a first order, first degree differential equation.
86
5.6. First order and first degree differential equations
Case I(b)
Case I(c)
87
5. Differential equations
Case I(d)
P and Q are functions of x and y but the variables are separable, i.e. we have:
P f (y)
= ,
Q g(x)
x3 y + x2 y 2 − 3xy 3 + 6y 4
dy dv
=v+x ,
dx dx
Therefore (5.1) becomes:
dv dv
v+x = F (v) ⇒ x = F (v) − v = φ(v),
dx dx
say, therefore
dv 1
= dx,
φ(v) x
i.e. Z
dv
= loge x + loge c.
φ(v)
We then evaluate the integral, replace v by y/x and simplify.
88
5.6. First order and first degree differential equations
Therefore,
1
loge v − loge (2v + 1) = loge C − loge x
2
becomes
v 2 x2
loge = loge C 2 = loge C1
2v + 1
which becomes
y 2 x = C1 (2y + x).
89
5. Differential equations
dV 1−V
V +X =
dX 1+V
becomes
dV 1 − 2V − V 2
X = ,
dX 1+V
i.e.
2(1 + V ) dV 2
2
· =− .
V + 2V − 1 dX X
Integrating then gives
5 which becomes
loge X 2 (V 2 + 2V − 1) = loge C
that is
Y2
2 Y
X + 2 − 1 = C,
X2 X
or
Y 2 + 2XY − X 2 = C,
i.e.
(y − 2)2 + 2x(y − 2) − x2 = C.
90
5.6. First order and first degree differential equations
1 dv
=P
v dx
becomes Z
loge v = P dx
d R P dx
y = Qe P dx
R
e
dx
and integrating gives
5
Z
dx dx dx + c .
R R
− P P
y=e Qe
91
5. Differential equations
92
5.7. Second order differential equations
For first order differential equations the constants can be determined by using
initial (or ‘end’) conditions often given in the form y = 0 when x = 0, dy/dx = 1
when x = 2 etc., for example.
For second order differential equations the constants within the PI are determined
directly by substituting the PI and its first and second derivatives into the given
equation and then equating coefficients on left- and right-hand sides of the
differential equation. Then, and only then, can the constants within the
complementary function be determined by evaluating the general solution
(complementary solution + particular integral) for ‘end’ conditions.
dy d2 y
= 2A3 x + A4 and = 2A3 ,
dx dx2
and hence, inserting these functions into the given differential equation produces:
93
5. Differential equations
Example 5.6 (Where each of the given equations involves only one of the
dependent variables, and can therefore be treated separately.)
Suppose d2 x/dt2 = a and d2 y/dt2 = b. Then
at2 bt2
x= + A1 t + A2 and y = + B1 t + B2
2 2
where a and b are given constants and A1 , A2 , B1 and B2 are constants to be
determined using initial conditions. It is, of course, possible that one or both of the
two given equations are more complicated but for which the methods of Sections 5.6
and 5.7 can be applied.
94
5.8. Simultaneous differential equations
Example 5.7 Suppose dx/dt = −ay and dy/dt = ax. Eliminating y, say, we have
d2 x dy
= −a = −a2 x
dt2 dt
and this can be solved using the methods of Section 5.7 to produce solutions of the
form x = A cos(at + ) and y = A sin(at + ) for constants A and to be determined.
(D + b)x + aDy = 0
eDx + (D + f )y = 0.
(1 − ae)D2 y + (b + f )Dy + f by = 0
and (1 − ae)D2 x + (b + f )Dx + f bx = 0
i.e. two ‘straightforward’ second order differential equations which can each be
solved using the methods of Section 5.7.
If c and g are not zero, but are given constants, a particular solution is evidently
x = c/b and y = g/f . The constants of the solutions for x and y are often dependent
upon each other as a consequence of the given initial conditions.
(m2 +b)(m2 +g)−(am2 +c)(f m2 +h) = 0 ⇒ (1−af )m4 +(b+g−ah−cf )m2 +(bg−ch) = 0.
95
5. Differential equations
We can then solve this for two solutions of m2 , say m21 and m22 . We then get plus or
minus the square root of each of these as a possible solution for m.
Finally, if appropriate conditions are met so that we have two positive roots for m2 ,
we see that we might have the solutions of the form:
Finally, if we have one positive solution for m2 and one negative one, −p2 say, we
will ultimately finish up with a solution of the form:
96
5.10. Some applications of differential equations
5.11 Summary
Following Chapters 4 and 5, and the numerous examples contained within them, you
should now have a good grasp of tackling a wide range of dynamic models. Once again,
however, it is necessary to note that for differential equations there is no need to go
beyond the bounds of the topics in this chapter. 5
What you do not need to know
Third or higher order equations (although the methods outlined above for second
order equations carry over very well for these more complex equations)
solve problems involving first and (constant coefficient) second order differential
equations
1. Suppose K(t) is the amount of capital at time t, k(t) is the excess of capital over
equilibrium amount Ke and I(t) is the rate of investment at time t. Furthermore,
suppose that a deficiency of capital below a certain equilibrium level Ke leads to an
increase in the rate of capital investment and a surplus of capital leads to a
97
5. Differential equations
dk(t)
= I(t)
dt
k(t) = K(t) − Ke
dI(t)
= −ak(t)
dt
for some a > 0.
(a) Derive a second order differential equation in k(t) and solve it under the
assumption that I(0) = I0 and k(0) = k0 .
(10 marks)
(b) Show why your analysis above is contrary to the experience in many countries
5 that capital can grow indefinitely.
(2 marks)
(c) Suppose now that the above model is changed so that the rate of investment
consists of two parts:
i. an amount depending only on K(t), say rK(t) and
ii. an amount whose rate of change depends upon how much total capital
differs from equilibrium level, i.e. suppose
dI dK
=r − a(K − Ke )
dt dt
and furthermore that
Ke = K0 ebt .
Create a second order differential equation in K(t) and show that this model
will allow for capital to grow indefinitely.
(8 marks)
d2 w dw
− 8 + 16w = 64
dt2 dt
where w = 7 and dw/dt = 11 when t = 0.
(7 marks)
98
5.13. Sample examination questions
99
5. Differential equations
iii. Draw sketch graphs for (and comment upon the behaviour of) Q(t) for the
three cases where K0 is equal to, or greater than or less than β/α.
(4 marks)
(b) Solve the following second order differential equation
d2 z dz
2
− 4 + 4z = 8(t2 + sin 2t)
dt dt
where z = 6 and dz/dt = 2 when t = 0.
(9 marks)
5 −
p dq 4p2
= 2 .
q dp p +1
If q = 4 when p = 1 determine the demand function qD as a function of p.
(10 marks)
At t = 0, k0 = c1 + c2 and
dk(t) √ √ √ √
= I(t) = −(c1 + c2 ) a sin at + i(c1 − c2 ) a cos at
dt
√
so at t = 0, I0 = i(c1 − c2 ) a.
Hence
√ I0 √ √
k(t) = k0 cos at + √ sin at = A sin( at + )
a
where √
I2
−1 k0 a
A = 0 + k02
2
and = tan
a I0
100
5.14. Guidance on answering the Sample examination questions
√
i.e. k oscillates with period equal to 2π/ a and amplitude
2 1/2
I0 2
+ k0 .
a
d2 k(t) dI dK d2 k dK
2
= =r − a(K − Ke ) → 2 − r + aK = aKe
dt dt dt dt dt
and
d2 k d2 K
2
= 2
− b2 K0 ebt .
dt dt
Hence we have the following second order differential equation in K: 5
d2 K dK
2
−r + aK = (a + b2 )K0 ebt .
dt dt
The auxiliary equation is m2 − rm + a = 0 and hence
√
r ± r2 − 4a
m=
2
giving α and β, say. Then
For a particular solution we try K = Debt and substituting this into the
equation we find that
(a + b2 )K0
D= 2 .
b − rb + a
Hence
K(t) = Beαt + Ceβt + Debt
and, if b > 0 and for suitable constants, K(t) can grow exponentially.
2. For
d2 w dw
2
−8 + 16w = 64
dt dt
we try w = Aeλt and hence the auxiliary equation is:
λ2 − 8λ + 16 = 0
w = Ae4t + Bte4t .
For a particular solution we try w = k which means that 16k = 64 and hence
k = 4. Thus
w = Ae4t + Bte4t + 4.
101
5. Differential equations
w = 3e4t − te4t + 4.
d2 n1 dn2
2
= −k2 = k1 k2 n1
dt dt
d2 n2 dn1
= −k 1 = k1 k2 n2 .
dt2 dt
5 Solving these in the usual fashion (Section 5.7) will give the solutions:
√ √ √ √
n1 = A 1 e k1 k2 t
+ B1 e− k1 k2 t
and n2 = A2 e k1 k2 t
+ B2 e− k1 k2 t
.
Using the equations for N1 and N2 together with the relationships between the
constants A1 , A2 , B1 and B2 established above, we can establish that:
r !
1 k2
A1 = N1 − N2
2 k1
r !
1 k1
A2 = N2 − N1
2 k2
r !
1 k2
B1 = N1 + N2
2 k1
r !
1 k1
and B2 = N2 − N1
2 k2
i.e. when
Bi
ek1 k2 t = − .
Ai
102
5.14. Guidance on answering the Sample examination questions
So the smaller the value of −Bi /Ai the less time to die out. Company 2 will be
the first to lose its customers if −B2 /A2 < −B1 /A1 . Inserting the values for
A1 , B1 , A2 and B2 and rearranging will give the required condition that
N12 k1 > N22 k2 .
4. (a) Rewriting the given differential equations using the differential operator D, i.e.
d/dt, then
(D + λ)x − λy = α (1)
and (D + µ)y − µx = β (2)
where α = c2 − λc1 and β = −c3 .
Adding µ times (1) to (2) times (D + λ) gives
(D2 + (µ + λ)D)y = αµ + λβ,
5
i.e. a straightforward second order differential equation to solve as follows:
The auxiliary equation is m2 + (µ + λ)m = 0 and hence m = 0 or −(µ + λ).
Hence the complementary function is y = A + Be−(µ+λ)t .
For the particular solution we try y = Ct. Deriving dy/dt and d2 y/dt2 ,
inserting them into the differential equation will give:
αµ + λβ
C= .
µ+λ
Thus the general solution is
−(µ+λ)t αµ + λβ
y = A + Be + .
µ+λ
Now, from (2), we know that
(D + µ)y − β
x =
µ
1 dy β
= +y−
µ dt µ
1 −(µ+λ)t αµ + λβ αµ + λβ β
= −(µ + λ)Be + + t + A + Be−(µ+λ)t −
µ µ+λ µ+λ µ
α − β λ −(µ+λ)t αµ + λβ
= A+ − Be + t.
µ+λ µ µ+λ
Setting y = 0 and x = 0 when t = 0 allows us to solve for A and B to give:
(β − α)µ (α − β)µ
A= and B = .
(µ + λ)2 (µ + λ)2
103
5. Differential equations
2
If K0 = β/α then K = K 0 eαt /2 + β/α, for K 0 a positive constant, and
2
Q = βt + αK 0 teαt /2 . Hence we have exponential growth on top of a linear
relationship between Q and t:
104
5.14. Guidance on answering the Sample examination questions
2 /2
If K0 = β/α then K = K 00 eαt + β/α, for K 00 a negative constant, and
2
Q = βt + αK 00 teαt /2 .
Hence Q goes to zero at an increasing rate:
z = Ae2t + Bte2t .
Equating coefficients:
t2 : 4C = 8, i.e. C = 2
t1 : −8C + 4D = 0, i.e. D = 4
t0 : 2C − 4D + 4E = 0, i.e. E = 3
sin 2t : −4F + 8G + 4F = 8, i.e. G = 1
cos 2t : −4G − 8F + 4G = 0, i.e. F = 0.
105
5. Differential equations
6. We have
p dq 4p2 dq 4p
− = 2 → =− 2 dp
q dp p +1 q p +1
and then integrating both sides we find that
k
ln q = −2 ln(p2 + 1) + c = ln
(p2 + 1)2
for some constant k. So q = k/(p2 + 1)2 and using the initial condition we have
4 = k/(1 + 1)2 and hence k = 16 and q = 16/(p2 + 1)2 .
106
Chapter 6
Further applications of matrices
To extend the matrix theory covered in earlier 100 courses to cover determinants,
eigen values and eigen vectors (these additional tools are required to understand
aspects of Chapter 8).
draw networks from given matrices and construct matrices to represent given
networks
construct and use transition matrices, e.g. to determine long-run equilibrium state
probabilities
determine the determinant, eigen values and eigen vectors of given matrices.
107
6. Further applications of matrices
108
6.6. Input-output economics
6
Figure 6.1: A three-product input-output diagram.
Example 6.1 The technology matrix for a three-industry input-output model is:
0.5 0 0.2
A = 0.2 0.8 0.12 .
1.0 0.4 0
and
7.6 4 2 5 58
(I − A)−1 b = 16 15 5 3 = 145
14 10 5 4 120
and therefore the necessary production amounts for the three commodities are 58,
145 and 120 units, respectively.
109
6. Further applications of matrices
6.7 Networks
An increasingly important area of management mathematics/operational research is the
use of graph theory and network theory in modelling realistic situations. Central to
these ideas is the use of matrices to represent interrelationships. Having depicted the
situation with matrices, a computer can then be used to solve specific problems, e.g.
transportation problems, transshipment problems, allocation problems, maximum flow
between two points, optimal company configuration for maximal communication,
minimum cost of depot locations, critical path (minimum completion time) for a
project, etc. We are clearly not going to be heavily involved in these areas within this
course. However, this chapter is intended to highlight the usefulness of matrices in
extracting the mathematical relationships from a given diagram or given situation. You
should appreciate how the resulting matrices can be manipulated.
Example 6.2
(Representation of a network as a matrix)
Suppose we have Figure 6.2 where nodes 1 to 5 are connected by arcs along which
6 something (money, water, electricity, goods, chemicals, information, etc.) flows. The
number alongside each directed arc might represent the flow, the arc capacity, the
cost per unit flow, etc.
We can then represent this diagram by the following matrix, where the (i, j)th
element corresponds to the ‘value’ on the arc connecting node i to j:
0 0 2 0 0
6 0 0 0 0
3 4 0 5 0 .
0 0 0 0 4
3 0 1 2 0
110
6.7. Networks
111
6. Further applications of matrices
(b) So if x5 = 400,
1 0 0 1 x1 800
1 1 0 0 x2 1200
=
1200 .
0 1 1 0 x3
0 0 1 1 x4 800
Solving using matrix methods
1 0 0 1 1 0 0 0
0 1
0 −1 −1 1 0 0
0 0 1 1 1 −1 1 0
0 0 1 1 0 0 0 1
gives
1 0 0 1 1 0 0 0
0
1 0 −1 −1 1 0 0
.
0 0 1 1 1 −1 1 0
0 0 0 0 −1 1 −1 1
At this point the last row of the matrix equation tells us we have not
6 got a unique solution. It tells us, however, that:
x1 + x4 = 800 (1)
x2 − x4 = 400 (2)
x3 + x4 = 800 (3)
0≤ x1 ≤ 800
400 ≤ x2 ≤ 1200
0≤ x3 ≤ 800
0≤ x4 ≤ 800.
(c) If x2 + x4 = 1400, then continuing with matrix methods with this new
additional last line to find an inverse and a solution vector:
1 0 0 1 1 0 0 0
0 1 0 −1 −1 1 0 0
0 0 1 1
1 −1 1 0
0 0 1 1 1 −1 1 0
0 1 0 1 0 0 0 1
112
6.8. Transition probabilities and Markov chains
becomes
1 0 0 1 1 0 0 0
0 1 0 −1 −1 1 0 0
0 0 1 1 1 −1 1 0
0 0 1 1 1 −1 1 0
0 0 0 2 1 −1 0 1
which becomes
1 0 0 0 1/2 1/2 0 −1/2
0
1 0 0 −1/2 1/2 0 1/2
0 0 1 0 1/2 −1/2 1 −1/2
0 0 0 1 1/2 −1/2 0 1/2
and so
1/2 1/2 0 −1/2 800 300
−1/2 1/2 0 1/2 1200 900
x=
1/2 −1/2 1 −1/2 1200
=
300 .
In order to find the equilibrium probabilities (the limiting case of p(t) as t gets larger
and larger) we can proceed as follows. Suppose the equilibrium probabilities are
p1 p2 p3 then
0.6 0.3 0.1
p1 p2 p3 = p1 p2 p3 0.2 0.7 0.1
0.1 0.3 0.6
113
6. Further applications of matrices
In certain texts, for example Dowling, you will come across the concepts of eigen values,
characteristic roots, eigen vectors, etc. These can prove very useful in certain aspects of
matrix manipulation and analysis of matrix-based models. Although these concepts will
not be examined as a separate topic of the course, they will be useful in Chapter 8 when
6 we discuss factor analysis and discriminant analysis. Hence a summary of the main
aspects of eigen values and their associated eigen vectors is given here. Unfortunately
this does mean that, although throughout MT2076 Management mathematics it
has been our aim to avoid the necessity for determinants wherever possible, it is
appropriate to mention them here. Some uses of eigen values and eigen vectors will
become apparent in Chapter 8. Further details of the mechanics of calculating
determinants, eigen values and eigen vectors can be found in Johnson and Wichern,
Chapter 2.
6.9.1 Determinants
4 3
Example 6.5 Suppose A is the 2 × 2 matrix given as A = , then then
1 2
|A| = 4|2|(−1)2 + 3|1|(−1)3 = 4(2) − 3(1) = 5.
114
6.9. A note on determinants, eigen values and eigen vectors
Then
3 2
(−1) + 2 0 2 (−1) + (−1) 0 3
2
3
|A| = 3 (−1)4
1 0 7 0 7 1
= 3(−2) − 2(−14) − 1(−21)
= −6 + 28 + 21
= 43.
9 1
Example 6.7 Suppose A = , then the eigen values of A are the λs that
1 9
satisfy
9−λ 1
= 0,
1 9−λ
i.e. (9 − λ)(9 − λ) − 1 = 0, i.e.
81 − 18λ + λ2 − 1 = 80 − 18λ + λ2 = (8 − λ)(10 − λ) = 0. Hence the eigen values are
λ1 = 8 and λ2 = 10.
Example 6.8 Continuing with Example 6.7, the eigen vector corresponding to
λ1 = 8 is
x1
x=
x2
such that
9 1 x1 x1
=8
1 9 x2 x2
i.e. 9x1 + x2 = 8x1 and x1 + 9x2 = 8x2 .
−1
So setting x2 = 1 (arbitrarily) gives x1 = −1 and x = or, in its unit length
1
115
6. Further applications of matrices
(normalised) form √
−1/√ 2
.
1/ 2
The normalised eigen vector corresponding to the eigen value λ2 = 10 is
√
1/√2
.
1/ 2
A further noteworthy fact concerning eigen values and eigen vectors is the spectral
decomposition:
If A is a symmetric k × k matrix, then A can be decomposed using its eigen values
λ1 , λ2 , . . . , λk and corresponding eigen vectors x1 , x2 , . . . , xk as follows:
k
X
A= λi xi x0i .
i=1
We will find (in Chapter 8) how this is useful when A is a covariance matrix.
6
6.10 Matrices in linear programming
Many relationships in economic and management models take the form of linear
equations or, more usually, linearly constrained functions. Before the advent of personal
computers, etc. reasonably difficult linear programs were solved by hand using what is
known as the simplex algorithm. This involves creating equalities from possibly given
inequalities by adding in slack variables. The resulting set of equations is then solved
using the algorithm which really amounts to matrix manipulation. You should
appreciate the usefulness of matrices in the area and, furthermore, recognise that
computer-based software often uses matrix arrays, etc. in a similar fashion.
6.11 Summary
What you need to know
How to construct input-output models and how to solve them using matrix
methods.
The relationship between network and matrices (in both directions); how to
construct matrix descriptions of various problems.
How to construct and solve transition probability problems; how to evaluate the
equilibrium probabilities.
A basic understanding of how matrices can be used to solve linear programs; the
power and limitations of such models.
The assumptions that are required for each of the above mathematical models.
116
6.12. A reminder of your learning outcomes
How to solve linear programs (which can be done using matrix methods or
graphically).
draw networks from given matrices and construct matrices to represent given
networks 6
construct connectivity matrices and use them effectively
construct and use transition matrices, e.g. to determine long-run equilibrium state
probabilities
determine the determinant, eigen values and eigen vectors of given matrices.
117
6. Further applications of matrices
(c) If it costs $10,000 per period when the production is inoperative (both
machines down) and zero otherwise, what is the expected average cost per
period when (i.) q = 0.1, and (ii.) q = 0.5?
(4 marks)
2. (a) Given the following matrix of technical coefficients for products X, Y and Z:
X Y Z
X 0.1 0.1 0.2
A= Y 0.1 0.1 0.1
Z 0.1 0.3 0.1
Determine the changes in total output for the three products when the final
demand for X rises by 2,000 and the final demand for Z falls by 1,600 units
simultaneously.
(12 marks)
(b) The following matrix shows the number of different ‘one-stage’ (i.e. visiting no
cities en route) journeys between cities A, B, C and D.
6
A B C D
A 1 1 0 0
B 0
0 1 0
A= .
C 1 0 0 1
D 0 0 1 0
118
6.14. Guidance on answering the Sample examination questions
(b) Identify all absorbing states, and determine which (if any) of the Markov
chains are absorbing. [Note: A Markov chain is said to be absorbing if it has at
least one absorbing state and if it is possible to go from every state to an
absorbing state (not necessarily in one step)].
(6 marks)
(c) If, for M1 you are initially in the state D, use matrix methods to determine
how many ways there are of going to states A, B and C in three or less
transitions.
(6 marks)
(d) If you are initially in state D for transition matrix M2 , use matrix
multiplication to determine the probability of being in state D after two
transitions.
(4 marks)
0 1 2 3
0 1−q q 0 0
1 0 0 q 1−q
A=
0
.
2 0 0 1
3 1−q q 0 0
π0 (1 − q) + π3 (1 − q) = π0 (1)
π0 q + π3 q = π1 (2)
π1 q = π2 (3)
π1 (1 − q) + π2 = π3 (4)
substituting from (3) into (4) shows that π1 = π3 and then using this fact with
(2) shows that: π3 = (q/(1 − q))π0 .
We also know that π0 + π1 + π2 + π3 = 1 and hence substituting for everything
in terms of π0 and solving gives π0 = (1 − q)/(1 + q + q 2 ) and hence
π1 = q/(1 + q + q 2 ), π2 = q 2 /(1 + q + q 2 ) and π3 = q/(1 + q + q 2 ).
(c) Expected average cost is
10000q 2
π0 · 0 + π1 · 0 + π2 · 10000 + π3 · 0 = .
1 + q + q2
Hence if q = 0.1, average cost per period is 100/1.11 = $90.09 and, if q = 0.5,
average cost is 2500/1.75 = $1428.57.
119
6. Further applications of matrices
Now I − A is
0.9 −0.1 −0.2
−0.1 0.9 −0.1
−0.1 −0.3 0.9
and inverting (by any method) will eventually produce the inverse (I − A)−1 as
1.168 0.225 0.284
0.150 1.183 0.165 .
0.180 0.419 1.198
A B C D
A 1 1 1 0
B 1 0 0 0
C 1 1 1 0
D 1 0 0 1
120
6.14. Guidance on answering the Sample examination questions
3. (a) Since each row of these matrices must add up to 1 (it is obvious that we are
‘going from row to column’ here rather than vice versa) then α = 0.2, β = 1
and γ = 0.1.
(b) M1 has no absorbing state and hence is not an absorbing Markov chain.
M2 has one absorbing state (B) but one cannot reach it from all (indeed any)
other states. Hence it is not an absorbing Markov chain.
M3 has two absorbing states (B and D) and states A and C can reach both of
them (one would be enough!). Hence M3 is an absorbing Markov chain.
(c) The connectivity matrix, C1 say, is:
1
1
1
0
1
0
0
0
6
C1 =
0
.
1 0 1
0 1 1 0
For two transitions we have the following number of routes
2 2 1 1
1 1 1 0
C2 = C12 = 1
1 1 0
1 1 0 1
and for three transitions the number of routes is
4 4 3 1
2 2 1 1
C3 = C13 =
2 2 1
1
2 2 2 0
and the total number of ways of going between states in three or fewer
transitions is C1 + C12 + C13 = C1 + C2 + C3 =
7 7 5 2
4 3 2 1
3 4 2 2 .
3 4 3 1
It is the final row elements 3, 4 and 3 which we require.
(d) We have
0.3 0 0.2 0.5 0.3 0 0.2 0.5
0 1 0 0 0 1 0 0
0 0 0 1 → 0.2 0 0.5 0.3
0.5 0 0.1 0.4 0.5 0 0.1 0.4
0.2 0 0.5 0.3 0.2 0 0.5 0.3
→ 0.37 0 0.24 0.39 .
121
6. Further applications of matrices
Hence the required probability of being in state D after two transitions is 0.39.
122
Chapter 7
Markov chains and stochastic
processes
To explain and establish the wide usefulness of the fundamental stochastic process
models of:
i. the ‘simple random walk’
ii. the ‘gambler’s ruin’
iii. the Poisson process
iv. the ‘birth and death’ process.
7
To establish the main terminology and notation for various queuing models.
fully derive and use the Poisson distribution from the Poisson process assumptions
fully derive and use the general solution for the gambler’s ruin problem
fully derive and use the general solution for the birth and death model
establish how various types of queues can be solved using stochastic processes.
123
7. Markov chains and stochastic processes
7.4 Introduction
Stochastic processes deal with systems which develop in space or time according to
some probabilistic laws. They attempt to describe (and predict) the behaviour of the
system in some mathematical way. They have wide-ranging applications including:
(a) Risk theory (e.g. a mathematical analysis of the random fluctuations in the capital
of an insurance company).
(b) Models for social and labour mobility. Research in movements between social
groups, occupation groups, etc. in order to correlate such movements with other
factors affecting the composition of society. The aim is to have a verified model
that can be used in predicting the composition of the social groups in the future in
order to match this with forecasted requirements for occupational groups, etc.
Recent studies have developed the ideas of the dynamic model of social structure.
(e) Models for population growth; birth and death models, etc.
7
7.5 Some definitions of stochastic processes
Let X(t), Y (t) etc. denote the properties of the system at time t, e.g. the number of
people waiting in a queue at time t; the hours of sunshine on day t, etc.
The ‘state of the system’ at time t1 is the value of X(t1 ).
The ‘state space’ is the total sample space (set of all possible values) of X(t), sometimes
denoted as {X(t)}.
The ‘stochastic process’ comprises the random variables X(t), Y (t),. . . and their
probability laws.
There are several types of possible systems depending upon whether we have discrete or
continuous time, discrete or continuous random variables.
A realisation of the process is X(t) plotted against t.
[Note: For discrete time we usually write Xn (n = 0, 1, 2, . . .) rather than X(t).]
124
7.6. A simple random walk
Usually the process has added complications such as reflecting barriers or absorbing
barriers, e.g. ‘Brownian movement’, ‘Gambler’s ruin’.
Suppose the particle starts at j, unrestricted by barriers.
Let Zi be the jump at ith step, i.e. Zi = ±1, then P (Zi = 1) = p and P (Zi = −1) = q.
Hence the expected value of Zi , E(Zi ) = p − q.
Z1 , Z2 , . . . , Zn give a sequence of identically distributed independent random variables.
The position at time n is Xn = Xn−1 + Zn = j + Z1 + Z2 + · · · + Zn .
So,
1
A= and B = −A
1 − (q/p)a
125
7. Markov chains and stochastic processes
and hence
1 − (q/p)j
θj = .
1 − (q/p)a
For q = p the solution has the form θj = (A + Bj).
Using the boundary conditions gives A = 0 and B = 1/a and hence θj = j/a.
In a similar fashion we can show that the probability that A is ruined is:
(q/p)j − (q/p)a a−j
for q 6= p, or for q = p.
1 − (q/p)a a
Note that, in the long run, one of the players is ruined.
126
7.8. Markov processes
(b) If p(n) tends to a limiting distribution Π, say, then Π satisfies the equation
Π = Π · P (the equilibrium equation). (See Section 6.8.)
i.e. rearranging
pn (t + δt) − pn (t) o(δt)
= −λpn (t) + λpn−1 (t) + .
δt δt
Letting δt → 0 gives
dpn (t)
= −λpn (t) + λpn−1 (t) for n ≥ 1
dt
while
dp0 (t)
= −λp0 (t).
dt
Solving these equations gives
(λt)n −λt
pn (t) = e
n!
i.e. the Poisson distribution with parameter λt.
λ is the rate of the process and the probability of waiting more than T units of time is
given by e−λT (the negative exponential distribution).
Birth and birth and death processes can be solved in a similar fashion to the above (see
Sample examination question 2 at the end of the chapter).
127
7. Markov chains and stochastic processes
the queue discipline, i.e. the rules of queuing and the way in which a customer is
selected from the queue, e.g. FIFO (First in First Out) or LIFO (Last in First Out)
the service mechanism (i.e. the laws governing the service time e.g. a normal
distribution, a negative exponential distribution, etc.)
mean and distribution of server’s busy times (i.e. length of time of continuous
working)
7.10 Summary
What you need to know
How to solve random walk type problems with or without one or two absorbing
barriers.
How to construct transition matrices and solve them using Chapman- Kolmogorov
equations.
128
7.11. A reminder of your learning outcomes
(12 marks)
129
7. Markov chains and stochastic processes
130
7.13. Guidance on answering the Sample examination questions
where r = 1 − p − q.
(b) ‘Random walk’ since, other than at barriers, movement is independent of
previous movements.
‘Reflecting barriers’ because the particle cannot pass through them, nor is it
absorbed, but the barrier allows the particle to be reflected back into the 1 to
a − 1 states.
(c) Assume equilibrium probabilities of state i is πi , i = 0, 1, 2, . . . , a. Then
π0 π 1 π 2 π3 · · πa = π0 π 1 π2 π3 · · π a · T
→ π0 = (1 − p)π0 + qπ1
π1 = pπ0 + rπ1 + qπ2
π2 = pπ1 + rπ2 + qπ3
..
.
πa−1 = pπa−2 + rπa−1 + qπa
πa = pπa−1 + (1 − q)πa .
Working through these equations in turn we find that π1 = (p/q)π0 the first
equation. When substituted into the next it gives π2 = (p/q)2 π0 and so on. In
general,
k
p
πk =
q
π0 . 7
Now,
π0 + π 1 + · · · + πa = 1
2 a !
p p p
→ π0 1+ + + ··· + = 1
q q q
1 − (p/q)a+1
→ π0 = 1
1 − p/q
1 − p/q
→ π0 =
1 − (p/q)a+1
2. (a) i. We have
or, for n = 0,
131
7. Markov chains and stochastic processes
So rearranging:
pn (t + ∆t) − pn (t)
= −(λ + µ)pn (t) + λpn−1 (t) + µpn+1 (t)
∆t
p0 (t + ∆t) − p0 (t)
= −λp0 (t) + µp1 (t).
∆t
Hence in steady state, i.e. letting ∆t → 0 and noting that the left-hand
side of the above equations are differentials and will go to zero in steady
state (and dropping the parameter t) we get:
λ
p1 = p0 = ρp0
µ
and, in general,
pn = ρ n p0 .
Then, since
∞
X 1
pi = 1 → p0 (1 + ρ + ρ2 + · · · ) = 1 → p0 = =1−ρ
i=0
1/(1 − ρ)
so long as ρ < 1.
Hence pn = ρn (1 − ρ) as required.
7 ii. The expected queue line length is
∞
X X X d
L= nPn = n(1 − ρ)ρn = (1 − ρ)ρ (ρn )
n=0
dρ
d X n
= (1 − ρ)ρ ( ρ )
dρ
d 1
= (1 − ρ)ρ
dρ 1 − ρ
1
= (1 − ρ)ρ
(1 − ρ)2
ρ
=
1−ρ
λ/µ
=
1 − λ/µ
λ
= .
µ−λ
(b) If the mean inter-arrival time for guests is 15 minutes then λ = 1/15 and,
similarly, µ = 1/10. Hence ρ = λ/µ = 2/3. Substituting into the formulae
above we obtain p0 = 1/3 = 0.333, p1 = 2/9 = 0.222, p2 = 4/27 = 0.148 and
p3 = 8/81 = 0.099.
(c) The service rate would probably need to be remodelled since there may or may
not be a server (taxi) to serve the queue. There may also be a build-up of taxis
awaiting passengers.
132
Chapter 8
Stochastic modelling, multivariate
models
To introduce the typical multivariate data used in multivariate models – such data
will be also be widely used in Chapters 10 and 11.
8.4 Introduction
In any business application, large amounts of data are encountered. Examples include
the response to market research questionnaires, the share prices quoted in the Financial
Times or Wall Street Journal, and the map references of customer locations. In order to
make more effective decisions, managers need to understand the information contained
in the data. Most data sets can be envisaged in the form of a table, where the columns
are variables and the rows are observations. For example, the raw data from a market
research survey could be presented in tabular form as follows.
133
8. Stochastic modelling, multivariate models
This represents the answers of n respondents to p questions and the answer of the ith
respondent to the jth question is Xij .
If a financial fund manager wanted to examine the performance of the companies within
their fund over a period of time, they could tabulate the necessary data in the following
way.
In this case, Xij represents the share price of the jth company at the end of the ith
time period.
The map references of the customers could be tabulated as follows.
8 Here, variable 1 would be longitude, variable 2 would be latitude and the remaining
variables would describe details about the customers site such as size of storage facility.
In each of these examples, each observation (respondent, time period or customer) is
multivariate; that is, more than one variable (question, company or map reference) is
needed to describe it. The behaviour of these multivariate observations is typically
random or stochastic (the words are synonymous). For example, if we knew how one
respondent had answered 29 out of 30 questions, we would not be able to guarantee an
accurate prediction of the remaining answer. The knowledge of the other answers would
probably improve the prediction that we would otherwise have made, but there will be
a random component in the response that will prevent perfect prediction.
If we think of a univariate (i.e. single) random variable, its random behaviour is
characterised by a probability function, such as the Normal distribution (see Figure
8.1). In an exactly similar way, the behaviour of a multivariate random variable (such as
a set of answers to a market research questionnaire) is characterised by a multivariate
probability function. A bivariate distribution such as a bivariate Normal can be drawn
(see Figure 8.2). Here the two variables are called x1 and x2 , the probability function is
denoted by f (x1 , x2 ). Although the probability function cannot be drawn for more than
two variables the principle remains the same.
There are many statistical techniques designed to analyse multivariate data. Within this
subject we will only be able to look at a small selection, which has been chosen on the
grounds of greater managerial relevance.
134
8.4. Introduction
In order to make more informed decisions and to answer ‘what if’ questions, managers
may want to estimate a particular variable (such as sales per month). This is a
regression type problem as one variable (the dependent variable) is explained in terms
of the other variables (the independent variables).
Chapters 9 and 10 on forecasting and econometrics are concerned with regression type
8
problems, although formal regression estimation may not be used. The additional
structure imposed on the data in these chapters is that the data are time series, i.e. the
observations are taken sequentially over time.
In other situations, often with survey data, the concern is with the number of variables
used to describe the phenomenon observed. For example, a questionnaire investigating
shoppers’ preference for washing-up liquid may have 30 questions, but, intuitively, you
might suggest that there are far fewer underlying concepts describing this preference.
For example these concepts may be:
cheapness/dearness
effectiveness at cleaning
kindness to hands
attractiveness (colour/smell/packaging).
The problem here is one of data reduction: reducing the number of variables observed to
a set of more basic concepts or factors. Often the variables measured fall into groups of
similarly behaving variables. The methods described in Chapter 11 on exploratory data
analysis are designed to gain familiarity with the data. The methods discussed are
designed to detect inter-relationships between variables, to see how similarly pairs of
135
8. Stochastic modelling, multivariate models
8
variables behave. Emphasis is placed on graphical methods, especially in the initial
stages, as they offer a quick and easy way of looking for structure in the data (and as a
means of detecting rogue values due to mis-typing). A method that can be used to
detect groups of similarly behaving variables is cluster analysis (also covered in Chapter
11). Cluster analysis can also be used to group similar observations together.
Other important multivariate methods address specific problems.
Factor analysis constructs a reduced number, say p, of factors to replace the m observed
variables, where p is less than m. The emphasis in factor analysis is on the construction
of easily interpreted factors. This technique is more powerful than the use of cluster
analysis to group variables in many ways; one is that factor analysis allows the factors
to be quantified.
Classification methods such as discriminant analysis and logistic regression are designed
to put the multivariate observations into categories. These are regression type problems
in that the dependent variable is the category the observation belongs to. For example,
banks may collect data about customers to whom they have lent money (such as salary,
family size, time spent at current address, number of credit cards held, etc.); the
relevant categories of interest will be whether the borrower repaid the loan or not. In
this type of application, a database of existing borrowers would be used to calibrate the
model. This model would then be used to predict whether new potential borrowers were
good or poor risks.
136
8.5. Principal component factor analysis
effectiveness
kindness to hands.
The questionnaire will be designed to shed some different lights on the shoppers’
perceptions of these considerations.
The number of really different considerations can be called the dimensionality of the
data. A common objective is to identify the dimensionality of a particular set of data.
An analysis of the data collected ought to indicate how many dimensions adequately
describe the shopper’s decision process.
A common problem in the collection of multivariate data is that the variables measured
are correlated with each other. This problem is often encountered in regression analysis,
when there is multi-collinearity in the data (see Chapter 10).
If the data are plotted for two variables, the scatter diagram shows a cluster of points 8
falling into an elliptical envelope; in more dimensions this shape is called an ellipsoid.
The shape of the ellipsoid gives information about the strength of the relationship
between the variables. The more the ellipsoid departs from a sphere, the stronger the
relationship. The information contained in the shape of the ellipsoid can be described in
terms of the direction and magnitude of the axes of the figure. The attractions of these
axes are that they:
indicate the relative importance of axes; the longest axis has the largest variability
along its length.
Principal components analysis identifies these axes (principal components). The
analysis uses the correlation matrix between the variables as its input. Using eigen value
analysis (see Section 6.9) to decompose the correlation matrix, the associated eigen
vectors become the principal components. They are produced so that they have the
properties of orthogonality mentioned. In addition, they are produced in order of
variation explained. Each component is composed of a linear combination of the original
variables. In summary, the objective of principal components analysis is to transform
the variables measured to a set of uncorrelated linear combinations of these variables.
The data reduction we often wish to undertake might be thought of as being two steps.
The first step (principal components analysis) changes the m correlated variables to m
137
8. Stochastic modelling, multivariate models
uncorrelated variables. The second step (factor analysis) aims to reduce the m
uncorrelated components into p (where p < m) uncorrelated factors.
Figure 8.3: A multiple scatter diagram for brake horsepower, top speed and engine size
of UK cars.
138
8.5. Principal component factor analysis
The principal components analysis output from SPSS (Statistical Package for the Social
Sciences) is summarised below.
Correlation matrix:
Factor matrix:
Note: In order to describe the factors as functions of the variables, the correlation
matrix (in the top of the above tableau), output by SPSS must be inverted.
Final statistics:
The first principal component (factor 1), for example, is composed of a weighted sum of
the three variables:
Factor(1) = 0.444 bhp’ + 0.391 Max speed’ + 0.328 Engine size’
The variables are standardised i.e./e.g.
139
8. Stochastic modelling, multivariate models
Underlying results
Det(S − λI) = 0.
This equation will have p roots λ1 , . . . , λp in order of decreasing size. The eigen vector
v1 is found by solving
Sv1 = λ1 v1 .
It can also be shown that the matrix can be decomposed in the following way:
S = λ1 v1 v10 + · · · + λp vp vp0 .
140
8.5. Principal component factor analysis
The correlations are all quite high showing that there is a high degree of similarity
between the performance of the companies.
If the first two principal components are concentrated upon (these explain 85 per
cent of the total variation), the loadings on the components are as follows:
141
8. Stochastic modelling, multivariate models
The first principal component is an average over all the companies with no
particular emphasis. The second principal component puts negative weights on the
chemical companies and positive weights on the car manufacturing companies.
Xi0 is the variable Xi minus its mean. The factors are formally called common factors,
the coefficients are called loadings.
The factor model is explaining p (say) variables in terms of m factors, where p > m.
This is the data reduction: from observable variables to m unobservable factors plus p
unobservable error terms.
In order to achieve this, various assumptions are imposed on the model:
The factors have variance of 1 and are uncorrelated with each other.
142
8.6. Factor analysis
The error terms and the factors are uncorrelated with each other.
These assumptions allow a non-unique solution to be found for the values of the factor
loadings. We can also express the variance of the variable X as follows:
principal components.
The most straightforward method is to take principal components and discard those
components explaining little variation. Let us look further at the German companies.
The following is an edited and annotated output from SPSS.
If the factor model is a good reflection of the structure of the data, then one would
expect the correlation matrix reconstructed from the factors to closely resemble the
original correlation matrix based on the original variables. The following is a
comparison between the two matrices provided by SPSS.
Reproduced correlation matrix:
143
8. Stochastic modelling, multivariate models
The lower left triangle contains the reproduced correlation matrix; the diagonal,
reproduced communalities; and the upper right triangle residuals between the observed
correlations and the reproduced correlations.
For example the:
144
8.6. Factor analysis
Varimax: (the most common) minimises the number of variables with a high
weighting on each factor.
In effect, the Varimax algorithm rotates the axes until its objective function is
maximised. The objective function is the variance of the squared loadings on the
rotated factors. This leads to high positive or negative loadings for some variables and
negligible loadings on others on each factor. The concentration on some variables and
discounting of others facilitates the labelling of the factors.
In the example the angle of rotation that achieves this maximum is −44.7◦ . The factor
transformation matrix gives this information, the first argument is the cosine of the
angle of rotation, the third is the sine of the angle of rotation, i.e.
cos(−44.7◦ ) = 0.71034 and sin(−44.7◦ ) = −0.70386.
After rotation, the loading of the chemical companies is maximised on factor 1, the
loading of the car companies is maximised on factor 2.
145
8. Stochastic modelling, multivariate models
The scores computed could be used as an index for weekly returns in the chemical and
car manufacturing sectors.
The important distinction to remember is that the:
factor loadings are the coefficients on the factors which are summed to give the
variables
factor scores are the coefficients on the variable values which are summed to give
the ‘observed’ factor.
Principal component factor analysis: a useful first pass through the data. Perform a
varimax rotation. Calculate factor scores for the observations, plot them looking for
outlying values (this might indicate observations which exert an undue influence on
the covariance matrix and might be better removed).
Repeat steps 1–3 for different numbers of factors. Does increasing the number of
factors add to ease of interpretability?
146
8.7. Discriminant analysis
If the data set is large, split the data in half and carry out the factor analyses on
each half. This allows one to check the stability of the results.
The use of accounting variables for building discriminant functions has received much
attention in finance. Some more detailed examples will be discussed later.
Market researchers use questionnaires to identify ‘innovators’, i.e. the sort of person
they should target when promoting a new product.
Candidates for training courses often sit tests as they progress. The scores on these
tests can be used to discriminate between those who will successfully complete the
course and those who will not.
This is the pioneering and still most used form of discriminant analysis, developed in
the 1930s. The approach is to transform a set of multivariate observations into
univariate observations which are used to discriminate between groups.
Let us use the credit risk problem as an illustrative example.
147
8. Stochastic modelling, multivariate models
The multivariate observation for individual i is: incomei , agei , (no. of credit cards)i ,
(size of family)i . Let us refer to this as X = (X1 , X2 , X3 , X4 ).
The two populations are: good credit risks and poor credit risks. Let us denote these
populations as A1 and A2 .
The multivariate observation is transformed to the univariate discriminant score by an
equation of the form:
i.e. Y = a1 X1 + a2 X2 + a3 X3 + a4 X4 = a0 X.
The covariance matrix G (see below) is assumed to be the same for each population.
This assumption makes the solution straightforward, but is also restrictive and difficult
to justify for many applications.
The means of the discriminant scores for each population are:
µ1Y = a0 µ1
µ2Y = a0 µ2 .
a0 (µ1 − µ2 )2
Maximise .
a0 GA
The solution to this optimisation leads to the calculation of the coefficients a1 , a2 , a3
and a4 .
It can be shown that the solution is given by:
Y = a1 X1 + a2 X2 + a3 X3 + a4 X4 = a0 X = (µ1 − µ2 )0 G−1 X.
148
8.7. Discriminant analysis
a shortage of examples from one population (banks have a far greater proportion of
good risks as customers)
the data are biased in that the bank will have discarded many observations that it
considered bad risks before the study is carried out.
where
The SPSS default for P (GROUP i) is 1/(number of groups), i.e. 0.5 when there are two
groups; if the analyst has a better estimate than this, he may incorporate this in the
analysis.
149
8. Stochastic modelling, multivariate models
When λ is close to 1, this implies that there is little difference between the means of the
discriminant scores in each sub-population. (Bad news!)
The closer λ is to 0, the greater the difference between the mean discriminant scores in
each sub-population. (Good news!)
This statistic is an F -distributed random variable. A common stepwise procedure is to
keep choosing the variable that makes the greatest reduction in Wilks’ λ, until the
reduction is no longer significant. In this respect it is similar to stepwise multiple
regression.
8
Example 8.3 An example of discriminant analysis using some bankruptcy
data
These data are taken from Johnson and Wichern and relate to a sample of US
companies (1968–72). The objective is to discriminate between bankrupt and
non-bankrupt companies. The variables are:
150
8.7. Discriminant analysis
(These are an interesting by-product of the analysis. For each case a score is worked
out for membership of group 1 and group 2. The case is allocated to the group where
it has the higher score.)
This is an ANOVA (see Chapter 10) of the discriminant score as the dependent
151
8. Stochastic modelling, multivariate models
8
(A means of judging the contribution made by a variable to discrimination.)
(These are the coefficients used with the unstandardised independent variables to
construct the discriminant score.)
152
8.7. Discriminant analysis
(The null hypothesis is that the covariance matrices in each group are equal. The
test is very sensitive to non-normality, so that rejection of H0 may mean that the
covariance matrices may not differ much but there is non-normality.)
153
8. Stochastic modelling, multivariate models
(This display shows the distribution of discriminant scores in both groups, note the
mis-classified cases.)
Classification results for cases selected for use in the analysis
Percentage of ‘grouped’ cases correctly classified: 91.89 per cent (this is the
apparent success rate).
154
8.8. Summary
Classification results for cases not selected for use in the analysis
Percent of ‘grouped’ cases correctly classified: 66.67 per cent (this is the actual
success rate).
8.8 Summary
What you need to know
How to recognise broad problem types such as: explaining one variable in terms of
another; data grouping or data reduction.
155
8. Stochastic modelling, multivariate models
156
Chapter 9
Forecasting
To give the reader an understanding of the main forecasting methods widely used
for many applications.
construct simple models to forecast time series – these should include exponential 9
smoothing, moving averages, seasonal forecasts and simple Box-Jenkins methods
157
9. Forecasting
Levine, D.M., D. Stephan, T.C. Krehbiel and M.L. Berenson Statistics for
managers using Microsoft Excel. (Upper Saddle River, NJ: Pearson Prentice Hall,
2005) fourth edition [ISBN 9780131440548].
9.5 Introduction
158
9.6. Classification of forecasts
The distance in time between when a forecast is made and the future point to which it
refers is the leadtime of the forecast. For a forecast of UK unemployment in December
2010 made in December 2008, the leadtime is two years.
A convenient and generally accepted division of leadtimes is into short-, medium- and
long-term problems. A time series can be regarded as the output of a system and the
classification follows from this.
If it is believed that the underlying system is unlikely to undergo any significant
changes during the leadtime in question, then the problem is short term. In other
words, short-term forecasting techniques are based on the hypothesis that the relevant
future observations are generated by the same system that generated past observations.
Forecasting product sales for inventory control is a typical short-term forecasting
problem.
If one expects the system to change in one of a number of ways, then the forecasting
problem is over the medium term. An example of a medium-term forecasting problem is
159
9. Forecasting
the demand for a product that will be significantly further through its product life cycle
at the end of the leadtime than it is when the forecast is made.
Long-term forecasting problems occur when there is little or no information available
about the state of the system generating the time series at the end of the leadtime. In
these circumstances, there are little quantitative data available and the opinions of
those with relevant expertise are used to help formulate the forecast.
The length of the leadtime is not sufficient to classify a forecasting problem into short,
medium and long term as the underlying systems may differ drastically in their
susceptibility to change.
Table 9.2 shows some examples of different systems and the differences between the
length of leadtimes in the same class.
9 The entries in the table are realistic estimates of the relevant time periods for each
environment. Note that in some cases, all three classifications cannot be used, as the
demand for particular fashion items has no long term, while the demand for nuclear
power stations has no short term. Note also that the length of the leadtimes associated
with short-term forecasts varies greatly depending on the stability of the system
involved.
160
9.7. The requirements of a forecasting exercise
ii. Causal models. Causal models are multivariate, the underlying hypothesis is that
the behaviour of the variable being forecast is influenced by one or more other
variables.
A simple example is forecasting the sales of a product using advertising
expenditure as an explanatory variable. In this case, it is believed that advertising
expenditure will affect sales in a direct way, perhaps after a time delay. Thus the
inclusion of extra information about advertising expenditure will lead to an
improvement over the univariate forecast. There is no theoretical limit to the
number of explanatory variables used; however, in practice the improvement in
forecasting gained by each new explanatory variable tends to decrease quickly.
The multivariate forecast is a conditional expectation using more information than
the univariate, thus a multivariate forecast for j periods ahead is:
E(Xt+j | Xt , . . . , X0 , Yt , . . . , Y0 , Zt , . . . , Z0 ) (9.2)
161
9. Forecasting
(a) trend
(b) seasonality
(c) cycles
(d) random variation.
Using a decomposition approach, each component is estimated and then these estimates
are re-combined to form a forecast. An advantage of the approach is that the
components are intuitively reasonable and can be easily explained.
In this section these components are discussed and the means of isolating them from a
9 time series will be explained. The first three components listed above are said to be
systematic components and random variation is the residual, sometimes called noise.
Trend
The trend of a time series is the systematic increase of the variable over time. The trend
line is a smooth line indicating the path of the series, ignoring other components. When
forecasting the trend is usually taken to be a straight line, which is extrapolated into
the future. This idea is used when isolating the trend from an existing time series. The
trend is found by taking a moving average of the data available, a process sometimes
called ‘smoothing’.
Trend at time t is
M −1
2
1 X
Tt = Xt+i (9.3)
M
i=− M2−1
162
9.8. The structure of a time series
Example 9.1 Using the historical weekly share price (over a two-year period) of a
company, a major UK food retailer, the effect of different choices of M , for example
7 and 25, is shown in Figure 9.2.
Note that the share price changes quite dramatically from week to week. For a
moving average, with M = 7, the changes in the share price are smoothed to quite a
large extent, and the use of M = 25 produces a very smooth curve.
Seasonality
The seasonality of a time series is the effect on the variable of the time of year that the
measurement is being made. The mechanism by which the season affects the variable
may be via the weather, for example, greater use of fuel for heating during the winter; 9
or it may be due to dates such as holidays or financial year ends.
Seasonal fluctuations are often very pronounced in weekly, monthly or quarterly time
series, and they often contribute a major proportion of the variability of the data.
Isolation of the seasonal pattern is thus of considerable importance in analysing and
forecasting time series.
The first stage is to identify the underlying trend; this represents where the series would
have been in the absence of seasonal and random fluctuation. This is done by a centred
moving average which represents an average for a year’s observations; for example a
centred moving average for quarterly data is:
1
Tt = (Xt−2 + 2(Xt−1 + Xt + Xt+1 ) + Xt+2 ) . (9.4)
8
A centred moving average is necessary when averaging over an even number of
observations, in order to make the timing of the average coincide with an observation,
rather than fall between them.
The next stage is to identify the seasonal pattern. In order to do this, it is convenient to
163
9. Forecasting
Xt becomes Xij
where Xij represents the jth period in year i. Thus t = iL + j, where there are L
periods per year, for example L = 12 or L = 52.
There are several ways of modelling seasonality. Here we shall concentrate on the
multiplicative representation given.
Example 9.2 Calculate a centred moving average for the registrations of new cars
in the United Kingdom (Years 1 to 7). Then calculate the multiplicative seasonal
factors and comment on the seasonal profile revealed.
The full data are given in Table 9.3.
Table 9.4 illustrates the calculations necessary.
The centred moving average is calculated using this version of (9.4):
1
Tt = (xt−6 + 2(Xt−5 + · · · + Xt+5 ) + Xt+6 ) .
24
The data and the centred moving average for these data are shown in Figure 9.3.
The raw seasonal fluctuations are found by dividing the series Xt by the trend Tt
9 and then subtracting 1. [Note: this latter stage of subtracting 1 is not always done
nor is it really necessary so long as when one later produces the forecasts one
remembers the definition of seasonality used when constructing the seasonal
forecasts from the trend forecasts.]
These values are observations of the seasonal factor, Sj , for the relevant months.
These values are averaged for each month to give Sj0 ; the average value of the Sj0 s for
j = 1, 2, . . . , 12 is found and subtracted from each Sj0 , giving the final estimate of the
seasonal factor Sj which satisfies the condition that S1 + S2 + · · · + S12 = 1. This
requirement ensures that a year of seasonal data has the same mean as a year of
de-seasonalised data. The seasonal factors are shown in Figure 9.4. Note that the
peak seasonal factor occurs in month 8 (August) when, in the United Kingdom, the
registration letter on a car’s number plate used to change. The lowest seasonal
demand is in December, as buyers postpone their decision to buy until the next year,
in order to gain a better second-hand price when they come to sell their car.
Note the main features of this seasonal pattern are dictated by arbitrary, legislative
decisions rather than an underlying weather-based effect, etc.
164
9.8. The structure of a time series
Table 9.3: Monthly new car registrations in the United Kingdom over a seven-year period.
165
9. Forecasting
Figure 9.3: New cars registered in the United Kingdom over seven years with a 12-month
centred moving average.
Figure 9.4: Seasonal factors for UK car registration data (multiplicative model).
166
9.9. Decomposition models of time series
Xt = Tt + t (9.6)
where
Tt = Tt−1 + bt
and
t ∼ N (0, σ 2 ), bt ∼ N (0, σb2 )
and
σ 2 σb2 .
(9.6) says that the variable Xt is the sum of two random variables; Tt is the trend term
discussed earlier. If the expected period-to-period change is zero, as in (9.7), it is called
the level of the series; however, the level is subject to a small random change each
period, bt ; t is the random error term (or noise term) which has a far larger variance
than the disturbance in level.
The objective of the forecaster is to make the best possible estimate of the level, Tt , as
this will be the best forecast at time t for the value of the variable to be forecast L
periods ahead, Xt+L .
An incremental trend term can be introduced to make the model more general. This
term, ct , represents the systematic change in level at time t; this term is subject to a
random disturbance term, dt . The model including trend is given here: 9
Xt = Tt + t (9.7)
where
Tt = Tt−1 + bt + ct
ct = ct−1 + dt
and
dt ∼ N (0, σd2 )
and
σ 2 σd2 .
Seasonality can be introduced in the model in a number of ways. The simplest method
is to de-seasonalise the series by dividing the observations by (1 + Sj ), and then
proceeding as in (9.7).
167
9. Forecasting
or
T̂t = T̂t−1 + α(Xt − T̂t−1 ) (9.10)
In words, the estimate of the level at time t is equal to the previous estimates made at
time t − 1 plus α times the previous error. For the model with a systematic trend,
equation (9.10) is adapted and there is a further updating equation for the systematic
trend term.
9 The value of β is also between 0 and 1. Note that the basic updating equation can be
characterised as:
168
9.9. Decomposition models of time series
9
Table 9.5: Details of the ‘level’ only exponential smoothing model on sales of a product.
The systematic trend ĉt is omitted for the random variation only forecast.
The starting points for the calculation of the mean squared error (MSE) are
T̂1 = X1 , ĉ1 = 0.
169
9. Forecasting
The first 40 observations were used for model estimation (i.e. the calculation of the
mean squared error). Using the starting procedure mentioned, the estimated trend
at time t = 2 is X1 = 82.37. Using equation (9.9) then gives
Yt is the response (dependent) variable (i.e. the time series to be forecast) at time t.
170
9.10. Simple Box-Jenkins (ARIMA) methods
Yt−i , for various i = 1, 2, . . . , are the response variables at timelag i. They are used
like independent (explanatory) variables.
Autoregressive models are suitable for stationary time series and the constant φ0 would
be related to the level of the series. If the data vary about zero then the constant φ0 is
not required.
Note: Table 9.6 might be useful to summarise the patterns associated with the various
autoregressive-moving average models.
171
9. Forecasting
Figure 9.5: Autocorrelation and partial autocorrelation coefficients of AR[1] and AR[2]
models. The first four graphs are for AR[1] and the next four are for AR[2]. Variations
occur because of coefficient signs.
172
9.10. Simple Box-Jenkins (ARIMA) methods
Figure 9.6: Autocorrelation and partial autocorrelation coefficients of MA[1] and MA[2]
models. The first four graphs are for MA[1] and the next four are for MA[2]. Variations
occur because of coefficient signs.
173
9. Forecasting
174
9.11. Summary
1. Model identification
The time series being forecast should be stationary, i.e. appearing to vary about a
fixed level. If the series is not stationary then it can often be converted to
stationarity by taking first differences. Diagrams similar to Figure 9.5 are then used
to suggest a model.
2. Model estimation
The parameters of the model are estimated so that they minimise the sum of
squares of the errors, i.e. fitted values minus actual values.
3. Model checking
A model is adequate if the residuals (errors) cannot be used to improve the
forecasts, i.e. they contain no identifiable pattern (i.e. they are random).
An overall check of the model adequacy is provided by the Ljung-Box statistic.
This is defined to be m
X rk2 (e)
Q = n(n + 2)
k=1
n−k
where
• rk (e) is the residual autocorrelation at lag k
• n is the number of residuals
• k is the time lag
• m is the number of time lags to be tested.
If the observed Q value is small (i.e has a p-value of < 0.05) then the model is
considered adequate. [If the data is random, then Q should follow a chi-squared 9
distribution on m degress of freedom.]
4. Forecasting with the model
Consider re-assessing the model once some more data become available.
9.11 Summary
What you need to know
175
9. Forecasting
construct simple models to forecast time series – these should include exponential
smoothing, moving averages, seasonal forecasts and simple Box-Jenkins methods
Although two stock controllers agree that simple exponential smoothing is the best
forecasting method, they disagree about the most appropriate loss criterion for
choosing the smoothing constant. Albert prefers root mean square forecasting
error, but Bertram prefers mean absolute forecasting error. The values of the
smoothing constant that satisfy Albert and Bertram are 0.42 and 0.54, but due to
an administrative error, the information linking the smoothing constant to the loss
criterion has been mis-filed. Using the data above, say which smoothing constant
relates to which loss criterion.
(14 marks)
176
9.13. Sample examination questions
2. The table below shows the total export orders for a company during 2000–03:
(a) Briefly discuss how one might choose between a multiplicative and additive
seasonal forecasting model.
(4 marks)
(4 marks)
(c) Estimate the seasonal variations using a multiplicative model, and thus
forecast the value of exports for the company during the three periods in 2004.
(5 marks)
(5 marks)
9
(e) If the exports for 2004 turn out to be Jan–Apr: 66; May–Aug: 72; Sep–Dec: 68
calculate the root mean square errors (RMSEs) for your 2004 forecasts
obtained from (c) and (d) above.
(2 marks)
1. Xt = φXt−1 + at
2. Xt = µ + θat−1 + at
where at is an error term; φ is 1.05, µ is 115 and θ is 0.1. Ten observations are
given below:
177
9. Forecasting
(a) Evaluate the one period ahead forecasts provided by each equation. Which
forecasting equation gives the lowest mean absolute error?
(14 marks)
(b) Explain how you would forecast more than one period ahead with each
equation, write down an expression for an N period ahead forecast. Evaluate a
forecast made at time t = 10, for N = 5 periods ahead.
(6 marks)
178
9.14. Guidance on answering the Sample examination questions
p
Hence RMSE = (803.2/10) = 8.96 and (mean absolute deviation (MAD)) =
83.3/10 = 8.33.
p
Hence RMSE = (830.2/10) = 9.11 and MAD = 81.4/10 = 8.14.
Hence it would appear that (of these two smoothing constants) 0.42 would be used
if the loss criterion is root mean squared deviation and 0.54 would be used if the
loss criterion was mean absolute deviation.
9
2. (a) See Section 9.8 (Seasonality). A diagram might help to see if the seasonality
factor increases (decreases) as the data rise (fall) or whether they are the same
irrespective of whether the data rise or fall. In the first case one is tempted to
use multiplicative factors, in the second case additive seasonality factors seem
more appropriate.
[Note: In the table below we are using the three-month moving average as a
trend forecast (an interpretation of part (a)). An alternative, giving different
answers, would be to determine a trend through the whole data, i.e
(61 − 45)/11 = 1.455 per period (season) and then produce a trend forecast by
adding on 1.455 each season. This would actually be a more common approach
(but, strictly, is not what the question suggested).]
179
9. Forecasting
(c) For the multiplicative seasonality the average Jan–Apr seasonal factor is
(0.962 + 0.931 + 0.968)/3 = 0.954. Similarly the averages for May–Aug and
Sep–Dec are 1.106 and 0.941, respectively (note that there are four values to
be averaged for May–Aug). The trended forecasts for the first three periods of
2004 are 63.0 + 2(1.4444) = 65.889 for Jan–Apr, 63.0 + 3(1.4444) = 67.333 for
May–Aug and 63.0 + 4(1.4444) = 68.778 for Sep–Dec. Multiplying by the
seasonal factors will then give the seasonal forecasts as 65.889(0.954) = 62.86,
67.333(1.106) = 74.47 and 68.778(0.941) = 64.72, respectively.
(d) For the additive seasonality the average Jan–Apr seasonal factor is
9 (−2 − 4 − 2))/3 = −2.67. Similarly the averages for May–Aug and Sep–Dec are
6.00 and −3.33, respectively (note that there are four values to be averaged for
May–Aug). The trended forecasts for the first three periods of 2004 are as
before and adding the seasonal factors will then give the seasonal forecasts as
65.889 − 2.67 = 63.22, 67.333 + 6.00 = 73.33 and 68.778 − 3.33 = 65.45,
respectively.
180
9.14. Guidance on answering the Sample examination questions
3. The forecasts (together with RMSE and mean absolute errors) using the two
models (equations) are as follows:
Hence the first model (Equation 1) gives the lowest mean absolute error.
For Equation (1) we would use E(Xt+N | Xt ) = φN Xt which gives
For Equation (2) we would use E(Xt+N | Xt ) = µ which gives X15 = 115.
9
181
9. Forecasting
182
Chapter 10
Econometrics, multiple regression
and analysis of variance
To outline the process by which econometric models are formulated and tested.
To give a clear explanation of why assumptions are necessary when model building
(using multiple regression as a specific case) and what happens when they are
invalid.
To introduce analysis of variance (ANOVA) tables and show how they can be used
as a general approach to data analysis and model testing.
To explain how hypothesis tests (t, F and Durbin Watson in particular) can be
used to validate a model.
183
10. Econometrics, multiple regression and analysis of variance
Hanke, J.E. and D.W. Wichern Business forecasting. (Upper Saddle River, NJ:
Pearson Prentice Hall, 2009) Chapter 7. There’s also a small section on ANOVA in
Chapter 6.
Johnson, R.A., and D.W. Wichern Applied multivariate statistical analysis. (New
York: Pearson Prentice Hall, 2007) Chapters 6.4–6.7, 7.
Pindyck, R.S. and D.L. Rubinfield Econometric models and economic forecasts.
(New York: McGraw-Hill, 2007).
10.5 Introduction
Econometrics is the application of statistical techniques to the formulation, estimation
and validation of economic relationships.
An economic model seeks to represent and explain economic behaviour; thus, the
structure of the equations developed must make good economic sense. If the model is
also statistically valid, then it can be used for prediction purposes. The causal models
discussed in Chapter 9 are econometric models designed to represent the economics at
the micro-level of the company.
Most of the effort in econometric modelling goes into the validation of the model as,
thanks to the ease of use of current software, the computation of the coefficients is
straightforward. Such software can also be used to produce an analysis of variance – a
concise and widely used approach to analysing models and testing the results. The topic
of ANOVA could be introduced in other chapters of this subject guide (such is its wide
applicability), but it seems most straightforward to introduce it as optional output from
10 a regression package. Its possible use in other areas will then be referred to. The
application of econometric methods consists of five stages:
formulation
estimation
validation
forecasting
implementation.
These stages will be discussed in turn.
184
10.6. Formulation
10.6 Formulation
Formulation involves three main steps:
Choice. The first choice is of the variables to be included. For example, if the objective
is to model and forecast the sales of a non-essential product, then one might choose to
include a measure of promotional effort (e.g. advertising expenditure), a measure of
buying power (e.g. consumers’ real disposable income) and any other relevant variable
(such as temperature if demand is weather-related).
Preliminary specification. The direction of causality has to be considered. This will
be described by the expected sign and magnitude of the coefficients, for example:
Here we would expect that α, β1 , β2 , β3 > 0 and we would hope that β1 > 1. This means
that we would expect £1 spent on advertising to generate more than £1 worth of sales.
Terminology:
Sales = α + β1 log (Adv. Exp.) + β2 log (Con Ret Disp Inc) − β3 log (Temp.).
log-log:
log(Sales) = α + β1 log (Adv. Exp.) + β2 log (Con Ret Disp Inc) − β3 log (Temp.).
The choice of form depends on how one believes a variable affects the variable being
predicted. If the effect is simply additive, then the linear model is appropriate. If the
effect is proportional, then a log-linear or log-log model may be considered.
10
10.7 Estimation
Estimation involves collecting the data and deciding upon the appropriate estimation
technique.
Data collection. The first stage is the gathering of the data. This is a non-trivial
exercise and it probably will take much time and effort to collect and check the data.
Once they have been collected, subjecting the data set to exploratory data analysis will
help confirm the hypotheses about relationships between the variables (see Chapter 11).
Selection of estimation technique. The choice lies between various types of single
equation models using a least squared error objective function or multiple simultaneous
185
10. Econometrics, multiple regression and analysis of variance
equations. The only method that falls within the scope of this course is straightforward
least squares estimation as described in the first year quantitative methods subject; this
is called OLS (Ordinary Least Squares) by econometricians.
The output from an OLS estimation procedure will give the following information:
coefficient estimates
where α is the significance of the test and z represents a tabulated normal value (or
Student’s t distribution if the regression uses fewer than 30 observations).
10.8 Validation
For a model to be acceptable it must be checked to see if various assumptions have been
significantly violated.
10 The OLS estimation procedure makes a number of assumptions about the data and the
system generating them. Departures from these assumptions have various consequences
which undermine the usefulness of the model. Consider a model of this form:
Yt = α + β1 X1t + β2 X2t + ut .
3. E(u2t ) = constant (the variance of the residual does not change with time or any
other variable); this is called ‘homoscedasticity’
186
10.8. Validation
4. E(ut ut−1 ) = 0 residuals are independent over time. They are not correlated with
themselves (not auto-correlated).
For the independent variables:
1. Xt is truly exogenous
2. the correlation between X1t and X2t is less than 1; that is, they are not collinear.
Further assumptions are that:
187
10. Econometrics, multiple regression and analysis of variance
Figure 10.1: Testing for auto-correlated errors using the Durbin Watson statistic.
The values of d are tabulated in most statistical tables; typically two values are given,
dU and dL . Their use is demonstrated in Figure 10.1.
The coefficient estimates are unstable and change if the number of variables or
number of observations is altered.
The problem can be considered as having at least two variables contributing the same
information about the dependent variable. Thus one solution is to drop one variable.
However, in addition to the duplicated information the independent variable might
contain some unduplicated information.
Remedies:
Do nothing (in spite of the effects, the predictive ability of the model is unaffected).
188
10.9. A case study – Lydia Pinkham’s vegetable compound
Plots of the sales data alongside advertising expenditure are shown in Figure 10.2. Sales
data are plotted alongside disposable income in Figure 10.3. The full data set used is
given in Table 10.1.
189
10. Econometrics, multiple regression and analysis of variance
In line with good forecasting practice, described in Chapter 9, the first 41 observations
out of the 52 available will be used for model estimation. The choice of the variables in
this study has already been made. Our preliminary specification of the model is that:
with all the coefficients positive. We would probably expect β1 > β2 , since the effects of
advertising decrease with time. We will assume a linear model initially. OLS estimation
of the model gives the results shown in Table 10.2.
190
10.9. A case study – Lydia Pinkham’s vegetable compound
10
Table 10.2: Output from multiple regression analysis of Lydia Pinkham data.
191
10. Econometrics, multiple regression and analysis of variance
The F statistic quoted is a check for the overall regression. The null hypothesis is that
all the βs are zero. In this case the F statistic is distributed as an F random variable
with 3 and 37 degrees of freedom (F values are included in most statistical tables). Here
the observed value of F is 38.9 and there is a negligibly small probability of this
occurring if the null hypothesis is true. Thus at least one β is significantly different from
0. Inspection of the values of the Student’s t test for the coefficients shows that only β2
is not significantly different from 0.
Re-estimation of the equation without lagged advertising expenditure produces only
minor changes:
10
In order to investigate whether the passage of time has had any useful effect, a date
variable (simply the year of the observation can be included).
As we can see this makes only a minor improvement.
192
10.9. A case study – Lydia Pinkham’s vegetable compound
10
That is:
Salest = 27608 + 1.35Advt + 7.83Disp.Inc.t − 14.32Datet .
Let us call this model (1).
The coefficient of this date variable is negative showing a downward tendency in sales.
The overall contribution that the disposable income variable and the date variable make
to the equation is to explain the longterm variations in sales about the constant. This
can be seen by looking at Figure 10.3, where disposable income follows a smooth
upward trend and the date term will contribute a negative trend (due to its coefficient).
Advertising is contributing to the explanation of short-term variation. A common
approach is to introduce a lagged endogenous variable, Salest−1 , which provides an
‘anchor’ for future variation in sales and removes the need for the long-term trend
variables of date and disposable income. In this case, there is also an opportunity to
193
10. Econometrics, multiple regression and analysis of variance
re-introduce lagged advertising. The details of the fitting of this equation are given
below.
That is:
Salest = 182.8 + 0.97Salest−1 + 0.55Advt − 0.66Advt−1 .
Let us call this model (2).
A possible development would be to model change in sales (Salest − Salest−1 ) in terms
of changes in advertising expenditure, but this will be left as an exercise for the
interested reader.
The Durbin Watson statistic shows significant positive auto-correlation of errors for all
except the last model (the critical value for dL is about 1.3 at five per cent significance).
In the last model, the behaviour of the DW statistic is biased by the lagged endogenous
variable, so no comment can be made.
In order to judge the degree of multicollinearity, the correlation matrix of the variables
is reproduced below.
10
Note that sales are correlated with all the variables, at least at a 10 per cent
194
10.10. Analysis of variance
significance level. There are significant correlations between advertising and lagged
advertising, advertising and lagged sales, lagged advertising and lagged sales.
Disposable income and the date variable are also quite highly correlated.
It is clear that model (2) fits the data better than model (1) as can be seen from this
summary table below:
Note that standard error is the root mean square error, where an error is (observed −
estimated) over the fitted region.
The forecasting performance of the two models will be examined in the next section.
Since disposable income will not be known exactly in advance, unlike advertising
expenditure, which is a variable controlled by the company, model (1) will be more
difficult to use for practical forecasting than model (2).
195
10. Econometrics, multiple regression and analysis of variance
[Note: the common notation is as follows: Yi are the observed values of the dependent
variable, Ŷi is the corresponding values predicted by the regression model and Y is the
mean of the observed values.]
Again referring back to Table 10.2 we have SSR = 14,775,145.2 , SSE = 4,682,178.4 and
hence (and sometimes also shown in the ANOVA) SST = 19,457,323.6.
The ‘Mean Square’ figures in the table come from dividing the appropriate sum of
squares by the degrees of freedom (DF). Thus MSR = 14,775,145.2/3 = 4,925,048.4 and
MSE = 4,682,178.4/37 = 126,545.4.
Let us now look at doing a hypothesis test using the ‘Mean Square’ data.
Once a regression model is created a useful initial hypothesis to test (to see if the model
is at all worthwhile) is H0 : β1 = β2 = · · · = βk = 0. Accepting this hypothesis would
lead to the conclusion that none of the k independent variables are useful in explaining
the dependent variable and thus we need to produce a fresh model specification with a
new set of independent variables.
The variance of the regression model s2e can be estimated using
n
2i
P
i=1 SSE
s2e = = = MSE.
n−k−1 n−k−1
10 If the null hypothesis H0 is true then MSE = SSR/k is also a measure of error with k
degrees of freedom. As a result of having two possible estimates of a variance (if H0 is
true) then the ratio
SSR/k MSR
F = = .
SSE/(n − k − 1) MSE
has an F distribution with k degrees of freedom in the numerator and n − k − 1 degrees
of freedom in the denominator. Hence we compare the computed value of F with the
critical values of an F table at the appropriate level of significance. If the computed
value lies outside of the critical limits we will reject the null hypothesis and conclude
that at least one of the X variables is useful.
The ANOVA table output from a spreadsheet will often save you the effort of looking
up critical values from an F table by giving you a significance value (equivalent to a
p-value for testing individual variable coefficients). Thus, if the significance value of the
observed F is less than 0.05, say, we conclude that the H0 hypothesis can be rejected.
196
10.11. Forecasting using econometric models
Once again referring back to Table 10.2, the observed F value is MSR/MSE =
4925048.4/126545.4 = 38.92, which is highly significant i.e. lies in an extremely small
tail (approx 0.0000 probability) of the F distribution that one would expect it to belong
to if H0 is true. Hence H0 is rejected and the model does have some worth.
random variation
estimation error in coefficients
prediction errors in exogenous variables.
A detailed analysis of errors can be made using a prediction/realisation diagram where
predicted changes are plotted against actual changes. This diagram shows how well the
econometric model predicts turning points and whether it tends to over- or
under-estimate (see Figure 10.4).
197
10. Econometrics, multiple regression and analysis of variance
point errors. Model (2) tends to underestimate the magnitude of change but makes few
turning point errors.
10.12 Summary
The emphasis in this chapter has been on the rigorous formulation of an econometric
model, the validation of the model and the forecasting performance of the model. The
details of estimation methods are outside the scope of this course.
You should be familiar with the output from regression packages used to estimate
econometric models and be able to write down the relevant model in the form of an
equation.
You should be able to critically examine the validation process and analyse
10 forecasting performance.
198
10.12. Summary
10
199
10. Econometrics, multiple regression and analysis of variance
understand the assumptions made for constructing a multiple regression model and
appreciate the difficulties when they do not hold
200
10.15. Guidance on answering the Sample examination questions
Xt = 17612.5 + 107.4t + et
201
10. Econometrics, multiple regression and analysis of variance
2. For n = 16 observations the tabulated five per cent level critical values of the
Durbin Watson statistic are given as dL = 1.10 and dU = 1.37.
The formal hypotheses are H0 : No autocorrelation against H1 : autocorrelation.
The limits are then evaluated as dL = 1.10, dU = 1.37, 4 − dU = 2.63 and
4 − dL = 2.90. Since the observed value of d is 2.08 it lies within the acceptance
region and hence we conclude that there is no autocorrelation in the equation.
10
202
Chapter 11
Exploratory data analysis
To show how cluster analysis can be used as a means of data reduction as well as
for grouping objects according to some criteria.
perform hierarchical clustering using single, complete and average linkage methods
203
11. Exploratory data analysis
11.5 Introduction
When carrying out large surveys for market research, social research or scientific
research, it is always wise to carry out an initial simple analysis of the data to:
204
11.7. Graphical methods
One of the great benefits modern software offers is ease of graphical presentation.
Spreadsheets offer a wide range of different 2 and 3D graphs and dedicated statistical
packages, such as SPSS and SAS, offer good quality specialist statistical presentations.
A small selection of these will be discussed.
Some data describing the technical characteristics and price of a set of cars for sale in
the UK will be used to demonstrate these methods.
The data are given below in tabular form.
(The variables are in the following units: price in £, maximum speed in miles per hour,
seconds to reach 60 mph (96.6 kph), tour means miles per gallon (0.425 kilometres per
litre) achieved while touring, size-cc is the capacity of the engine, bhp is the maximum
power output, the length is in feet (0.305m) and the weight is in hundredweight (0.05 of
a ton, 50.8 kg).
11
205
11. Exploratory data analysis
A box plot offers the same information as a histogram, but in a more condensed form.
It is designed to convey information about a single variable. It indicates both central
tendency and dispersion and gives information about observations a long way from the
mean or median of the data. The format of a typical box plot is given in Figure 11.1.
Box plots can be used to compare the distribution of variable values when the data are
divided into different categories. In Figure 11.2, the distribution of miles per gallon,
denoted as Tour, is shown for ‘cheap’ cars (less than £12,000) for one category and for
more expensive cars by two box plots.
The left hand box plot in Figure 11.2 for the more expensive cars shows a lower median
miles per gallon (about 34 mpg); cars 14 and 18 are extremely economical in
comparison with the other cars in this category. The cars under £12,000, shown on the
right, are more compactly distributed around a higher median of about 40 mpg. The
Citroen AX, car no. 2, is an extreme value (see Figure 11.1) in its class. The Rover
Montego Diesel, car no. 18, although even more economical, is classified as an outlier
rather than an extreme value because the more expensive cars have a greater
interquartile range (as represented by the box length).
11
206
11.7. Graphical methods
Figure 11.2: Box plots for miles per gallon for ‘cheap’ and ‘expensive’ cars.
Although a simple scatter plot is well known, new software makes it easy to show a
large data set in terms of a matrix of scatter plots.
These plots make any outlying observation very obvious. If one is found, it can then be
checked for accuracy. If it is a genuine observation, the decision whether to keep it
within the data set must be made. Observations a long way from the mean can exert
undue influence in techniques like regression and distort results.
The nature of relations between variables can be seen by the trends in the individual 11
plots. If the data lie in a long narrow ellipse, then the variables will have a strong
positive relationship (if the major axis of the ellipse has a positive gradient) or a strong
negative relationship (if the major axis of the ellipse has a negative gradient).
The car data are shown graphed in a multiple scatter plot in Figure 11.3. Inspection of
the individual plots shows the following example relationships:
A strong positive relationship between: bhp versus maximum speed.
A strong negative relationship between: seconds to 60 mph versus maximum speed.
No linear relationship between: price and touring miles per gallon.
207
11. Exploratory data analysis
[Key: BHP = maximum engine power output; Length = length of car in feet (0.305
metres), Max-sp = top speed in miles per hour, price = price in £, sec60 = time to
reach 60 mph in seconds, size-cc = engine capacity in cc, tour = fuel consumption in
miles per gallon (0.425 kilometres per litre) when touring, weight-k = weight in
hundredweights (0.05 of a ton, 50.8 kg).]
The correlation matrix of the car data is shown below.
11
208
11.8. Cluster analysis
209
11. Exploratory data analysis
In cluster analysis the underlying objective is to divide the data up into a number of
clusters. Each observation or case will be allocated to a cluster, each cluster will contain
similar cases and different clusters will contain cases dissimilar to those in other clusters.
Cluster analysis can be thought of as a form of exploratory data analysis that gives
initial insights into the structure of the data. In addition, it is often an excellent means
of communicating information about a data set in an easily digestible way.
210
11.8. Cluster analysis
If there are only two variables, say latitude and longitude, describing the location of two
depots, then the distance between them is simply:
11
Euclidean distances are not appropriate for binary observations; however, the
relationship between individuals can be summarised in a contingency table.
211
11. Exploratory data analysis
The measure of similarity (or distance) between two individuals is determined by the
number of qualities they have in common, the number neither has and the number only
one has. There are many possible formulas for measuring similarity; one of the most
obvious is:
a+d
.
a+b+c+d
Using this formula, the measure of similarity between School (2) and School (3) is 2/3.
For all the schools these measures of similarity can be shown on a similarity matrix:
212
11.8. Cluster analysis
Step 3 is rather less specific than it at first appears. Once a cluster has more than
one element in it, how is distance/similarity measured?
There are two very simple ways of measuring the distance between two clusters:
1. Using nearest neighbours to define the distance is called single linkage. The
shortest distance between members of each cluster gives the distance between
clusters. (This is shown in Figure 11.6.)
Example 11.1 Perform a cluster analysis of the car data, noting that the different
scales of measurement of the variables mean that a common scale has to be used. In
this case, the values for each variable have been standardised to z values, to give a
mean of 0 and a standard deviation of 1 for each variable. This problem is a little
cumbersome to go through in detail, so we will look at the dendrograms given as
output by a statistical package.
A dendrogram using single linkage is produced first.
213
11. Exploratory data analysis
11
Comments: In order to decide on the appropriate grouping for the cars, the output
from the use of both linkages has to be examined. The final choice is subjective, but
one is looking for homogeneity within groups and recognisable differences between
groups. One can get an impression from the dendrograms, but drawing up a table
with the items labelled is also useful. The membership of the five clusters from single
and average linkage algorithms are shown in the following table. (The dendrograms
214
11.8. Cluster analysis
have had extra lines added to them to make the grouping into five clusters more
apparent.)
In each case two large clusters exist: cluster 2 is identical for both cases, consisting
of larger, faster, more expensive cars which could be classified as medium saloons.
The two cluster 1s have many cheaper and smaller cars in common, but the results
disagree on three cars: the Nissan is separate according to the single linkage 11
algorithm, two Rovers are separate according to the average linking. Both
algorithms agree that the Land Rover and the diesel engined Rover 825 are
sufficiently different to be put in their own clusters.
Example 11.2 Perform a cluster analysis on the variable used in the car data. In
this example, we can use the correlation matrix of the variable as the measure of
similarity. However, as can be seen from earlier in the chapter, there are some
negative correlations. But since it is the magnitude of the correlation rather than its
215
11. Exploratory data analysis
sign that is important, the positive (i.e. absolute) values are taken in each case.
The similarity matrix of the variables for the car data is shown below.
The way in which the schedule is formed will be shown in detail. Searching through
the similarity matrix, for step 1, shows that the most similar variables are size and
weight (0.928); these are formed into a cluster.
The adjusted similarity matrix becomes:
11
216
11.8. Cluster analysis
Note that, as single linkage is being used, the measure of similarity between the
other variables and the new cluster is the maximum similarity (correlation) between
the variables involved.
The next most similar pair of clusters is price/bhp.
The adjusted similarity matrix becomes:
The next most similar are the maximum speed/acceleration cluster with the
price/bhp cluster. The adjusted matrix is:
217
11. Exploratory data analysis
This concludes the clustering process, where all the items are now in one cluster.
The dendrogram below represents the clusters formed.
Note: The dendrogram which follows the above clustering analysis (and those above)
have a horizontal scale which has been produced by computer software. When
producing such diagrams by hand, e.g. in the examination, you should have the scale
matching the values where clustering occurs. For example, size and weight form a
cluster at 0.928 on the horizontal scale, price and bhp then form a cluster at 0.891
on the horizontal scale, etc.
Dendrogram using single linkage
When there are many cases to be clustered, hierarchical clustering becomes unwieldy,
the dendrograms are too big to read easily and little insight is gained. A different
approach is more appropriate in this situation.
11 The K means method is designed to group the cases into a collection of K clusters,
where K is predetermined for each analysis. The analyst makes an informed guess
about the appropriate number of clusters.
Each case is identified by its coordinates (a multivariate observation) and, using the
following algorithm, cases are allocated to the K clusters.
The membership of the clusters is examined and judged on intuitive grounds. If the
clustering is too coarse, K is increased: if the clustering is too fine, K is reduced. The
decision is mainly subjective. If the membership of a cluster suggests an intuitively
satisfactory name which distinguishes the cluster from other clusters, then this is an
encouraging sign that the appropriate number of clusters has been chosen.
218
11.8. Cluster analysis
1. Either partition the items into K initial clusters, or specify K initial centroids
(seed points).
2. Go through the list of items, assigning each item to the cluster whose centroid
(mean) is closest. Re-calculate the centroid for the cluster receiving the item and
for the cluster losing the item.
Example 11.3 Use K-means cluster analysis to divide the car data into four
clusters. The analysis produces the following output. The centres of each cluster are
defined by the mean value of each variable for the items in that cluster as shown
below.
Final cluster centres
The membership of each of these clusters is described in the table below; in addition
to the membership of the cluster, the distance of each car from its cluster distance is
given.
11
219
11. Exploratory data analysis
11
11.9 Summary
Exploratory data analysis is a means of becoming familiar with the information
contained in a large data set. Graphical methods allow the analyst to check the data for
errors and to gain initial insights into interrelationships between variables.
Cluster analysis is a useful tool for finding homogeneous groups in either observations
or variables. As such, it is often a useful means of gaining insight into the data before
carrying out more sophisticated analyses such as factor analysis. Hierarchical clustering
220
11.10. A reminder of your learning outcomes
using a correlation matrix will show how variables cluster and give some indication how
many factors will be needed.
There are several decisions that need to be made during a cluster analysis:
selection of variables
perform hierarchical clustering using single, complete and average linkage methods
221
11. Exploratory data analysis
1. The similarities between interest rates across the world are summarised by the
following correlation matrix.
(a) Draw a ‘box and whiskers’ diagram (‘box plot’) to illustrate these data.
11 (8 marks)
(b) Identify any outlier(s) and suggest what you should do about it/ them.
(3 marks)
222
11.11. Sample examination questions
One of the products clustered has the following values for variables A to E:
Which cluster does this product belong to and what is its ‘distance from its
cluster centre’ ?
(7 marks)
4. (a) Compare and contrast the relative merits of K means cluster analysis and
hierarchical cluster analysis.
(4 marks)
• furthest neighbour
• nearest neighbour
• centroid clustering
• median clustering
• maximum clustering.
11
(2 marks)
Perform a hierarchical cluster analysis on these data, use single linkage to draw
a dendrogram.
(12 marks)
223
11. Exploratory data analysis
(d) If you were asked to remove two questions, which two would lead to least loss
of information?
(2 marks)
5. (a) In multivariate data analysis what is a ‘multiple scatter plot’ and what are its
uses?
(5 marks)
(b) What is meant by the term ‘outlier’ and why are ‘outliers’ important?
(3 marks)
(c) The following table shows bivariate observations on 21 subjects:
11
224
11.12. Guidance on answering the Sample examination questions
and we can determine that the median is 70, lower quartile is 66, upper
quartile is 73, interquartile range = 7. Hence upper outliers are > 83.5 and
lower outliers are < 55.5.
225
11. Exploratory data analysis
[(12 − 6)2 + (15 − 15)2 + (80 − 30)2 + (0.1 − 0.6)2 + (64 − 50)2 ]1/2 = 52.27.
Similarly, one can determine the distance from cluster centre 2 to be 35.57,
from cluster centre 3 to be 30.23 and from cluster centre 4 to be 23.88.
Hence the cluster the product belongs to is cluster 4 with ‘distance’ 23.88.
Alternatively, if one chose to use absolute values as distance measure:
Distance from cluster centre 1 is
|12 − 6| + |15 − 15| + |80 − 30| + |0.1 − 0.6| + |64 − 50| = 70.5.
Similarly, the distance from cluster centres 2, 3 and 4 becomes 51.0, 35.8 and
42.4, respectively.
Hence the product is placed in cluster 3 with ‘distance’ 35.8.
4. (a) K means can deal with many cases but needs a given value for K and thus
involves several attempts. Hierarchical methods give a dendrogram which is
informative but is only useful for relatively few observations. It can be used for
measuring the relationship between variables.
(b) Complete linkage is equivalent to ‘Furthest neighbour’.
(c) We start by clustering Q6 and Q7 (since they have the highest correlation)
11 and the amended tableau will become:
Note that there is no need for more than a triangular ‘matrix’ and there is no
need for the diagonal of zeros. Note also that the values involving Q6/Q7 are
226
11.12. Guidance on answering the Sample examination questions
the highest when given a choice e.g. the value for the cell Q5, Q6/Q7 is the
highest of 0.717 and 0.517 i.e. single linkage takes an optimistic view.
Continuing in this manner we will form the clusters as follows: Q5, Q6/Q7,
Q8, Q9, Q10 → Q5/Q6/Q7, Q8, Q9, Q10 → Q5/Q6/ Q7, Q8, Q9/Q10 →
Q5/Q6/Q7/Q8, Q9/Q10 → Q5/Q6/Q7/Q8/ Q9/Q10.
(d) We would remove the most correlated questions i.e. two from Questions 5, 6
and 7 (the first three clustered).
227
11. Exploratory data analysis
ii. Remember to leave the data in their original unordered pairs. When
producing the scatter diagram of X against Y you will see the subjects 20
and 21 are indeed away from the rest of the pattern (scatter). In addition,
you should see that subject 18 looks to be an additional (bivariate) outlier.
11
228
Chapter 12
Summary
This course and its subject guide are illustrative of many areas of mathematics (and
statistics) which lie beyond the basic elements within the 100 courses but which,
nonetheless, are extremely useful within the management area. Although not every
manager may use such mathematics, the management field has abundant areas of
application for such higher mathematical knowledge. Rest assured that you will find a
use for what you learn within this course!
A few of the subjects within this course’s syllabus may seem a little abstract when
summarised within a set of notes. However, it is hoped that you will appreciate the full
range of applications by reading texts. For example, differential equations are vitally
useful in virtually all dynamically changing relationships (e.g. between sales and
advertising, between prices and inflation, etc.).
At this final stage of the course it is recommended that you try to create your own
application area – imagine, if you like, that you are the Examiner of the subject and are
required to set the mathematics within a realistic storyline. You should find a seemingly
endless variety of possible questions!
Go through each chapter in turn, revising the mathematics and then finding/deriving
possible applications. Past examination papers from the VLE will give you many
examples and application areas to think about.
Finally, obtain copies of management science and financial mathematics type journals
and try to read articles that appear interesting to you. You should now find that there
is only an occasional piece of mathematics within published papers that you cannot
handle and understand.
12
229
12. Summary
12
230
A
Appendix A
Sample examination paper
Important note: This Sample examination paper reflects the examination and
assessment arrangements for this course in the academic year 2014–2015. The format
and structure of the examination may have changed since the publication of this subject
guide. You can find the most recent examination papers on the VLE where all changes
to the format of the examination are posted.
231
A. Sample examination paper
A
1. In today’s energy conscious world there is a need to make certain homes are energy
efficient. The local government of a town has performed a survey of 1000 homes in
the town to establish whether they are need more roof insulation (R), cavity wall
insulation (C) or improved windows (W ). The survey shows the following facts:
Some homes are totally efficient but 100 are inefficient in all three aspects.
540 homes suffer from R, 310 from C and 250 from W .
Only 15 homes suffer from W alone.
180 homes belong to the set R ∩ C ∩ W c .
(a) Draw a Venn diagram to depict the above information and, where necessary,
show the order of subsets as a function of the number of homes, x, which suffer
from W and C but not R.
(5 marks)
(b) Determine the maximum and minimum value of x.
(3 marks)
(c) Remedying R, C and W individually costs £500, £400 and £800 per home
respectively. Whenever remedial actions of both W and C occur at the same
house there is a net saving of £50 because of material transportation savings.
If the local government decides to make all the 1000 homes efficient with
respect to roof insulations, cavity wall insulation and improved windows what
will be the maximum total cost?
(3 marks)
2. The table below shows the average cost of commuting into London over the period
2008 to 2012 and a salary (earnings) index (Base 1995 = 100):
(a) Using the salary index as a deflationary measure, calculate a deflated index
series of commuting costs (Base 2008 = 100).
(6 marks)
(b) What is the highest annual percentage increase in deflated commuting costs
and when did that occur?
(3 marks)
X Y Z
X 0.75 0 0.1
A= Y 0.2 0.8 0.12
Z 0.5 0.2 0.5
232
A
(a) Use matrix methods to determine the total output for the three products when
the final (non-industry) demands for X, Y and Z are 100, 300 and 100
respectively.
(10 marks)
(b) Draw a three-product input-output network diagram to depict the above
information.
(3 marks)
d2 y dy
2
+ 4 + 13y = 104
dx dx
Immediately after the scandal breaks (i.e. when x = 0) Alvin expects the
currency ratio to take the value y = 11, and to be decreasing at the rate 6 per
day, and it is at this point in time that he will use his Dollars to buy Groats.
Determine the complete solution of the differential equation, and deduce that
after a few days he can expect to make a Dollar gross profit of more than 37%
if he converts his Groats into Dollars.
(8 marks)
(c) Consider the following second order difference equation:
233
A. Sample examination paper
A
(b) ReadiChem Limited sells rare minerals and gems to jewellery manufacturing
companies. The table below shows the total number of kilos of topaz and beryl
that were sold in the twelve months of 2012.
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Topaz 5.2 6.0 4.8 5.5 6.4 5.3 6.0 6.8 5.5 6.1 6.8 5.6
Beryl 11.9 9.5 10.1 11.9 12.9 10.4 11.2 13.0 14.0 11.8 12.5 14.3
Their Business Manager, Mr Wally Where, is responsible for analysing these
data and for producing a forecast of the kilos of topaz sold in January 2013.
He only understands moving averages, not exponential smoothing nor any
other forecasting methodology, and has asked you to advise him. He suspects
there is a seasonality of two, three or four months in the data values. Working
first on the topaz data, and then on the beryl data, examine in each case the
month-on-month increase or decrease in kilos sold. In the usual notation, and
using only these net changes, advise Wally which of MA2, MA3 and MA4 is
most appropriate for:
i. topaz and
ii. beryl, and hence
iii. make a forecast of the number of kilos of topaz that will be sold in
January 2013.
iv. Estimate the current average month on month trend for sales of topaz.
(14 marks)
7. Illegal immigration into the UK is a big problem, and the Government are creating
two new reception centres to house individuals who are caught, in order to
investigate them before either returning them to their country of origin or granting
them legal stay in the UK. Each reception centre will have numerous staff, led by
234
A
three Senior Immigration Officers (SIOs). The list of applicants for these positions
has been narrowed down to just nine, named A to I inclusive for convenience, of
whom three will have to be discarded and the two teams of SIOs will be staffed by
the remaining six.
It is important that the three eventual SIOs at either base should have as much
breadth and depth of experience between them as possible to ensure that most, if
not all, difficulties and situations can be recognised and tackled. A similarity
matrix has been constructed showing, on a scale from 0 to 100, how each individual
agrees or not with each of the other eight applicants from the analysis of the
responses to 100 ‘either/or’ questions relating to experience. Thus, for example, A
and B had the same responses to 26 questions. There is one complication, however:
candidates B and H must not be assigned to the same team because there is a very
serious personal disagreement between them.
235
A. Sample examination paper
A
236
Appendix B B
Sample examination paper –
Examiners’ commentary
(a) This happens to be a 3 set Venn – perhaps the most common but not the only
number of sets candidates might be called upon to handle. Note especially that
540 homes suffering from R says nothing about C and W . Similarly, treatment
is required for the orders of 310 and 250 given in the question. Starting from
the middle of the diagram outwards – a method that seems to work most of
the time – you should get the following Venn:
Remember to label the sets R, C and W and to give the order of all subsets
(as a function of x if necessary). Note where x belongs and also that the subset
(R ∪ C ∪ W )c does not have to be null (empty) since there is nothing in the
question that implies all 1000 homes surveyed require at least one energy
efficiency improvement.
(b) Setting the order of each subset to be non-negative and summarising these
constraints we find that 0 ≤ x ≤ 30.
(c) Simply a case of summing up the costs associated with each type of
improvement – each obtained by multiplying the cost per house times the
237
B. Sample examination paper – Examiners’ commentary
number of houses so improved. One must then remember to subtract the net
transportation savings.
B Total cost = 540(£500) + 310(£400) + 250(£800) − (100 + x)(£50)
= £270,000 + £124,000 + £200,000 − £5,000 − £50x
= £(589,000 − 50x)
238
demands we obtain the necessary production amounts. So production required
is (I − A)−1 · d, i.e.
15.2 4 4
100
3120
B
32 15 10 · 300 = 8700 .
28 10 10 100 6800
You can use any method you wish for inverting the matrix – the Examiners
have a personal preference for row operation methods.
Make certain with input-output questions that the matrix is ‘the right way
round’ (i.e. not transposed).
Given sufficient time (and inclination) it is often worthwhile checking whether
your answer for production would indeed lead to the necessary net output
when some of the production of each ‘commodity’ is required to produce other
commodities.
(b) This is just like Figure 6.1 in the subject guide. Remember to include the
actual flow along each arc and the direction of flow too.
239
B. Sample examination paper – Examiners’ commentary
[Note one cannot assume that when x = 1, y = 5 since the differential (rate)
only holds at a specific point x = 0 and is continually changing as x changes
continuously.]
When x tends to infinity, y tends to 8 (Groats per Dollar), so for every Dollar
initially he can buy 11 Groats, and when he sells the 11 Groats this will buy
him 1.375 Dollars, giving him a gross profit of 37.5%: i.e. more than 37%.
(c) Again very standard – this time for a second order difference equation.
The auxiliary equation is m2 − 9 = 0 and hence m = ±3 and so the
complementary function is
Yk = A(3k ) + B(−3)k .
For the particular solution (PS) the given right-hand side is a multiple of 3k ,
so the form of the PS will have to be
Yk = Ck3k .
Then Yk+2 = C(k + 2)9 · 3k , and
Yk+2 − 9Yk = 3k [9C(k + 2) − 9Ck] = 18C · 3k , = 90 · 3k .
So C = 5 and the general solution is
Yk = A(3k ) + B(−3)k + 5k(3k ).
Using the given conditions:
When k = 1, Y1 = 30, so 30 = 3A − 3B + 15, so that A − B = 5.
When k = 2, Y2 = 162, so 162 = 9A + 9B + 135, so that A + B = 3.
Hence A = 4 and B = −1 and the complete solution is Yk = (4 + 5k)3k − (−2)k .
Surprisingly there is no request for a graph or a discussion/description in this
difference equation/differential equation question. This is rather unusual but
the question is already a long one under the present format of the examination
paper.
240
5. Reading for this question
Overall this is a straightforward forecasting question and the moving average
approach is probably the simplest technique, which is covered in Chapter 9 of the B
subject guide. The Sample examination question 2 is probably the closest example
to follow.
Approaching the question
A straightforward forecasting question but with several ‘traps’ and sources of
uncertainty because alternative methods are possible. Be bold – make your
assumptions clear and continue with your answer!
An outline answer is as follows:
(a) First, the best value of α, (α∗ , say) will be the one that, for example,
minimises the RMSE (although minimising the Mean Absolute Error might
well give a slightly different value). From the table the RMSE values are
decreasing, so it may indeed be the case that the RMSE values will continue to
decrease, although ‘perhaps 0.45 or 0.5’ has been plucked from the air – there
is nothing to suggest that α∗ cannot be larger than 0.5.
Also, because the five values in the table are not only decreasing but also
decreasing at a decreasing rate, and the last two values are quite close, it is
possible that the minimum occurs between 0.35 and 0.4, i.e. we have passed
the point of minimum RMS somewhere between α = 0.35 and 0.4.
A simple linear interpretation – although it strictly is not a linear relationship
– suggests a value closer to 0.4, such as 0.38 or 0.39. Secondly the bold claim
that ‘the larger the value of α is so the more accurate will be the forecast’ is
nonsense – but quite often quoted! If this statement were indeed true then we
would always set α∗ = 1.0!
Another approach to pick up a mark is to point out that RMS is not the only
measure of accuracy.
Other measures (e.g. Mean Absolute Deviation) may give a different message.
(b) The extended table with the calculated month-on-month changes is:
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Topaz 5.2 6.0 4.8 5.5 6.4 5.3 6.0 6.8 5.5 6.1 6.8 5.6
Change 0.8 −1.2 0.7 0.9 −1.1 0.7 0.8 −1.3 0.6 0.7 −1.2 —
Beryl 11.9 9.5 10.1 11.9 12.9 10.4 11.2 13.0 14.0 11.8 12.5 14.3
Change −2.4 0.6 1.8 1.0 −2.5 0.8 1.8 1.0 −2.2 0.7 1.8 —
i. Consider topaz:
In January, April, July and October the increase to the next month is 0.8,
0.9, 0.8 and 0.7 respectively; and in February, May, August and November
the decrease to the next month is 1.2, 1.1, 1.3 and 1.2.
In March, June and September the increase to the next month is 0.7, 0.7
and 0.6.
241
B. Sample examination paper – Examiners’ commentary
(a) Candidates should briefly explain how the factors work and why they are
created.
Looking at the factor loadings, all the variables have positive (and reasonably
high) loadings on the first factor. This might suggest that this factor reflects
the overall ‘package’ or market appeal of the car (perhaps an indication of the
car’s cost?). This might be labelled a ‘general appeal’ factor. For the second
factor, half the loadings are positive and half of them are negative. It would
appear, within this factor, that cars that do well (above average) in the visual,
non-mechanical type of tests (Comfort and Interior Finish, Dashboard Style,
Colour Range) do poorly (below average) in the ‘mechanical/engineering/non
visual’ type tests (Service Interval, Engine Performance, Reliability). Perhaps
this factor can be classified as a ‘visual-non visual’ factor.
The ith communality is the portion of the variance in the ith variable
contributed by the m common factors.
242
(b) Factor rotation is sometimes appropriate to give a clearer interpretation of the
factors. In this question (as seen above) it has not been too difficult to come
up with suitable interpretations on the first two factors. This is partly because
there are only six variables being used. With more variables the interpretation
B
of the (initial) factors can become quite difficult (indeed, it can become the
hardest part of the whole factor analysis procedure). Although not obviously
necessary for this question, one can perform factor rotation to achieve more
easily interpretable results.
The crucial fact is that once the number of factors has been chosen, the actual
definition of the factors is not unique and the factors can be rotated without
any loss of explanatory power. There are several criteria for rotation which
each take into account that, for ease of interpretation of the factors, it is easier
if variables have either large weights (when their presence helps name/label
the factor) or small weights (when they can be ignored).
Three common factor rotation methods are:
• Varimax which minimises the number of variables with a high weighting
on each factor. It rotates the axes until its objective function (the variance
of the squared loadings on the rotated factors) is maximised.
• Quartimax which minimises the number of factors needed to explain a
variable. The resulting factors often include a ‘general’ factor with most
variables represented.
• Equamax which is a combination of the above.
Then link G to [E,F] via the greatest remaining entry, 83. We thus have our
cluster E, F and G to eliminate.
(b) The remaining similarity matrix, with E, F and G removed, will be:
243
B. Sample examination paper – Examiners’ commentary
Note that we do not add B to [C,H] – recognition of this alone gets credit.
To continue we have some alternative approaches, e.g.
• either the lowest similarity with [C,H] other than B is D is 35, so the first
team is [C,D,H] and the second team is the remaining three: [A,B,I]
• or the lowest entry is 21, so link D and A. So far the 1st team is [C,H] and
the 2nd is [A,D]. Now B must be linked with [A,D] so the two teams are
[A,B,D] and [C,H,I].
(c) Make certain your dendrogram – you might reasonably find it easier to
produce separate ones for parts (a) and (b) – has (i) a clear title; (ii) good axis
names and scales; and (iii) avoids crossing lines. Bearing in mind the great
variety of answers possible for parts (a) and (b) the dendogram would be
awarded full marks so long as it is consistent with the candidate’s clustering
process and adhered to the good practice stated above.
(d) There is one mark for each of any two sensible comments. For example:
• In part (a), pairing G with [E,F] as rejects means that the least correlated
pair, G and D, cannot be in one of the two teams.
• In part (b) the process means that the first team of three will have
relatively lower commonality than the second, so team 2 may be more
widely experienced, especially if the ‘either’ approach is used. However,
this is based on single ‘lowest’ values, so perhaps average linkage might be
more helpful here. Alternatively, since B and H are both in the final six,
start with, say, B, and find the two to join B, provided this does not
include H.
244
Approaching the question
(a) If the residuals are not independent, then the relationship between successive
residuals can be modelled. B
Yt = α + β1 X1 + et
et = ρet−1 + ut ,
The values of d are tabulated in most statistical tables; typically two values
are given, dU and dL . Their use is demonstrated in the diagram below which
depicts testing for auto-correlated errors using the Durbin Watson statistic.
(b) The cause of auto-correlated residuals may be one or more of the following:
• Common trend or cycle in the variables
• Omission of important explanatory variable
• Mis-specification of the form of the equation
• Use of smoothed or adjusted data.
The effects are that the coefficient estimates are unbiased but the variance of
the residuals is underestimated and standard error of the coefficients is
underestimated.
245
B. Sample examination paper – Examiners’ commentary
246
Comment form
We welcome any comments you may have on the materials which are sent to you as part of your study
pack. Such feedback from students helps us in our effort to improve the materials produced for the
International Programmes.
If you have any comments about this guide, either general or specific (including corrections,
non-availability of Essential readings, etc.), please take the time to complete and return this form.
Name
Address
Email
Student number
For which qualification are you studying?
Comments
Date: