Block 1
Block 1
AN OVERVIEW
Structure Page No
1.1 Introduction
Objectives
1.2 Model Classifications
1.3 Mathematical Modelling - What &id Why?
1.4 Classifying Mathematical Models
1.5 Limitations of a Mathematical Model
1.6 Summary
1.7 SolutionslAnswers
1. INTRODUCTION
Mathematics is a very effective tool in solving real world problems. The
critical step in the use of mathematics for solving real world problems is the
building of a suitable mathematical model. A mathematical model is a
conversion of a real world problem into an abstract mathematical problem
involving mathematical concepts such as constants, variables, functions,
equations, inequalities, etc. The process by which a real world problem is
represented and inteipreted in terms of a mathematical model is called
mathematical modelling. Our main aim in this unit is to introduce you to the
I basic concepts of mathematical modelling and discuss the process of
development of a mathematical model.
Objectives
Some models are replicas of the physical properties (relative shape, form, and
weight) of the object they represent. Some are physical models but do not have
the same physical appearance as the object of their representation. Other type
of models deal with symbols, expressions, mathematical equations and
inequalities. Each of these models can be classified into four main categories:
physical models, schematic models, verbal models and mathematical models.
Physical Models
Physical models are prototypes models that look like the objects they represent.
They are more or less the exact replicas of the object being modelled. Scale
models of Taj Mahal, airplanes, buses, ships, ofice complexes, shopping
centres, homes, etc. look exactly like their counterparts but in much smaller
scale. The advantage in having a scaled model is that one can tell exactly what
the object under study looks like, in three dimensions, before making a major
investment. Also, some of these models can even perform as their counterparts
would and this allows you to conduct the study on the model to see how it
might perform under actual operating conditions. Scaled models of airplanes
can be tested in wind tunnels to determine aerodynamic properties and the
effects of air turbulence on their outer surfaces. Models of bridges and dams
can be subjected to multiple levels-of stress from wind, heat, cold and other
factors to test their effects as endurance and safety. Scaled models that behave
in a manner similar to the real objects are less expensive to create and test than
their actual counterparts.
Schematic Models
Schematic models are more abstract than physical models. They do not look
like the physical reality they represent. Graphs and charts are schematic
niodels that provide pictorial representations of mathematical relationships.
Mathematical linear relationship between two variables may be indicated by
plotting a line on a graph. Pie charts, bar charts and histograms can all model
some real situations but /donot bear any physical resemblance to them.
Diagrams, drawings, blueprints and a flow chart describing a computer
program are all exampllzs of schematic models.
Verbal Models
Verbal models use words to represent some object, situation or problem that
exists, or could exist, i.n reality. This could be a simple word presentation of
scenery described in a book to a complex business problem (described in words
and numbers). Verbal. models provide all relevant and necessary information
to solve the problem, make recommendations and suggest alternatives. The
case studies which you must have studied from management text books are
examples of verbal r~nodelsthat expose you to the workings of a business
without having to visit the firm's actual premises. Often these verbal models Mathematical Modelling
provide enough information to be converted into mathematical models.
- An Overview
Mathematical Models
Mathematical models are the most abstract of the four classifications. These
models do not look like their real-life counterparts at all. Mathematical models
are built using numbers, symbols, variables, empirical laws related together by
means of equations or equalities. Mathematical models can take many forms
like statistical models, optimization models, algebraicldifferential equations or
game theoretic models, etc. In this unit and units to follow, we shall
concentrate on mathematical models. We shall be discussing in detail the
process and need of mathematical modelling, types of mathematical models
and we shall formulate some models with the contexts taken fkom biology,
physics, economics, finance, medicine, etc. Before we discuss basic concepts
of mathematical modelling in this unit, you may try the following exercise.
El) Give two examples each of the physical, schematic, verbal and
mathematical models.
The third step in modelling is to replace the real quantities and processes by
mathematical symbols, a set of variables and a set of equations/inequalitiesthat
establish relationships between these variables. The values of the variables can
be practically anything; real or integer numbers, Boolean values or strings, for
example. The variables represent some properties of the system, for example,
we may measure system outputs often in the form of signals, timing data, event
occurrence (yedno). The actual model is the set of functions that describe the
relations between the different variables.
After the problem is formulated, the fourth step is the study of the resuiting
mathematical system using appropriate mathematical tools and techniques.
This may involve a calculation, solving an equation, proving a theorem, etc.
The motivation is to produce new information about the problem being studied.
It is likely that new information can be obtained by using well-known
mathematical concepts and techniques. If not, we may need to develop new
techniques or adopt tested methods fiom other disciplines.
Formulation
I ( Mathematical Analysis
I
a Evaluation
Stop
Fig. 1
After a model is evaluated and found gocd, there are several uses of the model:
i) it helps in our better understanding of 'he real physical system, ii) it can
serve as a tool in the prediction of the fbture state of the system which is
currently unknown, iii) ir. can help in doing trial experiments by changing the
parameter values or in perturbing the system to produce a desirable condition,
i.e., many unknown parameter values can be estimated.
Let cs now look at some simple mathematical models. We shall not go into the
details of their formulations here since you must be already familiar with them.
y and p represent constant specific birth rate of the prey and mortality rate of
the predator respectively.
Introduction to The predator's specific growth rate a y depends on availability of the prey as
Mathematical Modelling
food source, whereas, the prey death rate 6x depends on the number of
predators. These equations hdve oscillatory solutions but have the
unsatisfactory feature of forming a conservative system, and oscillations of any
magnitude are possible. More realistic versions of this model remove this
degeneracy. This model is discussed in detail in Unit 9, Block-3 of our
undergraduate course MTE- 14 on 'mathematical modelling'.
***
Example 2: Model of a Particle in a Box
In physics, the particle in a box (also known as the infinite potential we8 or
the infinite square well) is a problem consisting of a single particle inside an
infinitely deep potential well, h m which it cannot escape, and which loses no
energy when it collides with the walls of the box. In one dimensional case, the
problem can be described as a single point particle enclosed in a box inside
which it experiences no force i.e., it is at zero potential energy. At the walls of
the box, the potential rises to infinity, forming an impenetrable wall.
In classical mechanics, the problem can be modelled using Newton's laws of
motion and the solution to the problem is trivial: The particle moves in a
straight line, always at the same speed, until it reflects h m a wall. A
quantum-mechanical solution of the problem becomes very interesting and
reveals some decidedly quantum behaviour of the particle that agrees with
observations but contrasts sharply with the predictions of classical mechanics.
According to quantum theory, particle has no definite position or velocity.
Rather, a probabilistic interpretation is given to the state of the particle in terms
of a time-independent wave h c t i o n yl(x) . The square of the wave function
( yl 12.,is a probability density.
E2) Give at least two examples of the formulas you are already familiar with
as the mathematical models of the real situations.
Example 4: What is the rocket thrust level and duration necessary to ensure
that the rocket reaches the desired altitude?
Example 5: What is the optimum number of check out counters that the
supermarket should have in order to maximize the expected profit?
E3) Give two situations from the field of management where mathematical
treatment of the problem is necessary to get the required solution.
If, on the other hand, D is not a constant but depends on C ,say, for example, if
D = Do expb (C - C, )]where Do, P, C, are constants then Eqn. (1) reduces to
Eqn. (3) is a non-linear PDE and hence the model obtained is a non-linear one.
You may notice that if the model tries to replicate more closely the real
situation (i.e., D is a function of the concentration in this case), the
corresponding mathematics involved is more challenging. (solving a ,
non-linear PDE,in this case).
A static model does not account for the element of time and hence the
variables and relationships describing the system are time-independent.
Consider, for instance, the transportation problem generally associated with
industries:
.
Suppose there are m origins Oi, i = 1, 2, .., m in an industry, where various
amount of a commodity are produced or stored for transportation to n
destinations D j, j = 1, 2, ..., n . It is then required to transport all units of the
commodity fiom all Oi ,exhaustively, to all D j , exactly satisfying all their
Introduction to requirements in such a way that the total transportation cost becomes
Mathematical Modelling minimum. The following assumptions are made:
i) The ith origin can supply exactly ai unit o'f the commodity.
j=l
n
j=l
Cij x i j
n
subject to x i j = a i , i = l , 2, ..., m
j=1
In contrast to the static system, in dynamic system, time plays a very important
role. The relationship between the variables describing the system changes
with time. If you look at the problem of rocket launch then it can be described
A system is in terms of a closed system consisting of two objects-the rocket and the earth.
openfclosed if the The variables describing the rocket are its position and velocity relative to
objects in the system some fixed point on the earth, and the interaction between the two objects is
dotdo not interact
with objects of the
given by the theory of dynamics. In this description we may as well include
super-system which the influence of other planets in the solar system by treating the planets as
do not belong to the belonging to the environment of the system and the system being open. In
system. either case, the variables- i.e. position and velocity of the rocket-change
continuously with time. Hence the system is a dynamic system.
(9)
where x(t) (> 0) is the size of the population at time t , x,, is the size of the
population at the initial time and r represents the net growth rate. For details
of this model refer to Unit 8, Block-3, MTE-14.
Table-1
E4) State the types of modelling you will use for the following problems.
Also give reasons in support of your answer.
a) Human motion (e.g. walking, lifting, jumping) involves dynamic
action at one or more joints in the body. If the joints h c t i o n
normally they cause no discomfort otherwise they cause severe pain
when in motion. One way of reducing the pain and discomfort is to
replace the defective joint with an artificial one. For the artificial
joint to be effective it must be capable of executing all the motions
of a normal joint. The problem is to build a model to describe the
hctioning of a joint, say, hip joint, so that it is usefbl to
bioengineers when designing hip replacements.
b) Research and development (R&D) is an important activity in any
modem industrial organization. The R&D manager is faced with
the difficult problem of allocating limited resources (money,
manpower, space, etc.) between a number of competing projects.
The problem is to build a model to help the R&D manager to make
right decisions.
c) In a chemical process, the output quality and yield depend critically
on the levels assigned to relevant factors (temperature, pressure,
concentration, etc.) It is important to optimally select these levels
as well as to monitor and control the system to ensure that they stay
at the desired levels. The problem is to build a model to help
achieve this.
d) The formation of sand dunes and their encroachment into
de-forested lands near deserts has become a serious problem in
many parts of the world. It is needed to predict the spread of desert,
as well as to devise policies to control the spread. The problem is to
build a model to describe the movement of sand dunes.
e) Advertising is a means by which a manufacturer can promote the
product and improve sales and hence, the revenue generated.
However, advertising costs money and is worthwhile only if the
cost of advertising is less than the increase in the sale revenue. The
problem is.to evaluate the effectiveness of different advertising
policies and the selection of the optimal policy. The problem is to
build a model to help solve this problem.
It is rare that we obtain an adequate mathematical model at the first attempt for
the problem under consideration. In general, an iterative procedure is needed
Introduction to where improvements are progressively made until an adequate mathematical
Mathematical Modelling model is obtained. The adequacy of a model is established by checking the
validity of the assumptions made in building the model and by the closeness of
the agreement between the behaviour of the model and the system under
consideration. For example, while developing a model to describe the motion
of a simple pendulum if the time interval of study is sufficiently small, so that
the energy loss due to frictional drag is very small, then the assumption that the
frictional drag is negligibIe is valid. However, if the time interval of study is
large then the assumption of negligible frictional drag is not valid, since the
effect of drag on the pendulum motion is cumulative.
In a rocket launch model, the assumption that the thrust generated by the rocket
is an impulse, is valid if the thrust lasts for a very small fraction of the total
flight time, something that is not known initially. Only after a first solution,
can this be checked and if the need be, the model has to be modified. This
emphasizes the iterative nature of modelling.
To end the unit we now give the summary of what we have covered in it.
1.6 SUMMARY
In this unit we have covered the following points:
E2) You may give examples of familiar mathematical models from your own
experience.
E3) A firm that assembles computers and computer equipment is to start the
production of two new types of computers. Each type will require
assembly time, inspection time and storage space. The amounts of each
of these resources that can be devoted to the production of the computers
is limited. The manager of the firm likes to determine the quantity of
each computer to be produced in order to maximize the profit generated
by their scale. For this kind of study a mathematical approach is
preferred.
2.1 INTRODUCTION
In Unit 1 we introduced you to the need of modelling a real life situation and
the role of mathematics in it. Various steps involved in modelling a real life
situation were discussed there. We classified the mathematical models into
various types and illustrated them through examples from population
dynamics, optimization problem, queueing system, etc. In this unit, we shall
proceed with the next step in modelling i.e., given a real world problem, how
do you convert it to model abstraction leading to a mathematical equation? We
shall draw your attention to the various modelling aspects which need to be
taken into account while formulating a mathematical model.
In addition to the above objectives, using the discussed model from the field of
finance, you should also be able to
i) explain the meaning and calculation of expected return and risk measures
for an individual security;
ii) explain the portfolio selection problem of investments;
Introduction to iv) explain what the efficient frontier is and how important it is to inve
Mathematical Modelling analysis.
ANATOMY I
I MILK f
Their conceptual views of the same system or object are rather different
since they are both heavily influenced by their own environment,
background and objectives. For example, a biologist would be interested
in the anatomy and physiology of a camel whereas, a business man's
concern would be the profit he can e m h m it. The same is true when
we come to the mathematical modelling of any system or process.
Although the motivation for building the model is usually to find the
means to answering a particular question, the form of that question
influences the way in which we build the mathematical model.
The search for essentials of the problem is related to the main purpose
of the model. For instance, the assumption of no removal of infectives in
general, is highly restrictive in the context of a human population.
However, this might be a reasonable approximation to the early stages of
some upper respiratory infections where a long time may elapse before an
infective is removed fi-om circulation. A mild cold infection in a
classroom may also be considered as an example of no removal in a
human population. On the other hand, the assumption of no removal is
fairly applicable to epidemics in insect and plant populations. In the case
of a common h g a l disease of flies, the diseased or dead fly remains
attached to a leaf or a blade of grass, creating a situation roughly
analogous to no removal. The dead or diseased plants in a forest are
rarely removed, and if dead plants are infectious, the assumption of no
removal is fulfilled. If you are interested in knowing more about
modelling the spread of infectious diseases, you may refer to Unit 10,
Block-3 of h4TE- 14.
El) State the type of modelling you will use for the following problems
giving reasons for your answers. List the essentials and non-essentials in
the problems.
i) The economic viability of an insurance company depends critically
on its ability to assess risks and decide on the premium charged to
cover risks. If the premium are low, then payouts can exceed
revenue collected and the company can go bankrupt. On the other
hand, if they are high, the number of customers will go down, thus
affecting profitability. The problem is to develop a model to help
the insurance company decided the premium it should charge for
different risks to ensue economic viability and maximise its profits.
ii) A company manufacturing soft drinks is thinking of expanding its
plant capacity so as to meet future demand. The monthly sales for
the past 5 years are available. The problem is to develop a model to
obtain good estimates for future demand so as to help the company
make the right decisions.
E2) Give examples of at least two real life situations where mathematical
treatment of the problem is the only approach to find the solution of the
problem. Why do you think that there is no other scientific alternative for
the treatment of these problems? List the essentials and non-essentials in
these problems.
E3) Give at least two examples each of physical situations in which the
variables involved are i) continuous and stochastic; ii) piece-wise
continuous and stochastic; iii) hzzy.
Securities
When you borrow some money or take a loan fkom a broker then you have to
leave some item of values as security with the broker or sign a piece of paper
promising repayment with interest. Failure to repay the loan (plus interest)
means that the broker can sell your item to recover the amount of the loan (plus
interest) and perhaps make a profit. The terms of agreement are recorded when
the deal is made. This piece of paper serving as evidence is called a security.
Similarly, you may have some spare money which you would like to lend to
earn some profit out of it. You then think of investing your money in
Government Bonds, saving certificates, shares, mutual funds, etc. These
investment strategies are called the securities. Broadly speaking, a security
helps us to save our h d s in the event of default (and that is why the name). It
may be a simple promissory note, share of the common stock, a bond, etc.
Return
Once the investment has been made, you are interested in knowing how good
was your investment strategy. For this, you need to calculate the rate of return
on the investment strategy. What is the rate of return? It gives a relation
between the initial input and final output of an investment and is calculated as
follow.
End of period value - Beginning of period value
. . (1)
Return =
Beginning of period value
I
i Risk
[
Risk denotes the probability of specific eventualities which may have both
I beneficial and adverse consequences. However, in general usage the
convention is to focus only on potential negative impact of the investment
strategy. Often, it is described as a situation which would lead to negative
consequences.
Uncertainty
i=l
wi =l,iscalledaportfolio
( : :)
For example, P = 0, -, - is a portfolio of three securities where no amount
1
is invested in the first security, -rd of the total funds are invested in the
3
2
skcond and -rd in the third security. By changing the value of wi in P ,
3
subjected to the condition that zn
i=l
w = 1, all the portfolios of given n
securities i.e., a feasible set can be obtained. Formally, we give the following
definition.
Definition: The set of all the possible portfolios which can be constructed from
a given set of securities is called the feasible set or the opportunity set.
, ,
Given a portfolio P = (w , w , ...,w ,) of n securities, our main purpose is
to see the,effect of portfolio values w on the terminal value of the return on
the portfolio P . But each w is a certain proportion of the initial funds that are
invested in ith security of the portfolio P . Thus, for quantifying the return and
risk of the portfolio P ,we have to calculate the return and risk of its
constituting n securities. It is therefore, important to select the proportions w
, ,
of the initial funds in such a way that the portfolio P = (w , w ,...,w ,) is
optimally good according to our investment objectives. Such a portfolio
which provides an investor the maximum level of satisfaction is called an
Optimal Portfolio and the problem of selecting such a portfolio is referred to
as portfolio selection problem. Thus, as a first step towards modelling we
state the formulated problem as follows:
Before we take up the problem you rnay try the following exercises.
- --
E4) At the end of year 2005, Mohm decided to invest Rs.30,000 in a portfolio
of stocks and bonds. Rs. 10,0'00were put into common stocks and
Rs.20,000 into corporate bonds. At the end of 2006, Mohan's stock and
bond holdings were worth R.s.13,000 and Rs.16,000, respectively.
During 2006, Rs.500 in cash dividends was received on the stocks and
Rs. 1000 interest payments was received on the bonds. What was the
percentage return on
Mohan's stock portfolio during 2006? Model Formulation
i)
ii) Mohan's bond portfolio during 2006?
iii) Mohan's total portfolio during 2006?
Let us now discuss the Markowitz's approach to solving the portfolio selection
Thus, as a second step in the model building process, we can say that the
Markowitz model is based on the following assumptions:
Investor invest money for a particular length of time, called the holding
Investors prefer higher returns to lower returns for a given level of risk.
The evaluation of portfolios is canied out in terms of returns and the risk
associated with the constituting securities, over a given holding period.
Introduction to The obvious question which must be occurring in your mind is - How to
Mathematical Modelling quantify the expected return and risk of a security or a portfolio? Let us try to
do that as a next step in the modelling process.
To calculate the expected rate of return, you have to first enumerate all the
possible rates of return that an investment could have. For simplicity's sake,
let's imagine an investment with four possible rates of return,
-lo%, - 5%, 10% and 20%.
The first two rates of return indicate a loss while the last two indicate a gain.
The next step is to assign probabilities to each rate of return. How do you
assign these probabilities? It requires to make some educated guesses based on
past performance of the investment itself, and the demonstrated performance of
similar investments. General market and economic factors should also be
aken into account, while assigning these probabilities. Let the probabilities
0.1, 0.1, 0.5 and 0.3 are assigned respectively to the above four rates of
E(R) =.x
n
i=l
(Possible Return x Probability) (2)
= [(-0.10) (0.10) + (-0.05) (0.10) + (0.10) (0.50) + (0.20) (0.30)]
= 0.095.
The expected return is 0.095. This means that the investor can expect a return
of 9.5% on her investment.
Thus, for any security, the expected return is the weighted average all
possible outcomes, where each outcome is weighted by its respective
probability of occurrence. It is calculated as
n
E(R) = C R j Pij
where
E(R) = the expected return on a security,
R = the jh possible return,
1
E6) Let the return distribution on a security A be given as follows:
Possible rate of retum -8 -6 9 10 12 8
(in percent) (R j)
Associated probabilities 0.04 0.06 0.2 0.3 0.25 0.15
(pi j>
i Find the expected return of the security A .
where
n2 = the variance of returns,
E(R) = the expected return on a security,
R j = the j" possible return,
I
Let us consider the following example.
Solution: The calculations for the standard deviation are as shown in Table 1.
Introduction to Table 1
Mathematical Modelling
Possible Associated R x pi,
[R,- E(R)]~ [R - E(R)]'
Return Probability
- pi1
0.01 0.2 0.002 0.0049 0.00098
0.07 0.2 0.014 0.0001 0.00002
0.08 0.3 0.024 0.0 0.0
0.10 0.1 0.010 0.0004 0.00004
0.15 0.2 0.030 0.0049 0.00098
Cpij=I C R j Pij *
C [ R j -E(R)I'P~~
j j
= 0.08 = 0.00202
= E(R) = o2
E7) Calculate the expected retum and risk of a security given the following
information
Portfolio Risk
Portfolio risk is measured by the variance (or standard deviation) of the
portfolio's return, exactly as in the case of each individual security. Although
the expected return of a portfolio is a weighted average of its expected returns,
portfolio risk is not a weighted average of the risk of the individual
securities in a portfolio. Mathematically,
Portfolio risk is a unique characteristic and not simply the sum of individual
security risk. This is because a security may have a large risk if it is held by
itself but risk may be small when held in a portfolio of securities. But then the
question is how a portfolio of assets can reduce risk and how the risk is
measured? Let us analyze the portfolio risk.
Thus, a, decreases as n increases i.e., the risk of the portfolio will be reduced
as more securities are added to the portfolio. You do not have to take decision
about which security to add, as all of them have identical properties. The only
concern is how many securities are to be added? However, in the real world,
the assumption of statistically independent returns on stocks is unrealistic.
Introduction to Most stocks are positively correlated with each other, that is, the movements in
Mathematical Modelung their returns are related. For example, a rise in interest rates will adversely
affect most of the firms, because most of the fixms borrow funds to finance part
of their operations. Therefore, there is a need for diversification i.e., putting
small fractions of the total funds in as many securities as are found suitable
after their evaluation is done.
Diversification
j=l
where
,rc = the covariance between securities A and B ,
RA = jUIpossible return on security A ,
,
E(R ) = the expected value of the return on security A , and
n = the number of likely outcomes for a security for the period.
Eqn. (8) gives the covariance as the expected value of the product of deviations
from the mean. Covariance can be positive, negative or zero.
Positive covariance indicates that the returns on the two securities tend to
move in the same direction at the same time. When one increases
(decreases), the other also do the same.
Negative covariance indicates that the returns on the two securities move
inversely. When one increases (decreases), the other tends to decrease
(increase).
Zero covariance indicates that the returns on two securities are
independent and do not move together in the same or opposite directions.
The covariance o, when divided by the standard deviations of securities A
and B gives
Model Formulation
p, arc the
Possible rates of returns for security Associate probabilities
associated
X Y Pxj " P y j probabilities of the
.19 .18 0.33 security X and
.17 .16 . 0.25 p Y j t sarc the
.I 1 .I 1 0.22 associated
.10 .9 0.20 probabilities of the
L
secwity Y.
lntrodnction to You can check that
Mathematie~lModelling
ox = (0.33x (0.0406)' + 0.25 x (0.0206)' + 0.22 x (-0.0394)' + 0.2(-0.0494)' )112
oy = (0.33 x (0.0384)' + 0.25 x (0.01 84)' + 0.22 x (-0.03 16)' + 0.2(-0.0516)' !I2
= (0.0013)"~= 0.036
then ox, can be calculated using Formula (1) as
om = 0.33 x 0.0406 x 0.0384 + 0.25 x 0.0206 x 0.0184
+ 0.22 x (-0.0394) x (-0.0316) + 0 . 2 0 ~(-0.0536) x (-0.0516)
= 0.0013.
Hence, by Formula (2), we have
~ X Y- 0.00 13
Pxy = oxay - 0.038 x 0.036 =0.95 cl.
E9) For the data given in E8), find the covariance 6,' and correlation
coefficient pI2 between two securities 1 and 2 .
Knowing the correlation and covariance that gives the comovement in security
returns, we are now ready to calculate portfolio risk. We start with the case of
two securities and then generalise it to n securities.
where
op = the standard deviation of the portfolio P ,,
o, = the standard deviation of security 1,
o2 = the standard deviation of security 2 ,
w, = the portfolio weight of security 1, and
,
w = the portfolio weight of security 2 .
p,, = the correlation coefficient between security 1 and 2 .
Portfolio risk is effected both by the correlation between assets and by the
percentage of funds invested in each asset. We shall now illustrate it through
an example.
The risk of the portfolio steadily decreases from 0.0447 to 0.0265 as the
correlation coefficient declines from + 1.0 to - 1.O. In the same way it can be
seen by holding the correlation coefficient constant that the size of the portfolio
weights assigned to each security has an effect on portfolio risk. You can
check it yourself in the above example by assigning different values to w, and
, , ,
w instead of assuming w = w = 0.5 .
The two-security case for calculating portfolio risk can be generalised to the
n-security case. Portfolio risk in the case of n securities can be calculated as
follows:
n n n
where
o$ = the variance of the return on the portfolio,
of = the variance of return for securityi,
o, = the covariance between the returns for securities i and j , and
wi = the portfolio weights or percentage of investable funds invested
in security i.
You may now try the following exercises.
You have learnt to evaluate portfolios on the basis of their expected returns and
risk as measured by the standard deviation. The evaluation of the risk of a
portfolio involves the evaluation of the following three parameters:
i) Standard deviation of each security.
ii) Covariance between pair of securities
iii) Proportion of funds invested in each security.
Once the risk-return opportunities available to an investor is determined it is
seen that a large number of possible portfolios exist when the percentage of the
investors wealth to be invested in each security is varied.
Doest the investor need to evaluate all these portfolios? The answer to this
question is no. The reason being that an investor needs to look at only a subset
of the available portfolios meeting the following two conditions.
1. Portfolios that offer maximum expected returns for varying level of risk.
2. Portfolios that offer maximum risk for varying levels of expected returns.
The set of portfolios meeting these two conditions is known as the efficient set
or efficient frontier. As a next step in the modelling process, we now try to
get new information about the problem being studied. From a large number of
possible portfolios, we try to locate the efficient frontier. The efficient set can
be located from the feasible set, also know as the opportunity set. We shall
now see how this is done.
EMcient Frontier
Let us consider Fig. 2 illustrating the location of the feasible set.
If you look at Fig. 2, there is no portfolio offering less risk than that of
portfolio A . This is because if a vertical line is drawn through A , there will
no point in the feasible set to the left of the line. Also there is no portfolio
offering more risk than that of portfolio C . Thus,the set of portfolios offering
maximum expected return for varying levels of risk is the set of portfolios lying
on the 'northern' boundary of the feasible set between points A and C .
Let us now consider the second condition. The portfolio offering maximum
point in the feasible set that lies above this line. similarly, there is no portfolio
offering a lower expected return then portfolio B . Thus, the set of portfolios
offering minimum risk for varying levels of expected return is the set of
portfolios lying on the 'western' boundary of the feasible set between points B
1 and D. Now since both the conditions are to be fblfilled while identifying the
eficient set, only the portfolios lying on the northwest boundary between
points A and D need to be considered. Accordingly, these portfolios form the
efficient set, and the investor will have to find his or her optimal portfolio fkom
this set of efficient portfolios. All other portfolios are inefficient and can be
safely ignored.
OP
Fig. 3: Indifference curves.
Each cruved line in Fig. 3 indicates one indifference curve for the investor and
represents all combinations of portfolios that the investor would find equally
desirable. For example, the investor with the indifference curves as shown i.1
Fig. 3, would find portfolios A and B equally desirable, since they lie on the
same indifference curve I,, even though they have different expected returns
and standard deviations. Portfolio B has higher standard deviation then
portfolio A so it is less desirable on that dimension but on the other hand, it is
preferred as it provides higher expected return. Thus, all portfolios that lie on a
given indifference curves are equally desirable to the investor. Also,
indifference curves cannot intersect, since they represent different levels of
desirability.
As an investor, we would always love to have portfolios with more return and
less risk, our indifference curves will always show some inclination towards
the return line. That is, we choose portfolios in such a way that the
corresponding curve heads towards the northwest direction. In other words,
farther an indifference curve is fiom the horizontal axis, the greater is the
utility. Let us now see how do we use indifference curves in conjunction with
the efficient frontier to find which feasible portfolio is optimal.
E12) How many portfolios are on an efficient frontier? What is the Markowitz
efficient set?
The single index model divides a security's return into two components a , and
a market rdated part PjRM. Given these values, the error term is the
difference between the return on jh security and the SUm of two components
of return, i.e.,
e j = R , - ( a j +QjR,)
For example, let us assume the return for the market index for period t is
lo%, a = 4y0 and p = 1.S . Then the estimate for stock j is
R~ =4%+1.5RM+e,
or, Rj=4%+(1.5)(10%)=19%
This shows that if the market index return is lo%, then tfie likely for
your stock is 19% , Further, if the actual retum On Jh stock for period is
16%,theerrortenmis 16%-19%=-3%.
~~t~ that R~ and ej are random variables and the P term, or bet% is
imponant as it m w u e s the msitivity of a stock to mmket movnncnts. To
use our model, we to h o w for each StoEk we consider3 the estimates of
m o u s l y available dab may be used to estimate future beta Or
huld also $ve some subjective estimate of beta.
Model Formulation
The single-index model uses two simplifjmg assumptions. The first
assumption used in the single-index model is that the random error term and
the market index are uncorrected, meaning that the outcome of the market
index has no bearing on the outcome of the random error term. The second
assumptions is that the random error terms of any two securities are
uncorrelated, meaning that the outcome of the random error term8f i"
security has no bearing on the outcome of the random error term of j" (i # j)
security. This can be expressed as cov(ei,ej) = 0. In other words, the
returns of the two securities will be correlated (i.e., move together) only in their
common response to the return on the market. This mean that stocks covary
together only because of their common relationship to the market index. If
either of these two assumptions are invalid i.e., they are not good description of
reality, then model will be an approximation and alternately, more than one
index model may be useful in such cases.
In the single-index model, all the covariance terms can be accounted for stocks
being related only through common reactions to the market index, that is,
covariance depends only on the market risk. Therefore, the covariance
between any two securities can be written as
of =sf[o,] + ocj
2 2
(16)
(Market risk) + (company specific risk)
:;[ )
var iancc =
p:;olio]
Portfolio
+ [residual
variance]
Portfolio
I
As in the case of Markowitz analysis, in order to detennine the composition of
the tangency portfolio, the investor needs to estimate all the expected returns,
1 variances and covariances. In the case of single-index model, this can be done
/ by estimating a j, Pi and o, for each of the n risky securities. Also needed
Introduction to are the values of R M and its variance (T: . One way to estimate the
Mathematical Modelling
parameters is with time series regression. With these estimates Eqns. (13), (16)
and (17) can be used to calculate the returns, variances and covariances for the
securities. Using these values, the curved efficient set of portfolios can be
derived, h m which the tangency portfolio can be determined.
Before we end this unit you may try the following exercises.
El 5) How is the covariance between any two securities calculated with the
single-index model?
El 6) How would you compare the Markowitz model with the Sharpe model?
We now end this unit by giving a summary of what we have covered in it.
2.4 SUMMARY
In this unit, we have covered the following points:
1. Once the essential characteristics of the real world problem are identified
its conversion into a mathematical description in terms of equations can
be done in different ways according to the objectives of the study as
illustrated in the case of motion of a simple pendulum.
2. Portfolio selection problem: How can an investor choose an optimal
portfolio, fiom a feasible set of risky securities i.e., choose a portfolio that
provides h i d e r the maximum level of satisfaction in t m s of return and
risk.
3. Markowitz portfolio theory provides the way to select an optimal
portfolio based on using the full information about securities.
4. An efficient portfolio has the highest expected return for a given level of
risk or the lowest level of risk for a given level of expected return.
5. The Markowitz analysis determines the efficient set of portfolios, all of
which are equally desirable. The efficient set in an expected
return-standard'deviation space is a curve which is upward concave.
6. The efficient frontier gives the investment possibilities that exist from a
given set of securities. Indifference curves express investor preferences.
Model FormuI.tion
I. The optimal portfolio for a risk-averse investor occurs at the point of
tangency between the investor's highest (northwest) indifference curve
and the efficient set of portfolios.
8. The Sharpe model relates retuns on each security to the returns on a
common index.
9. The Sharpe model provides an alternative expression for portfolio
variance, which is easier to calculate in comparison to the case of the
Markowitz analysis.
E2) State the problem giving reasons for why there is no other treatment of
the problem possible.
ES) P, = (0, -1
3 3
-1,
2
P = (0,
1
-,3
4 4
-1, P = ,, ,
4 2 4
) Similarly ~ n d
I others.
structure Page No
3.1 Introduction
Objectives
3.2 Data Visualization
3.3 Simple Linear Regression Models
F 3.4 Multiple Linear Regression Models
3.5 Summary
9
3 -6 Solutions/Answers
Most scientific disciplines are concerned with measuring items and collecting
data. One reason for this is the increasingly quantitative approach employed in
all the sciences, business and many other activities which directly affect our
lives. Volumes of data on various items like population, taxes, wealth, exports,
We shall start the unit by giving you in Sec. 3.2, the general idea of data
visualization i.e., visual representation of data. We shall discuss linear
regression modeIs with one predictor in Sec. 3.3 and with two or more
predictors in Sec. 3.4. Second order polynomial models in one variable are
also introduced in this section. You are also required to do computer practical
exercises on this unit which are given at the end of the unit.
Objectives
After studying this unit, you should be able to
use quantitative and graphical techniques for data visualization;
distinguish between simple and multiple linear regression models;
write down simple or multiple linear regression models appropriate for a
given set of data;
use the method of least squares for estimating the parameters of linear
regression model;
fit a line/curve/plane/surface,appropriate, for a given set of data.
The origin of this field are in the early days of computer graphics in the 1950s,
when the first grwhs and figures were generated by computers. With the rapid
increase of computing power, larger and more complex numerical models were
developed, resulting in the generation of huge numerical data sets. Also, large
data sets were generated by data acquisition devices such as medical scanners
and microscopes, and data was collected in large databases containing text,
numerical information and multimedia information. Advanced computer
graphics techniques were needed to process and visualize these massive data
sets. Once the data are converted to a visual form, the trend and patterns are
often immediately apparent. Fig. 1 shows an example of large data set that has
been converted to colour-coded display. It shows the Indian map with the
population classified and presented using different colours.
Data Analysis and
Fitting Models to Data
POPULATION MAP
4lABIM 51%
nA> 01 BEYGAL
NM C W *
Iwrtate
narc ltnm 9 cnn:
%YDAUA\ & IICOBAU I U A L U S
A new research area called Information Visualization was launched in the early
1990s to support analysis of abstract and heterogeneous data sets in many
application areas. Therefore, the phrase "Data Visualization" is gaining
acceptance to include both the scientific and information visualization fields.
Two main parts of data visualization mainly presumed are statistical graphics,
and thematic maps.
Statistical Graphics
Statistical graphics, also known as graphical techniques, are information
graphics in the field of statistics used to visualize quantitative data. Statistics
and data analysis procedures can broadly be split into two parts: quantitative
techniques and graphical techniques. Quantitative techniques are the set of
statistical procedures that yield numeric or tabular output. Examples of
quantitative techniques include hypothesis testing, analysis of variance, point
estimation, confidence intervals, and least squares regressicn. These and
similar techniques are all valuable and are mainstream in terms of classical
analysis.
36 35
I 35-
8
I:
30-
m
C1
8
a 25-
E
k 20-
0
15- 13 -
13
10-
5- 3
I I,
1 2 3 4 5 6 7 8 9 1 0
Marks of sGdents
Graphical procedures are not just tools used in an EDA context; such graphical
tools are the shortest path to gaining insight into a data set in terms of testing
assumptions, model selection and statistical model validation, estimator
selection, relationship identification, factor effect determination, and outlier
detection. In addition, good statistical graphics can provide a convincing
means of communicating the underlying message, that is present in the data, to
others. To sum up we can say that the graphical statistical methods have the
following four objectives:
Thematic Maps
A thematic map displays spatial pattern of a theme or series of attributes. In
contrast to reference maps which show many geographic features (forests,
roads, political boundaries), thematic maps ernphasise spatial variation of one
or a small number of geographic distributions. These distributions may be
physical phenomena such as climate or human characteristics such as
population density and health issues. These types of maps are sometimes
referred to as graphic essays that portray spatial variations and
interrelationships of geographical distributions. Location, of course, is also
important to provide a reference base of where selected phenomena are
occuning. While general reference maps show where something is in space,
thematic maps tell a story about thqt place. Fig. 3 gives a Pie chart showing the
50
Data Analysis and
proportion of people from different states of India living in a certain town of Fitting Models to Data
West Bengal.
El) Explain at least four statistical graphical techniques you are familiar with.
lllustrate the advantages of each of them through examples.
E2) Give at least two examples of the thematic maps with which you are
familiar. Also explain the purpose of using each of them.
Once the dzta related to any study is collected, the next step is to make use of
this data to draw meaningfbl conclusions, i.e., analyse the data about the
subject under study. As we have already mentioned, regression models which
are statistical models are usekl in almost all the areas biological, physical,
social sciences, business, engineering, etc. in both the planning stages of
research and analysis of the resulting data. Care should be devoted to accurate
data collection because the conclusions fiom the analysis depend on the data.
A good data collection will result in better analysis and more applicable model.
If the data used in a regression model are not representative of the system
studied, then conclusions drawn fiom the model are likely to be in error.
(a)
Fig. 4: Scatter plot of'the time and distance data.
In a perfect world, where speed and distance could be measured without error,
all observations would lie exactly on this straight line. However, in reality it is
impossible to keep the speed exactly constant and to measure the precise
distance. Therefore, in a scatter plot of 'real7 data, the points would deviate
fkom the theoretical straight line. A realistic scatter plot might look like Fig.
4(b). For this example, we need a model that will describe the linear
relationship between the two variables, and, at the same time, take the variation
away from the line into account.
Formulation
If we let y represents the distance covered and x represents time then the
equation of the straight line in Fig. 4(a), relating these two variables is
where bo is the intercept and b, is the slope. Now the data points in Fig. 4(b)
do not fall exactly on a straight line so Eqn. (1) should be modified to account
for this. Let the difference between the observed value of y and the straight
line (bo + blx) be an error e . That is, it is a device that accounts for the failure
of the model to fit the data exactly.
Eqn. (3) represents a multiple linear regression model, which we shall discuss
in the next section. A model is called linear because it is linear in the
parameters bo,b, , ...,b ,and not because y is a linear function of x's. You
will see that many models in which y is related to the x's in a nonlinear
fashion are treated as linear regression models as long as the equation is linear
in the b's . Once model (2) is formulated, we would like to use it to obtain
I information on the y's for specific x values. For that we have to estimate the
unknown parameters in the regression model by making the elrot minimum
and this process is called fitting the model to the data. There are several
parameter estimation techniques available. Here we shall be using the method
of least squares for parameter estimation.
The parameters bo and b, are unknown and must be estimated using sample
data. Suppose we have n pairs of data, say (y, , x, ), (y, , x ,), ...,(yn, xn ) .
Then the estimation of bo and bl js done as follows:
Estimation of b, and b,
Let us use the method of least s q w e s to estimate bo and b, . That is, we will
estimate bo and b, so that the sum of the squares of the difference between the
observations yi and the straight line is a minimum. From Eqn. (2), we may
'write
Eqn. (2) may be viewed as a population regression model while Eqn. (4) is a
sample regression model, written in terns of the n pairs of data
(yi, xi), i = 1, 2, ..., n . Thus, the least squares criterion is that the error
should be minimum.
The least squares estimators or the 'best' estimates of bo and b,, say 6, and
61, are t1.e values that minimize e and, therefore, must satisfy
Eqns. (8) and (9) are called the least squares normal equations. The solution
to the normal equations is obtained as
Using simple algebra you can write the denominator and numerator of
Eqn. (11) in a more compact notation as
Estimation of a2
a
Thus, SS, = S, - bi .,S
The quantity MS, is called the error mean square or the residual mean
square. The square root of 6 ' is called the standard error of regression, and
has the same units as the response variable y . Because e2 depends on the
residual sum of squares, any violation of the assumptions on the model errors
or any misspecifiCation of the model form may damage the usefilness of 62as
.
an estimate of oz We now illustrate the estimation of the model parameters
through examples.
Example 2 (Demand for homes): Find a linear demand equation that best fits
the following data, and use it to predict annual sales of homes priced at
Rs. 14.00.000.
-
Solution: Calculations are shown in Table 1. 55
Introduction to Table 1
Mathematical Modelling
L I I
28 20 560 784
sums Xx = 154 Xy = 528 Zxy = 10,728 x x 2 = 3500
Substituting these values in the formula given by Eqns. (10) and (1 1), we ge
( ) - ( 1 ( 1 7(10, 728) - (154) (528) -7.929
slope=b, = - -
n(Zx ) - (Zx) 7(3500) - 1 5 4 ~
intercept = b, =
- Zy-m(Zx)
-
528-(-7.929)(154)m249,9.
n 7
We can now use this equation to predict the annual sales of homes priced at
Rs.14,00,000. Remembering that x is the price in lakhs of rupees, we set
x = 14, and solve for y ,getting y = 139. Thus,our model predicts that
approximately 139 homes priced at Rs.14,00,000 will be sold.
***
Example 3: Consider the data shown in Table 2
Table 2
Use a best fit line to estimate the value of y for x = 6 and 8 . Also obtain the
estimate of the error variance of the best fit.
Solution: Calculations of the least squares line are given in Table 3.
Table 3
- 89
Therefore, b, = -- 2.225
Data Analysls and
Flttlng Models to Data
40
and 6 , = 6 - 2.225(4) = 6 - 8.9 = -2.9.
P
Table 4
Coefficient of Determination
3 that is, 99.34 percent of the variability in the data is accounted for by the
regression model.
Limitations
1. Regression models are intended as interpolation equations over the
range of the regressor variable(s) used to fit the model. They may not be
valid for extrapolation outside of this range. For example, see Fig. 5.
A
Y
/ j [ I /i L
I
I I I
I I I
I I I
I I I
XI xz
I
x, x
>
58 . Fig. 5: The risk of extrapolation la regression.
Suppose that data on y and x were called in the interval x I x 5 x2. Data Analysis and
Fitting Models to Data
Over this interval, the linear regression equation shown in Fig. 5 is a
good approximation of the true relationship. However, if this equation is
used to predict values of y for values of the regressor variable in the
region x 2 5 x Ix 3 , then model is useless over this range of x because of
equation error.
2. The position of the x-values play an important role in the least squares
fit. While determining the height of the line, all points have equal
weight. Whereas, the slope of the line is strongly influenced by the
remote values of x . For example, consider the data in Fig. 6. The slope
in the least squares fit depends heavily on either or both of the points A
and B . The remaining data would give a very different estimate of the
slop if A and B were deleted. Fig. 6: Two extreme
observations.
3. Regression techniques indicate a strong relationship between two
variables, this does not imply that the variables are related in any causal
sense. Our expectations of discovering c a q e and effect relationship
from regression should be modest. For example, if you look at the data
given in E6), you will see that the linear trend between x a d y does not
establish cause and effect between homework and test results.
Use a best fit line to estimate the additional amount of oil that can be
economically recovered.
E6) Students in a statistics class claimed that doing the homework had not
helped prepare them for the midterm exam. The exam score y and
homework score x for the 18 students in the class were as follows:
Fit a simple linear regression model to the data and interpret the resua.
Calculate R for the data.
- - --- - -
Fitting Exponential c o k e s
So far, we'have seen how to fit a straight line to a set of bivariate data i.e., data
giving relationship between two variables. A straight line is the simplest model
for a set of bivariate data, and is not appropriate if the scatter plot of the data
shows curvature. One class of models that accounts for curvature is the class
of exponential curves. Some examples of exponential curves are
Models (24)-(26) have two important features. They are monotone (either
increasing or decreasing), and they are easy to fit. Monotonicity is important
because often, in practice, a monotone relationship exists between two
variables. For example, if x is the level of traffic in a communications
network and y is the number of packets lost, it only makes sense for y to
increase as x increases. Models (24)-(26) are easy to fit using the technique
for fitting lines. This is because, when we transform them to a different scale,
they are actually lines. Which, if any, of these models is appropriate for a
particular set of data is best determined by drawing scatter plots of the
transformed data and see which transformation makes the data look the most
linear. Let us see how these transformations are done.
T@ Model y = boxb1 .
Taking logarithms on both sides of Eqn. (24) and then adding an error term, we
obtain
Data Analysis and
Fitting Models to Data
or, y*=b;+b,x*+e
You may notice that the model (27) is a line in the variables y* and x* . In
order to estimate the parameters, we can simply calculate ln (y) and ln (x) for
all the data points, and then use least squares to fit a straight line. We now
illustrate the method through an example.
Table 5
Use a best fit line to estimate the value of y when x = 3.36. Also obtain the
residual for the fitted line.
Solution: A graph of these data in Fig. 7 shows that a line is not the
appropriate model and suggests that model (24) might be reasonable.
Taking logarithms of all the data poinis we obtain the values as given in
Table 6.
Table 6
0 I I I I I >
.5 1 1.5 2.0 2.5 ln(y)
Fig. 8: A scatter plot of the log transformed data in Example 4.
-
x ( x * -x*)' = 4.138
- -
Z ( x * -x*) (y* -ye)=9.315.
This gives,
Residuals for this fitted line could be computed using Eqn. (14). For example,
Similarly, the transformations for the models (25) and (26) can also be
obtained.
For the Model y = b,ebl', taking logarithms on both sides of Eqn. (25) and Data Analysis and
adding an error terms, we obtain, Fitting Models to Data
ln(y)=lnb,+b,x+e
Of,
Thus, we can estimate the parameters of model (25) by calculating ln(y) for
each data point and then using least squares with h(y) as the dependent
variable and x as the independent variable.
For the model e y = b,xbl ,once again, taking logarithms, on both sides of
Eqn. (26) and adding an error term, we get
To have a better understanding of models (25) and (26), you can solve the
following exercises.
E7) Find a linear regression equation that best fit the data given in Table 7.
Table 7
E8) Find a linear regression equation that best fit the data given in Table 8.
Table 8
Just as we can fit a line or a curve to two-dimensional data, we can also fit a
plane or curved surface to three-dimensional data and a hyper-plane or
hyper-surface to four- or higher-dimensional data.
We shall now discuss models that are suitable to fit three and higher
dimensional data.
The pro
,
, ,f flitmg models for le
with an exam
of the sand
surfaces i wh?
t ~ o ninvolving M
fines. CC' US s f d
Introduction to FormulatSoo
Mathematical Modelling
Suppose we want to develop a model for estimating the effective life of a
cutting tool based on two variables namely, the cutting speed and the depth of
cut. A regression model that might describe this relationship is
which is the equation for a plane in three dimensions where y denotes the
effective tool life, xl denotes the cutting speed, and x denotes the depth of
cut. b,, b, , b, are the unknown parameters of the model and e is the random
error. Eqn. (30) is a multiple linear regression model with two predictor
variables. The term linear is used because Eqn. (30) is a linear fbction of the
,
unknown parameters b ,b, and b .
Suppose that n > k observations are available, and let yi denotes the iU
observed response and xij denote the iU observation or level of regressor x j .
We assume that the error term e has the mean zero and variance oZ,and that
the errors are uncorrelated.
Let us use the method of least squares for estimating the regression coefficients
in model (3 1).
and
Eqns. (38) are the least squares normal equations. To solve the normal
equations multiply both sides of Eqns. (38) by the inverse of x'x . Thus, the
lest squares estimator of b is
Introduction to provided (xlx)-' exists. The (x'x)-' matrix will always exist if the
Mathemtied Modelling
regressors are linearly independent. Using the value of 6 given in Eqn. (39),
the fitted value of y can be written as
= x (xlx)-' x'y
=Hy.
The n x n matrix H = x (x' x)-' x' is called the 'hat' matrix. The residual
ermr vector can be written as
e=y-9 (41)
Example 5: Find a linear regression equation that best fit the data given in
Table 9.
Table 9
Estimation of a2
= e'e .
Substituting e = y - x b in Eqn. (43), we get
SS, =(y - x i ) ' ( y - x 6)
=y1Y-2b'x'y +G1x'xb.
Since x' x 6 = x' y ,Eqn. (44) reduces to
1
Example 6: Estimate the error variance oZ for the multiple regression model
fit to the data in Example 5.
Solution: For the data in Example 5, we have
SS, = y ' y -&'my
Introduction to Just as in the case of simple linear regressions model adequacy can be
evaluated in the case of multiple regression models also.
Limitations
All the limitations mentioned in Sec. 3.3 apply to multiple linear regression
models also.
Vacuum 18 18 20 20 22 22 24 24 26 26
setting = x
Particle 4.0 4.2 5.6 6.1 6.5 6.8 5.4 5.6 3.3 3.6
size = y
Vacuum setting
Flg. 9: A Matter plot of the data in Example 7.
, ,
to the given data. Using x = x and x = x Z,we can transform the above
model to
Introduction to y = b , +b,x, +b,x, + e .
Mathematlcd Modelling
We now use the method used in Example 5 to estimate the regression
coefficients.
x'y = 1117.60
40):;24[:
Thus, to maximize the particle size in the product the best setting for the
vacuum is at 21.82.
***
You may now try the following exercises.
Data Aarlysls and
Fitting Models to Data
E9) The yearly fluctuations in the groundwater table is believed to be
dependent on the annual rainfall and the volume of water pumped out
from the basin. The data collected on these variables for a period of 10
years is given in Table 10.
Table 10
El 2) The sale price of a holiday cottage depends on the age and livable area of
the cottage. Find a linear regression model that best fit the data given in
Table 13. Also find the residual and the residual mean square for the
data.
Table 13
We now end this unit by giving a summary of what we have covered in it.
3.5 SUMMARY
In this unit, we have covered the following points.
4. Regression models are linear if they are linear in the parameters involved
in the model.
El) You can discuss scatter plot, histogram, bar diagram, box plot, block plot,
etc. You can illustrate them by considering a type of data for which each
of them is most suited. For example, in case of continuous data
histogram is preferred whereas, discrete data is commonly represented by
a bar diagram.
b) Do as in a) above.
E7) Draw a scatter plot of the data and see that a line is not an appropriate
model. Transform the data by taking log of y's and obtain
Check that the scatter plot of the above data looks linear.
The best fit linear regression line is
9, = 0.73 + 4.209 x
or, jl=e 0.73+4.209~= 2.075 e4.209~
The scatter plot of the above data indicates that linear model is
appropriate for the data. The best fit regression line is
9=-4.1+4.62 xl
=-4.1+4.62 hx.
El 1) Draw the scatter plot of the given data and check that the model to be
fitted is
y=b, +b,x+b,x 2
,
Put x, = x and x = x * in the above model and proceed exactly as in
Example 7 to obtain 9 = -6.6959 + 11.7703~- 0.635x2.
Data Anllydr and
Fifflng Modcb to Data
Introduction to
Mathematical Modelling PRACTICAL EXERCISES
Sessions 1 and 2
1. An electric utility is interested in developing a model relating peak hour
demand (y) to total energy usage during the month (x) . Data for 53
residential customers for the month of August, 2005 are shown in
Table 1.
Table 1
3. When gasoline is pumped into the tank of a car, vapors are vented into
the atmosphere. An experiment was conducted to determine whether y ,
the mount of vapour, can be predicted using the following four variables
based on initial conditions of the tank and the dispensed gasoline:
x, = tank temperature
x, = gasoline temperature
x, = vapour pressure in tank
,
x = vapour pressure of gasoline
The data are given in Table 3.
Table 3
77
Introduction to a) fit a linear regression model to the data using the least squares
Mathematical Modelling estimates.