IEM Outline Lecture Notes Autumn 2016
IEM Outline Lecture Notes Autumn 2016
INTRODUCTION TO UNIT
Collection and description of information, or data descriptive statistics. We will normally be dealing
2
with a subset of a larger collection or set of data. The
subset is called a sample, the larger set a population.
(ii) Using sample data to make inferences about a
population - statistical inference.
1.2 Why Study Statistics?
(i)
Probability
Distributions
3
Probability theory can be considered part of
descriptive statistics.
Here we will be concerned about making probability
statements about a given population.
Sampling Theory and Sampling Distributions (the basis
of inductive statistics):
Here we will be concerned with making probability
statements about characteristics of samples, given
assumptions about the population from which the
sample was drawn.
Point and Interval Estimation:
Point Estimation - Here we will be concerned about
producing a particular estimate (a number), based on
sample data, of a characteristic of a population.
Interval Estimation - Here we will not give an
estimate of a population characteristic, but rather a
range in which we are confident (to some degree) the
true value of the population characteristic is.
Hypothesis Testing:
Under this heading we will be looking at ways of
testing
hypotheses
about
characteristics
of
populations, based on sample data.
Regression Analysis:
In this case we will be concerned with estimating
linear relationships between different variables, i.e.
linear equations.
We will go on to examine statistical tests associated
with estimated regression equations.
DESCRIPTIVE STATISTICS
5
(ii) Population/Sample:
A statistical population is the set of measurements or
observations of a characteristic of interest for all
elementary units in a frame.
A population may comprise a finite or infinite number of
elements (observations), depending on the context.
A statistical sample is a subset of a population.
(iii) Parameters/Statistics:
For our purposes The numerical characteristics which describe a population
are called parameters of the population.
The numerical values calculated from sample data are
called sample statistics. These sample statistics can be
thought of as describing or characterizing the sample.
(iv) Qualitative and Quantitative Variables:
Populations may be quantitative or qualitative. Data from
quantitative populations is called quantitative or interval
data.
Data from qualitative populations is called
qualitative, nominal or categorical data.
Data from a quantitative population can be expressed
numerically in a meaningful way. The variable (or
characteristic) associated with a quantitative population is
called a quantitative variable.
Frequency
30
55
13
2
9
For the previous example we have
Class (i)
0 to under 2
2 to under 4
4 to under 6
6+ children
Cumulative.
Rel. Freq.
0.30
0.85
0.98
1.00
10
Frequ.
10
30
50
10
Rel. Freq.
0.1
0.3
0.5
0.1
Cum. Freq.
10
40
90
100
Frequency Histogram
Frequency
50
30
10
0.5 2.5 4.5 6.5 8.5
11
Cumulative Frequency Histogram
100
90
Cumulative
Frequency
40
10
0.5 2.5 4.5 6.5 8.5
Variable Value
12
A distribution is positively skewed (skewed to the right) if
it has the following shape.
A Distribution that is Skewed to the Right
Relative Frequency
Variable Value
Variable Value
13
2.5 Bivariate Frequency Distributions
Often it is of interest to classify observations of elementary
units according to two variables (characteristics). This
allows one to gauge the relationship between the two
variables.
Example:
Consider the final results of 50 students in a particular
subject. Each students final grade and gender are
recorded, allowing the derivation of the following
bivariate frequency distribution.
Gender
HD
Dist.
Male
Female
Column
Total
5
2
7
4
3
7
Grade
Credit Pass
10
11
21
6
2
8
Fail
2
5
7
Row
Total
27
23
50
14
Marginal frequencies, represented by the row and column
totals, each refer to one variable only.
We can express the information in a bivariate frequency
distribution as a relative frequency distribution by
dividing each entry in the distribution by the total number
of observations.
Example:
For the previous example, the bivariate relative frequency
distribution is given by
Gender
HD
Dist.
Male
Female
Col.
Total
0.10
0.04
0.14
0.08
0.06
0.14
Grade
Credit Pass
0.20
0.22
0.42
0.12
0.04
0.16
Fail
0.04
0.10
0.14
Row
Total
0.54
0.46
1.00
The row and column totals in the above table are called
the marginal relative frequencies.
15
3.
1,..., n
xi
x1
x2
........ x n
i 1
n
i 1
cai
(i)
i 1
ai
i 1
16
(ii)
(a i
c)
ai
i 1
i 1
(iii)
(a i
c)
i 1
2
i
2c
i 1
(iv)
nc
(a i
i 1
c)
nc 2
ai
nc 2
i 1
a
i 1
ai
2
i
2c
i 1
Example:
Consider the following four labelled numbers.
a1
1 , a2
3 , a3
2 , a4
(a i
i 1
1) 2 .
17
3.2 Measures of Central Tendency
For each measure considered there are population and
sample versions. We will suppose here there are N values
in the population and n values in a sample.
Note that at this stage we are only concerned with
quantitative variables, and we assume the population
contains a finite number of values.
Definition (Mean of a Finite Quantitative Population)
If x1 , x 2 , x 3 , ......., x N represents a finite population of N
quantitative data points, then the mean of this population
is given by
N
x1
Population mean
(
x2
xi
... x N
i 1
Sample mean
x1
x2
..... x n
n
xi
i 1
18
Definition (Mode of a Set of Data)
The mode is the data value that occurs most frequently in
a set of data (population or sample).
Definition (The Median of a Set of Data)
If quantitative data is arranged in ascending or
descending order, the middle value of data is called the
median. If there is an even number of data points, the
median is typically taken to be the arithmetic average of
the two middle values.
Example:
Consider the following set of data, which we can assume to
be a sample from a population.
1
3
5
10
1
1
1
2
n 24, x1 1, x 3
then down)
5
2
1
4
5, x11
4
7
5
2
12
6
8
6
4
6
9
30
19
Comparison of the Mean, Median and Mode
The mean takes account of all observation values therefore
it can be affected by extreme values or outliers, i.e. values
which differ greatly from the majority of values.
The median and mode are unaffected by extremely high or
low values.
The mode may not represent a central value in the
distribution, as in the above example, but it may be useful,
for example, for qualitative data.
If the frequency (or relative frequency) distribution is
perfectly symmetric and unimodal, the mean, median and
mode will coincide.
Symmetric Distribution
Relative Frequency
Variable Value
Mean
Median
Mode
20
Distribution that is Skewed to the Right
Relative Frequency
Variable Value
Mode Mean
Median
Variable Value
Mean Mode
Median
21
MAIN POINTS
A statistical population is a set of measurements or
characteristics of elementary units of interest.
Once a population is defined, a sample is a subset from
the population.
Parameters are numerical characteristics of a population.
Sample statistics are numerical characteristics of a
sample.
A frequency or relative frequency distribution describes
how data is distributed over different classes or categories.
A histogram shows graphically a frequency, relative
frequency or cumulative frequency distribution (the areas
of the contiguous rectangles are proportional to the
frequencies or relative frequencies).
The mean is affected by extreme values; the median and
the mode are not affected by extreme values.
The population mean is denoted
denoted x .
22
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 2
Required Reading:
Ref. File 1: Section 1.1
Ref. File 3: Sections 3.5(a)-(d), 3.5(f)
Ref. File 4: Introduction and Sections 4.1, 4.2
3.
23
Definition (Mean Absolute Deviation)
(i) If x 1 , x 2 , x 3 , ....., x N represents a finite quantitative
population, then the population mean absolute
deviation is given by
N
xi
i 1
Population MAD
xi
Sample MAD
i 1
24
Definition (Variance of a Finite Quantitative Population)
If x 1 , x 2 , x 3 , ....., x N represent a finite population of N
quantitative data points, then the variance of this
population is given by
N
(x i
2
)2
i 1
x) 2
(x i
Sample variance
s2
i 1
n 1
Population variance
s2
i 1
N
n
Sample variance
x 2i
x 2i
i 1
n 1
nx 2
25
(x i
)2
i 1
(x i
x) 2
i 1
n 1
26
An advantage of the standard deviation over the variance
is that it is expressed in the original units of measure.
Example:
Calculate s 2 and s for the previous 24 number example.
(36.3315, 6.0276)
s
x
27
Example:
Suppose we wish to compare the variability of the weights
of a given sample of people with the variability of their
daily calorie intake. We are told
sample mean of weights = 68kg
sample standard deviation of weights = 5kg
sample mean of daily calorie intake = 1200 calories
sample standard deviation of daily calorie intake = 300
calories
28
The Empirical Rule
For a bell-shaped distribution of sample or population
data, it will be approximately true that
68% of the data points will lie within 1 standard
deviation of the mean.
95% of the data points will lie within 2 standard
deviations of the mean.
99.7% of the data points will lie within 3 standard
deviations of the mean.
29
4.
30
Venn diagrams are often a convenient way of portraying
sets and the relationship between them. An example is the
following diagram.
(a b)
a, b disjoint/mutually exclusive
s
Example
Suppose we have the set s
Define
1,3,5,7,9 , b
1,2,3,4,5,6,7,8,9,10
1,3 , c
4,7
31
4.2 Terminology Related to Statistical Experiments
An experiment, in a statistical sense, is an act or process
that leads to an outcome which cannot be predicted with
certainty.
Definition (Simple Events and Events)
A simple event of an experiment is an outcome that cannot
be decomposed into simpler outcomes. An event is a
collection or set of one or more simple events. An event is
said to have occurred if a simple event included in the
event occurs.
Definition (Sample Space of a Statistical Experiment)
The sample space of an experiment, which will be denoted
s, is the set of all possible simple events. It can be
described as the event consisting of all simple events.
32
Definition (Discrete Sample Space)
A discrete sample space consists of either a finite number
of simple events or a countable and infinite number of
simple events.
Definition (Continuous Sample Space)
A continuous sample space consists of simple events that
represent all the points in an interval on the real number
line. The interval could be of finite or infinite width.
4.3 Basic Concepts of Probability
(a) Probabilities of Events as Relative Frequencies
Definition (Probability of an Event)
If f E is the frequency with which event E occurs in n
repetitions (trials) of an experiment under identical
conditions/rules, P (E) is defined as
P (E)
lim
n
fE
n
33
(b) Definition of a Probability Distribution
Definition (Probability Distribution)
A probability model or probability distribution for an
experiment takes the form of either a list of probabilities
of simple events or some other representation of the
relative frequency distribution of the underlying
population associated with the experiment.
(c) Axioms of Probability
Suppose an experiment has a sample space s. Any
assignment of probabilities to events in s (subsets of s)
must satisfy the following axioms:
1. For any event E in s, 0 P (E) 1
2. P(s ) 1
3. The probability of an event that is the union of a
collection of mutually exclusive events is given by
the sum of the probabilities of these mutually
exclusive events. (The additive property of
probability)
34
(d) Assigning Probabilities to Simple Events in Discrete
Sample Spaces
There are three broad
probabilities to events.
(i)
approaches
to
assigning
35
P(A
B.
B)
B
A
36
B)
s
A
37
The first approach above can
performance of the following steps:
be
formalised
as
38
Example:
Suppose
Example:
Suppose that for s
P (1)
P ( 4)
P ( 5)
with
P ( 2)
P (7)
0.36
A
{1,2,3,4,5,6,7,8} :
P ( 3)
P ( 8)
P ( 6)
0.08
{1,3,5,6} , B
0.1
{2,3,4,5,8}
39
MAIN POINTS
For a finite population,
N
)2
(x i
2
variance
i 1
For a sample,
n
(x i
variance
s2
x) 2
i 1
n 1
40
In statistical experiments:
Simple events cannot be decomposed into simpler
outcomes.
The sample space is the set of all simple events.
Events are a collections or sets of one or more simple
events.
An event occurs if any of its included simple events
occur.
All statistical experiments can be thought of as sampling
from a statistical population.
Probabilities must obey certain axioms.
41
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 3
Required Reading:
Ref. File 4: Sections 4.3, 4.4, 4.6
4.
42
Example:
Suppose we have the following data on all 1950 first year
students at a particular university.
Age in
Years
Under 25
25 - 34
35 or over
Column
Total
Work Status
Not
PartWorking
Time
1200
200
100
75
10
5
1310
280
FullTime
250
100
10
360
Row
Total
1650
275
25
1950
A ) , P (C
F) , P ( C ) , P ( C
E)
43
Joint Relative Frequency Distribution
(Bivariate Distribution)
Age in
Years
Under 25
25 - 34
35 or over
Column
Total
Work Status
Not
PartFullWorking
Time
Time
1200
200
250
1950
1950
1950
Row
Total
1650
1950
100
1950
75
1950
100
1950
275
1950
10
1950
1310
1950
5
1950
280
1950
10
1950
360
1950
25
1950
1.00
44
4.5 Useful Counting Techniques
(a) The Multiplicative Rule
Theorem (Multiplicative Rule of Counting)
Suppose two sets of elements, sets a and b, consist of n A
and n B distinct elements, respectively: n A and n B need not
be equal. Then it is possible to form n A n B distinct pairs
of elements consisting of one element from set a and one
element from set b, without regard to order within a pair.
Example:
If a take-away food store sells 10 different food items and
5 different types of drink, 5 10 50 distinct food/drink
pairs are possible.
The multiplicative rule can be extended naturally. Thus
n 1n 2 ...n k different sets of k elements are possible if one
selects an element from each of k groups consisting of
n 1 , n 2 ,..., n k distinct elements, respectively.
Example:
Suppose we select 5 people at random. What is the
probability that they were born on different days of the
week, assuming an individual has an equal probability of
being born on any of the seven days of the week?
(Approx. 0.1499)
45
A simple event here is an ordered sequence of 5 elements,
the first representing the day of the week the first person
was born on, the second the day the second person was
born on, and so forth.
(b) Permutations
Definition (Permutations)
A permutation is an ordered sequence of elements.
Definition (Factorial Notation)
If N is a non-negative integer, we define:
N! N( N 1)(N
And
0! 1
2).......(3)(2)(1)
(N-factorial)
46
Theorem (Number of Permutations)
The total number of possible distinct permutations
(ordered sequences) of R elements selected (without
replacement) from N distinct elements, denoted P , is
given by
N
N!
( N R )!
Example:
Consider the numbers 1, 2, 3, 4. How many permutations
of these four numbers taken 2 at a time can be found?
(12)
(c) Combinations
Definition (Combinations)
A set of R elements selected from a set of N distinct
elements without regard to order is called a combination.
Theorem (Number of Combinations)
The total number of possible combinations of R elements
selected from a set of N distinct elements is given by.
N!
R!( N R )!
47
Example:
In how many ways can a committee of 4 people be chosen
from a group of 7 people? (35)
and
Ni
i 1
N!
R! ( N R )!
48
Example:
Say we have 3 black flags and 2 red flags. How many
distinct ways are there of arranging these flags in a row?
(10)
Example:
Suppose there are 6 applicants for 2 similar jobs. As the
personnel manager is too lazy he simply selects 2 of the
applicants at random and gives them each a job. What is
the probability that he selects one of the 2 best applicants,
and 1 of the four worst applicants? (8/15)
49
4.6 Conditional Probability
Definition (Conditional Probability)
The probability of event A occurring given that event B
occurs, or the conditional probability of A given B (has
occurred) is denoted P ( A | B ) . Provided P (B ) 0 , this
conditional probability is defined to be
P ( A | B)
P ( A B)
P (B)
Example:
Suppose that a survey of women aged 20-30 years suggests
the following joint probability table relating to marital
status and desire to become pregnant within the next 12
months.
50
Marital status
Married
Unmarried
Total
Desire
Pregnancy No pregnancy
0.08
0.47
0.02
0.43
0.10
0.90
Total
0.55
0.45
1.00
B)
P ( A ) P (B | A )
P (B ) P ( A | B )
Example:
Define events A and B in the following way:
A: A student achieves a mark of over 65% in a first year
statistics exam
B: A student goes on to complete her bachelors degree.
51
Suppose past experience indicates
P ( A)
P (B | A )
0.7
0.88
B)
P ( A ) P (B)
52
Alternative Definition (Independent and Dependent
Events)
Events A and B are said to be statistically independent if
P ( A | B)
P (A)
P (B | A )
P (B )
Example:
Consider the single die tossing experiment again and
define the following events:
A: an odd number of dots results
B: a number of dots greater than 2 results
Are A and B independent?
53
4.8 More Useful Probability Rules
(a) The Additive Law of Probability
Theorem (Additive Law of Probability)
For two events A and B defined on a sample space
P (A
B)
P ( A ) P (B ) P ( A
Example:
Again suppose that for s
P (1)
P ( 4)
P ( 5)
with
P ( 2)
P (7)
0.36
A
P ( 3)
P ( 8)
B)
{1,2,3,4,5,6,7,8} :
P ( 6)
0.08
{1,3,5,6} , B
0.1
{2,3,4,5,8}
54
(b) The Complementation Rule
Theorem 4.7 (Complementation Rule)
Suppose an event E and its complement E defined on
some sample space s. Then
P ( E) 1 P ( E )
P (E1
A ) P (E2
A ) .... P (Ek
A)
P (E j ) P ( A | E j )
j 1
55
MAIN POINTS
In some statistical experiments the number of basic
outcomes in the sample space or event of interest can be
enumerated by using the multiplicative rule,
permutation or combination formulae, depending on how
a basic outcome can be represented most appropriately.
P ( A | B ) means the probability event A occurs given
that event B has occurred. The conditional probability
definition is
P ( A B)
P ( A | B)
P (B)
B)
56
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 4
Required Reading:
Ref. File 4: Sections 4.7 to 4.9
Ref. File 5: Introduction and Sections 5.1 to 5.4
4.
57
Example:
Suppose that in a given street 50 residents voted in the last
election. Of these, 15 voted for party A, 30 voted for
party B and 5 voted for neither party A nor B.
Suppose that one evening a candidate for the next election
visits the residents of the street to introduce herself. What
is the probability that the first two eligible voters she
meets voted for party A at the last election? ( 3 35 )
Example:
Consider the experiment of successively drawing 2 cards
from a deck of 52 playing cards. Define the following
events:
A 1 : ace on first draw
A 2 : ace on second draw
58
Note: If we simultaneously select a sample of n elements,
we are effectively sampling without replacement.
4.10 Probability Trees
Tree diagrams can be a useful aid in calculating the
probabilities of intersections of events (i.e. joint
probabilities).
Example:
Greasy Mos take-away food store offers special $10 meal
deals consisting of a small pizza or a kebab, together with
a can of soft drink, a milkshake or a cup of fruit juice.
Past experience has shown that 60% of meal deal buyers
choose a pizza (P), 40% choose kebabs (K), 75% choose
softdrink (S), 20% choose a milkshake (M) and 5%
choose fruit juice (J). Assume the events P and K are
independent of the events S, M and J. What is the
probability that a meal deal customer (chosen at random)
will choose a pizza and fruit juice? (0.03)
The tree diagram for this example can be drawn as below.
59
P (P
P (P
P (K
S)
0.4 (0.75)
0.3
P (K
M)
0.4 (0.2)
0.08
P (K
J:0.05
S:0.75
K:0.4
M:0.2
J:0.05
60
5.
61
Alternative Definition (Random Variable)
A random variable X is a real valued function for which
the domain is the sample space of a statistical experiment.
In most statistical experiments of interest, outcomes give
rise to quantitative data that can be considered values of
the random variable being studied.
62
Definition (Discrete Random Variable)
A discrete random variable can only assume a finite or
infinite and countable number of values.
Definition (Continuous Random Variable)
A continuous random variable can assume any value in an
interval (finite or infinite).
63
Example:
Consider again the experiment of tossing a fair die once
and noting the number of dots on the upward facing side
(X).
P( X
x)
64
Definition (Expected Value of a Discrete Random
Variable)
The expected value of a discrete random variable X is
defined as
E( X )
xP ( X
x)
all x
65
Theorem (Expected Value of a Function of a Discrete
Random Variable)
Suppose a function g ( X ) of a discrete random variable X.
The expected value of this function, if it exists, is given by
E [ g ( X )]
g( x) P ( X
x)
all x
g k ( X )]
E [ g1 ( X )] ... E [ g k ( X )]
E [ g ( X )]
66
5.3 The Variance of a Random Variable
To gauge the dispersion of a random variable X about its
expected value or mean we can calculate the expected
value of its squared distance ( X E ( X ))2 from the mean.
This is called the variance of the random variable X,
denoted Var ( X ) .
Definition (Variance of a Random Variable)
The variance of any random variable X (discrete or
continuous) is given by
Var ( X )
E [( X
E ( X )) 2 ]
SD ( X )
Var ( X )
E [( X
E ( X ))2 ]
67
An alternative way of writing (and calculating) Var ( X ) is
Var ( X )
E ( X 2 ) [ E ( X )]2
x2 P ( X
x)
[ E ( X )]2
(If X is discrete)
all x
Example:
Suppose a lottery offers 3 prizes: $1,000, $2,000 and
$3,000. 10,000 tickets are sold and each ticket has an
equal chance of winning a prize. Calculate the variance
and standard deviation of the random variable X
representing the value of the prize won by a ticket.
(1399.64, 37.4118)
x
0
1,000
2,000
3,000
P ( X x)
xP ( X x )
x2
9997
0
0
10000
1
1,000,000
0.1
10000
1
4,000,000
0.2
10000
1
9,000,000
0.3
10000
Total
0.6
x2 P( X
0
100
400
900
1400
x)
68
b 2Var ( X )
69
5.4 The Binomial Distribution
The binomial distribution is a discrete probability
distribution based on n repetitions of an experiment
whose outcomes are represented by a Bernoulli random
variable.
(a) Bernoulli Experiments
A Bernoulli experiment (or trial) is such that only 2
outcomes are possible. These outcomes can be denoted
success (S) and failure (F), with probabilities p and
(1 p ) , respectively.
A Bernoulli random variable Y is usually defined so that it
takes the value 1 if the outcome of a Bernoulli experiment
is a success, and the value 0 if the outcome is a failure.
Thus
P (Y
P (Y
1) p
0) (1 p)
E (Y )
Var (Y )
p
p(1 p)
70
(b) Binomial Experiments
Definition (Binomial Experiment)
A binomial experiment fulfils the following requirements:
(i)
(ii)
(iii)
(iv)
(v)
x)
Cx p x (1 p ) n
for x
0,1,2,3,..., n
71
Example:
A company that supplies reverse-cycle air conditioning
units has found from experience that 70% of the units it
installs require servicing within the first 6 weeks of
operation. In a given week the firm installs 10 air
conditioning units. Calculate the probability that, within 6
weeks
5 of the units require servicing (0.1029 approx.)
none of the units require servicing (0 approx.)
all of the units require servicing (0.0282 approx.)
72
(c) Cumulative Binomial Probabilities
(Extract of Appendix 3)
CUMULATIVE BINOMIAL PROBABILITIES: P ( X
x | p, n)
p
n
1
x
0
1
0.05
0.9500
1.0000
0.10
0.9000
1.0000
0.15
0.8500
1.0000
0.20
0.8000
1.0000
0.25
0.7500
1.0000
0.30
0.7000
1.0000
0.35
0.6500
1.0000
0.40
0.6000
1.0000
0
1
2
0.9025
0.9975
1.0000
0.8100
0.9900
1.0000
0.7225
0.9775
1.0000
0.6400
0.9600
1.0000
0.5625
0.9375
1.0000
0.4900
0.9100
1.0000
0.4225
0.8775
1.0000
0.3600
0.8400
1.0000
0
1
2
3
0.8574
0.9928
0.9999
1.0000
0.7290
0.9720
0.9990
1.0000
0.6141
0.9393
0.9966
1.0000
0.5120
0.8960
0.9920
1.0000
0.4219
0.8438
0.9844
1.0000
0.3430
0.7840
0.9730
1.0000
0.2746
0.7183
0.9571
1.0000
0.2160
0.6480
0.9360
1.0000
0
1
2
3
4
5
6
7
8
9
10
0.5987
0.9139
0.9885
0.9990
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.3487
0.7361
0.9298
0.9872
0.9984
0.9999
1.0000
1.0000
1.0000
1.0000
1.0000
0.1969
0.5443
0.8202
0.9500
0.9901
0.9986
0.9999
1.0000
1.0000
1.0000
1.0000
0.1074
0.3758
0.6778
0.8791
0.9672
0.9936
0.9991
0.9999
1.0000
1.0000
1.0000
0.0563
0.2440
0.5256
0.7759
0.9219
0.9803
0.9965
0.9996
1.0000
1.0000
1.0000
0.0282
0.1493
0.3828
0.6496
0.8497
0.9527
0.9894
0.9984
0.9999
1.0000
1.0000
0.0135
0.0860
0.2616
0.5138
0.7515
0.9051
0.9740
0.9952
0.9995
1.0000
1.0000
0.0060
0.0464
0.1673
0.3823
0.6331
0.8338
0.9452
0.9877
0.9983
0.9999
1.0000
10
....
....
....
....
....
0.70
0.3000
1.0000
0.0900
0.5100
1.0000
0.0270
0.2160
0.6570
1.0000
0.0000
0.0001
0.0016
0.0106
0.0473
0.1503
0.3504
0.6172
0.8507
0.9718
1.0000
Example:
Referring to previous air conditioning unit example,
calculate the probability that within 6 weeks of installation
less than 8 of the air conditioners require servicing.
(0.6172 approx.)
4 or more of the air conditioners require servicing.
(0.9894 approx.)
73
Example:
A referring to previous air conditioning unit example, use
the cumulative binomial tables to calculate the probability
that within 6 weeks of installation
5 units require servicing (0.103)
10 units require servicing (0.0282)
74
Each combination of n and p gives a particular
binomial distribution. We say n and p are the
parameters of the binomial distribution.
If p
Example
Suppose n
5 and p
0.5
(probability histogram)
probability
0.3125
0.1563
0.0313
0
75
MAIN POINTS
If we sample without replacement from a finite
population, the outcome on any draw will depend on the
outcomes of all previous draws.
Sampling with replacement from a finite population is
equivalent to sampling from an infinite population.
Tree diagrams can facilitate the calculation of joint
probabilities (i.e. the probabilities of intersections of
events).
A probability distribution can be interpreted as a model
for the relative frequency distribution of some real
statistical population. In any given situation, the model
may or may not represent the relative frequency
distribution exactly.
It is convenient to associate the outcomes of a statistical
experiment with values of a random variable (e.g. X). We
can then think in terms of the probability distribution of
the random variable.
The mean (expected value) and variance of a discrete
random variable are given by
E( X )
x P( X
x)
all x
Var ( X )
E [( X
E ( X )) 2 ]
( x E ( X )) 2 P ( X
all x
x)
76
x)
C p x (1 p )n
77
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 5
Required Reading:
Ref. File 6: Introduction and Sections 6.1 to 6.4
6.
6.1 Introduction
From now on we shall be mainly concerned with studying
the distributions of continuous random variables. As we
have noted, a continuous random variable can assume any
value in a given interval.
The probability distribution for a continuous random
variable X will have a smooth curve or line as its graphical
representation. The heights of the points on this curve will
be given by a function of x, denoted f ( x ) , which is
variously called the probability density function, the
probability distribution or simply the density function of
the random variable X.
78
Area
P (a
b)
y = f(x)
f ( x) d x 1
79
6.2 The Uniform Distribution
If a random variable X can take on any value in a given
finite interval a x b and the probability of the variable
taking a value in a given finite sub-interval is the same as
the probability the variable takes a value in any other
finite sub-interval of the same width, we say the variable X
is uniformly distributed. We have the following formal
definition.
Definition (Uniform Random Variable)
A continuous random variable X is said to be uniformly
distributed over the finite interval a X b if and only if
its density function is given by
1
b a
f ( x)
, if a
x b
0 , if x a or x b
d)
d c
b a
for c
a,d
80
f(x)
Total Area = 1
1/(b-a)
E( X )
Var ( X )
(a b )
2
(b a ) 2
12
Example:
The amount of petrol sold daily by a service station (say X)
is known to be uniformly distributed between 4,000 and
6,000 litres inclusive. What is the probability of sales on
any one day being between 5,500 and 6,000 litres? (0.25)
81
6.3 The Normal (Gaussian) Distribution
The normal distribution represents a family of bellshaped distributions that are distinguished according to
their mean and variance.
Definition (Normally Distributed Random Variable)
A random variable X is normally distributed if and only if
it has a density function of the following form:
f ( x)
1
2
(x
)2
Where:
and 2 are parameters of the distribution of X.
They are used to represent E ( X ) and Var ( X ) ,
respectively.
e is the irrational number e that serves as the base
for natural logarithms (e 2.7182..)
is the irrational number representing the ratio of
the circumference of a circle to its diameter
(
3.1415..)
A normal distribution with mean
usually denoted N ( , 2 ) .
and variance
is
82
y = f(x)
83
The standard normal distribution has a mean of 0 and a
variance (and standard deviation) of 1. A standard
normal variable is often denoted Z . Thus
Z ~ N (0, 1)
) if and only if Z
~ N (0,1)
84
Also note that a linear function of a normal variable is also
normally distributed.
(Extract of Appendix 5)
AREAS UNDER THE STANDARD NORMAL DISTRIBUTION
The table below gives areas under the standard
normal distribution between 0 and z.
z
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
....
....
3.8
3.9
0
.0000
.0398
.0793
.1179
.1554
.1915
.2258
.2580
.2881
.3159
.3413
.3643
.3849
.4032
.4192
.4332
.4452
.4554
.4641
.4713
.....
......
.4999
.5000
1
.0040
.0438
.0832
.1217
.1591
.1950
.2291
.2612
.2910
.3186
.3438
.3665
.3869
.4049
.4207
.4345
.4463
.4564
.4649
.4719
.....
.....
.4999
.5000
2
.0080
.0478
.0871
.1255
.1628
.1985
.2324
.2642
.2939
.3212
.3461
.3686
.3888
.4066
.4222
.4357
.4474
.4573
.4656
.4726
.....
.....
.4999
.5000
3
.0120
.0517
.0910
.1293
.1664
.2019
.2357
.2673
.2967
.3238
.3485
.3708
.3907
.4082
.4236
.4370
.4484
.4582
.4664
.4732
.....
.....
.4999
.5000
4
.0160
.0557
.0948
.1331
.1700
.2054
.2389
.2704
.2996
.3264
.3508
.3729
.3925
.4099
.4251
.4382
.4495
.4591
.4671
.4738
.....
.....
.4999
.5000
5
.0199
.0596
.0987
.1368
.1736
.2088
.2422
.2734
.3023
.3289
.3531
.3749
.3944
.4115
.4265
.4394
.4505
.4599
.4678
.4744
.....
.....
.4999
.5000
6
.0239
.0636
.1026
.1406
.1772
.2123
.2454
.2764
.3051
.3315
.3554
.3770
.3962
.4131
.4279
.4406
.4515
.4608
.4686
.4750
.....
.....
.4999
.5000
7
.0279
.0675
.1064
.1443
.1808
.2157
.2486
.2794
.3078
.3340
.3577
.3790
.3980
.4147
.4292
.4418
.4525
.4616
.4693
.4756
.....
.....
.4999
.5000
Z
8
.0319
.0714
.1103
.1480
.1844
.2190
.2518
.2823
.3106
.3365
.3599
.3810
.3997
.4162
.4306
.4429
.4535
.4625
.4699
.4761
.....
.....
.4999
.5000
Example:
If Z ~ N (0, 1) determine the following probabilities:
P ( Z 0) (0.5)
P ( Z 0.5) (0.3085)
P ( 0.1 Z 0.9) (0.3557)
P ( Z 1.64) (0.0505)
9
.0359
.0754
.1141
.1517
.1879
.2224
.2549
.2852
.3133
.3389
.3621
.3830
.4015
.4177
.4319
.4441
.4545
.4633
.4706
.4767
.....
.....
.4999
.5000
85
Example:
If X ~ N (12, 4) , calculate P ( X 6.26) , P (7
P ( X 15.5) . (0.0021, 0.6853, 0.0401)
13) and
86
Example:
From several years records, a fish market manager has
determined that the weight of deep sea bream sold in the
market ( X ) is approximately normally distributed with a
mean of 420 grams and a standard deviation of 80 grams.
Assuming this distribution will remain unchanged in the
future, calculate the expected proportions of deep sea
bream sold over the next year weighing
(a) Between 300 and 400 grams. (0.3345)
87
6.4 The Normal Approximation to the Binomial
The normal distribution can be used to approximate
binomial probabilities if
np
5 and n(1 p )
np and
np(1 p ) .
88
Example:
It is known that 60% of cars registered in a given town use
unleaded petrol. A random sample of 200 cars is selected.
Determine the probability that, of the cars in the sample:
130 use unleaded petrol. (0.021)
more than 130 use unleaded petrol. (0.0643)
less than 130 use unleaded petrol. (0.9147)
89
MAIN POINTS
The graphical representation of a continuous random
variable is the graph of its density function This is the
counterpart of the probability histogram for a discrete
random variable.
The probability that a continuous random variable takes
on a value in some range is given by an area under the
density function.
The uniform distribution has a constant density function.
If X is normally distributed with a mean
2
, we can write this information as
X ~ N( ,
and variance
90
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 6
Required Reading:
Ref. File 7: Introduction and Sections 7.1 to 7.4
7.
INTRODUCTION TO ESTIMATION
91
Definition (Sample Statistic)
Suppose the random variables X 1 , X 2 ,....., X n are
associated with a sample of size n from a statistical
population. Then any function of (or formula containing)
X 1 , X 2 ,....., X n that does not depend on any unknown
parameter is called a sample statistic.
Definition
(Estimator/Estimate
of
a
Population
Parameter)
Suppose the random variables X 1 , X 2 ,....., X n are
associated with a sample of size n from a statistical
population.
Then a sample statistic involving
X 1 , X 2 ,....., X n that is used to estimate a parameter of the
population or associated probability distribution is called
an estimator of the parameter, and a realization of the
sample statistic (an actual number) is called an estimate of
the parameter.
92
Definition (Sample Mean and Variance of a Random
Variable)
Suppose the random variables X 1 , X 2 ,....., X n represent a
random sample of size n of the random variable X. The
sample mean and variance of X are then defined as,
respectively
n
Xi
Sample Mean of X
i 1
n
n
(Xi
Sample Variance of X
S2
X )2
i 1
n 1
93
Properties of Estimators
Definition (Unbiased Estimator)
Consider an estimator of some population parameter .
is an unbiased estimator of if E ( )
. If E ( )
,
is said to be a biased estimator of with the value of the
bias given by B E ( )
.
( is the lower case version of the Greek letter theta)
94
Definition (Relative Efficiency of an Estimator)
If 1 and 2 are both unbiased estimators of a population
parameter
with unequal variances, 1 is said to be
relatively more efficient than 2 if
Var (1 ) Var ( 2 )
95
7.2 The Sampling Distribution of the Sample Mean
Example:
Suppose we know that in a large city 20% of households
possess no car, 60% possess one car and 20% possess two
cars. If we let X be the number of cars in a household we
can write the probability distribution of X as
P( X
x)
x 0
15
x 1
35
x 2
15
x1
0
0
0
1
1
1
2
2
2
x2
0
1
2
0
1
2
0
1
2
x
0
0.5
1
0.5
1
1.5
1
1.5
2
P (( X 1 x1 ) ( X 2 x 2 ))
1/5 1/5 = 1/25
1/5 3/5 = 3/25
1/5 1/5 = 1/25
3/5 1/5 = 3/25
3/5 3/5 = 9/25
3/5 1/5 = 3/25
1/5 1/5 = 1/25
1/5 3/5 = 3/25
1/5 1/5 = 1/25
96
x
P ( X x)
0
1/25
0.5
6/25
1
11/25
1.5
6/25
2
1/25
97
Note: Var ( X ) decreases as n increases and approaches
zero in the limit. This, together with the fact that X is
unbiased, ensures that X is a consistent estimator of .
Note: The standard deviation of an estimator is often
called the standard error of the estimator, although often
this term is used for an estimate of the standard deviation
of an estimator.
Example:
A particular type of light bulb has a mean life of 6,000
hours and a standard deviation of bulb life of 400 hours.
What percentage of random samples made up of 100
observations of bulb lives will yield mean bulb lives
between 5,950 and 6,050 hours? (78.88%)
98
8.
INTERVAL ESTIMATION
z )
99
By symmetry P (Z
Similarly z
z )
is such that
P( z
) 1
X~N
100
X
n
~ N ( 0, 1)
P( z
) 1
n
z
P X
Thus the ( 1
x is given by
P X
x z
P z
n
n
n
n
1
1
,x z
**
101
Example:
Suppose it is known from past experience that the
duration of phone calls (X) made by telephone subscribers
in a given city is approximately normally distributed with
a standard deviation of 10 minutes. A sample of 25 calls is
metered and the mean duration of these calls is found to
be 7.5 minutes. Construct a confidence interval for the
mean duration of calls (in minutes) based on this sample,
using a confidence level of 0.90. (4.21, 10.79)
30
by s 2 if it is
102
8.3 Properties of Confidence Intervals
The width of a confidence interval for the population
mean, where we are justified in using the normal
distribution, is given by
x z
(x z
2z
2z
2
2
30 )
103
Example:
A clothing shop located in a busy shopping arcade is
interested in estimating the mean age of people who
frequent the arcade. The shop intends to use this
information in determining the appropriate range of
clothing it should stock in order to maximize sales. A
sample of people is to be selected at random in the arcade
and questioned by the shop manager about their age.
What should the sample size be if the shop manager
wishes to use a calculated x to estimate the average age of
people who frequent the arcade to within 1.5 years, with
95% confidence, assuming the population standard
deviation is approximately 7.5? (97)
104
MAIN POINTS
A random sample of a random variable is such that the
random variables representing the sample are
independently and identically distributed.
An estimator of a population parameter is a formula
containing the random variables representing sample
values.
The probability distribution of an estimator is called a
sampling distribution.
An unbiased estimator of a parameter has a mean equal
to the parameter value.
A consistent estimator has a probability distribution that
becomes more concentrated around the true parameter
value as n tends to infinity.
2
E( X )
, and Var ( X )
105
106
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 9
Required Reading:
Ref. File 7: Sections 7.6 and 7.7
Ref. File 8: Introduction and Sections 8.1 to 8.5
9.
HYPOTHESIS TESTING
107
The basic process of statistical hypothesis testing can be
represented as follows:
Claim (hypothesis) about parameter
(plus knowledge of any other parameters needed to specify
the distribution of the population)
108
Definition (Simple and Composite Hypotheses)
A hypothesis that specifies a single value of a parameter
and permits us to specify uniquely the distribution of the
population being sampled from is called a simple
hypothesis. All other hypotheses concerning a population
parameter are called composite hypotheses.
Typically the alternative hypothesis is a composite
hypothesis that specifies a range of alternative values of
the parameter of interest, unless the parameter in question
can only take a finite number of values.
If the null hypothesis specifies a range of values of the
parameter considered, our hypothesis testing procedure
will use the limit value of the specified range of values (see
reference file for explanation).
Example:
It is claimed that the mean weight of flour in boxes of a
particular brand of flour is 500 grams. Suppose we wish
to test this claim.
109
The alternative hypothesis can be specified in several
ways, depending on the type of non-random variation that
is of interest.
A two-sided alternative hypothesis is of the form
parameter
110
(iii) Type I and Type II Errors
For any specification of H 0 and H 1 , two errors can occur
in testing H 0 .
Definition (Type I and Type II Errors)
A type I error occurs if the null hypothesis H 0 is rejected
when it is in fact true. A type II error occurs if the null
hypothesis H 0 is not rejected when it is in fact false.
111
The value (values for a two-tail test) of the test statistic
that partition(s) the sampling distribution of the estimator
into non-rejection and rejection regions is (are) called the
critical value (or values) of the test. The choice of
will
determine this (these) value(s).
against
at the
H0 :
H1 :
0
0
level of significance.
X~N
exactly
And
Z
~ N (0,1)
(exactly)
112
Say x l is the critical value that cuts off an area of in the
upper tail of the distribution of X under H 0 . That is
P( X
xl )
xl
xl
where
of
113
reject H 0 :
if z
reject H 0 :
if x
xl
z ,
z
0
Example:
A cereal manufacturer claims that the mean fat content of
its cereal packets is 2.2 grams. Assume the fat content per
packet is approximately normally distributed with a
standard deviation of 0.6 grams. A consumer organisation
suspects that the mean fat content per packet is higher
than the manufacturers claim. It tests a random sample
of 25 packets of cereal and finds a sample mean fat
content of 2.4 grams. On the basis of this information, test
the manufacturers claim against a suitable alternative at
the
0.05 level of significance.
114
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
115
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
Analogously, to test
against
H0 :
H1 :
0
0
(or x
xs
0
0
in the
116
or z
(alternatively:
x
x s or x
xl )
Example:
Suppose that in the previous example the consumer
organisation is as concerned about too low an average fat
content as too high an average fat content. Retaining all
the other details of the previous example, test the
2. 2 (
manufacturers claim against H 1 :
0.05).
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
117
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
118
9.3 Hypothesis Tests of the Mean When the Population is
Non-Normal, 2 Known or Unknown, n 30
Again, suppose the null hypothesis is
H0 :
X~N
approximately
~ N ( 0, 1)
approximately
by s in
Example:
Suppose a similar context as in the previous cereal packet
example except that
the population is non-normal with unknown variance
the sample size is 49
the sample standard deviation is 0.8
Again perform a two-tail test of the manufacturers claim
at the
0.05 level of significance, assuming x 2.4 as
before.
119
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
120
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
30 ) from a normal
X
population and 2 is unknown we cannot consider
S n
as being approximately N ( 0, 1) .
In fact the true
X
distribution of
in this situation is that of the T
S n
distribution (or Students T distribution).
X
is called the T score.
S n
121
For large n the T distribution approaches the standard
normal distribution. It is bell-shaped and symmetric
about 0, but has fatter tails than the normal distribution.
X
is said to be T distributed with (n 1) degrees
S n
of freedom.
T
122
(Extract of Appendix 6)
CRITICAL VALUES OF THE T DISTRIBUTION
The table below gives critical values of T for
given probability levels
0
Degrees of
Freedom,
1
2
3
4
5
6
7
8
9
10
11
12
13
40
60
120
Critical Values t
t 0 .1
t 0.05
t 0.025
t 0.01
t 0.005
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.303
1.296
1.290
1.282
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.684
1.671
1.661
1.645
12.706
4.303
3.192
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.021
2.000
1.984
1.960
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.423
2.390
2.358
2.326
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.704
2.660
2.626
2.576
Example:
Using the T table find t 0. 05,10 and t 0. 025, 40 . (1.812, 2.021)
Armed with the T distribution we can construct a
confidence interval for using a small sample size from a
normal population with an unknown variance. Similar
algebra and reasoning to that used previously gives the
following ( 1
)100% confidence interval in these
circumstances:
123
x t
2,n 1
s
n
Example:
A large company employing 500 salespersons takes a
random sample of size 25 of these salespersons expense
account amounts for a particular month. The sample
mean is found to be $210 with a sample standard deviation
of $30. Construct a 95% confidence interval for the mean
expense account amount for the month in question,
assuming the expense account amounts are approximately
normally distributed. ($197.62, $222.38)
124
MAIN POINTS
A hypothesis test involves testing some claim about a
population parameter. We will be led to reject the claim if
the sample result obtained (i.e. value of the test statistic) is
highly unlikely assuming the claim is true.
In the context of a hypothesis test:
Type I error: rejecting the null hypothesis (claim) when it
is true.
Type II error: not rejecting the null hypothesis when it is
false.
The rejection region for a hypothesis test is determined
by the significance level , and the distribution of the test
statistic assuming the null hypothesis is true.
If X is normally distributed with unknown variance,
X
is a T distributed random variable with (n 1)
T
S n
degrees of freedom (even if n is small).
125
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 10
Required Reading:
Ref. File 5: Section 5.5
Ref. File 7: Section 7.8
Ref. File 8: Sections 8.6, 8.7, 8.9
Ref. File 9: Sections 9.1 and 9.2
9.5 Hypothesis Tests about the Mean when the
Population is Normal, 2 is Unknown and n 30 (or
larger)
Again assuming H 0 :
T
X
S
is true, we have
P (T
P (T
,n 1
,n 1
and t
2,n 1
as
2,n 1
,n 1
Tn
126
,n 1
(or x l
( H1 :
(or x s
,n 1
( H1 :
2,n 1
and
test ( H 1 :
s
) for an upper-tail test
n
,n 1
)
0
,n 1
s
) for a lower-tail test
n
2,n 1
0
(or
2 ,n 1
s
) for a two-tail
n
Example:
A manufacturer of radial tyres claims the mean tread life
of its tyres is at least 60,000 kilometres. The tyre tread life
is known to be approximately normally distributed. To
test the manufacturers claim, 16 of the tyres are selected
at random and tested. The sample yields a mean tread life
of 56,000 kilometres and a sample standard deviation of
5,250 kilometres. Perform the test of the manufacturers
claim against a lower tail alternative, assuming
0.05 .
127
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
Step 5:
Calculate the test statistic based on a realized sample.
128
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
129
For example, suppose the null hypothesis is H 0 :
0
and that x is obtained from the sample. Then, assuming
the null hypothesis is true
For an upper-tail test, p-value P ( X x )
For a lower-tail test, p-value P ( X x )
For a two-tail test, p-value 2 P ( X x) if x is to
the right of 0 , and p-value 2 P ( X x) if x is to
the left of 0 .
Example:
Recall the previous example where a cereal manufacturer
claimed the mean fat content of cereal packets was 2.2
grams. The fat content per packet was approximately
normally distributed with a standard deviation of 0.6
grams. A consumer organization suspected that the mean
fat content per packet was higher than the manufacturers
claim. It tested a random sample of 25 packets of cereal
and found a sample mean fat content of 2.4 grams.
Calculate the p-value for an upper tail test of the
manufacturers claim. (0.0475)
130
10. INFERENCE ABOUT A POPULATION
PROPORTION
10.1 The Sample Proportion of Successes
If we divide a binomial random variable X by the number
of trials n, we obtain the proportion of successes (in n
trials) or the sample proportion, normally denoted p . p
can be considered an estimator of the binomial
distribution parameter p, otherwise called the
population proportion.
For given n and p, the probability distribution of the
random variable p has the same shape as the probability
distribution of the binomial random variable X.
We have
P( X
x)
E ( p )
P p
Also
Var ( p )
p(1 p )
n
x
n
131
Example:
Suppose 2% of integrated circuits produced by a
particular process are defective. A manufacturer of
radios purchases 20 untested circuits. What is the
probability that at most 5% of these circuits prove to be
defective? (0.9401)
132
We know already that
E ( p )
And
Var ( p )
p(1 p )
n
p ~ N p,
p(1 p)
n
approximately
Example:
It is known that 10% of televisions produced by a given
manufacturer have a minor defect. A retailer buys 100
televisions from the manufacturer.
What is the
probability that more than 15% of these televisions have
minor defects? (0.0475)
133
10.3 Large Sample Confidence Intervals for Proportions
We have, if np
Z
5 and n(1 p )
5:
p p
~ N ( 0, 1) approximately
p(1 p ) n
p p
p(1 p ) n
or after rearrangement
P ( p z
p(1 p) n
p z
p(1 p) n ) 1
P ( p z
p (1
p ) n
p z
p (1
p ) n ) 1
(p z
p (1 p ) n , p z
p (1 p ) n )
134
Example:
Suppose a new product is to be launched on the market.
To gain an idea of the potential market size, 1,000 people
selected at random are shown the product and asked if
they would be willing to buy it. The proportion of these
people who would be willing to buy is found to be 0.065
(i.e. 65 people). Calculate a 90% confidence interval for
the proportion of all people who would buy the product.
(0.052, 0.078)
p(1 p )
D
135
Unfortunately this depends on p, which is unknown after
the sample is taken.
However p(1 p ) reaches a
maximum of 0.25 when p 0.5 . Hence choosing n such
that
(0.5)(0.5)
D
2 ( 0.5)
136
If np 0 5
hypothesis:
and n(1 p 0 )
p 0 (1 p 0 )
approximately
n
p p 0
~ N ( 0, 1)
approximately
p 0 (1 p 0 ) n
p ~ N p 0 ,
Z
Given , we can then proceed to formulate one and twotail tests for p in a similar fashion to that used for .
Example:
A public transport company claims that no more than 5%
of its customers are dissatisfied with its new electronic
ticketing system. 200 customers of the company are
selected at random and asked whether they are dissatisfied
with the ticketing system. Of these, 13 state they are
dissatisfied. Perform an upper tail hypothesis test of the
companys claim using the
0.05 significance level.
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
137
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
138
11. THE CHI-SQUARED DISTRIBUTION AND
INFERENCE CONCERNING A POPULATION
VARIANCE
11.1 The Chi-Square Distribution
The sum of squares of (nu) independent standard
normal random variables is said to be Chi-square
distributed with degrees of freedom, written 2 .
2
,
2
,
139
(Extract of Appendix 7)
CRITICAL VALUES OF THE CHI-SQUARE DISTRIBUTION
The table below gives critical values of
Degrees of
Freedom
Critical Values
2
.995
2
.99
2
.975
2
.95
2
.90
2
.10
2
.05
2
.025
2
.01
2
.005
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.0000393
0.0100251
0.0717212
0.206990
0.411740
0.675727
0.989265
1.344419
1.734926
2.15585
2.60321
3.07382
3.56503
4.07468
4.60094
0.0001571
0.0201007
0.114832
0.297110
0.554300
0.872085
1.239043
1.646482
2.087912
2.55821
3.05347
3.57056
4.10691
4.66043
5.22935
0.0009821
0.0506356
0.215795
0.484419
0.831211
1.237347
1.68987
2.17973
2.70039
3.24697
3.81575
4.40379
5.00874
5.62872
6.26214
0.0039321
0.102587
0.351846
0.710721
1.145476
1.63539
2.16735
2.73264
3.32511
3.94030
4.57481
5.22603
5.89186
6.57063
7.26094
0.0157908
0.210720
0.584375
1.063623
1.61031
2.20413
2.83311
3.48954
4.16816
4.86518
5.57779
6.30380
7.04150
7.78953
8.54675
2.70554
4.60517
6.25139
7.77944
9.23635
10.6446
12.0170
13.3616
14.6837
15.9871
17.2750
18.5494
19.8119
21.0642
22.3072
3.84146
5.99147
7.81473
9.48773
11.0705
12.5916
14.0671
15.5073
16.9190
18.3070
19.6751
21.0261
22.3621
23.6848
24.9958
5.02389
7.37776
9.34840
11.1433
12.8325
14.4494
16.0128
17.5346
19.0228
20.4831
21.9200
23.3367
24.7356
26.1190
27.4884
6.63490
9.21034
11.3449
13.2767
15.0863
16.8119
18.4753
20.0902
21.6660
23.2093
24.7250
26.2170
27.6883
29.1413
30.5779
7.87944
10.5966
12.8381
14.8602
16.7496
18.5476
20.2777
21.9550
23.5893
25.1882
26.7569
28.2995
29.8194
31.3193
32.8013
30
40
50
60
70
80
90
100
13.7867
20.7065
27.9907
35.5346
43.2752
51.1720
59.1963
67.3276
14.9535
22.1643
29.7067
37.4848
45.4418
53.5400
61.7541
70.0648
16.7908
24.4331
32.3574
40.4817
48.7576
57.1532
65.6466
74.2219
18.4926
26.5093
34.7642
43.1879
51.7393
60.3915
69.1260
77.9295
20.5992
29.0505
37.6886
46.4589
55.3290
64.2778
73.2912
82.3581
40.2560
51.8050
63.1671
74.3970
85.5271
96.5782
107.565
118.498
43.7729
55.7585
67.5048
79.0819
90.5312
101.879
113.145
124.342
46.9792
59.3417
71.4202
83.2976
95.0231
106.629
118.136
129.561
50.8922
63.6907
76.1539
88.3794
100.425
112.329
124.116
135.807
53.6720
66.7659
79.4900
91.9517
104.215
116.321
128.299
140.169
Example:
Find 02.01,15 ,
2
0.975,15
2
0.025,10
140
11.2 Confidence Intervals for a Population Variance
Given a random sample of a random variable X of size n,
we know
n
(Xi
S2
X )2
i 1
n 1
(ii)
(n 1) S
2
X )2
(Xi
2
i 1
2
n 1
141
We define:
2
1
2
2
1
Replacing
P(
2
1
by
2
2
2, n 1
2, n 1
(n 1) S 2
2
2,n 1
2,n 1
gives
2
1
(n 1) S 2
2,n 1
2
n 1
2
2,n 1
By rearrangement we obtain
142
(n 1) S 2
(n 1) S 2
2
1
2 ,n 1
Then a (1
(n 1) s 2
2
2 ,n 1
2 ,n 1
(n 1) s 2
2
1
2 ,n 1
Example:
A particular automotive engineering firm manufactures a
specific machined part used in motor vehicles. A random
sample of 15 of the parts from the firm yielded a diameter
sample variance of 0.0015 (diameter is measured in
centimetres). Assuming the diameter of the part is
approximately normally distributed, use this information
to calculate a 95% confidence interval for the variance of
the part. (0.0008, 0.0037)
143
11.3 Hypothesis Tests for a Population Variance
Suppose we wish to test the hypothesis
2
H0 :
H1 :
where
2
0
2
0
2
0
(n 1) S 2
2
1
2,n 1
2
0
2
2,n 1
2
0
(n 1) s 2
2
0
is
such that
2
2
1
2,n 1
Or
2
2
2,n 1
144
For testing H 0 :
decision rule is
Reject H 0 if
2
0
against H1 :
(n 1) s 2
2
0
For testing H 0 :
decision rule is
Reject H 0 if
2
0
2
0
, the
2
0
, the
,n 1
against H1 :
(n 1) s 2
2
0
2
1
2
,n 1
Example:
Suppose the firm manufacturing the machined engine part
considered in the previous example claims its parts have a
diameter variance 2 of no more than 0.0013 (again
diameter is measured in centimetres). Again suppose a
random sample of 15 of the parts from the manufacturer
yielded a diameter sample variance of 0.0015. Given this
information, perform a test of H 0 : 2 0.0013 against
H1 : 2 0.0013 at the 5% level of significance. Again
assume the diameter of the part is approximately normally
distributed.
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
145
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
146
MAIN POINTS
The p-value of a test is the probability (under the null
hypothesis) of obtaining a value of the test statistic equal
to or more extreme (in the direction(s) of rejection) than
the value obtained.
When testing hypotheses (or forming confidence intervals
for) about :
1. If the population is normal, 2 known use normal
distribution
2. If the population is normal, 2 unknown use T
distribution
3. If the population is not normal, 2 known, n large
use normal distribution
2
4. If the population is normal or not normal,
unknown, n large use normal distribution and
replace by s
The sample proportion of successes p is an unbiased
estimator of the population proportion (binomial
parameter p).
The normal distribution can be used to approximate
probabilities relating to p in the same circumstances that
the normal distribution is used to approximate the
binomial distribution.
The sample proportion of successes can be used to form
confidence intervals for p, and to test hypotheses about
p.
147
148
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 11
Required Reading:
Ref. File 10: Sections 10.1, 10.3, 10.4(a), 10.4(c)
Ref. File 11: Introduction and Sections 11.1(a) and 11.1(b)
12. INTRODUCTION TO CORRELATION ANALYSIS
IN THE CONTEXT OF CROSS-SECTIONAL
DATA
Definition (Deterministic and Stochastic Relationships)
Given two jointly distributed and statistically dependent
random variables X and Y, the random variable Y is
deterministically related to X if Y can be expressed as an
exact function of X only. Otherwise Y is stochastically
related to X.
Our concern now is to assess any linear relationship
between two stochastically related jointly distributed
random variables.
149
Definition (Sample Covariance between Two Random
Variables)
Suppose the n pairs of random variables
( X 1 ,Y1 ),..., ( X n , Yn ) represent a random sample of two
jointly distributed random variables X and Y. Then the
sample covariance (a random variable) between X and Y
based on these observations is defined as
(Xi
S X ,Y
X ) ( Yi
Y)
i 1
n 1
150
Definition (Population Covariance)
The population covariance of two jointly distributed
random variables X and Y with means E ( X )
X and
E (Y )
Y is defined as
Cov ( X , Y )
E [( X
X ,Y
E ( X ))(Y
E (Y ))]
E ( XY ) E ( X ) E (Y )
E [( X
E ( XY )
X
X
)(Y
)]
X ,Y
(X
E ( X )) (Y
X
Where
SD ( X ) and
E (Y ))
Cov ( X ,Y )
X ,Y
X
SD (Y ) .
151
population data points to a reference straight line (the best
linear predictor, cf. reference file Section 10.4(b) (optional
reading))
Theorem (Properties of the Population Correlation
Coefficient)
With respect to two jointly distributed random variables,
say X and Y, both with finite non-zero variances:
(i)
X
Y
X ,Y
2
X
X ,Y
2
X
152
Definition (Sample Correlation Coefficient between Two
Random Variables)
Suppose the n pairs of random variables
( X 1 ,Y1 ),..., ( X n , Yn ) represent a random sample of the two
jointly distributed random variables X and Y. Then the
sample correlation coefficient R X ,Y (a random variable)
between X and Y based on these observations is defined as
( X i X ) ( Yi Y )
SX
SY
1
n 1
R X ,Y
(Xi
n
i 1
X )( Yi
n 1
S X SY
Y)
S X ,Y
S X SY
Where, as before:
n
(Xi
SX
i 1
n 1
X)
(Yi
and SY
Y )2
i 1
n 1
153
Theorem (Distribution of a Sample Statistic to Test
whether the Correlation Coefficient between Bivariate
Normally Distributed Variables is Zero)
Suppose that we dispose of a random sample
( X 1 ,Y1 ),..., ( X n , Yn ) of bivariate normally distributed
random variables X and Y.
Then under the null
hypothesis X ,Y 0 we have the distribution result
RX ,Y
n 2
~ Tn
2
1 RX ,Y
(i.e. under H 0 :
X ,Y
0)
154
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
155
13. INTRODUCTION TO REGRESSION ANALYSIS
IN THE CONTEXT OF CROSS-SECTIONAL
DATA
13.1 Introduction
156
An equation which specifies an exact relationship between
variables is called a deterministic model.
g( X ) U
where:
U is the random error or random disturbance, i.e. a
random variable with a probability distribution.
E (Y | X ) g ( X ) is the deterministic component of
the model, which specifies how the mean of Y is
related to X. E (Y | X ) is called the conditional
expectation function (CEF) of Y given X.
E (Y | X ) g ( X ) implies that the expected value of
the random disturbance in the model will be zero for
all values of X.
157
13.2 Assumptions of the Neoclassical (Stochastic
Regressor) Simple Linear Regression (NSR) Model
(NSR-1) (Stochastic Regressor Assumption)
X and Y are jointly distributed random variables for
which a sample consisting of n paired observations
{( X 1 ,Y1 ), ( X 2 ,Y2 ),...., ( X n ,Yn )} is to be obtained.
(NSR-2) (Sample Variation in the Regressor Assumption)
There is variation in the values taken by X in the
observed sample.
(NSR-3) (Linear CEF Assumption)
E (Yi | X 1 , X 2 ,..., X n )
where
and
E (Yi | X i )
Xi
{i 1,...., n}
2
are constants.
Yi
E (Yi | X 1 , X 2 ,..., X n )
{i
1,...., n}
158
(NSR-4) (Homoscedasticity or
Variance Assumption)
Var (U i | X 1 , X 2 ,..., X n )
Where
Constant
{i
Conditional
1,...., n}
is finite.
(NSR-5) (Non-Correlated
Assumption)
Random
Disturbances
1,...., n}
159
13.3 The Least Squares Estimators of the NSR Model
In the context of the NSR model we wish to estimate the
assumed linear conditional expectation function (CEF)
(also called the population regression line)
E (Y | X )
~
Y
~
1
~
2 X
~
Ui
Yi
~
Yi
Yi
~
( 1
~
2 X i )
160
~
~
optimal values of 1 and 2 chosen (using calculus) such
that (the sum of squared residuals, or SSR)
n
~
Yi ) 2
(Yi
~
1
(Yi
i 1
~
2 X i ) 2
i 1
2 X
Y
n
X i Yi
i 1
n
nY X
(Xi
X )(Yi
Y)
i 1
n
X i2
nX 2
i 1
i 1
X )Yi
(Xi
X )2
i 1
n
X )2
(Xi
(Xi
i 1
(Xi
X )(Yi
Y)
i 1
(Xi
X )(Yi
Y ) (n 1)
i 1
n
(Xi
X)
(Xi
i 1
X)
(n 1)
S X ,Y
S X2
i 1
Yi
Yi
Yi
( 1
2 X i )
161
A scatter diagram depicts all the realized observations
( x i , y i ) in a sample. Normally the dependent variable is
measured along the vertical axis.
The following example of a scatter diagram shows the
relationship between realized y i , y i and u i ( y i y i ) .
.
yi
E(Y | X
(x i , y i )
xi )
yi
y i
E (Y | X
X
2
x)
( x i , y i )
xi
162
Example:
Consider a retailer with stores in a number of different
localities. Suppose the retailers marketing manager
believes there is a stochastic linear relationship between
the amount her firm spends each month on local
advertising (X) and monthly sales (Y) of a store in a
particular locality. Fit a least squares regression line
given the following local advertising and sales data (in
thousands of dollars) for 12 randomly chosen stores of the
firm in a particular month.
Store
1
2
3
4
5
6
7
8
9
10
11
12
Totals
Advertising
(X)
5.0
4.5
7.0
7.6
5.0
7.4
9.0
6.5
6.2
4.6
5.8
10.0
78.6
Sales
(Y)
250
260
280
282
265
266
280
268
265
258
263
295
3232
x 2i
xi y i
25.00 1250.0
20.25 1170.0
49.00 1960.0
57.76 2143.2
25.00 1325.0
54.76 1968.4
81.00 2520.0
42.25 1742.0
38.44 1643.0
21.16 1186.8
33.64 1525.4
100.00 2950.0
548.26 21383.8
163
The unbiased least squares estimator of
in the simple
two variable neoclassical linear regression model is called
the standard error of the regression and is given by
SSR
n 2
SU
Where as before
n
SSR
U i2
i 1
(Yi Yi )2
i 1
164
xi
5.0
4.5
7.0
7.6
5.0
7.4
9.0
6.5
6.2
4.6
5.8
10.0
yi
250
260
280
282
265
266
280
268
265
258
263
295
y i
259.4017
256.1980
272.2167
276.0612
259.4017
274.7797
285.0317
269.0130
267.0907
256.8387
264.5277
291.4392
y i y i
-9.4017
3.80205
7.7833
5.9388
5.5983
-8.7797
-5.0317
-1.01295
-2.0907
1.1613
-1.5277
3.5608
( y i y i ) 2
88.39196
14.45558
60.57976
35.26935
31.34096
77.08313
25.318
1.026068
4.371026
1.348618
2.333867
12.6793
354.1976
165
MAIN POINTS
The correlation coefficient measures the degree of linear
relationship between two random variables
Hypotheses about the population correlation coefficient
can be tested using the sample correlation coefficient,
assuming the two variables are bivariate normally
distributed
Regression analysis involves estimation of a stochastic
model relating two or more variables.
The neoclassical simple linear regression (NSR) model
(bivariate or stochastic regressor model) consists of a set
of assumptions about a bivariate statistical population and
sampling conditions that forms a basis for estimating an
assumed linear CEF
Least squares estimation involves using estimators of
coefficients that minimize the sum of squared
residuals(SSR), i.e. (Yi Yi ) 2 .
The unbiased least squares estimator of the disturbance
standard deviation
in the NSR model is called the
standard error of the regression and is given by
SU
SSR
n 2
166
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 12
Required Reading:
Ref. File 2: Introduction and Sections 2.1(a), 2.2(a), 2.2(b)
Ref. File 11: Sections 11.3, 11.4(a), 11.6(a), 11.6(e), 11.6(f),
11.6(g)
13.4 The Explanatory Power of a Regression Equation
A measure of how superior the estimated regression line is
to Y in predicting the realized values of the dependent
variable is the coefficient of determination, denoted R 2 .
R 2 is defined by decomposing the total variation of Y in
the sample observations into explained and
unexplained variations, where
n
Y )2
(Yi
Total variation
SST
i 1
Explained variation
(Yi
Y )2
SSE
i 1
(Yi
Unexplained variation
i 1
Yi )
U i2
SSR
i 1
167
Y 1
(y i y i )
yi
(y i
y)
X
2
y i
(y i
y
xi
y)
SST
SSR
SSE
SSE
SST
We will have
0
R2
SSR
SST
168
If R 2 1 all the observations lie on a non-horizontal
straight line. R 2 will equal 0 if 2 0 , i.e. if the sample
regression line is horizontal.
Example:
Calculate R 2 for the sales/advertising example.
approx.)
We already know SSR
354.1976 and y
yi
250
260
280
282
265
266
280
268
265
258
263
295
( y i y )2
373.7764
87.1105
113.7785
160.4453
18.7774
11.1109
113.7785
1.7777
18.7775
128.4437
40.1107
658.7795
1726.6666
269.3333
(0.79
169
170
13.5 Inference in the Neoclassic Simple Linear Regression
Model
(a) Operational Distribution Results Related to 1 and
2
Theorem (Conditional Variances of the OLS Estimators of
the NSR Model Intercept and Slope Parameters)
Under assumptions (NSR-1) to (NSR-5) of the neoclassical
simple regression model the least squares estimators 1
and 2 have the following conditional variances:
2
Var ( 1 | X 1 ,.., X n )
2
1 | x
Xi
i 1
n
(Xi
X )2
i 1
Var ( 2 | X 1 ,.., X n )
2
2
2 | x
(Xi
X )2
i 1
171
S
S 2 | x
1
2
U
Xi
i 1
n
(Xi
X )2
i 1
2
2 | x
SU2
n
(Xi
X )2
i 1
172
Theorem 11.12
Suppose assumptions (NSR-1) to (NSR-5) of the
neoclassical simple linear regression model are satisfied,
and in addition the random disturbances U i ( i 1,...., n )
are multivariate normally distributed conditional on
X 1 , X 2 ,..., X n . Then
1
1
~Tn
S | x
And
2
2
~Tn
S | x
173
(b) Confidence Intervals for
2
S
~ Tn
2 |x
Thus
2
S
2,n 2
2,n 2
2 |x
2 ,n 2
2 |x
For a 2 and s
2 |x
regression, the (1
{ 2
2 , n 2 2 | x
2 ,n 2
S | x )
2
, 2
2 , n 2 2 | x
174
(c) Testing Hypotheses About
2 0
S | x
2
2
~ Tn
S | x
or
t
t
2,n 2
2,n 2
175
Example:
Reconsider the sales/advertising regression example with
the following results
Yi
227.3642 6.4075 X i
12
(x i
x) 2
n 12
33.43
i 1
sU2
35.41976
Perform a two-tail test of H 0 : 2 0 with
Test H 0 : 2 0 against H 1 : 2 0 with
Construct a 95% confidence interval for
8.7008)
0.05 .
0.05 .
2 . (4.1142,
First Test
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
176
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
Second Test
Step 1:
Label the random variable of interest and formulate the
null and alternative hypotheses.
177
Step 2:
Identify the appropriate sampling distribution of the test
statistic under the null hypothesis H 0 .
Step 3:
Find the critical value(s) of the test statistic.
Step 4:
State the decision rule.
Step 5:
Calculate the test statistic based on a realized sample.
Step 6:
Compare the realized value of the test statistic and the
critical value(s) to reach a conclusion.
178
Regression results are often reported with the estimated
standard errors or T statistics given in brackets under the
corresponding estimated coefficients.
Note: The classical simple linear regression (CSR) model
involving a non-stochastic (fixed) regressor is discussed in
reference file Section 11.4. It is useful to compare this
model with NSR model, although the CSR model is not
examinable in this unit. Although the interpretation of the
two models is different, the basic form of the estimators
and inferential techniques are similar for the two models.
179
14. INTRODUCTION TO DIFFERENTIAL
CALCULUS
14.1 Limits
Terminology:
x
x
c )
(x
c )
are satisfied:
(i) l is a finite number
(ii) f ( x ) l as x c from the left (left limit exists)
(iii) f ( x ) l as x c from the right (right limit
exists)
180
Infinite limit implies limit does not exist.
lim f ( x) exists
x
f ( x ) is defined at x
function)
(iii) lim f ( x) f (c)
(ii)
181
14.3 Differentiation
(a) Basic Definitions
Definition (Slope of a Linear Function)
The slope or rate of change of a linear function
f ( x ) a bx is the change in the value of the function per
1-unit change in x (the argument of the function),
measured from any starting point. It is given by b.
Now consider any function y
the domain of f ( x ) .
If x changes from c to (c
average rate of change of
x (c
x ) as
y
x
f (c
x)
x
f ( x ) and a point x
c in
f (c )
f (c
x)
x
f (c )
182
y (at x
dy
(at x
dx
c),
c),
f (c )
dy
dx
f ( x)
lim
x
f (x
x)
x
f (x)
y
Tangent at x
y
f (c
x)
y
f (c )
f (x)
183
Note:
A differentiable function is continuous but the converse
need not be true.
(b) Differentiation Rules
Theorem (Basic Rules of Differentiation)
(i)
d (k )
dx
d (x n )
nx n 1
(ii)
(where n is any real number)
dx
(For all values of x for which both x n and x n 1 are
defined)
d [k f ( x)]
dx
d f ( x)
dx
d [ f ( x) g( x)]
dx
(v)
d [ f ( x) g( x)]
dx
d f ( x)
dx
f ( x)
dg( x)
dx
d g( x)
dx
df ( x)
dx
(product rule)
g( x)
184
df ( x )
dg( x )
f ( x)
d [ f ( x ) / g ( x )]
dx
dx
(vi)
dx
[ g ( x )] 2
For g( x) 0
(quotient rule)
g( x)
d (e x )
(vii)
dx
(viii)
ex
d ln x
dx
1
x
(where ln x
Log e x )
dy
dx
dy dg
dg dx
(chain rule, or function of a function rule)
dx
dy
1
dy dx
185
MAIN POINTS
The coefficient of determination ( R 2 ) measures the
proportion of total variation in the dependent variable
(in the sample) that is explained by variation in the
independent variable in the sample
In the context of simple linear regression, the sample
correlation coefficient equals the square root of the
coefficient of determination
Assumptions about the distribution of the disturbances
allow us to perform statistical inference in the linear
regression model
If the random disturbances in the NSR model are
multivariate normally distributed, exact distribution
results concerning the least squares estimators of the
models coefficients can be derived.
As the variance of the disturbances and hence the actual
variances of the least squares estimators are not known,
we are led to use the T distribution in performing
inference about the coefficients in the NSR model.
Continuity of a function at a point is a stronger condition
to satisfy than the limit of the function existing at the
point.
186
dy
dx
f ( x)
lim
x
f (x
x)
x
f (x)
187
200052 INTRODUCTION TO ECONOMIC METHODS
LECTURE - WEEK 13
Required Reading:
Ref. File 2: Sections, 2.3(a), 2.3(b), 2.3(d), 2.4, 2.5(a),
2.5(b), 2.7
14. INTRODUCTION TO DIFFERENTIAL
CALCULUS CONTINUED
14.4 Some Marginal Concepts in Economics
(a) Total and Marginal Revenue
Preliminary - Demand Functions
In its simplest form, the demand function expresses
quantity demanded (q) as a function of price (p). That
is
q
f (p )
188
Suppose an inverse demand function p
g (q ) .
We know
Total revenue
TR (q ) pq
g (q )q
MR(q)
d TR (q)
dq
Example:
Calculate MR if TR
1100 q
3q 2
2
d TC(q )
dq
189
Example:
Calculate MC if TC
q3
10
200q 500
p0
D
q1
q
q0
q0 ) q0
p0 ) p0
q q0
p p0
190
Definition (Arc Elasticity of Demand with Respect to
Price)
Consider a demand function q f (p ) for a particular
good, where q is quantity demanded and p is the price
per unit of the good, and two different points on the
function given by (q 0 , p 0 ) and (q 1 , p 1 ) . Let q (q 1 q 0 )
and p (p 1 p 0 ) . Then the arc elasticity of demand
with respect to price between the two given prices is given
by
E arc
[(q 0
[(p 0
q
q 1 ) 2]
p
p 1 ) 2]
dq p
dp q
1
p
dp dq q
191
(
leads to a
Example:
Calculate the elasticity of demand with respect to price if
q
2500 8p 2p 2
192
14.5 Higher Order Derivatives
Definition (Second Derivative of a Function)
Suppose the (first) derivative of a single variable function
y f ( x ) is differentiable. Then the derivative of the first
derivative is called the second order derivative or simply
the second derivative.
f ( x ) is
d2 f
d 2y
f ( x ), y ,
or 2
2
dx
dx
(k )
( x ), y
(k )
Example:
y f ( x ) 2x 3
dk f
d ky
or k
k
dx
dx
5x 4
calculated,
the
kth
193
14.6 Maxima and Minima of Functions - Definitions
Definitions (Extreme Points of a Single-Variable Function)
Given a function y f ( x ) and a point x a in the domain
of f ( x ) :
f ( x ) has a global or absolute maximum value f (a ) at
x a if and only if f (a ) f ( x ) for all x in the
domain of f ( x ) .
f ( x ) has a global or absolute minimum value f (a ) at
x a if and only if f (a ) f ( x ) for all x in the
domain of f ( x ) .
f ( x ) has a local or relative maximum value f (a ) at
x a if and only if f (a ) f ( x ) for all x in some
interval, however small, around x a and in the
domain of the function.
f ( x ) has a local or relative minimum value f (a ) at
x a if and only if f (a ) f ( x ) for all x in some
interval, however small, around x a and in the
domain of the function.
194
14.7 Determination of Local Extreme Points
(a) The First Derivative Test
An extreme point of a function may occur at a point in the
domain of the function either where the first derivative
equals zero, or where the first derivative does not exist.
Points in the domain of the function where either of these
cases occurs are called critical points of the function.
Points where the first derivative is zero are also commonly
called stationary points.
The first derivative test finds local extreme points of a
function by determining the intervals on which the
function is increasing or decreasing.
The First Derivative Test
Step 1: Determine the points where f ( x) 0 or f (x) is
not defined.
Step 2: For the intervals delineated by the points found in
Step 1, determine whether the function is
increasing ( f (x) 0) or decreasing ( f (x) 0) .
Step 3: For each point determined in Step 1 at which the
function is continuous, note what happens to the
sign of f (x) as x increases through the point.
If the sign of f (x) changes from positive to
negative, the point is a local maximum.
If the sign of f (x) changes from negative to
positive, the point is a local minimum.
195
If the sign f (x) does not change sign, the point
is neither a local maximum nor a local
minimum.
Note: If a function is constant in value between two
points on its graph, any points between these two points
will represent both local maximum and local minimum
points. The first derivative test as stated above is not
applicable to finding such local extreme points. Usually
the existence of intervals over which the value of the
function is unchanged can be determined by simple
inspection of the function considered, or by checking
whether the first derivative of the function equals zero
over any particular interval(s). (Reference File 2, p. 43)
Example:
y
f ( x)
45x
x2
2
196
(b) The Second Derivative Test
The Second Derivative Test
Suppose that for a twice differentiable function y f ( x )
we have f (a) 0 ; x a is stationary point of the
function.
If f (a) 0 , the function attains a local maximum
value of f (a ) at x a .
If f (a) 0 , the function attains a local minimum
value of f (a ) at x a .
If f (a) 0 , we must check whether f (x) changes
sign as x increases through the point x a (as per
the first derivative test).
Note: As was the case with the first derivative test, the
second derivative test as stated above is not applicable to
determining local extreme points occurring on intervals
over which the value of a function is unchanged.
(Reference File 2, p. 46)
Inflection Points
Inflection points occur where f ( x) 0 and f (x)
changes sign. At such a point we may or may not have a
stationary point ( f ( x) 0 ).
Example:
y
f ( x)
x3
197
Example of Simple Optimization:
Following the worst cyclone in 50 years, a banana
plantation owner in Northern Queensland has to replant
most of his holding. From research studies conducted in
the region the plantation owner knows that, on average, if
1,760 banana trees are planted per hectare, the annual
yield per tree will be 2 cartons of bananas; in addition, for
every additional tree planted per hectare (over 1,760), the
annual yield per tree falls by a thousandth of a carton.
How many trees should the plantation owner plant per
hectare to maximize the yield per hectare? What is the
maximum yield per hectare?
198
MAIN POINTS
The calculus definitions of marginal revenue, marginal
cost and point elasticity of demand with respect to price
are given by, respectively
Marginal revenue
Marginal cost MC
d TR (q)
dq
d TC(q )
dq
MR(q)
dq p
dp q
The determination of local extreme points can be carried
out using the first derivative test.
The second derivative test can be used to determine
whether a stationary point of a twice differentiable
function is an extreme point of the function.