ECS3706 study unit 17_reduced size file
ECS3706 study unit 17_reduced size file
Statistical principles
ECONOMETRICS IN ACTION
The Department of Economics at New York university (NYu) has evolved into
one of the world’s leading centres for research and teaching in economics.
Professor C flinn of NYu teaches the Econometrics I course. Here are some
of his comments on his course objectives:
STuDY OBJECTIVES
Econometrics makes extensive use of statistical concepts. Some examples:
• We assume that the data used in regression analysis are a random sample drawn
from the population. What exactly is the meaning of “random sample” and of
“population”?
• What are the implications of using sample estimates? The concept of the sampling
distribution of a sample estimator is a fundamental concept you must understand
well.
• Related to the sampling distribution are concepts like unbiased estimators and
minimum variance. What do these mean?
This module requires you to be familiar with statistical concepts. This chapter deals
with the basic statistical concepts required in this regard. This could be particularly
helpful to students who have not previously completed statistics courses. Students
who have previously completed statistics courses may find this chapter a convenient
means to brush up their statistics, and may even learn some new things!
Yes, this study unit is examination material. Within each of the sections below, we
clearly indicate what you must understand.
32
Open Rubric
STuDY uNIT 17: Statistical principles
We expect you, in the case of discrete random variables, to understand the meaning of
• a random variable (X) and the probability distribution of X which is denoted by
its probability density function P(X)
• the mean (or expected value) of random variable (X)
• the variance and the standard deviation of random variable (X)
ECS3706/1 33
TASK 17.1.1
Consider a normal die with numbers 1 to 6 on its sides. Let X measure the outcome
of a throw of the die.
(a) Explain how the concept of a discrete random variable (X) may be applied
to the throw of the die.
(b) Derive the probability density function P(X). Explain whether P(X) is normally
distributed.
(c) Derive E(X), the expected value of X and explain its practical meaning.
5 ANSWERS
(a) The variable (X) can assume six possible outcomes when the die is thrown.
The range of possible outcomes of X is (1, 2, 3, 4, 5, 6). Because these are
a countable number of possible values, X is a discrete variable. Because X
assumes values by random chance, X is also a random variable. Thus X is
a discrete random variable.
(b) The probability P(X) is the probability of obtaining each of these X-values.
Because each number 1 to 6 has an equal chance of occurring, P(X) = 1/6
for all X. Note that ΣP(X) = 1.
Variable X is not normally but uniformly distributed. In the case of the uniform
distribution, P(X) is constant for all values of X. In the case of the normal
distribution, the chart of P(X) versus X is bell shaped. Loosely speaking, this
means that the probability P(X) of realising numbers in the middle range of
X is higher than that of the tail ends.
34
STuDY uNIT 17: Statistical principles
= 21/6
= 3.5
The meaning of the expected value is the average value of a large number of
throws. Because each throw can yield numbers 1 to 6, where the probability of each
number is 1/6, we can expect the average of a large number of throws to be 3.5.
Sum 2.9167
TASK 17.1.2
The outcomes of all possible throw 1 and throw 2 values are displayed in the
table on the right.
ECS3706/1 35
Y = throw1 Outcome of throw 2
+ throw 2
1 2 3 4 5 6
1 2 3 4 5 6 7
Outcome of throw 1
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
(a) List all possible values of Y as well as their frequency (how many times each
occurs). Which value of Y occurs most?
(b) Determine and draw P(Y), the probability density function of Y. Is Y normally
distributed?
(c) Derive μ, the mean value of Y, σ 2, the variance of Y, and σ, the standard
deviation of Y. Explain the meaning of σ.
6 ANSWERS
(a) See the table below for the 11 possible Y i values which fall between 2 and
12. The Y-value of 7 occurs most (it is called the mode).
36
STuDY uNIT 17: Statistical principles
(b) P(Y) is proportional to the frequency of Y and P(Y) = Yfrequency /36. ΣP(Y) = 1,
that is, the area under the P(Y) curve is 1, which of course also applies to
the continuous variable case. The nice thing about P(Y)s is that if you want
to derive the probability of getting numbers say 5 to 9, you simply add their
P(Y)s, which is (4 + 5 + 6 + 5 + 4)/36 = 24/36. The probability density func-
tion is displayed below.
Y is a discrete variable and its probability distribution P(Y) is not bell shaped.
(c) μ = ΣY.P(Y) = 7
TASK 17.1.3
Explain why there is a need for continuous variables. How do we interpret P(X) for
continuous variables? When is a continuous variable normally distributed? What
is a standardised variable?
ECS3706/1 37
7 ANSWER
In real life the outcomes of random variables are often not countable numbers. Often
the values of random variables are rational numbers which may include decimal
fractions. For example, a continuous random variable, u, may assume the value
of -4.7636 (rounded to 4 decimals). In regression analysis, the error term values
typically include rational numbers which fluctuate around an average value of 0.
Continuous random variables, say variable X, allow for rational numbers. Continu-
ous random variables often occur over an interval, say from -20.8 to +30.2. It is
even possible that we do not even specify their minimum or maximum X-values!
For example, it is possible that –∞≤X≤ +∞ where ∞ indicates infinity, as in the
case of the normal distribution.
But how do we deal with their probability density functions P(X)? The P(X)
curve is defined such that the total area under the curve = 1. We cannot speak
of the probability of obtaining a, say, X = 7 value. The probability of P(X = 7)
would be very small. We instead deal with the probability across a range of X-
values, for example 4 ≤ X ≤ 7.
The value of X = 21 occurs most frequently. The sum of the probability of obtaining
values in both tail ends (that is, relative large deviations from the average), say X
≤ 12 plus X ≥ 30 is relatively small.
38
STuDY uNIT 17: Statistical principles
TASK 17.1.4
(a) Explain what is meant by the expected value of a random discrete variable
(X). Its P(X) and X.P(X) are provided in table 17.1.4.
(b) What is the meaning of bias in the case of a sample distribution (X) used to
measure an unknown population parameter μ?
(c) Explain how an expected value is derived in the case of a random continuous
variable Z of which P(Z) is known.
Table 17.1.4
X P(X) X.P(X)
3 1/216 0.0139
4 3/216 0.0556
5 6/216 0.1389
6 10/216 0.2778
7 15/216 0.4861
8 21/216 0.7778
9 25/216 1.0417
10 27/216 1.2500
11 27/216 1.3750
ECS3706/1 39
12 25/216 1.3889
13 21/216 1.2639
14 15/216 0.9722
15 10/216 0.6944
16 6/216 0.4444
17 3/216 0.2361
18 1/216 0.0833
8 ANSWERS
The expected value of a random discrete variable (X), E(X) is its weighted average:
∑X.P(X) = 10.5.
If E(X) = μ = 10.5, then the estimator is unbiased. If, say, E(X) = 12, while μ = 10.5,
then the estimator is biased. Bias occurs when the estimator tends to overestimate
or underestimate the true value.
TASK 17.1.5
Psychologists tell us that the intelligence quotient (IQ) of the population is normally
distributed with average μ = 100 and the standard deviation σ = 15.
(a) Compile a table which indicates which proportion of the population has an IQ
exceeding (or equal to) 100, 110, 120, 130, 140 and 145, respectively. In the
process, also indicate the standardised Z-values. Look up the probabilities
in table B-7 (the normal distribution). Also indicate how many persons of a
population of 10 000 persons fall within each group.
(b) Explain why IQ is normally distributed within a population. Refer to the central
limit theorem.
40
STuDY uNIT 17: Statistical principles
9 ANSWERS
(a)
IQ (X) Number of
persons in
(greater or z = (X – μ)/σ Probability
population
equal to) that Z > z
of 10 000
(b) The central limit theorem states that if Z is a standardised sum of N independ-
ent and identically distributed random variables, then the probability distribu-
tion of Z approaches the normal distribution. See page 552. IQ is normally
distributed because it reflects the cumulative outcome of a large number of
hereditary and environmental factors. See page 554.
17.2 SAMPLING
This section deals with topics on selection bias, survivor bias, nonresponse bias and
the power of random selection.
TASK 17.2.1
Explain:
10 ANSWERS
Sampling is the process of selecting only some units, for example people, organi-
sations) from a total population of interest. For example, we can select a sample
of, say, 50 students from the population of 250 000 Unisa students. The beauty
ECS3706/1 41
of sampling is that the characteristics of the sample quite often accurately reflect
those of the population. Statistical inference refers to the process of estimating
population parameters (mean, total, ratios, et cetera) from the sample estimates,
and of providing suitable measures of their accuracy.
Given that we use sampling, we can expect that the sample estimates of parameters
will fluctuate round their true population parameters. This is called sampling error.
Parameters refer to statistical measures such as the mean or standard deviation.
In econometrics our interest lies mainly with the coefficients of a regression equa-
tion – which may also be called parameters.
17.3 ESTIMATION
This section deals with sampling distributions, the mean of the sampling distribution,
the standard deviation of the sampling distribution, the t-distribution, confidence
intervals and sampling from finite populations.
• the meaning of a sampling distribution, and its expected value and standard
deviation
• the meaning of systematic error (or bias)
• the meaning of the t-distribution
Please refer back to the statement made by Kennedy at the beginning of this study
unit. We sometimes have different estimators which have different sampling distribu-
tions. For example, we will come across the econometric problem of serial correla-
tion (chapter 9) which affects the accuracy of estimates. In this case we then have
the choice of two estimators, normal OLS, and the method of GLS. The choice of
the better estimator then rests upon the characteristics of its sampling distribution.
1
The sample in Studenmund only includes real estate transactions of the past four weeks.
42
STuDY uNIT 17: Statistical principles
TASK 17.3.1
This task addresses the sampling distribution of a sample estimator. In this case,
the sample estimator is X , that is, the average of a sample of X-values drawn
from a population of X-values. The question is how will X match the true popula-
tion average.
!
In study unit 4 we will again deal with the sampling !
distribution. In that case, our
interest lies with the sample distribution of b where b is a sample estimate of a
coefficient of a regression equation. In both cases, however, the principle of a
sample distribution is similar.
11 ANSWERS
The easiest way to explain the meaning of a sampling distribution is to use a
simulation approach. The following steps outline this approach:
(1) The first step is to define precisely what characteristic of the population we
wish to measure. Assume that we wish to determine the population average
(or mean) of variable X of the population.
(2) In this case we need to determine whether the sample average ^ X h , based
on a random sample, is a good estimator of the population average (μ).
The goal of the procedure is to determine how well the sample estimator
X performs.
(3) We create a known population by simply generating, say, 50 000 random
values of X.
(4) We then sample repeatedly from this population by random selection of,
say, samples of 20 observations each.
(5) We calculate the sample mean of each sample ^ X h . We record these es-
timates into a histogram.
(6) The distribution of these estimates defines the sampling distribution of X
Because we know the true mean (μ), we can determine how much the sample
estimates of the mean ^ X h deviate from μ.
Your lecturer has applied these steps in practice. First (step 3) observations (X)
for the population were generated which conform to the normal distribution with
the average μ = 100 and a standard deviation of X of σ = 15. This is easily done
by using a PC and MS Excel.
Then a large number of random samples (each of sample size 20) were selected
(step 4). The sample mean of each sample ^ X h was derived and recorded into a
histogram (step 5). The histogram summarises the frequency of values of different
X obtained from all samples.
With respect to the histogram, the Y-axis measures the relative occurrence of the
values of X . The number on the X-axis represents the upper bound, for example,
100.5 represents values of X falling between 99.5 and 100.5.
ECS3706/1 43
Which conclusions can we make based on this sampling distribution?
(1) The first, possibly unexpected, fact is that the outcomes of random sampling
produce a well-behaved distribution! The sample averages appear to cluster
around the true value being estimated and the distribution is symmetric. The
expected value (weighted average) of the sample means of all samples is
equal (at least very close) to the true value of µ = 100. This implies that the
estimator X is unbiased. Bias in the estimator occurs when the expected
value of X is not equal to µ.
(2) Deviations ( X – µ) do occur, which are both positive and negative. However,
in most cases, these deviations are relatively small. Large deviations do
occur, but the probability of this is relatively low.
X -n
Z= where v is the standard error of X
v N
N
44
STuDY uNIT 3: Learning to use regression analysis
TASK 17.3.2
Explain the meaning of the t-distribution (with respect to sample estimator X ) and
explain which sources of sampling variation it accounts for.
This task provides some background regarding the t-distribution which will again
appear in the next study unit.
12 ANSWER
In task 17.3.1 (4), reference was made to the standardised form of X – µ, that is
X -n
Z= equation A.
v
N
Because we only have sample data, the sample will provide values for X and
N. Both μ and σ are, however, unknown. The first (μ) is not really a problem due
to the nature of hypothesis testing. In the next study unit, you will learn that we
simply replace μ with a fixed value, that is a value of which its “compatibility” with
X is tested. The second, σ = SE(X), remains unknown.
s=
/ ^ Xi - X h2 equation B.
N -1
If we replace σ within equation A with its estimate s in equation B, then
X -n
t= equation C.
s
N
Although Z is normally distributed, t is distributed like the t-distribution. The t-
distribution copes with two sources of variation, that is, X and s, which of course
vary from sample to sample.
In the next study unit the t-value is also used to test the coefficients of a regression
! !
equation for statistical significance. You only have one sample, and this sample
provides only one estimate each of b and SE _ b i . It is derived as
!
b - b0
t=
!
SE ^bh
where β 0 is the H 0 value of the coefficient being tested and b is the sample es-
timate of coefficient β.
ECS3706/1 45