0% found this document useful (0 votes)
2 views

ECS3706 study unit 17_reduced size file

The document outlines the objectives and content of an Econometrics I course at NYU, emphasizing the importance of understanding statistical principles for econometric analysis. It covers fundamental concepts such as probability distributions, random variables, and the significance of statistical estimators. The study unit includes practical tasks to reinforce learning and prepare students for examination material related to these statistical concepts.

Uploaded by

Jabulani Pilime
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ECS3706 study unit 17_reduced size file

The document outlines the objectives and content of an Econometrics I course at NYU, emphasizing the importance of understanding statistical principles for econometric analysis. It covers fundamental concepts such as probability distributions, random variables, and the significance of statistical estimators. The study unit includes practical tasks to reinforce learning and prepare students for examination material related to these statistical concepts.

Uploaded by

Jabulani Pilime
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

STuDY uNIT 17

Statistical principles

ECONOMETRICS IN ACTION

The Department of Economics at New York university (NYu) has evolved into
one of the world’s leading centres for research and teaching in economics.
Professor C flinn of NYu teaches the Econometrics I course. Here are some
of his comments on his course objectives:

• We will begin by reviewing probability and sampling theory. To be a competent


econometrician, one needs to have a solid understanding of basic statistical
theory, some familiarity with data, and a good knowledge of economic theory.
• from my perspective, econometrics is essentially the application of
standard statistical tools to the analysis of conditional relationships between
random variables. What distinguishes econometrics from statistics is the
econometrician’s objective to infer something about behaviour from empirical
relationships between variables.
• In this course, we will attempt to prepare the student for this kind of research
enterprise by carefully covering most or all of the statistical theory [albeit at
a basic level] they will need to do competent applied econometric analysis.

The message is clear. You cannot fully understand econometrics without a


solid grounding in statistics.

STuDY OBJECTIVES
Econometrics makes extensive use of statistical concepts. Some examples:
• We assume that the data used in regression analysis are a random sample drawn
from the population. What exactly is the meaning of “random sample” and of
“population”?
• What are the implications of using sample estimates? The concept of the sampling
distribution of a sample estimator is a fundamental concept you must understand
well.
• Related to the sampling distribution are concepts like unbiased estimators and
minimum variance. What do these mean?

This module requires you to be familiar with statistical concepts. This chapter deals
with the basic statistical concepts required in this regard. This could be particularly
helpful to students who have not previously completed statistics courses. Students
who have previously completed statistics courses may find this chapter a convenient
means to brush up their statistics, and may even learn some new things!

Yes, this study unit is examination material. Within each of the sections below, we
clearly indicate what you must understand.

32

Open Rubric
STuDY uNIT 17: Statistical principles

The approach of this chapter is different to that of other chapters.

• We first provide the headers of sections as discussed in the textbook.


• We then tell you exactly what you are required to know. Remember, the focus is
on understanding the meaning of statistical concepts.
• There may be examination questions on the material you are required to know.
We may, for example, ask you to derive a standard deviation (given some simple
data), to explain its meaning, to explain what is a sample distribution, or to explain
the meaning of expected value.
• The major part of this study unit consists of a number of tasks which are practical
applications of all the major statistical concepts. The tasks are meant to be
learning exercises. They may assist you in better understanding statistical concepts.
Definitely work through them!
• Although some aspects may be explained in a different way than in the textbook,
the textbook remains your prime source.

17.1 PROBABILITY DISTRIBuTIONS


This section covers topics on probability, mean, variance and standard deviation,
continuous random variables, standardised variables and the normal distribution.

We expect you, in the case of discrete random variables, to understand the meaning of
• a random variable (X) and the probability distribution of X which is denoted by
its probability density function P(X)
• the mean (or expected value) of random variable (X)
• the variance and the standard deviation of random variable (X)

In the case of continuous random variables, you must understand


• why continuous variables arise
• the meaning of the probability distribution (the probability density curve)
• the meaning of mean, variance and standard deviation
• the meaning of standardised variables

In the case of the normal distribution, you must


• understand its meaning and how the central limit theorem can give rise to a
normal distribution
• be able to apply the normal distribution in practice

Tasks 17.1.1–17.1.5 deal with the following major statistical concepts:


• probability density function (uniform) of a discrete random variable and its
expected value
• mean, variance and standard deviation of a discrete random variable which is
not uniformly distributed
• continuous random variables and their probability density functions; standardised
random variables
• expected value and bias
• the normal distribution
• the central limit theorem

ECS3706/1 33
TASK 17.1.1
Consider a normal die with numbers 1 to 6 on its sides. Let X measure the outcome
of a throw of the die.

(a) Explain how the concept of a discrete random variable (X) may be applied
to the throw of the die.

(b) Derive the probability density function P(X). Explain whether P(X) is normally
distributed.

(c) Derive E(X), the expected value of X and explain its practical meaning.

(d) Derive the variance of X and the standard error of X.

5 ANSWERS
(a) The variable (X) can assume six possible outcomes when the die is thrown.
The range of possible outcomes of X is (1, 2, 3, 4, 5, 6). Because these are
a countable number of possible values, X is a discrete variable. Because X
assumes values by random chance, X is also a random variable. Thus X is
a discrete random variable.

(b) The probability P(X) is the probability of obtaining each of these X-values.
Because each number 1 to 6 has an equal chance of occurring, P(X) = 1/6
for all X. Note that ΣP(X) = 1.

Variable X is not normally but uniformly distributed. In the case of the uniform
distribution, P(X) is constant for all values of X. In the case of the normal
distribution, the chart of P(X) versus X is bell shaped. Loosely speaking, this
means that the probability P(X) of realising numbers in the middle range of
X is higher than that of the tail ends.

34
STuDY uNIT 17: Statistical principles

(c) The expected value of X is derived as ∑ X . P(X):


= 1.(1/6) + 2.(1/6) + 3.(1/6) + 4.(1/6) + 5.(1/6) + 6.(1/6)
= (1 + 2 + 3 + 4 + 5 + 6).(1/6)

= 21/6
= 3.5

The meaning of the expected value is the average value of a large number of
throws. Because each throw can yield numbers 1 to 6, where the probability of each
number is 1/6, we can expect the average of a large number of throws to be 3.5.

(d) The variance of X is ∑ (X – μ)2.P(X) where μ is the expected value of


X (μ = 3.5).

X P(X) X- μ (X- μ)2 (X- μ)2.P(X)

1 1/6 -2.5 6.25 1.0417

2 1/6 -1.5 2.25 0.3750

3 1/6 -0.5 0.25 0.0417

4 1/6 0.5 0.25 0.0417

5 1/6 1.5 2.25 0.3750

6 1/6 2.5 6.25 1.0417

Sum 2.9167

The variance of X is 2.9167. The standard error of X is 2.91667 = 1.7078.

TASK 17.1.2

This example deals with a nonuniform probability density function


in contrast
to example 17.1.1 which deals with a uniform one.
Consider a normal die with numbers 1 to 6 on its sides. Let Y
measure the sum of two throws of the die. for example, if two
throws realise a 4 and a 2, then Y = 6.

The outcomes of all possible throw 1 and throw 2 values are displayed in the
table on the right.

The possible outcomes of Y range from a minimum of Y = 2 (1 + 1) to a maximum


of Y = 12 (6 + 6). Each of the 6 x 6 possible outcomes has an equal probability
to occur, that is, 1/36. Note that there are more “7” outcomes for Y than, for
example, 5s, simply because more combinations of throws have the sum of 7.

ECS3706/1 35
Y = throw1 Outcome of throw 2
+ throw 2
1 2 3 4 5 6

1 2 3 4 5 6 7

Outcome of throw 1
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

(a) List all possible values of Y as well as their frequency (how many times each
occurs). Which value of Y occurs most?

(b) Determine and draw P(Y), the probability density function of Y. Is Y normally
distributed?

(c) Derive μ, the mean value of Y, σ 2, the variance of Y, and σ, the standard
deviation of Y. Explain the meaning of σ.

6 ANSWERS

(a) See the table below for the 11 possible Y i values which fall between 2 and
12. The Y-value of 7 occurs most (it is called the mode).

Y Fre- P(Y) Y.P(Y) Y (Y (Y – μ)2.


quency –μ – μ)2 P(Y)
(F/36)
(F)

2 1 0.0278 0.0556 -5 25 0.6944

3 2 0.0556 0.1667 -4 16 0.8889

4 3 0.0833 0.3333 -3 9 0.7500

5 4 0.1111 0.5556 -2 4 0.4444

6 5 0.1389 0.8333 -1 1 0.1389

7 6 0.1667 1.1667 0 0 0.0000

8 5 0.1389 1.1111 1 1 0.1389

9 4 0.1111 1.0000 2 4 0.4444

10 3 0.0833 0.8333 3 9 0.7500

36
STuDY uNIT 17: Statistical principles

11 2 0.0556 0.6111 4 16 0.8889

12 1 0.0278 0.3333 5 25 0.6944

Sum 36 1.0000 μ = 0 110 5.8333


7.0000

Frequency: the number of times the specific Y-value occurs.

(b) P(Y) is proportional to the frequency of Y and P(Y) = Yfrequency /36. ΣP(Y) = 1,
that is, the area under the P(Y) curve is 1, which of course also applies to
the continuous variable case. The nice thing about P(Y)s is that if you want
to derive the probability of getting numbers say 5 to 9, you simply add their
P(Y)s, which is (4 + 5 + 6 + 5 + 4)/36 = 24/36. The probability density func-
tion is displayed below.

Is Y normally distributed? Well, not quite! To be normally distributed, Y must


be a continuous variable and its probability distribution P(Y) must be bell
shaped.

Y is a discrete variable and its probability distribution P(Y) is not bell shaped.

(c) μ = ΣY.P(Y) = 7

σ 2 = Σ(Y i – μ)2.P(Y i) = 5.8333

σ = √ σ 2 = 2.4152. σ is a measure of the dispersion or variation of Y. In the


case of a normal distribution, about 2/3 of its values fall within the range μ –
σ to μ + σ. Applied to the case of Y, μ – σ = 4.6 and μ + σ = 9.4. Well, Y is a
discrete variable and not normally distributed either, but let’s approximate it
by the range 5-9. The probability of Y falling within this range is indeed 2/3
(sum its P(Y) values: 1/36(4 + 5 + 6 + 5 + 4) = 24/36 = 2/3).

TASK 17.1.3
Explain why there is a need for continuous variables. How do we interpret P(X) for
continuous variables? When is a continuous variable normally distributed? What
is a standardised variable?

ECS3706/1 37
7 ANSWER
In real life the outcomes of random variables are often not countable numbers. Often
the values of random variables are rational numbers which may include decimal
fractions. For example, a continuous random variable, u, may assume the value
of -4.7636 (rounded to 4 decimals). In regression analysis, the error term values
typically include rational numbers which fluctuate around an average value of 0.

Continuous random variables, say variable X, allow for rational numbers. Continu-
ous random variables often occur over an interval, say from -20.8 to +30.2. It is
even possible that we do not even specify their minimum or maximum X-values!
For example, it is possible that –∞≤X≤ +∞ where ∞ indicates infinity, as in the
case of the normal distribution.

But how do we deal with their probability density functions P(X)? The P(X)
curve is defined such that the total area under the curve = 1. We cannot speak
of the probability of obtaining a, say, X = 7 value. The probability of P(X = 7)
would be very small. We instead deal with the probability across a range of X-
values, for example 4 ≤ X ≤ 7.

An example of a discrete distribution that is approxi-mately normally distributed


is provided on the left. This refers to the case where X is the sum of six throws of
the die. In this case, the minimum value of X = 6 (6 x 1) and the maximum value
of X = 36 (6 x 6).

The value of X = 21 occurs most frequently. The sum of the probability of obtaining
values in both tail ends (that is, relative large deviations from the average), say X
≤ 12 plus X ≥ 30 is relatively small.

38
STuDY uNIT 17: Statistical principles

We use standardised Z-values to look up probabilities of the normal distribution


in which case Z = (X – μ)/σ. The probability of say -1 ≤ Z ≤ 1 is represented by the
area under the P(Z) curve from Z = -1 to +1. See the chart at the left. According
to table B7, the area under the curve for Z ≥ +1 is 0.1587, similarly the area below
Z ≤ -1 is 0.1587. Thus the area below the curve for -1 ≥ Z ≤ 1 is 0.6826. Conse-
quently, the probability that a continuous normally distributed random variable will
fall within the range μ – σ and μ + σ is 68.26%.

TASK 17.1.4
(a) Explain what is meant by the expected value of a random discrete variable
(X). Its P(X) and X.P(X) are provided in table 17.1.4.

(b) What is the meaning of bias in the case of a sample distribution (X) used to
measure an unknown population parameter μ?

(c) Explain how an expected value is derived in the case of a random continuous
variable Z of which P(Z) is known.

Table 17.1.4

X P(X) X.P(X)

3 1/216 0.0139

4 3/216 0.0556

5 6/216 0.1389

6 10/216 0.2778

7 15/216 0.4861

8 21/216 0.7778

9 25/216 1.0417

10 27/216 1.2500

11 27/216 1.3750

ECS3706/1 39
12 25/216 1.3889

13 21/216 1.2639

14 15/216 0.9722

15 10/216 0.6944

16 6/216 0.4444

17 3/216 0.2361

18 1/216 0.0833

Sum 1.000 10.5000

8 ANSWERS
The expected value of a random discrete variable (X), E(X) is its weighted average:

∑X.P(X) = 10.5.

Assume that variable X is the sample estimate of a population parameter μ. Also


assume the sample estimates vary from 3 to 18 and that their P(X) is as in table
17.1.4.

If E(X) = μ = 10.5, then the estimator is unbiased. If, say, E(X) = 12, while μ = 10.5,
then the estimator is biased. Bias occurs when the estimator tends to overestimate
or underestimate the true value.

In the case of a random continuous variable Z:

E(Z) = ∫Z.P(Z).dZ where ∫P(Z).dZ = 1.

TASK 17.1.5
Psychologists tell us that the intelligence quotient (IQ) of the population is normally
distributed with average μ = 100 and the standard deviation σ = 15.

(a) Compile a table which indicates which proportion of the population has an IQ
exceeding (or equal to) 100, 110, 120, 130, 140 and 145, respectively. In the
process, also indicate the standardised Z-values. Look up the probabilities
in table B-7 (the normal distribution). Also indicate how many persons of a
population of 10 000 persons fall within each group.

(b) Explain why IQ is normally distributed within a population. Refer to the central
limit theorem.

40
STuDY uNIT 17: Statistical principles

9 ANSWERS
(a)

IQ (X) Number of
persons in
(greater or z = (X – μ)/σ Probability
population
equal to) that Z > z
of 10 000

100 0.00 0.5000 5 000

110 0.67 0.2514 2 514

120 1.33 0.0918 918

130 2.00 0.0228 228

140 2.67 0.0038 38

145 3.00 0.0013 13

(b) The central limit theorem states that if Z is a standardised sum of N independ-
ent and identically distributed random variables, then the probability distribu-
tion of Z approaches the normal distribution. See page 552. IQ is normally
distributed because it reflects the cumulative outcome of a large number of
hereditary and environmental factors. See page 554.

17.2 SAMPLING
This section deals with topics on selection bias, survivor bias, nonresponse bias and
the power of random selection.

Studenmund provides a good overview of some sample selection methods and of


sampling error. Our interest, however, does not lie with the different sample selection
methods. Within econometrics, sampling is important because we use the concept
of the sampling distribution. Thus, you need to focus only on the following aspects:
• the difference between the population and the sample
• the meaning of sampling error
• the meaning of statistical inference

TASK 17.2.1
Explain:

• What is sampling in general?


• Why do sampling concepts arise in econometrics?

10 ANSWERS
Sampling is the process of selecting only some units, for example people, organi-
sations) from a total population of interest. For example, we can select a sample
of, say, 50 students from the population of 250 000 Unisa students. The beauty

ECS3706/1 41
of sampling is that the characteristics of the sample quite often accurately reflect
those of the population. Statistical inference refers to the process of estimating
population parameters (mean, total, ratios, et cetera) from the sample estimates,
and of providing suitable measures of their accuracy.

In econometrics, we use sample data for estimation. An example is the house


price regression of chapter 1 where the sample of 43 houses 1 is a subset of all
houses sold in Southern California during a given time period. In econometrics,
we make the distinction between the population regression function (PRF) and
the sample regression function (SRF). The PRF refers to the true but unknown
regression equation. The PRF is a theoretical construct. It is not something we
would normally estimate because often not all population values are known, or the
population is impractical to measure. In contrast, the SRF is a practical concept.
The SRF is based on data which we observe. In practice, we estimate the SRF.

Given that we use sampling, we can expect that the sample estimates of parameters
will fluctuate round their true population parameters. This is called sampling error.
Parameters refer to statistical measures such as the mean or standard deviation.
In econometrics our interest lies mainly with the coefficients of a regression equa-
tion – which may also be called parameters.

17.3 ESTIMATION
This section deals with sampling distributions, the mean of the sampling distribution,
the standard deviation of the sampling distribution, the t-distribution, confidence
intervals and sampling from finite populations.

We expect you to understand

• the meaning of a sampling distribution, and its expected value and standard
deviation
• the meaning of systematic error (or bias)
• the meaning of the t-distribution

The importance of the sampling distribution

Please refer back to the statement made by Kennedy at the beginning of this study
unit. We sometimes have different estimators which have different sampling distribu-
tions. For example, we will come across the econometric problem of serial correla-
tion (chapter 9) which affects the accuracy of estimates. In this case we then have
the choice of two estimators, normal OLS, and the method of GLS. The choice of
the better estimator then rests upon the characteristics of its sampling distribution.

To determine the best estimator, we ask four questions:

• Is the estimator unbiased?


• What is the size of its standard error?
• Are the estimates of its standard error unbiased?
• What impact does an increased sample size have on these characteristics?

1
The sample in Studenmund only includes real estate transactions of the past four weeks.

42
STuDY uNIT 17: Statistical principles

TASK 17.3.1
This task addresses the sampling distribution of a sample estimator. In this case,
the sample estimator is X , that is, the average of a sample of X-values drawn
from a population of X-values. The question is how will X match the true popula-
tion average.

!
In study unit 4 we will again deal with the sampling !
distribution. In that case, our
interest lies with the sample distribution of b where b is a sample estimate of a
coefficient of a regression equation. In both cases, however, the principle of a
sample distribution is similar.

Explain the meaning of

• the sampling distribution of X


• the expected value of X , as well as bias
• the standard deviation (also called standard error) of X

11 ANSWERS
The easiest way to explain the meaning of a sampling distribution is to use a
simulation approach. The following steps outline this approach:

(1) The first step is to define precisely what characteristic of the population we
wish to measure. Assume that we wish to determine the population average
(or mean) of variable X of the population.
(2) In this case we need to determine whether the sample average ^ X h , based
on a random sample, is a good estimator of the population average (μ).
The goal of the procedure is to determine how well the sample estimator
X performs.
(3) We create a known population by simply generating, say, 50 000 random
values of X.
(4) We then sample repeatedly from this population by random selection of,
say, samples of 20 observations each.
(5) We calculate the sample mean of each sample ^ X h . We record these es-
timates into a histogram.
(6) The distribution of these estimates defines the sampling distribution of X
Because we know the true mean (μ), we can determine how much the sample
estimates of the mean ^ X h deviate from μ.

Your lecturer has applied these steps in practice. First (step 3) observations (X)
for the population were generated which conform to the normal distribution with
the average μ = 100 and a standard deviation of X of σ = 15. This is easily done
by using a PC and MS Excel.

Then a large number of random samples (each of sample size 20) were selected
(step 4). The sample mean of each sample ^ X h was derived and recorded into a
histogram (step 5). The histogram summarises the frequency of values of different
X obtained from all samples.

With respect to the histogram, the Y-axis measures the relative occurrence of the
values of X . The number on the X-axis represents the upper bound, for example,
100.5 represents values of X falling between 99.5 and 100.5.

ECS3706/1 43
Which conclusions can we make based on this sampling distribution?

(1) The first, possibly unexpected, fact is that the outcomes of random sampling
produce a well-behaved distribution! The sample averages appear to cluster
around the true value being estimated and the distribution is symmetric. The
expected value (weighted average) of the sample means of all samples is
equal (at least very close) to the true value of µ = 100. This implies that the
estimator X is unbiased. Bias in the estimator occurs when the expected
value of X is not equal to µ.
(2) Deviations ( X – µ) do occur, which are both positive and negative. However,
in most cases, these deviations are relatively small. Large deviations do
occur, but the probability of this is relatively low.

(3) To further judge the accuracy of a sample value X we need information


regarding the “width” of the sample distribution of X . for example, in the
histogram above almost all observations of X fall within the range μ – 10 to
μ + 10. The SE ^ X h is such a measure (but different from the value of 10)
where SE = standard error (also called standard deviation).
Statistical theory tells us that the SE ^ X h may be derived as follows:
SE ^ Xh v
SE ^ X h = = = 3.354 where N is the sample size.
N 20
The SE ^ X h measures the “width” of the sample distribution of X . Of
course, the “narrower” the sample distribution of X is, the more accurate
its estimates are.

(4) The distribution of deviations X – µ conforms to the normal distribution.


This allows us to make probability statements regarding the extent of devia-
tions X – µ. It is convenient, however, to write this in its standardised form:

X -n
Z= where v is the standard error of X
v N
N

σ is the SE(X) and N is the sample size.

44
STuDY uNIT 3: Learning to use regression analysis

The advantage of this form is that Z is normally distributed with an average


of 0 and a standard error of 1. Because tables of the normal distribution are
published in this form it is then easy to compare values of Z (obtained from
the sample) to that of z (the values published in the table).

The distribution of random sampling estimates of X is highly predictable. We


may assume that a single random sample estimate will conform to this behaviour.

TASK 17.3.2
Explain the meaning of the t-distribution (with respect to sample estimator X ) and
explain which sources of sampling variation it accounts for.

This task provides some background regarding the t-distribution which will again
appear in the next study unit.

12 ANSWER
In task 17.3.1 (4), reference was made to the standardised form of X – µ, that is
X -n
Z= equation A.
v
N
Because we only have sample data, the sample will provide values for X and
N. Both μ and σ are, however, unknown. The first (μ) is not really a problem due
to the nature of hypothesis testing. In the next study unit, you will learn that we
simply replace μ with a fixed value, that is a value of which its “compatibility” with
X is tested. The second, σ = SE(X), remains unknown.

Because the sample consists of N values of X, σ may in fact be estimated. An


unbiased estimator of σ is s, where

s=
/ ^ Xi - X h2 equation B.
N -1
If we replace σ within equation A with its estimate s in equation B, then
X -n
t= equation C.
s
N
Although Z is normally distributed, t is distributed like the t-distribution. The t-
distribution copes with two sources of variation, that is, X and s, which of course
vary from sample to sample.

In the next study unit the t-value is also used to test the coefficients of a regression

! !
equation for statistical significance. You only have one sample, and this sample
provides only one estimate each of b and SE _ b i . It is derived as
!
b - b0
t=
!
SE ^bh
where β 0 is the H 0 value of the coefficient being tested and b is the sample es-
timate of coefficient β.

ECS3706/1 45

You might also like