0% found this document useful (0 votes)
3 views

Econ Review Stat W2 Jan2023

The document covers fundamental statistical concepts including probability, random variables, and various probability distributions such as normal, chi-square, t, and F distributions. It discusses descriptive statistics, measures of central tendency, spread, and association, as well as statistical inference techniques like estimation and hypothesis testing. Key properties of estimators and the concepts of confidence intervals and errors in hypothesis testing are also outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Econ Review Stat W2 Jan2023

The document covers fundamental statistical concepts including probability, random variables, and various probability distributions such as normal, chi-square, t, and F distributions. It discusses descriptive statistics, measures of central tendency, spread, and association, as well as statistical inference techniques like estimation and hypothesis testing. Key properties of estimators and the concepts of confidence intervals and errors in hypothesis testing are also outlined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Part 1: BASIC STATISTICAL

CONCEPTS (continued)
1. Probability

2. Random variables

3. Some important theoretical probability distributions

• Descriptive Statistics
• Normal distribution, Student’s t-distribution,
• Chi-square distribution, F distribution

4. Statistical Inference: Estimation

• Point Estimation
• Interval Estimation (Confidence Interval)

5. Statistical Inference: Hypothesis Testing


0. DESCRIPTIVE STATISTICS

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 3


Measures of central tendency

• The average value of a series is its measure of location or measure of


central tendency, capturing its ‘typical’ behaviour
• There are three broad method to calculate the average value of a series: the
mean, median and mode
• The mean is the very familiar sum of all N observations divided by N
• More strictly, this is known as the arithmetic mean
• The mode is the most frequently occurring value in a set of observations
• The median is the middle value in a series when the observations are
arranged in ascending order
• Each of the three methods of calculating an average has advantages and
disadvantages

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 4


The geometric mean

• The geometric mean involves calculating the Nth root of the product of
the N observations, more relevant for growth
• So the geometric mean of six numbers in a series would be obtained by
multiplying them together and taking the sixth root
• In finance, when the numbers in the series can be negative or 0 (like
returns), we can use a slightly different method to calculate the geometric
mean
• Here we add one to each data point, then multiply together, take the Nth
root and then subtract one at the end

• where r1, r2 etc. are the data points that we wish to take the geometric
mean of
• The geometric mean will always be smaller than the arithmetic mean
unless all of the data points are the same.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 5
Measures of spread

• The spread of a series about its mean value can be measured using the
variance or standard deviation (which is the square root of the variance)
• This quantity is an important measure of risk in finance
• The standard deviation scales with the data, whereas the variance scales
with the square of the data. So, for §example, if the units of the data
points are US dollars, the standard deviation will also be measured in
dollars whereas the variance will be in dollars squared
• Other measures of spread include the range (the difference between the
largest and smallest of the data points) and the interquartile range (the
difference between the third and first quartile points in the series)
• The coefficient of variation divides the standard deviation by the sample
mean to obtain a unit-free measure of spread that can be compared across
series with different scales.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 6


Higher moments

• The higher moments of a data sample give further indications of its


features and shape.
• Skewness is the standardised third moment of a distribution and indicates
the extent to which it is asymmetric
• Kurtosis is the standardised fourth moment and measures whether a series
is ‘fat’ or ‘thin’ tailed
• Skewness can be positive or negative while kurtosis can only be positive
• The formulae for skewness and kurtosis calculate the quantities from the
sample data in the same way that the variance is calculated

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7


Plot of a skewed series versus a normal distribution

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 8


Plot of a leptokurtic series versus a normal
distribution

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 9


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 10
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 12
Measures of association

• Covariance is a linear measure of association between two series.


• It is simple to calculate but scales with the standard deviations of the two
series
• The correlation is another measure of association that is calculated by
dividing the covariance between two series by the product of their standard
deviations
• Correlations are unit-free and must lie between (-1,+1)
• The correlation calculated in this way is more specifically known as
Pearson’s correlation measure between continuous variables.
• An alternative measure is known as Spearman’s rank correlation measure,
involving ordinal variables.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13


Some algebra useful for working with means,
variances and covariances

Means
•The mean of a random variable y is also known as its expected value, written
E(y).
•The expected value of a constant is the constant, e.g. E(c) = c
•The expected value of a constant multiplied by a random variable is equal to
the constant multiplied by the expected value of the variable: E(cy)=c E(y). It
can also be stated that E(cy+d)= cE(y)+d, where d is also a constant.
•For two independent random variables, y1 and y2, E(y1y2) = E(y1) E(y2)

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 14


Some algebra useful for working with means,
variances and covariances 2

Variances
•The variance of a random variable y is usually written var(y).
•The variance of a random variable y is given by var(y) = E[y − E(y)]2. The
variance of a constant is zero: var(c) = 0
•For c and d constants, var(cy + d) = c2var(y)
•For two independent random variables, y1 and y2, var(cy1 + dy2) = c2var(y1) +
d2var(y2)
Covariances
•The covariance between two random variables, y1 and y2 may be expressed as
cov(y1, y2)
•cov(y1, y2) = E[(y1 − E(y1))(y2 − E(y2))]
•For two independent random variables, y1 and y2, cov(y1, y2) = 0
•For four constants, c, d, e, and f, cov(c+dy1, e+fy2)=dfcov(y1, y2).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15
4-16

1. Normal Distribution

Normal Probability Density Function:


1  x− 
2

1 −  
2  
f ( x) = e for −   x  
 2
N o rm a l D is trib u tio n :  = 0 ,  = 1

0.4

0.3

0.2
f(x)

0.1

0.0

-5 0 5
x
4-17

The Standard Normal Distribution


X −
Z=

The standard normal random variable, Z, is the normal random


variable with mean  = 0 and standard deviation  = 1: Z~N(0,12).

Standard Normal Distribution

0 .4

0 .3
f(z)

0 .2

0 .1

0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5
=0
Z
Properties of normal distribution

◼ Symmetric about the mean


◼ P(a<x<b)= area of the region between the density function,
horizontal axis and vertical lines x=a, x=b
◼ Sum of independent normal R.V. is normally distributed

◼ Central Limit Theorem: sample mean is normally distributed as


sample size increases to ∞
◼ Skewness=0, kurtosis=3
The Central Limit Theorem
Let S n = X 1 + ... + X n be the sum of
independent random variables with the same
distribution. Then for large n (n>30), the
distribution of S n is approximately normal
with mean

E ( S n ) = n and SD ( S n ) =  n ,
where  = E ( X i ) and  = SD( X i )
6-20

2. The Chi-Square (2) Distribution


k
Z = Z ~  i
2 2
k
i =1

✓The chi-square distribution is the probability


distribution of the sum of k independent, squared
standard normal random variables.

✓The mean of the chi-square distribution is equal to the


degree of freedom parameter, E[2] = k. The variance of
a chi-square is equal to twice the number of degree of
freedom, Var[2] = 2k.
6-21

⚫ The chi-square random variable


cannot be negative, so it is
C hi-S q uare D istrib utio n: d f=1 0 , d f=3 0 , df =5 0
bounded by zero on the left.
0 .1 0

⚫ The chi-square distribution is 0 .0 9


0 .0 8
df = 10

skewed to the right. 0 .0 7


0 .0 6

f( )
df = 30

2
0 .0 5

⚫ The chi-square distribution 0 .0 4


0 .0 3 df = 50

approaches a normal 0 .0 2
0 .0 1
0 .0 0
distribution as the degree of 0 50 100

2
freedom increases.
⚫ Sum of independent chi-square
RV is also a chi-square RV.
3. The t Distribution (Student’)
Z1
t= ~ tk
 /k
2
k

• The t distribution is a family of bell-


shaped and symmetric distributions. Standard normal
• The expected value of t distribution is
0. t, df = 20
• For df > 2, the variance of t t, df = 10
distribution is df/(df-2).
• The t distribution is flatter and has
fatter tails than standard normal.
• The t distribution approaches a
standard normal as the number of

degree of freedom increases 
4. The F Distribution

The F distribution is the distribution of the ratio of two chi-


square random variables that are independent of each other,
each of which is divided by its own degree of freedom.

An F random variable with k1 and k2 degrees of freedom:

 k1
2
F( k1 , ) = 1
k2
 k2
2
2
The F Distribution
F Distributions with different Degrees of Freedom

1.0 F(25,30)

f(F)
F(10,15)

0.5

F(5,6)
0.0

0 1 2 3 4 5
F

• The F random variable cannot be negative, so it is bounded by


zero on the left.
• The F distribution is skewed to the right.
• The F distribution is identified by the number of degrees of
freedom in the numerator, k1, and the number of degrees of
freedom in the denominator, k2.
5. Estimators and Properties

An estimator of a population parameter is a sample statistic used to


estimate the parameter. The most commonly-used estimator of the:
Population Parameter Sample Statistic
Mean () is the Mean (X)
Variance (2) is the Variance (s2)
Standard Deviation () is the Standard Deviation (s)
Proportion (p) is the Proportion ( p̂ )

• Desirable properties of estimators include:


✓Unbiasedness
✓Efficiency
✓Consistency
Unbiased and Biased Estimators

{
Bias

An unbiased estimator is on A biased estimator is


target on average. off target on average.
Efficiency

An estimator is efficient if it has a relatively small variance (and


standard deviation).

An efficient estimator is, An inefficient estimator is, on


on average, closer to the average, farther from the
parameter being estimated. parameter being estimated.
Consistency

An estimator is said to be consistent if its probability of being close


to the parameter it estimates increases as the sample size increases.

Consistency

n = 10 n = 100
6. Confidence Interval (Interval
Estimate)
2 types of estimators:

• Point Estimate
✓A single-valued estimate.
✓Conveys little information about the actual value of the
population parameter, about the accuracy of the estimate.

• Interval Estimate (or Confidence Interval)


✓An interval or range of values believed to include the
unknown population parameter.
✓Associated with the interval is a measure of the
confidence we have that the interval does indeed contain
the parameter of interest.
Confidence Interval for the mean 
by sample mean
2
According to the Central Limit Theorem, X ~ N ( , )
n
S t and ard No rm al Di strib utio n: 9 5 % I nt e rv al

0.4

0.3

f( z)
0.2

0.1

0.0

-4 -3 -2 -1 0 1 2 3 4

z
7. Statistical Hypothesis Testing

• A null hypothesis, denoted by H0, is an assertion about one or


more population parameters. This is the assertion we hold to
be true until we have sufficient statistical evidence to conclude
otherwise.

✓H0:  = 100
• The alternative hypothesis, denoted by H1, is the assertion of
all situations not covered by the null hypothesis.

✓H1:   100
The Null Hypothesis, H0

• The null hypothesis:


✓Often represents an existing belief.
✓Is maintained to be true, until a test leads to its rejection
in favor of the alternative hypothesis.
✓Is accepted as true, or rejected as false on the basis of a
consideration of a test statistic.
Concepts of Hypothesis Testing

• A test statistic is a sample statistic computed from sample


data. The value of the test statistic is used in determining
whether or not we may reject the null hypothesis.
• The decision rule of a statistical hypothesis test is a rule
that specifies the conditions under which the null hypothesis
may be rejected.

Consider H0:  = 100. We may have a decision rule that says: “Reject
H0 if the sample mean is less than 95 or more than 105.”

In a courtroom we may say: “The accused is innocent until proven


guilty beyond a reasonable doubt.”
Decision Making

• There are two possible states of nature:


✓H0 is true
✓H0 is false
• There are two possible decisions:
✓Fail to reject H0 as true
✓Reject H0 as false
Type I and Type II Errors

A contingency table illustrates the possible outcomes


of a statistical hypothesis test.
1-Tailed and 2-Tailed Tests

The tails of a statistical test are determined by the need for an action. If action
is to be taken if a parameter is greater than some value a, then the alternative
hypothesis is that the parameter is greater than a, and the test is a right-tailed
test. H0:   50
H1:   50

If action is to be taken if a parameter is less than some value a, then the


alternative hypothesis is that the parameter is less than a, and the test is a left-
tailed test. H0:   50
H1:   50

If action is to be taken if a parameter is either greater than or less than some


value a, then the alternative hypothesis is that the parameter is not equal to a,
and the test is a two-tailed test. H0:  = 50
H1:   50
7-40

Rejection Region

• The rejection region of a statistical hypothesis test is the range of


numbers that will lead us to reject the null hypothesis in case the
test statistic falls within this range.
• The rejection region, also called the critical region, is defined by
the critical points.
• The rejection region is defined so that, before the sampling takes
place, our test statistic will have a probability  of falling within
the rejection region if the null hypothesis is true.
Five-Step Procedure for
Hypothesis Testing: 1st approach

• Step 1: State the null hypothesis H0.

• Step 2: State the alternative hypothesis H1.

• Step 3: Compute the test statistic (T.S) value.

• Step 4: Determine the rejection region for a


given level of significance α.

• Step 5: Conclusion.
Five-Step Procedure for
Hypothesis Testing: 2nd approach

• Using confidence interval (for 2 tailed-tests)


7-43

Testing Population Means

• Cases in which the test statistic is Z (when using σ)

The formula for calculating Z is :


x−
z=
 
 
 n
7-44

Testing Population Means

• Cases in which the test statistic is t (when using s)

The formula for calculating t is :


x−
t=
 s 
 
 n
7-45

The p-Value

The p-value is the smallest level of significance , at which the


null hypothesis may be rejected using the obtained value of the
test statistic.

The p-value is the area of the rejection region when the Test
Statistics is equal to the Critical Value.

RULE: When the p-value is less than  , reject H0.


Five-Step procedure for Hypothesis
Testing: 3rd approach using p-value

• Step 1: State the null hypothesis H0.

• Step 2: State the alternative hypothesis H1.

• Step 3: Compute the test statistic (T.S) value.

• Step 4: Get the p-value corresponding to TS


(with software).

• Step 5: Compare the p-value and the level of


significance α.

• If p-value<α we reject H0.


• Else we don’t reject H0.

You might also like