0% found this document useful (0 votes)
90 views

Sampling and Estimation

sampling and estimation
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Sampling and Estimation

sampling and estimation
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistical Analysis in Finance

Session 3:
Sampling and Estimation

Dr. Nemanja Radić

www.cranfield.ac.uk/som

Statistical Analysis in Finance

Content :
Sessions 1 & 2: Probability and Probability Distributions
SESSION 3: SAMPLING AND ESTIMATIONS
Session 4: Hypothesis Testing
Session 5: Problem Solving
Sessions 6 & 7: Regression Analysis
Session 8: Regression Models with Dummy Variables
Sessions 9 &10: Problem Solving and Exam Revision
2
Statistical Analysis in Finance

Reading:
Statistical Techniques in Business and Economics
(17/E) by Douglas A. Lind, William G. Marchal and
Samuel A. Wathen 2017. McGraw-Hill. Chapters 8
and 9 .

Intended Learning Outcomes

• Understand simple random sampling, sampling


distribution and sampling error.

• Understand Central Limit Theorem and its


importance.

• Be familiar with techniques of point estimation.

• Be able to estimate confidence intervals for a


variety of data.
4
Sampling

• Population - consists of all members of a specified


group.

• Population Parameter is unknown.

• Sample - a subset of the population.


• Sample Statistic is calculated from sample and
used to make inferences about the population.

Most Commonly Used Probability


Sampling Methods

• Simple Random Sample:


• A sample selected so that each item or person in the population has
the same chance of being included.

• Systematic Random Sampling:

• The items or individuals of the population are arranged in some


order. A random starting point is selected and then every kth
member of the population is selected for the sample.

6
Most Commonly Used Probability Sampling
Methods (cont’d)

• Stratified Random Sampling


• A population is first divided into subgroups, called strata, and a
sample is selected from each stratum. Useful when a population can
be clearly divided in groups based on some characteristics.

• Cluster Sampling
• A population is divided into clusters using naturally occurring
geographic or other boundaries. Then, clusters are randomly
selected and a sample is collected by randomly selecting from each
cluster.

Stratified versus Cluster Sampling

• Stratified Sampling • Cluster Sampling


• Sample consists • Sample consists
of elements from of elements from
each group. the selected
groups.
• Preferred when
the objective is to • Preferred when
increase the objective is to
precision. reduce costs.

8
Selecting Samples in Finance

• Investment analysts commonly work with both time-series and


cross-sectional data.

• No economic basis for how long a time series should be.


• May need to combine data from two different periods, such as
fixed and floating exchange rate regimes.
• As a consequence, we would not be sampling from a population
described by a single set of parameters.

• Whenever we sample cross-sectionally, certain assumptions must


be met if we wish to summarize the data in a meaningful way.

• For example, might choose to summarize company-level data by


industry.
9

Parameter versus Statistics

• Population is described by parameters.


• A parameter is a constant, whose value may be
unknown.
• Only one population.
• Sample is described by statistics.
• A statistic is a random variable whose value depends
on the chosen random sample.
• Statistics are used to make inferences about the
population parameters.
• Can draw multiple random samples of size n.

10
Sampling Error

The sampling error is the difference between a


sample statistic and its corresponding
population parameter.
Examples:
X -µ
s -s
s2 - s 2
p -p
11

Sampling Distribution of the Sample


Mean

• The sampling distribution of the sample mean is a probability


distribution consisting of all possible sample means of a given
sample size selected from a population.

• It is not to be confused with the sample distribution, i.e. the


distribution of values in a sample (notice the - ing in the
ending)

• To get the sampling distribution of a sample mean, we need


to first select all possible samples of the same size from the
population, calculate the mean from each sample, and then
construct the distribution of all the means we calculated.

12
Sampling Distribution of the Sample
Means – Example 1

A firm has seven production employees (considered the population). The


hourly earnings of each employee are given in the table below.

1. What is the population mean?


2. What is the sampling distribution of the sample mean for samples of
size 2?
3. What is the mean of the sampling distribution?
4. What observations can be made about the population and the
sampling distribution?
13

Central Limit Theorem

If all samples of a particular size are selected from any population, the
sampling distribution of the sample mean is approximately a normal
distribution. This approximation improves with larger samples.

• If the population follows a normal probability distribution, then for any


sample size the sampling distribution of the sample mean will also be
normal.

• If the population distribution is symmetrical (but not normal), the normal


shape of the distribution of the sample mean emerge with samples as small
as 10.

• If a distribution that is skewed or has thick tails, it may require samples of


30 or more to observe the normality feature.

• The mean of the sampling distribution ( µ x ) equal to μ and the variance


equal to σ2/n.
19
Sampling Methods and the Central Limit
Theorem

20

Point Estimate

• A point estimate is a single value (point) derived from a


sample and used to estimate a population value.

X ® µ
s ® s
s2 ® s 2
p ® p
21
Confidence Interval (C.I.)

• A CI estimate is a range of values constructed from sample data so


that the population parameter is likely to occur within that range at a
specified probability.
• The specified probability is called the degree of confidence,
symbolised as 1 – α.
• α denotes the probability of error, also known as the level of
significance. This is the allowed probability that the estimation
procedure will generate an interval does not contain the true
parameter.

• If we let α = 5%, we are (1 – α)% [(e.g. 95% )] confident that a single


95% C.I. contains the population mean.
• We are justified in making this statement because we know that
95% of all possible C.I. constructed in the same manner will
contain the population mean.
22

Construction of C.I.

• A (1 – α)% confidence interval for a parameter has the


following structure:

• Point estimate ± Reliability factor x standard error

• Point estimate is value of sample statistic

• Reliability factor = a number based on the assumed


distribution of the point estimate and the degree of confidence
(1 – α) for the C.I.

• The standard error (standard deviation of the sample means)


of the sample statistic providing the point estimate.
23
Factors affecting confidence interval
estimates

The width of a confidence interval are determined by:

1.The sample size, n.

2.The variability in the population, usually σ


estimated by s.

3.The desired level of confidence.

24

Confidence Intervals for a Mean – σ Known

• A (1-α) % confidence for population mean μ when we are


sampling from a normal distribution with known variance
σ2 is given by
s
X± z
n
• We use the following reliability factors when we construct
C.I. Based on standard normal distribution:
• Confidence Intervals (C.Is):
• 90%, a = 0.10, z = 1.65.
• 95%, a = 0.05, z = 1.96.
• 99%, a = 0.01, z = 2.58.
25
C.I. for a Mean – σ Unknown

• If we are sampling from a population with


unknown variance

Then a (1-α) % C.I. for the population mean μ is


given by:

s
X± t
n
where the number of df for t is n-1 and n is the sample size

28

The t-distribution

• It is, like the z distribution, a continuous distribution, defined by a


single parameter known degrees of freedom, df.
• It is, like the z distribution, bell-shaped and symmetrical.
• There is not one t distribution, but rather a family of t distributions. All
t distributions have a mean of 0, but their standard deviations differ
according to the sample size, n.
• The t distribution is more spread out and flatter at the center than
the standard normal distribution As the sample size increases,
however, the t distribution approaches the standard normal distribution

29
Comparing the z and t Distributions
when n is small, 95% Confidence Level

t distribution has a grater spread. the value of t for a given level of


confidence is larger in magnitude. t distribution is flatter or more spread out. 30

Confidence Interval for the Mean


– Example 3

A tyre manufacturer wishes to Given in the problem :


investigate the tread life of its
tyres. A sample of 10 tyres driven n = 10
50,000 miles revealed a sample
mean of 0.32 inch of tread x = 0.32
remaining with a standard
deviation of 0.09 inch. s = 0.09
Construct a 95 percent
confidence interval for the
population mean.
Compute the C.I. using the
Would it be reasonable for the
manufacturer to conclude that t - dist. (since s is unknown)
after 50,000 miles the population
mean amount of tread remaining s
is 0.30 inches? X ± ta ,n -1
n
31
C.I. for a Proportion (π)

To develop a confidence interval for a proportion, we need to meet


the following assumptions.
1. The binomial conditions, discussed in last week, have been met.
Briefly, these conditions are:

a. The sample data is the result of counts.


b. There are only two possible outcomes.
c. The probability of a success remains the same from one trial
to the next.
d. The trials are independent. This means the outcome on one
trial does not affect the outcome on another.
2. The values np and n(1-p) should both be greater than or equal
to 5. This condition allows us to invoke the central limit theorem
and employ the standard normal distribution, that is, z, to complete
a confidence interval.

33

C.I. for a Proportion – σ Known

• A (1-α) % confidence interval of the population


proportion is given by

p (1 - p )
p± z
n
X
where p =
n

34
Selecting an appropriate sample size

There are 3 factors that determine the size of a


sample, none of which has any direct relationship to
the size of the population.

• The level of confidence desired.

• The margin of error the researcher will tolerate.

• The variation in the population being Studied.

36

Sample size for estimating the population


mean

s
E = z
n
2
æ z ×s ö
n=ç ÷
è E ø
Where:
n is the size of the sample.
Z is the standard normal value corresponding to the desired level of
confidence.
! is the population standard deviation.
E is the maximum allowable error.

37
Sample size for estimating a population
proportion

p (1 - p )
E= z 2
n æZö
n = p (1 - p )ç ÷
èEø
where:
n is the size of the sample
z is the standard normal value corresponding to
the desired level of confidence
π is the population proportion
E is the maximum allowable error
39

You might also like