0% found this document useful (0 votes)
41 views

MIT2 854F10 Stats

1. The document provides an overview of statistical inference concepts including probability distributions, sampling, estimation, and hypothesis testing. 2. Key distributions discussed include the binomial, Poisson, normal, t, chi-square, and F distributions and how they relate to sampling and estimating population parameters. 3. Examples are given of how to construct confidence intervals for estimating means and variances using these distributions depending on whether the population variance is known or unknown.

Uploaded by

Tesfaye Tefera
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

MIT2 854F10 Stats

1. The document provides an overview of statistical inference concepts including probability distributions, sampling, estimation, and hypothesis testing. 2. Key distributions discussed include the binomial, Poisson, normal, t, chi-square, and F distributions and how they relate to sampling and estimating population parameters. 3. Examples are given of how to construct confidence intervals for estimating means and variances using these distributions depending on whether the population variance is known or unknown.

Uploaded by

Tesfaye Tefera
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Statistical Inference

Lecturer: Prof. Duane S. Boning

Agenda
1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling
Chi-square, t, and F distributions

3. Estimation: Reasoning about the population based on a sample 4. Some basic confidence intervals
Estimate of mean with variance known Estimate of mean with variance not known Estimate of variance

5. Hypothesis tests

Discrete Distribution: Bernoulli


Bernoulli trial: an experiment with two outcomes

Probability mass function (pmf):


f(x) p 1-p 0 1 x

Discrete Distribution: Binomial


Repeated random Bernoulli trials

n is the number of trials p is the probability of success on any one trial x is the number of successes in n trials

Binomial Distribution
Binomial Distribution
0.25
1 .2 1 0 .8 0 .6 S eries1

0 .4

0 .2

11

13

15

17

19

21

23

25

27

0.2

Probability

0.15

0.1

0.05

11

13

15

17

19

21

23

25

27

Number of "Successes"

29

29

Discrete Distribution: Poisson

Mean: Variance: Example applications:


# misprints on page(s) of a book # transistors which fail on first day of operation

Poisson is a good approximation to Binomial when n is large and p is small (< 0.1)

Poisson Distributions
Poisson Distribution
0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 c

=5

Probability

Poisson Distribution
0.1

0.04

0.09
0.02 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Events per unit

0.08 0.07 0.06 0.05 0.04 0.03 c

=20

Poisson Distribution
0.08

0.07

0.06

=30

Probability

0.02 0.01

0.05

Probability

0
0.04

11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Events per unit

0.03

0.02

0.01

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Events per unit

Continuous Distributions
Uniform Distribution Normal Distribution
Unit (Standard) Normal Distribution

Continuous Distribution: Uniform


probability density function (pdf)

cumulative distribution function* (cdf)


1

also

sometimes called a probability distribution function *also sometimes called a cumulative density function

Standard Questions You Should Be Able To Answer (For a Known cdf or pdf)

Probability x less than or equal to some value

Probability x sits within some range

10

Continuous Distribution: Normal (Gaussian)


pdf

cdf

0.99865 0.977 0.84

0.5

0.00135

0.0227 0

0.16

11

Continuous Distribution: Unit Normal


Normalization

Mean
Variance pdf

cdf

12

Using the Unit Normal pdf and cdf


We often want to talk about percentage points of the distribution portion in the tails
1 0.9

0.5

0.1

13

Philosophy

The field of statistics is about reasoning in the face of uncertainty, based on evidence from observed data
Beliefs:
Distribution or model form Distribution/model parameters

Evidence:
Finite set of observations or data drawn from a population

Models:
Seek to explain data

14

Moments of the Population vs. Sample Statistics


Population
Mean Variance Standard Deviation Covariance Correlation Coefficient
15

Sample

Sampling and Estimation


Sampling: act of making observations from populations Random sampling: when each observation is identically and independently distributed (IID) Statistic: a function of sample data; a value that can be computed from data (contains no unknowns)
average, median, standard deviation

16

SticiGui: Statistics Tools for Internet and Classroom Instruction with a Graphical User Interface
http://stat-www.berkeley.edu/~stark/SticiGui

Sampling Demo

17

Population vs. Sampling Distribution


Population (probability density function)

n = 20
Sample Mean (statistic)

n = 10 n=2

Sample Mean (sampling distribution)

n=1

18

Sampling and Estimation, cont.


Sampling Random sampling Statistic A statistic is a random variable, which itself has a sampling distribution
I.e., if we take multiple random samples, the value for the statistic will be different for each set of samples, but will be governed by the same sampling distribution

If we know the appropriate sampling distribution, we can reason about the population based on the observed value of a statistic
E.g. we calculate a sample mean from a random sample; in what range do we think the actual (population) mean really sits?

19

Sampling and Estimation An Example


Suppose we know that the thickness of a part is normally distributed with std. dev. of 10: We sample n = 50 random parts and compute the mean part thickness: First question: What is distribution of

Second question: can we use knowledge of distribution to reason about the actual (population) mean given observed (sample) mean?
20

Estimation and Confidence Intervals


Point Estimation:
Find best values for parameters of a distribution Should be
Unbiased: expected value of estimate should be true value Minimum variance: should be estimator with smallest variance

Interval Estimation:
Give bounds that contain actual value with a given probability Must know sampling distribution!

Confidence Interval Demo

21

Confidence Intervals: Variance Known


We know , e.g. from historical data Estimate mean in some interval to (1- )100% confidence

Remember the unit normal percentage points Apply to the sampling distribution for the sample mean

1 0.9

0.5

0.1

22

Example, Contd
Second question: can we use knowledge of distribution to reason about the actual (population) mean given observed (sample) mean?

n = 50
95% confidence interval, ~95% of distribution lies within +/- 2 of mean = 0.05

23

Reasoning & Sampling Distributions


Example shows that we need to know our sampling distribution in order to reason about the sample and population parameters Other important sampling distributions:
Student-t Use instead of normal distribution when we dont know actual variation or Chi-square Use when we are asking about variances F Use when we are asking about ratios of variances

24

Sampling: The Chi-Square Distribution

Typical use: find distribution of variance when mean is known Ex:

So if we calculate s2, we can use knowledge of chi-square distribution to put bounds on where we believe the actual (population) variance sits

0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0

10

15

20

25

30

25

Sampling: The Student-t Distribution

Typical use: Find distribution of average when For k ! 1, tk ! N(0,1) Consider xi ~ N( , 2) . Then

is NOT known

This is just the normalized distance from mean (normalized to our estimate of the sample variance)
26

Back to our Example


Suppose we do not know either the variance or the mean in our parts population:

We take our sample of size n = 50, and calculate

Best estimate of population mean and variance (std.dev.)?


If had to pick a range where would be 95% of time?

Have to use the appropriate sampling distribution: In this case the t-distribution (rather than normal distribution)
27

Confidence Intervals: Variance Unknown


Case where we dont know variance a priori Now we have to estimate not only the mean based on our data, but also estimate the variance Our estimate of the mean to some interval with (1- )100% confidence becomes

Note that the t distribution is slightly wider than the normal distribution, so that our confidence interval on the true mean is not as tight as when we know the variance.
28

Example, Contd
Third question: can we use knowledge of distribution to reason about the actual (population) mean given observed (sample) mean even though we werent told ?

n = 50

t distribution is slightly wider than gaussian distribution

95% confidence interval

29

Once More to Our Example


Fourth question: how about a confidence interval on our estimate of the variance of the thickness of our parts, based on our 50 observations?

30

Confidence Intervals: Estimate of Variance

The appropriate sampling distribution is the Chi-square Because 2 is asymmetric, c.i. bounds not symmetric.

31

Example, Contd
Fourth question: for our example (where we observed sT2 = 102.3) with n = 50 samples, what is the 95% confidence interval for the population variance?

0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005

0 0

10 20

30 40

50 60 70

80 90 100

32

Sampling: The F Distribution

Typical use: compare the spread of two populations Example:


x ~ N( x, y ~ N( y, Then from which we sample x1, x2, , xn 2 ) from which we sample y , y , , y y 1 2 m
2 x)

or

33

Concept of the F Distribution


Assume we have a normally distributed population We generate two different random samples from the population In each case, we calculate a sample variance si2 What range will the ratio of these two variances take? ) F distribution Purely by chance (due to sampling) we get a range of ratios even though drawing from same population
Example:

Assume x ~ N(0,1)
Take samples of size n = 20 Calculate s12 and s22 and take ratio

95% confidence interval on ratio

Large range in ratio!

34

Hypothesis Testing
A statistical hypothesis is a statement about the parameters of a probability distribution H0 is the null hypothesis
E.g. Would indicate that the machine is working correctly

H1 is the alternative hypothesis


E.g. Indicates an undesirable change (mean shift) in the machine operation (perhaps a worn tool)

In general, we formulate our hypothesis, generate a random sample, compute a statistic, and then seek to reject H0 or fail to reject (accept) H0 based on probabilities associated with the statistic and level of confidence we select

35

Which Population is Sample x From?


Two error probabilities in decision:
Type I error: false alarm Type II error: miss Power of test (correct alarm)
Consider H0 the normal condition Consider H1 an alarm condition

Control charts are hypothesis tests:

Set decision point (and sample size) based on acceptable , risks

Is my process in control or has a significant change occurred?


36

Summary
1. 2. 3. 4. Review: Probability Distributions & Random Variables Sampling: Key distributions arising in sampling
Chi-square, t, and F distributions

Estimation: Reasoning about the population based on a sample Some basic confidence intervals
Estimate of mean with variance known Estimate of mean with variance not known Estimate of variance

5.

Hypothesis tests

Next Time: 1. Are effects (one or more variables) significant? ) ANOVA (Analysis of Variance) 2. How do we model the effect of some variable(s)? ) Regression modeling
37

MIT OpenCourseWare http://ocw.mit.edu

2.854 / 2.853 Introduction to Manufacturing Systems


Fall 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like