MIT2 854F10 Stats
MIT2 854F10 Stats
Agenda
1. Review: Probability Distributions & Random Variables 2. Sampling: Key distributions arising in sampling
Chi-square, t, and F distributions
3. Estimation: Reasoning about the population based on a sample 4. Some basic confidence intervals
Estimate of mean with variance known Estimate of mean with variance not known Estimate of variance
5. Hypothesis tests
n is the number of trials p is the probability of success on any one trial x is the number of successes in n trials
Binomial Distribution
Binomial Distribution
0.25
1 .2 1 0 .8 0 .6 S eries1
0 .4
0 .2
11
13
15
17
19
21
23
25
27
0.2
Probability
0.15
0.1
0.05
11
13
15
17
19
21
23
25
27
Number of "Successes"
29
29
Poisson is a good approximation to Binomial when n is large and p is small (< 0.1)
Poisson Distributions
Poisson Distribution
0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 c
=5
Probability
Poisson Distribution
0.1
0.04
0.09
0.02 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Events per unit
=20
Poisson Distribution
0.08
0.07
0.06
=30
Probability
0.02 0.01
0.05
Probability
0
0.04
0.03
0.02
0.01
Continuous Distributions
Uniform Distribution Normal Distribution
Unit (Standard) Normal Distribution
also
sometimes called a probability distribution function *also sometimes called a cumulative density function
Standard Questions You Should Be Able To Answer (For a Known cdf or pdf)
10
cdf
0.5
0.00135
0.0227 0
0.16
11
Mean
Variance pdf
cdf
12
0.5
0.1
13
Philosophy
The field of statistics is about reasoning in the face of uncertainty, based on evidence from observed data
Beliefs:
Distribution or model form Distribution/model parameters
Evidence:
Finite set of observations or data drawn from a population
Models:
Seek to explain data
14
Sample
16
SticiGui: Statistics Tools for Internet and Classroom Instruction with a Graphical User Interface
http://stat-www.berkeley.edu/~stark/SticiGui
Sampling Demo
17
n = 20
Sample Mean (statistic)
n = 10 n=2
n=1
18
If we know the appropriate sampling distribution, we can reason about the population based on the observed value of a statistic
E.g. we calculate a sample mean from a random sample; in what range do we think the actual (population) mean really sits?
19
Second question: can we use knowledge of distribution to reason about the actual (population) mean given observed (sample) mean?
20
Interval Estimation:
Give bounds that contain actual value with a given probability Must know sampling distribution!
21
Remember the unit normal percentage points Apply to the sampling distribution for the sample mean
1 0.9
0.5
0.1
22
Example, Contd
Second question: can we use knowledge of distribution to reason about the actual (population) mean given observed (sample) mean?
n = 50
95% confidence interval, ~95% of distribution lies within +/- 2 of mean = 0.05
23
24
So if we calculate s2, we can use knowledge of chi-square distribution to put bounds on where we believe the actual (population) variance sits
0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
10
15
20
25
30
25
Typical use: Find distribution of average when For k ! 1, tk ! N(0,1) Consider xi ~ N( , 2) . Then
is NOT known
This is just the normalized distance from mean (normalized to our estimate of the sample variance)
26
Have to use the appropriate sampling distribution: In this case the t-distribution (rather than normal distribution)
27
Note that the t distribution is slightly wider than the normal distribution, so that our confidence interval on the true mean is not as tight as when we know the variance.
28
Example, Contd
Third question: can we use knowledge of distribution to reason about the actual (population) mean given observed (sample) mean even though we werent told ?
n = 50
29
30
The appropriate sampling distribution is the Chi-square Because 2 is asymmetric, c.i. bounds not symmetric.
31
Example, Contd
Fourth question: for our example (where we observed sT2 = 102.3) with n = 50 samples, what is the 95% confidence interval for the population variance?
0 0
10 20
30 40
50 60 70
80 90 100
32
or
33
Assume x ~ N(0,1)
Take samples of size n = 20 Calculate s12 and s22 and take ratio
34
Hypothesis Testing
A statistical hypothesis is a statement about the parameters of a probability distribution H0 is the null hypothesis
E.g. Would indicate that the machine is working correctly
In general, we formulate our hypothesis, generate a random sample, compute a statistic, and then seek to reject H0 or fail to reject (accept) H0 based on probabilities associated with the statistic and level of confidence we select
35
Summary
1. 2. 3. 4. Review: Probability Distributions & Random Variables Sampling: Key distributions arising in sampling
Chi-square, t, and F distributions
Estimation: Reasoning about the population based on a sample Some basic confidence intervals
Estimate of mean with variance known Estimate of mean with variance not known Estimate of variance
5.
Hypothesis tests
Next Time: 1. Are effects (one or more variables) significant? ) ANOVA (Analysis of Variance) 2. How do we model the effect of some variable(s)? ) Regression modeling
37
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.