0% found this document useful (0 votes)

93 views21 pages

08 Test of Significance

The document discusses hypothesis testing and setting up statistical tests. It provides examples of null and alternative hypotheses for situations like coin tossing, drug testing, and distinguishing between Coke and Pepsi. It explains how to calculate test statistics like the z-statistic and p-values to evaluate the evidence against the null hypothesis. The smaller the p-value, the stronger the evidence against the null. It also discusses when to use a t-test instead of a z-test, and how to interpret statistical significance and effect sizes.

Uploaded by

admirodebrito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views21 pages

08 Test of Significance

Uploaded by

admirodebrito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

The logic behind testing hypotheses

We toss a coin 10 times and get 7 tails. Is this sufficient evidence to conclude that the
coin is biased?
The null hypothesis, H0 , states that "nothing extraordinary is going on". So in this
case
H0 : P(T) = 12
The alternative hypothesis, HA , states that there is a different chance process that
generates the data. Here we can take
HA : P(T) 6= 12
Hypothesis testing proceeds by collecting data and evaluating whether the data are
compatible with H0 or not (in which case one rejects H0 ).
The logic behind testing hypotheses

A different example: A company develops a new drug to lower blood pressure. It tests
it with an experiment involving 1,000 patients.
In this case "nothing extraordinary going on" means that the drug has no effect. So
H0 : no change in blood pressure HA : blood pressure drops
Note that in this case the company would like to reject H0 !
So the logic of testing is typically indirect: One assumes that nothing extraordinary is
happening and then hopes to reject this assumption H0 .
Setting up a test statistic
A test statistic measures how far away the data are from what we would expect if H0
were true.
The most common test statistic is the z-statistic:
observed − expected
z =
SE

‘Observed’ is a statistic that is appropriate for assessing H0 . In the example of the 10

coin tosses, appropriate statistics would be the number of tails or the percent of tails.
‘Expected’ and SE are the expected value and the SE of this statistic, computed under
the assumption that H0 is true.
In the example: Using the formulas for the sum of 0/1 labels we get
1
√ q1 1
‘expected’ = 10 × 2 = 5 and SE = 10 2 × 2 = 1.58. So
7−5
z = = 1.27
1.58
p-values measure the evidence agains H0
Large values of |z| are evidence against H0 : The larger |z| is, the stronger the evidence.
The strength of the evidence is measured by the
p-value (or: observed significance level):
The p-value is the probability of getting a value of z as extreme or more extreme than
the observed z, assuming H0 is true.
But if H0 is true, then z follows that standard normal curve, according to the central
limit theorem, so the p-value can be computed with normal approximation:

The smaller the p-value, the stronger the evidence against H0 . Often the criterion for
rejecting H0 is a p-value smaller than 5%. Then the result is called statistically
significant.
p-values measure the evidence against H0

In the example:

Note that the p-value does not give the probability that H0 is true, as H0 is either true
or not - there are no chances involved. Rather, it gives the probability of seeing a
statistic as extreme, or more extreme, that the observed one, assuming H0 is true.
Distinguishing Coke and Pepsi by taste
It has been said that it is difficult to distinguish Coke and Pepsi by taste alone, without
the visual cue of the bottle or can.
In an experiment that I did in a class at Stanford, 10 cups were filled at random with
either Coke or Pepsi. A student volunteer tasted each of the 10 cups and correctly
named the conents of seven. Is this sufficient evidence to conclude that the student can
tell apart Coke and Pepsi?
"Nothing extraordinary is going on" means that the student does not have any special
ability to tell them apart and is just guessing.
To write this down formally we introduce 0/1 labels since we are counting correct
answers: 1 = correct answer, 0 = wrong answer
1 1
H0 : P(0) = P(1) = 2 HA : P(1) > 2
This is a one-sided test: the alternative hypothesis for for P(1) we are interested in is
on one side of 21 .
Distinguishing Coke and Pepsi by taste
Since we are looking at the sum of ten 0/1 labels, the z-statistic is the same that we
had for coin-tossing:

observed sum − expected sum 7−5

z = = = 1.27
SE of sum 1.58
But since we do a one-sided test instead of a two-sided test, the p-value is only half as
large:

Since 10.2% is not smaller than 5%, we don’t reject H0 : We are not convinced that the
student can distinguish Coke and Pepsi.
Distinguishing Coke and Pepsi

A two-sided alternative might also be appropriate:

1
HA : P(1) 6= 2
HA corresponds to a student who is more likely than not to distinguish Coke and Pepsi,
but who may confuse them. Such a student might get one correct answer (say).
One has to carefully consider whether the alternative should be one-sided or two-sided,
as the p-value gets doubled in the latter case.
It is not ok to change the alternative afterwards in order to get the p-value below 5%.
The t-test
The health guideline for lead in drinking water is a concentration of not more than
15 parts per billion (ppb).
Five independent samples from a reservoir average 15.6 ppb. Is this sufficient evidence
to conclude that the concentration µ in the reservoir is above the standard of 15 ppb?
Recall our model for measurements:
measurement = µ + measurement error
So it may be that the concentration µ is below 15 ppb, but measurement error results
in an average of 15.6 ppb.
H0 : µ = 15 ppb HA : µ > 15 ppb
We can try a z-test for the average of the measurements:

observed average − expected average 15.6 ppb − 15 ppb

z = =
SE of average SE of average
since the measurement error has expected value zero.
The t-test
SE of average = √σ , but the standard deviation σ of the measurement error is
n
unknown.
We can estimate σ by s, the sample standard deviation of the measurements. However:
If we estimate σ and n is small (n ≤ 20), then the normal curve is not a good enough
approximation to the distribution of the z-statistic. Rather, an appropriate
approximation is Student’s t-distribution with n − 1 degrees of freedom:
The t-test

The q
fatter tails account for the additional uncertainty introduced by estimating σ by
1 Pn 2
s = n−1 i=1 (xi − x̄) .

Using the t-test in place of the z-test is only necessary for small samples: n ≤ 20 (say).
In that case it is also better to replace the confidence interval x̄ ± z SE by

x̄ ± tn−1 SE
More on testing

I Statistically significant does not mean that the effect size is important:
Suppose the sample average shows a lead concentration that is only slightly above
the health standard of 15 ppb: say the sample average is 15.05 ppb.
That may not be of practical concern, even though the test may be highly
signficant: Statistical significance convinces us that there is an effect, but it
doesn’t say how big the effect is.
Reason: A large sample size n makes SE = √σn small, so even a small exceedance
over the limit by (say) 0.05 ppb may give a statistically significant result.
Therefore it is helpful to complement a test with a confidence interval: In the
above case a 95% confidence interval for µ might be [15.02 ppb, 15.08 ppb].
More on testing

I There is a general connection between confidence intervals and tests:

A 95% confidence interval contains all values for the null hypothesis that will not
be rejected by a two-sided test at a 5% significance level.
(A 5% significance level means that the threshold for the p-value is 5%).
I There are two ways that a test can result in a wrong decision:
H0 is true, but was erroneously rejected → Type I error (‘false positive’)
H0 is false, but we fail to reject it → Type II error
Rejecting H0 if the p-value is smaller than 5% means P(type I error) ≤ 5%
The two-sample z-test
Last month, the President’s approval rating in a sample of 1,000 likely voters was 55%.
This month, a poll of 1,500 likely voters resulted in a rating of 58%. Is this sufficient
evidence to conclude that the rating has changed?
We want to assess whether
p1 =proportion of all likely voters approving last month
is equal to
p2 =proportion of all likely voters approving this month
"nothing unusual is going on" means p1 = p2 . It’s common to look at the difference
p2 − p1 instead:
H0 : p2 − p1 = 0 H1 : p2 − p1 6= 0
p1 is estimated by p̂1 = 55%, p2 by p̂2 = 58%. The central limit theorem applies to the
difference p̂2 − p̂1 just as it does to p̂1 and p̂2 . So we can use a z-test:
The two-sample z-test
We can use a z-test for the difference p̂2 − p̂1 :

observed difference − expected difference (p̂2 − p̂1 ) − (p2 − p1 )

z = =
SE of difference SE of difference

An important fact is that if p̂1 and p̂2 are independent, then

p
SE(p̂2 − p̂1 ) = (SE(p̂1 ))2 + (SE(p̂2 ))2 . So
(p̂2 − p̂1 ) − 0 0.03
z = rq = = 1.48
2 q 2 0.0202
p1 (1−p1 )
1000 + p2 (1−p
1500
2)
The two-sample z-test

The confidence interval for p2 − p1 is

(p̂2 − p̂1 ) ± z SE(p̂2 − p̂1 ) = [−1%, 7%] when z = 2

We can improve the estimate of SE(p̂2 − p̂1 ) somewhat by using the fact that p1 = p2
on H0 . Since there is a common proportion we can estimate it by pooling the samples:
0.55 × 1000 = 550 voters approve in the first sample, 870 in the second, so in total
there are 1420 approvals out of 2500. So the pooled estimate of p1 = p2 is
1420
2500 = 56.8%.
q
So we estimate SE(p̂2 − p̂1 ) by 0.568(1−0.568)
1000 + 0.568(1−0.568)
1500 = 0.02022, which
essentially gives the same answer in this case.
The two-sample z-test

The two-sample z-test is applicable in the same way to the difference of two sample
means in order to test for equality of two population means.
If the two samples are independent, then again
p
SE(x̄2 − x̄1 ) = (SE(x̄1 ))2 + (SE(x̄2 ))2
and SE(x̄1 ) = √σ1 is estimated by √s1 .
n1 n1

If the sample sizes n1 , n2 are not large, then the p-value needs to computed from the
t-distribution.
The pooled standard deviation

If one has reason to assume that σ1 = σ2 (or if this has been checked), then one may
use the pooled estimate for σ1 = σ2 given by

(n1 − 1)s21 + (n2 − 1)s22

s2pooled =
n1 + n2 − 2

However, the advantages of using s2 are small and the analysis rests on the
pooled
assumption that σ1 = σ2 . For these reasons the pooled t-test is usually avoided.
All of the above two-sample tests require that the two samples are independent. They
are also applicable in special situations where the samples are dependent, e.g. to
compare the treatment effect when subjects are randomized into treatment and control
groups.
The paired-difference test

Do husbands tend to be older than their wives?

The ages of five couples:
Husband’s age Wife’s age age difference
43 41 2
71 70 1
32 31 1
68 66 2
27 26 1
The two-sample t-test is not applicable since the two samples are not independent.
Even if they were independent, the small differences in ages would not be significant
since the standard deviations are large for husbands and also for the wives.
The paired-difference test
Since we have paired data, we can simply analyze the differences obtained from each
pair with a regular t-test, which in this context of matched pairs is called
paired t-test:
H0 : population difference is zero
¯
d−0
t = , where di is the age difference of the ith couple.
SE(d)
¯

¯ =
SE(d) σd
√ . Estimate σd by sd = 0.55. Then t = 1.4−0
√ = 5.69
n 0.55/ 5

The independence assumption is in the sampling of the couples.

The sign test
What if didn’t know the age difference di but only if the husband was older or not?
We can test
H0 : half the husbands in the population are older than their wives
using 0/1 labels and a z-test, just as we tested whether a coin is fair:
sum of 1s− n2 5− 5 1
z = = √ 21 = 2.24 since σ = on H0 .
SE of sum 52 2

The p-value of this sign-test is less significant than that of the paired t-test. This is
because the latter uses more information, namely the size of the differences. On the
other hand, the sign test has the virtue of easy interpretation due to the analogy to coin
tossing.

Construction Inspection Manual 8th Edition
100% (7)
Construction Inspection Manual 8th Edition
385 pages
Dibels 8 Benchmark 1 Student 2020-1
No ratings yet
Dibels 8 Benchmark 1 Student 2020-1
16 pages
VSP - Cloud Provider 2020
No ratings yet
VSP - Cloud Provider 2020
2 pages
T Test and Z Test
100% (1)
T Test and Z Test
47 pages
Nihongo Sou Matome n4 PDF 161
No ratings yet
Nihongo Sou Matome n4 PDF 161
4 pages
Hypothesis Test
No ratings yet
Hypothesis Test
35 pages
7.02 Test About Proportion
No ratings yet
7.02 Test About Proportion
2 pages
14-UnknownMeans
No ratings yet
14-UnknownMeans
43 pages
1 Vocab Reasoning
No ratings yet
1 Vocab Reasoning
3 pages
HYPOTHESES
No ratings yet
HYPOTHESES
32 pages
Textbook
No ratings yet
Textbook
11 pages
Basics of Hypothesis Testing
No ratings yet
Basics of Hypothesis Testing
36 pages
22 Hypothesis 2
No ratings yet
22 Hypothesis 2
36 pages
Basics of Hypothesis Testing
No ratings yet
Basics of Hypothesis Testing
36 pages
Basics of Hypothesis Testing
No ratings yet
Basics of Hypothesis Testing
36 pages
1. Testing
No ratings yet
1. Testing
29 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
36 pages
Lecture 2 1
No ratings yet
Lecture 2 1
27 pages
Gerstman PP09
No ratings yet
Gerstman PP09
36 pages
Basics of Hypothesis Testing
No ratings yet
Basics of Hypothesis Testing
42 pages
Hypothesis Testting3
No ratings yet
Hypothesis Testting3
7 pages
Chapter 5
No ratings yet
Chapter 5
65 pages
Fundamentals of Data Analytics FDA Module 3 Reading Material
No ratings yet
Fundamentals of Data Analytics FDA Module 3 Reading Material
23 pages
Statistical Inference: (Analytic Statistics) Lec 10
No ratings yet
Statistical Inference: (Analytic Statistics) Lec 10
42 pages
Basics of Hypothesis Testing: October 19
No ratings yet
Basics of Hypothesis Testing: October 19
36 pages
Hypothesis Testing
100% (1)
Hypothesis Testing
36 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
41 pages
2024-Lecture 08
No ratings yet
2024-Lecture 08
31 pages
Z Test T Test For Students
No ratings yet
Z Test T Test For Students
6 pages
Chapter2 handout
No ratings yet
Chapter2 handout
34 pages
ch9
No ratings yet
ch9
36 pages
Lecture08 Hypothesis Testing Inf Stats FA24
No ratings yet
Lecture08 Hypothesis Testing Inf Stats FA24
29 pages
5 Largesampletest
No ratings yet
5 Largesampletest
41 pages
FYM - DOE - Lecture #4 PDF
No ratings yet
FYM - DOE - Lecture #4 PDF
35 pages
STA301_LEC39
No ratings yet
STA301_LEC39
73 pages
One Sample T test part 1
No ratings yet
One Sample T test part 1
20 pages
Z Teast
No ratings yet
Z Teast
32 pages
Hypothesis Test
No ratings yet
Hypothesis Test
20 pages
BIOSTATS MIDTERMS
No ratings yet
BIOSTATS MIDTERMS
4 pages
Statistical Inferences
No ratings yet
Statistical Inferences
46 pages
Tests of Significance Notes PDF
No ratings yet
Tests of Significance Notes PDF
12 pages
Hypothesis Testing For One Population Parameter - Samples
100% (1)
Hypothesis Testing For One Population Parameter - Samples
68 pages
8 Hypo Z Test Mean With Exercises No Voice.9188.1586461026.1918
No ratings yet
8 Hypo Z Test Mean With Exercises No Voice.9188.1586461026.1918
40 pages
MFGE 341 Quality Science Statistics
No ratings yet
MFGE 341 Quality Science Statistics
23 pages
Testing Hypotheses About Proportions
No ratings yet
Testing Hypotheses About Proportions
26 pages
Week4 Modified
No ratings yet
Week4 Modified
28 pages
Hypothesis Testing- Z Test
No ratings yet
Hypothesis Testing- Z Test
19 pages
Navidi_ch6 (1)
No ratings yet
Navidi_ch6 (1)
82 pages
Hypothesis Testing Ug
No ratings yet
Hypothesis Testing Ug
66 pages
Things To Know PDF
No ratings yet
Things To Know PDF
56 pages
Learning Module - Statistics and Probability
No ratings yet
Learning Module - Statistics and Probability
71 pages
Two Samples Z-Test
No ratings yet
Two Samples Z-Test
14 pages
Hypotheses Testing
No ratings yet
Hypotheses Testing
25 pages
Hypothesis Tests & Control Charts: by S.G.M
No ratings yet
Hypothesis Tests & Control Charts: by S.G.M
26 pages
Week 13 Hypothesis Testing
No ratings yet
Week 13 Hypothesis Testing
32 pages
hyp
No ratings yet
hyp
19 pages
Navidi ch6
No ratings yet
Navidi ch6
82 pages
Z Test Sample Problems
No ratings yet
Z Test Sample Problems
15 pages
3lesson 2 Tests Involving The Population Mean 1
No ratings yet
3lesson 2 Tests Involving The Population Mean 1
14 pages
PS 601 Notes - Part II Statistical Tests
No ratings yet
PS 601 Notes - Part II Statistical Tests
56 pages
Introduction To Hypothesis Testing - III
No ratings yet
Introduction To Hypothesis Testing - III
33 pages
chapter 6
No ratings yet
chapter 6
28 pages
02 - Statistical Inference
No ratings yet
02 - Statistical Inference
10 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Soil Mechanics Laboratory Manual - PDF
100% (5)
Soil Mechanics Laboratory Manual - PDF
202 pages
1 - Soil Mechanics Laboratory Manual by Engr. Yasser M.S.
No ratings yet
1 - Soil Mechanics Laboratory Manual by Engr. Yasser M.S.
104 pages
12 Multiple Comparisons
No ratings yet
12 Multiple Comparisons
10 pages
Soil Mechanics Laboratory Manual by Engr. Yasser M.S. Almadhoun
No ratings yet
Soil Mechanics Laboratory Manual by Engr. Yasser M.S. Almadhoun
39 pages
2 - Soil Mechanics Laboratory Manual by Engr. Yasser M.S. - Compressed
No ratings yet
2 - Soil Mechanics Laboratory Manual by Engr. Yasser M.S. - Compressed
98 pages
Simulations in Statistical Inference
No ratings yet
Simulations in Statistical Inference
12 pages
1 - Soil Mechanics Laboratory Manual by Engr. Yasser M.S.
No ratings yet
1 - Soil Mechanics Laboratory Manual by Engr. Yasser M.S.
104 pages
Prediction Is A Key Task of Statistics
No ratings yet
Prediction Is A Key Task of Statistics
18 pages
02 Producing Data, Sampling
No ratings yet
02 Producing Data, Sampling
9 pages
01 Descriptive Statistics For Exploring Data
No ratings yet
01 Descriptive Statistics For Exploring Data
21 pages
04 Normal Approximation For Data and Binomial Distribution
No ratings yet
04 Normal Approximation For Data and Binomial Distribution
24 pages
Cpcb Scientist b
No ratings yet
Cpcb Scientist b
15 pages
CH 12
No ratings yet
CH 12
9 pages
Capr-Ii 5151 PDF
No ratings yet
Capr-Ii 5151 PDF
60 pages
Civil Services
No ratings yet
Civil Services
15 pages
LIST OF CANDIDATES ALLOTTED SEATS IN JIPMER PUDUCHERRY KARAIKAL New CompressPdf ICCC INDIA
No ratings yet
LIST OF CANDIDATES ALLOTTED SEATS IN JIPMER PUDUCHERRY KARAIKAL New CompressPdf ICCC INDIA
2 pages
PhD research paper
No ratings yet
PhD research paper
1 page
Test2 Answer Key PDF
No ratings yet
Test2 Answer Key PDF
1 page
Annova
0% (1)
Annova
19 pages
Omega Consultancy Services: A Govt. Regd. Test House
No ratings yet
Omega Consultancy Services: A Govt. Regd. Test House
1 page
Response Sheet
No ratings yet
Response Sheet
59 pages
Final Exam - Stat101 - SpringA
No ratings yet
Final Exam - Stat101 - SpringA
5 pages
Testing The Difference Between Two Means of Independent Samples: Using The T Test
No ratings yet
Testing The Difference Between Two Means of Independent Samples: Using The T Test
5 pages
Likert PDF
No ratings yet
Likert PDF
4 pages
Skittles Project Six
No ratings yet
Skittles Project Six
2 pages
14-General Science Handwritten PDF Notes
No ratings yet
14-General Science Handwritten PDF Notes
82 pages
Confirmatory Factor Analysis (CFA) of First Order Factor Measurement Model-ICT Empowerment in Nigeria
No ratings yet
Confirmatory Factor Analysis (CFA) of First Order Factor Measurement Model-ICT Empowerment in Nigeria
8 pages
Pavan Kumar 1
No ratings yet
Pavan Kumar 1
8 pages
Q4 Statistics and Probability 11 - Module 1
No ratings yet
Q4 Statistics and Probability 11 - Module 1
18 pages
CA Intermediate Law Custom Question Paper (1)
No ratings yet
CA Intermediate Law Custom Question Paper (1)
4 pages
ISTQB CTFL 2018v3.1 Sample Exam C Answers v1.3
No ratings yet
ISTQB CTFL 2018v3.1 Sample Exam C Answers v1.3
23 pages
The Parametic Test of Significance Test T - Distribution
No ratings yet
The Parametic Test of Significance Test T - Distribution
43 pages
2020engg Cap1 Cutoff
No ratings yet
2020engg Cap1 Cutoff
1,326 pages
Difference of Proportion
No ratings yet
Difference of Proportion
5 pages
The University of The South Pacific
No ratings yet
The University of The South Pacific
6 pages
Jee Main 2025-s1p1
No ratings yet
Jee Main 2025-s1p1
1 page
MCAT Scores - Combined 2012
No ratings yet
MCAT Scores - Combined 2012
2 pages
Chapter 09
No ratings yet
Chapter 09
47 pages

08 Test of Significance

Uploaded by

08 Test of Significance

Uploaded by

The logic behind testing hypotheses

‘Observed’ is a statistic that is appropriate for assessing H0 . In the example of the 10

observed sum − expected sum 7−5

A two-sided alternative might also be appropriate:

observed average − expected average 15.6 ppb − 15 ppb

I There is a general connection between confidence intervals and tests:

observed difference − expected difference (p̂2 − p̂1 ) − (p2 − p1 )

An important fact is that if p̂1 and p̂2 are independent, then

The confidence interval for p2 − p1 is

(p̂2 − p̂1 ) ± z SE(p̂2 − p̂1 ) = [−1%, 7%] when z = 2

(n1 − 1)s21 + (n2 − 1)s22

Do husbands tend to be older than their wives?

The independence assumption is in the sampling of the couples.

You might also like