0% found this document useful (0 votes)
11 views33 pages

08-Data Science-S25-Comparing Two Samples

Uploaded by

mohussein529
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views33 pages

08-Data Science-S25-Comparing Two Samples

Uploaded by

mohussein529
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CMPS 360

Data Science Session 8


Fundamentals
Spring 2025 Comparing Two Samples

Ch. 12

Many slides are created by John DeNero ([email protected]) and Ani Adhikari ([email protected])
Comparing Two Samples
● Compare values of sampled individuals in Group A with
values of sampled individuals in Group B.

● Question: Do the two sets of values come from the


same underlying distribution?

● Answering this question by performing a statistical test


is called A/B testing.
A/B Testing
The Groups and the Question
● Random sample of mothers of newborns. Compare:
○ (A) Birth weights of babies of mothers who smoked
during pregnancy
○ (B) Birth weights of babies of mothers who didn’t
smoke

● Question: Could the difference be due to chance alone?

(Demo)
Hypotheses
● Null:
○ In the population, the distributions of the birth
weights of the babies in the two groups are the
same. (They are different in the sample just due to
chance.)
● Alternative:
○ In the population, the babies of the mothers who
smoked weigh less, on average, than the babies of
the non-smokers.
Test Statistic
● Group A: non-smokers
● Group B: smokers

● Statistic: Difference between average weights


Group B average - Group A average

● Negative values of this statistic favor the alternative


(Demo)
Simulating Under the Null
● If the null is true, all rearrangements of labels are
equally likely
● Plan:
○ Shuffle all group labels
○ Assign each shuffled label to a birth weight
○ Find the difference between the averages of the two
shuffled groups
○ Repeat
The Data

...

Non-smoker Non-smoker Smoker Smoker


... Non-smoker
120 oz 113 oz 128 oz 108 oz 117 oz
Shuffling Labels Under the Null

...

Smoker Non-smoker Non-smoker Smoker Smoker


...
120 oz 113 oz 128 oz 108 oz 117 oz
Shuffling Rows: Random Permutation
● tbl.sample(n)
○ Table of n rows picked randomly with replacement
● tbl.sample()
○ Table with same number of rows as original tbl,
picked randomly with replacement
● tbl.sample(n, with_replacement = False)
○ Table of n rows picked randomly without replacement
● tbl.sample(with_replacement = False)
○ All rows of tbl, in random order
(Demo)
How We’ve Tested Thus Far
Hypothesis Testing Review
● 1 Sample: One Category (e.g. percent of flowers that are purple)
○ Test Statistic: observed_proportion, abs(observed_proportion - null_proportion)
○ How to Simulate: sample_proportions(n, null_dist)

● 1 Sample: More Than 2 Categories (e.g. ethnicity distribution of jury panel)


○ Test Statistic: tvd(observed_dist, null_dist)
○ How to Simulate: sample_proportions(n, null_dist)

● 1 Sample: Numerical Data (e.g. scores in a lab section)


○ Test Statistic: observed_mean, abs(observed_mean - null_mean)
○ How to Simulate: population_data.sample(n, with_replacement=False)

● 2 Samples: Underlying Values (e.g. birth weights of smokers vs. non-smokers)


○ Test Statistic: group_a_mean - group_b_mean, group_b_mean - group_a_mean,
abs(group_a_mean - group_b_mean)
○ How to Simulate: observed_data.sample(with_replacement=False)
Smoking caused
Lower Birth Weight?
Importance of Random Assignment
We’ve concluded that in the population, birth weights of
babies whose mothers smoke weigh less than those whose
mothers do not
● Is lower birth weight caused by maternal smoking?
● Can’t Tell:
○ Moms aren’t randomly assigned whether to smoke
○ Other factors contribute to their decision to smoke (e.g.
income, geography, diet)
Causality
Randomized Controlled Experiment
● Sample A: control group
● Sample B: treatment group
● If the treatment and control groups are selected at
random, then you can make causal conclusions.
● Any difference in outcomes between the two groups
could be due to
○ chance
○ the treatment
(Demo)
Before the Randomization
● In the population there is one imaginary ticket for each
of the 31 participants in the experiment.
● Each participant’s ticket looks like this:

Potential Potential
Outcome Outcome

Outcome if assigned to Outcome if assigned to


treatment group control group
The Data
16 randomly picked tickets show:

Outcome if assigned to
control group

The remaining 15 tickets show:

Outcome if assigned to
treatment group
The Hypotheses
● Null:
○ In the population, the distribution of all potential
control scores is the same as the distribution of all
potential treatment scores.
○ tl;dr the treatment has no effect
● Alternative:
○ In the population, more of the potential treatment
scores are 1 (pain improves) than the potential
control scores. (Demo)
Random Assignment & Shuffling
Data Sample Hypothesis Testing Conclusions
Generation Data Difference of Means
Permutation Test

Observational
Sample Association

Our Two-
Shuffle Labels
Sample
to Simulate
Numerical Data
from Null

Randomized
Control
Experiment Causation
P-Values and Error Probabilities
Discussion Question
There are 2000 students in a course. Each has a coin to test:
Null: The coin is fair
Alternative: The coin is unfair
● based on 1,000 tosses of a coin,
● the statistic = | number of heads - 500 |,
● and the 5% cutoff for the P-value.

Suppose all 1000 coins are fair. About how many students
will conclude that their coins are unfair?
Statistic Simulated Under the Null

About 5% of the
area is to the right
of the gold line
Can the Conclusion be Wrong?
Yes.
Null is true Alternative is
true
Test favors the
null ✅ ❌
Test favors the
alternative ❌ ✅
An Error Probability
● The cutoff for the P-value is an error probability.

● If:
○ your cutoff is 5%
○ and the null hypothesis happens to be true

● then there is about a 5% chance that your test will


reject the null hypothesis.
P-value cutoff vs P-value
● P-value cutoff
○ Does not depend on observed data or simulation
○ Decide on it before seeing the results
○ Conventional values at 5% and 1%
○ Probability of hypothesis testing making an error
● P-value
○ Depends on the observed data and simulation
○ Probability under the null hypothesis that the test statistic
is the observed value or further towards the alternative
More on Hypothesis Tests
Discussion Question
● Manufacturers of Super Soda run a taste test.
● 91 out of 200 tasters prefer Super Soda over its rival.
Question: Do fewer people prefer Super Soda than its rival, or is this
just chance?
Null hypothesis:
Alternative hypothesis:
Test statistic (a way to summarize the whole sample as a single
number):
p-value: Start at the observed statistic and look which way?
Discussion Question
● Manufacturers of Super Soda run a taste test.
● 91 out of 200 tasters prefer Super Soda over its rival.
Question: Do fewer people prefer Super Soda than its rival, or is this
just chance?
Null hypothesis: Half the people in the population prefer Super Soda.
Alternative hypothesis: fewer people in the population prefer Super
Soda than its rival
Test statistic (a way to summarize the whole sample as a single
number): The number of people in the sample who prefer Super Soda
p-value: Start at the observed statistic and look which way? left
(Demo)
Hypothesis Test Concerns
The outcome of a hypothesis test can be affected by:
● The hypotheses you investigate:
How do you define your null distribution?
● The test statistic you choose:
How do you measure a difference between samples?
● The empirical distribution of the statistic under the null:
How many times do you simulate under the null distribution?
● The data you collected:
Did you happen to collect a sample that is similar to the population?
● The truth:
If the alternative hypothesis is true, how extreme is the difference?

(Demo)
Changing number of simulations
Difference from the null
Hypothesis Test Effects
Number of simulations: Make it as large as possible so
that the empirical distribution of the test statistic under the
null distribution is good. No new data needs to be collected.
Number of observations: A larger sample will lead you to
reject the null more reliably if the alternative is in fact true.
Difference from the null: If the null hypothesis is false, but
the truth is similar to the null hypothesis, then even a large
sample may not provide enough evidence to reject the null.

You might also like