Hypotesis Testing Chapter1
Hypotesis Testing Chapter1
z-scores
HYPOTHESIS TESTING IN R
Richie Cotton
Data Evangelist at DataCamp
A/B testing
Electronic Arts (EA) is a video game
company.
HYPOTHESIS TESTING IN R
Retail webpage A/B test
Control Treatment
HYPOTHESIS TESTING IN R
A/B test results
The treatment group (no ad) got 43.4% more purchases than the control group (with ad).
The intuition that "showing an ad would increase sales" was completely wrong.
HYPOTHESIS TESTING IN R
Stack Overflow Developer Survey 2020
library(dplyr)
glimpse(stack_overflow)
Rows: 2,261
Columns: 8
$ respondent <dbl> 36, 47, 69, 125, 147, 152, 166, 170, 187, 196, 221,…
$ age_first_code_cut <chr> "adult", "child", "child", "adult", "adult", "adult…
$ converted_comp <dbl> 77556, 74970, 594539, 2000000, 37816, 121980, 48644…
$ job_sat <fct> Slightly satisfied, Very satisfied, Very satisfied,…
$ purple_link <chr> "Hello, old friend", "Hello, old friend", "Hello, o…
$ age_cat <chr> "At least 30", "At least 30", "Under 30", "At least…
$ age <dbl> 34, 53, 25, 41, 28, 30, 28, 26, 43, 23, 24, 35, 37,…
$ hobbyist <chr> "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Ye…
HYPOTHESIS TESTING IN R
Hypothesizing about the mean
A hypothesis:
119574.7
HYPOTHESIS TESTING IN R
Generating a bootstrap distribution
# Step 3. Repeat steps 1 & 2 many times
so_boot_distn <- replicate(
n = 5000,
expr = {
# Step 1. Resample
stack_overflow %>%
slice_sample(prop = 1, replace = TRUE) %>%
}
)
HYPOTHESIS TESTING IN R
Visualizing the bootstrap distribution
tibble(resample_mean = so_boot_distn) %>%
ggplot(aes(resample_mean)) +
geom_histogram(binwidth = 1000)
HYPOTHESIS TESTING IN R
Standard error
std_error <- sd(so_boot_distn)
5511.674
HYPOTHESIS TESTING IN R
z-scores
value − mean mean_comp_samp
standardized value =
standard deviation
119574.7
sample stat − hypoth. param. value
z=
standard error mean_comp_hyp <- 110000
1.737171
HYPOTHESIS TESTING IN R
Testing the hypothesis
Is 1.737171 a high or low number?
This is the goal of the course!
HYPOTHESIS TESTING IN R
Standard normal (z) distribution
Standard normal distribution: the normal
distribution with mean zero, standard
deviation 1.
HYPOTHESIS TESTING IN R
Let's practice!
HYPOTHESIS TESTING IN R
p-values
HYPOTHESIS TESTING IN R
Richie Cotton
Data Evangelist at DataCamp
Criminal trials
Two possible true states.
1. Defendant committed the crime.
2. Not guilty.
If the evidence is "beyond a reasonable doubt" that the defendant committed the crime,
then a "guilty" verdict is given, else a "not guilty" verdict is given.
HYPOTHESIS TESTING IN R
Age of first programming experience
age_first_code_cut classifies when Stack Overflow user first started programming
1. "adult" means they started at 14 or older
Does our sample provide evidence that data scientists have a greater proportion starting
programming as a child?
HYPOTHESIS TESTING IN R
Definitions
A hypothesis is a statement about an unknown population parameter.
The alternative hypothesis (HA ) is the new "challenger" idea of the researcher.
H0 : The proportion of data scientists starting programming as children is the same as that
of software developers (35%).
HA : The proportion of data scientists starting programming as children is greater than 35%.
1"Naught" is British English for "zero". For historical reasons, "H-naught" is the international convention for
pronouncing the null hypothesis.
HYPOTHESIS TESTING IN R
Two possible true states. In reality, either HA or H0 is true (but not
1. Defendant committed the crime. both).
2. Defendant did not commit the crime. The test ends in either "reject H0 " verdict or
Initially the defendant is assumed to be not If the evidence from the sample is
"significant" that HA is true, choose that
guilty.
hypothesis, else choose H0 .
If the evidence is "beyond a reasonable
doubt" that the defendant committed the
Significance level is "beyond a reasonable
doubt" for hypothesis testing.
crime, then a "guilty" verdict is given, else a
"not guilty" verdict is given.
HYPOTHESIS TESTING IN R
One-tailed and two-tailed tests
Hypothesis tests determine whether the
sample statistics lie in the tails of the null
distribution.
Test Tails
alternative different from null two-tailed
alternative greater than null right-tailed
alternative less than null left-tailed
HA : The proportion of data scientists starting
programming as children is greater than 35%.
HYPOTHESIS TESTING IN R
p-values
The larger the p-value, the stronger the support for H0 .
The smaller the p-value, the stronger the evidence against H0 .
Small p-values mean the statistic is in the tail of the null distribution (the distribution of the
statistic if the null hypothesis was true).
The "p" in p-value stands for probability.
HYPOTHESIS TESTING IN R
Defining p-values
A p-value is
HYPOTHESIS TESTING IN R
Calculating the z-score
prop_child_samp <- stack_overflow %>%
summarize(point_estimate = mean(age_first_code_cut == "child")) %>%
pull(point_estimate)
0.388
3.956
HYPOTHESIS TESTING IN R
Calculating the p-value
pnorm() is normal CDF.
3.818e-05
HYPOTHESIS TESTING IN R
Let's practice!
HYPOTHESIS TESTING IN R
Statistical
significance
HYPOTHESIS TESTING IN R
Richie Cotton
Data Evangelist at DataCamp
p-value recap
p-values quantify evidence for the null hypothesis.
Large p-value → fail to reject null hypothesis.
HYPOTHESIS TESTING IN R
Significance level
The significance level of a hypothesis test (α) is the threshold point for "beyond a reasonable
doubt".
HYPOTHESIS TESTING IN R
Calculating the p-value
alpha <- 0.05 p_value <= alpha
3.818e-05
HYPOTHESIS TESTING IN R
Confidence intervals
For a significance level of 0.05, it's common to choose a confidence interval of
1 - 0.05 = 0.95 .
# A tibble: 1 x 2
lower upper
<dbl> <dbl>
1 0.369 0.407
HYPOTHESIS TESTING IN R
Types of errors
Truly didn't commit crime Truly committed crime
Verdict not guilty correct they got away with it
Verdict guilty wrongful conviction correct
actual H0 actual HA
False positives are Type I errors; false negatives are Type II errors.
HYPOTHESIS TESTING IN R
Possible errors in our example
If p ≤ α, we reject H0 :
A false positive (Type I) error could have occurred: we thought that data scientists started
coding as children at a higher rate when in reality they did not.
A false negative (Type II) error could have occurred: we thought that data scientists coded
as children at the same rate as software engineers when in reality they coded as children at
a higher rate.
HYPOTHESIS TESTING IN R
Let's practice!
HYPOTHESIS TESTING IN R