0% found this document useful (0 votes)
13 views

Hypotesis Testing Chapter1

asd

Uploaded by

zopauy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Hypotesis Testing Chapter1

asd

Uploaded by

zopauy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Hypothesis tests and

z-scores
HYPOTHESIS TESTING IN R

Richie Cotton
Data Evangelist at DataCamp
A/B testing
Electronic Arts (EA) is a video game
company.

In 2013, they released SimCity 5.

Their goal was to increase pre-orders of the


game.

They used A/B testing to test different


advertising scenarios.

This involves splitting users into control and


treatment groups.

1 Image credit: "Electronic Arts" by majaX1 CC BY-NC-SA 2.0

HYPOTHESIS TESTING IN R
Retail webpage A/B test
Control Treatment

HYPOTHESIS TESTING IN R
A/B test results
The treatment group (no ad) got 43.4% more purchases than the control group (with ad).
The intuition that "showing an ad would increase sales" was completely wrong.

Was this result statistically significant or just by chance?

You need EA's data to determine this.

You'd use techniques from Sampling in R + this course to do so.

HYPOTHESIS TESTING IN R
Stack Overflow Developer Survey 2020
library(dplyr)
glimpse(stack_overflow)

Rows: 2,261
Columns: 8
$ respondent <dbl> 36, 47, 69, 125, 147, 152, 166, 170, 187, 196, 221,…
$ age_first_code_cut <chr> "adult", "child", "child", "adult", "adult", "adult…
$ converted_comp <dbl> 77556, 74970, 594539, 2000000, 37816, 121980, 48644…
$ job_sat <fct> Slightly satisfied, Very satisfied, Very satisfied,…
$ purple_link <chr> "Hello, old friend", "Hello, old friend", "Hello, o…
$ age_cat <chr> "At least 30", "At least 30", "Under 30", "At least…
$ age <dbl> 34, 53, 25, 41, 28, 30, 28, 26, 43, 23, 24, 35, 37,…
$ hobbyist <chr> "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Ye…

HYPOTHESIS TESTING IN R
Hypothesizing about the mean
A hypothesis:

The mean annual compensation of the population of data scientists is $110,000.

The point estimate (sample statistic):

mean_comp_samp <- mean(stack_overflow$converted_comp)

mean_comp_samp <- stack_overflow %>%


summarize(mean_compensation = mean(converted_comp)) %>%
pull(mean_compensation)

119574.7

HYPOTHESIS TESTING IN R
Generating a bootstrap distribution
# Step 3. Repeat steps 1 & 2 many times
so_boot_distn <- replicate(
n = 5000,
expr = {

# Step 1. Resample
stack_overflow %>%
slice_sample(prop = 1, replace = TRUE) %>%

# Step 2. Calculate point estimate


summarize(mean_compensation = mean(converted_comp)) %>%
pull(mean_compensation)

}
)

1 Bootstrap distributions are taught in Chapter 4 of Sampling in R

HYPOTHESIS TESTING IN R
Visualizing the bootstrap distribution
tibble(resample_mean = so_boot_distn) %>%
ggplot(aes(resample_mean)) +
geom_histogram(binwidth = 1000)

HYPOTHESIS TESTING IN R
Standard error
std_error <- sd(so_boot_distn)

5511.674

HYPOTHESIS TESTING IN R
z-scores
value − mean mean_comp_samp
standardized value =
standard deviation
119574.7
sample stat − hypoth. param. value
z=
standard error mean_comp_hyp <- 110000

$119, 574.7 − $110, 000


z= = 1.737 std_error
$5511.67
5511.674

z_score <- (mean_comp_samp - mean_comp_hyp) / std_error

1.737171

HYPOTHESIS TESTING IN R
Testing the hypothesis
Is 1.737171 a high or low number?
This is the goal of the course!

Hypothesis testing use case:


Determine whether sample statistics are close to or far away from expected (or
"hypothesized" values).

HYPOTHESIS TESTING IN R
Standard normal (z) distribution
Standard normal distribution: the normal
distribution with mean zero, standard
deviation 1.

tibble(x = seq(-4, 4, 0.01)) %>%


ggplot(aes(x)) +
stat_function(fun = dnorm) +
ylab("PDF(x)")

HYPOTHESIS TESTING IN R
Let's practice!
HYPOTHESIS TESTING IN R
p-values
HYPOTHESIS TESTING IN R

Richie Cotton
Data Evangelist at DataCamp
Criminal trials
Two possible true states.
1. Defendant committed the crime.

2. Defendant did not commit the crime.

Two possible verdicts.


1. Guilty.

2. Not guilty.

Initially the defendant is assumed to be not guilty.

If the evidence is "beyond a reasonable doubt" that the defendant committed the crime,
then a "guilty" verdict is given, else a "not guilty" verdict is given.

HYPOTHESIS TESTING IN R
Age of first programming experience
age_first_code_cut classifies when Stack Overflow user first started programming
1. "adult" means they started at 14 or older

2. "child" means they started before 14

Previous research suggests that 35% of software developers started programming as


children

Does our sample provide evidence that data scientists have a greater proportion starting
programming as a child?

HYPOTHESIS TESTING IN R
Definitions
A hypothesis is a statement about an unknown population parameter.

A hypothesis test is a test of two competing hypotheses.

The null hypothesis (H0 ) is the existing "champion" idea.

The alternative hypothesis (HA ) is the new "challenger" idea of the researcher.

For our problem

H0 : The proportion of data scientists starting programming as children is the same as that
of software developers (35%).

HA : The proportion of data scientists starting programming as children is greater than 35%.

1"Naught" is British English for "zero". For historical reasons, "H-naught" is the international convention for
pronouncing the null hypothesis.

HYPOTHESIS TESTING IN R
Two possible true states. In reality, either HA or H0 is true (but not
1. Defendant committed the crime. both).
2. Defendant did not commit the crime. The test ends in either "reject H0 " verdict or

Two possible verdicts. "fail to reject H0 ".

1. Guilty. Initially the null hypothesis, H0 , is assumed

2. Not guilty. to be true.

Initially the defendant is assumed to be not If the evidence from the sample is
"significant" that HA is true, choose that
guilty.
hypothesis, else choose H0 .
If the evidence is "beyond a reasonable
doubt" that the defendant committed the
Significance level is "beyond a reasonable
doubt" for hypothesis testing.
crime, then a "guilty" verdict is given, else a
"not guilty" verdict is given.

HYPOTHESIS TESTING IN R
One-tailed and two-tailed tests
Hypothesis tests determine whether the
sample statistics lie in the tails of the null
distribution.

Test Tails
alternative different from null two-tailed
alternative greater than null right-tailed
alternative less than null left-tailed
HA : The proportion of data scientists starting
programming as children is greater than 35%.

Our alternative hypothesis uses "greater


than," so we need a right-tailed test.

HYPOTHESIS TESTING IN R
p-values
The larger the p-value, the stronger the support for H0 .
The smaller the p-value, the stronger the evidence against H0 .

Small p-values mean the statistic is in the tail of the null distribution (the distribution of the
statistic if the null hypothesis was true).
The "p" in p-value stands for probability.

For p-values, "small" means "close to zero".

HYPOTHESIS TESTING IN R
Defining p-values
A p-value is

the probability of observing a test statistic

as extreme or more extreme

than what was observed in our original sample,

assuming the null hypothesis is true.

HYPOTHESIS TESTING IN R
Calculating the z-score
prop_child_samp <- stack_overflow %>%
summarize(point_estimate = mean(age_first_code_cut == "child")) %>%
pull(point_estimate)

0.388

prop_child_hyp <- 0.35

std_error <- 0.0096028

z_score <- (prop_child_samp - prop_child_hyp) / std_error

3.956

HYPOTHESIS TESTING IN R
Calculating the p-value
pnorm() is normal CDF.

Left-tailed test → use default lower.tail = TRUE .

Right-tailed test → set lower.tail = FALSE .

p_value <- pnorm(z_score, lower.tail = FALSE)

3.818e-05

HYPOTHESIS TESTING IN R
Let's practice!
HYPOTHESIS TESTING IN R
Statistical
significance
HYPOTHESIS TESTING IN R

Richie Cotton
Data Evangelist at DataCamp
p-value recap
p-values quantify evidence for the null hypothesis.
Large p-value → fail to reject null hypothesis.

Small p-value → reject null hypothesis.

Where is the cutoff point?

HYPOTHESIS TESTING IN R
Significance level
The significance level of a hypothesis test (α) is the threshold point for "beyond a reasonable
doubt".

Common values of α are 0.1 , 0.05 , and 0.01 .

If p ≤ α, reject H0 , else fail to reject H0 .


α should be set prior to conducting the hypothesis test.

HYPOTHESIS TESTING IN R
Calculating the p-value
alpha <- 0.05 p_value <= alpha

prop_child_samp <- stack_overflow %>% TRUE


summarize(
point_estimate = mean(age_first_code_cut == "child")
p_value is less than or equal to alpha , so
) %>%
pull(point_estimate) reject H0 and accept HA .
prop_child_hyp <- 0.35
std_error <- 0.0096028
z_score <- (prop_child_samp - prop_child_hyp) / std_error
The proportion of data scientists starting
programming as children is greater than 35%.
p_value <- pnorm(z_score, lower.tail = FALSE)

3.818e-05

HYPOTHESIS TESTING IN R
Confidence intervals
For a significance level of 0.05, it's common to choose a confidence interval of
1 - 0.05 = 0.95 .

conf_int <- first_code_boot_distn %>%


summarize(
lower = quantile(first_code_child_rate, 0.025),
upper = quantile(first_code_child_rate, 0.975)
)

# A tibble: 1 x 2
lower upper
<dbl> <dbl>
1 0.369 0.407

HYPOTHESIS TESTING IN R
Types of errors
Truly didn't commit crime Truly committed crime
Verdict not guilty correct they got away with it
Verdict guilty wrongful conviction correct

actual H0 actual HA

chosen H0 correct false negative

chosen HA false positive correct

False positives are Type I errors; false negatives are Type II errors.

HYPOTHESIS TESTING IN R
Possible errors in our example
If p ≤ α, we reject H0 :

A false positive (Type I) error could have occurred: we thought that data scientists started
coding as children at a higher rate when in reality they did not.

If p > α, we fail to reject H0 :

A false negative (Type II) error could have occurred: we thought that data scientists coded
as children at the same rate as software engineers when in reality they coded as children at
a higher rate.

HYPOTHESIS TESTING IN R
Let's practice!
HYPOTHESIS TESTING IN R

You might also like