2nd Half Notes
2nd Half Notes
Lesson Plan
▶ Understanding the test of independence
▶ Using contingency tables to compute the chi square
statistics
▶ Find the estimated P-value of the sample
What is Chi-Square?
▶ Chi-square: χ2
Nominal or Ordinal
Test of independence
Chi-square distribution (it is right skewed
▶ distribution), The area under the curve is 1
The Chi-Square distribution
As d.f increase the chi-square distribution starts to
become more of a normal distribution shape
Conditions for Chi Square
▶To perform a chi-square test for association, check that the
following conditions are met:
• Random: The data come from a random sample from the
population of interest, from independent random samples from
the populations of interest, or from groups in a randomized
experiment. Randomized experiment: the way people are put into control or eperimental groups
• Large Counts: All expected counts are at least 5.
Chi-Square Procedures
▶StateH0 and H1 Null and alternative hypothesis
Var 2
r number of rows
Colum
n total
c number of columns
Practice: Contingency Table
Observed
frequencies
Chi-Square notation
Expected Frequencies
(500*150)/1000 = 75
Practice: Expected values
Statistically Signifacence Difference between the Observed & expected values based off probabilty theory
Test of Independence
Test Statistics
= 19.6
Coconut flavour 33 36
57 54
90
Cherry flavour 30 20
20 30
50
Orange flavour 5 16 35 24
40
Traditional cola 12 8
8 12
20
Column total 80 120 200
Practice
H0: School level and soda flavour preferences are independent
H1: School level and soda flavour preferences are not independent
Cherry flavour 20 30 50
Orange flavour 16 24 40
Traditional cola 8 12 20
Hypotheses: we are using row instead of r because row is for the population & r is for the sample
FORMULA: CALCULATING r
n number of pairs of sample data
x denotes the sum of all x-values.
n x y – ( x)( y) x2 indicates that each x-value
r= should be squared and then
those squares added.
n(x2) – (x)2 n(y2) – (y)2
(x)2 indicates that the x-values
should be added and then the
total squared.
xy indicates that each x-value
r = linear correlation coefficient for sample data. should be first multiplied by its
= linear correlation coefficient for population data. corresponding y-value. After
obtaining all such products,
When calculating pearson’s r go to 3 decimal places
find their sum.
HYPOTHESIS TESTING
If the absolute value of the computed value of r, denoted |r|,
exceeds the value in Table (critical value), conclude that
there is a linear correlation. (Reject H0)
ρ calculated > ρ table : Reject H0
• What is a hypothesis?
We don’t say we accept the null hypothsis, we says we failed to reject
• What is hypothesis testing
• A statistical method that helps us make decisions or inferences about
a population parameter (like the mean) based on sample data.
• Inferential statistics
• We accept or reject the null hypotheses by the end of the test
• What is the difference between confidence intervals and hypothesis
testing?
HYPOTHESIS TESTING
• H0 • Ha or H1
• No effect or no difference • Reject H0
• Things are in fact different
o Greater than
o Less than
o Different the not equal to symbol, relates to a 2 tail test
SATING YOUR HYPOTHESES
x− x−
z= or t =
s
n n
Review: When do we use z versus t?
CONC LUDING A STATISTIC AL TEST
Steps:
1. Calculate your t- test value.
2. Determine the degrees of freedom
you are using
3. Determine if it is a one tail or two
tail test
4. Determine the significance level you
are using for comparison
5. Compare the two
6. Draw your conclusion based on the
rejection zone principle
LET'S PRACTICE
Z-caluclated = 1.25
Z-Critical = +/- 1.96
We fail to reject H0
LET'S PRACTICE
t-critcal= 1.833
We reject our H0
DRAWING CONC LUSIONS
H0 True H0 False
Reject H0 Type I error = Alpha Error Correct Conclusion!
You are testing the null hypothesis that you are not allergic
to peanuts. What is a type I error?
Steps
Use symbols to denote
• State the null and alternate hypothesis the hypothsis
Knows Population
Standard Deviation
4/1/2025 3
Remember!
This is Estimation
Central limit theorem is guiding these principles
Important Notation
= population standard deviation
= population mean
x = sample mean (point estimate)
n = number of sample values Sample size
E = margin of error It is something we can caluculate
zc = critical value On the z table
Estimating a Parameter
• Estimating the population parameter using a single number is the
point estimate A single number that represent the population
is estimating for μ
x
A limination of Point estmimate
- It is not repersentive of the entire poulation
x–E<μ<x+E
Confidence Interval
• A confidence interval gives a set of plausible values for a
parameter based on sample data
o Margin of error
4/1/2025 8
Practice!
Point estimate
Margin of Error when σ is known
Standard
Error
Critical Value
Confidence Level
Confidence Level
• Way to measure the
reliability of an estimate
P(-zc < z < zc) = c
• Critical Value
• Number of the area between
–zc and zc
Practice!
• Find the z.99 such that P(-z0.99 < z < z0.99) = .99
(1-.99)/2 = .005
n = number of trials
p = success of a single trial
q = 1-p = failure of a single
trial Because we are in
binomial distribution.
It is either a sources
or a failure that is
why q= 1-p
Estimating Proportions
Point estimate for p and q
• https://youtu.be/Ea4_eX--mIY?si=xJisoJ_PsPu9uNzU
4/1/2025 22
Student t Distribution
• Developed by William Sealy Gosset
• Degrees of freedom
d.f. = n − 1
Practice
• Find the Critical value for a .95 confidence level for a t distribution
with a sample size of n= 10
Estimation of μ
Marginal Error
Confidence Interval
Estimation Summary
Examine the problem statement
σ is known σ is unknown
• "We are c% confident that the interval from ______ to _____ captures the true value
of the population mean."
3/18/2025 3
When discusing if something is going to happen or not
- ex sports, player stats
0 .5 1
Impossibility Certainty of
of occurrence 50 /50 chance occurrence
Definitions
• Statistical Experiment: Any random activity that results in definite outcomes
In Class Example:
Probability of rolling a two
Probability Assignment
Equally likely outcomes
Number of Outcomes Favorable to Event A
P ( A) =
Total Number of Outcomes
f= frequency
n = sample size
Relative frequency
Non-numerical approach
non- scientific approach
Note: Use the
Practice proper notation
Question:
A bag contains 5 red marbles, 3 blue marbles, and 2 green marbles.
If one marble is drawn at random from the bag, what is the
probability that:
• The marble drawn is red?
P(R) = 5/10. =0.5
3/18/2025 8
Statistics and Probability
• Rare Event Rules:
• Probability of observed events are so small we conclude that
the assumption is probably not correct Ex fliping a coin 20
times and landing
on heads every time
Relative
0.52 0.518 0.495 0.503 0.4996
Frequency
f = number of
104 259 495 1006 2498
heads
n = number of
200 500 1000 2000 5000
flips
Random Variables
3/18/2025 13
Continuous Variables: The Normal
Inflection point always
x–µ
z=
x minus population mean,
devided by population
standard deviation
Practice!
• The below represents your score in your psychology class.
your grade μ σ
Psychology 68 65 6
• For anything less than -3.49 use 0.000 for approximate the area
• P(z > c) = P(z ≥ c) Less then is the area blow the z score
1 – area left to Z1
Z1 = .94
Area above is 1-0.8264=0.174
Probability to the right of Z1 is o.174
Using the standard normal
distribution table
Areas to the left of
Z2 – area left to Z1
• Given the Z-score above, and knowing that the population mean is
32 with a standard deviation of 2.5, find x
2.60 = x – 32 = 38.5
2.5
Inverse Normal Distribution
Practice!
Question:
A social science researcher is studying the number of hours students
spend on social media per week. The researcher finds that the average
(mean) time spent on social media is 15 hours with a standard deviation
of 4 hours.
• What is the z-score for a student who spends 20 hours per week on
social media?
• What is the probability a student spends more than 20 hours per
week on social media?
3/18/2025 33
What is a Sampling Distribution
• The sampling distribution of the statistic is the distribution of all
values of that statistic when all possible samples of the same
size are taken from the population.
o We use a set of samples We use several samples because
theire is variability from sample to
sample
Standard Error:
Standard deviation
for sample
distribution
Central Limit Theorem
• For a population of any distribution,
the sample mean approaches a
normal distribution as the sample size
increases.
• Bigger sample size = more of a normal Symbol for standard
shape Error
Central Limit Theorem