0% found this document useful (0 votes)
5 views

2nd Half Notes

The document provides a comprehensive lesson plan on Chi-Square tests and correlation research, detailing the concepts, procedures, and hypothesis testing involved in analyzing relationships between categorical and continuous variables. It outlines the conditions for performing Chi-Square tests, the calculation of test statistics, and the interpretation of P-values. Additionally, it covers the fundamentals of correlation, including the linear correlation coefficient, scatterplots, and common errors in correlation research.

Uploaded by

tx4n775hkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

2nd Half Notes

The document provides a comprehensive lesson plan on Chi-Square tests and correlation research, detailing the concepts, procedures, and hypothesis testing involved in analyzing relationships between categorical and continuous variables. It outlines the conditions for performing Chi-Square tests, the calculation of test statistics, and the interpretation of P-values. Additionally, it covers the fundamentals of correlation, including the linear correlation coefficient, scatterplots, and common errors in correlation research.

Uploaded by

tx4n775hkx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 131

Chi-Square

Lesson Plan
▶ Understanding the test of independence
▶ Using contingency tables to compute the chi square
statistics
▶ Find the estimated P-value of the sample
What is Chi-Square?
▶ Chi-square: χ2
Nominal or Ordinal

▶ To test the relationship between categorical variables


to determine if they are related or not.
Moving away from normal distribution, now using

Test of independence
Chi-square distribution (it is right skewed
▶ distribution), The area under the curve is 1
The Chi-Square distribution
As d.f increase the chi-square distribution starts to
become more of a normal distribution shape
Conditions for Chi Square
▶To perform a chi-square test for association, check that the
following conditions are met:
• Random: The data come from a random sample from the
population of interest, from independent random samples from
the populations of interest, or from groups in a randomized
experiment. Randomized experiment: the way people are put into control or eperimental groups
• Large Counts: All expected counts are at least 5.
Chi-Square Procedures
▶StateH0 and H1 Null and alternative hypothesis

▶Create an observed and expected frequency table

▶Calculate your test statistics (χ2)


▶Compare your P-value to α
▶Draw your conclusion
Hypothesis testing
Test of Independence

H0: The rows and column variables are independent


(there is no relationship)

H1: The row and column variables are dependent (there


is a relationship)
Chi-Square notation
O Observed Frequency
Our actual count
Observed Frequency Table
- Eitherr given the data to plug in or the table

E Expected frequency Expected frequency Table


- Will need to calucaulate
Var 1 Var 2 Row
What our count should be total

based on probability theory Var 1

Var 2
r number of rows
Colum
n total
c number of columns
Practice: Contingency Table

Observed
frequencies
Chi-Square notation
Expected Frequencies

(500*150)/1000 = 75
Practice: Expected values
Statistically Signifacence Difference between the Observed & expected values based off probabilty theory
Test of Independence
Test Statistics

D.f. = (row – 1 )(columns – 1)


Practice: Test Statistics

= 19.6

D.f. = (row – 1 )(columns – 1)

So… (2-1) (2-1) = 1


Significance level: .05
The Chi-Square Test Statistics
Once you have your d.f. use the Chi-square distribution
table and a right tail test to find the estimated P-value.
P-value

If P-value ≤ α then we reject


H0

If P-value > α then we do


not reject H0
Writing a Conclusion
▶If the p-value is larger than α:
▶“Because the p-value is > α= 0.05; we fail to reject Ho. We do not
have convincing evidence that variable _______ and variable
________ are associated."

▶If the p-value is smaller than α:


▶“Because the p-value is < α= 0.05; we reject Ho. We do have
convincing evidence that variable _______ and variable ________
are associated."

Never “accept” Ho; or conclude that Ho is ”true”


Practice
▶Sample data χ2 = 19.60 → p-value ≈ .000
▶Significance level: 0.05 → Χ2 table = 3.84

Since the P-value for the calculated Χ2 (.000)is less than


the α (.05) we reject H0

We can therefore conclude that gender and pet


ownership are dependent at the 5% level of significance
Practice!
The below table looks at the school level a student is in
and their flavour preferences in soda. Given the
information, test whether the variables are
independent at a 1% significance level
Soda flavour High School Elementary School Total

Coconut flavour 33 36
57 54
90
Cherry flavour 30 20
20 30
50
Orange flavour 5 16 35 24
40
Traditional cola 12 8
8 12
20
Column total 80 120 200
Practice
H0: School level and soda flavour preferences are independent
H1: School level and soda flavour preferences are not independent

Soda flavour High Elementary Total


School School
Coconut flavour 36 54 90

Cherry flavour 20 30 50

Orange flavour 16 24 40

Traditional cola 8 12 20

Column total 80 120 200


Practice
Cell O E O-E (O-E)2 (O-E)2 / Χ2 = 24.68
E P-value ≈ .000
1 33 36 -3 9 .25
d.f. = (4-1)(2-1)= 3
2 57 54 3 9 .17
3 30 20 10 100 5
Χ2 table = 11.34
4 20 30 -10 100 3.33
5 5 16 -11 121 7.56 Since .000 < .01
6 35 24 11 121 5.04
7 12 8 4 16 2 we reject the null
8 8 12 -4 16 1.33 hypothesis
Practice
Conclusion

At the 1% level of significance, we conclude that school


level and soda flavour preferences are dependent

“Because the p-value is > α= 0.05; we fail to reject H We do not


o.

have convincing evidence that variable soda flavour and variable


school level are associated."
CORRELATION
RESEARCH
LESSON PLAN
▪ Understanding what correlation is
▪ Visualizing correlation using scatterplots
▪ Linear Correlation Coefficient
▪ Hypothesis testing in correlation research
▪ Common correlation errors
WHAT IS CORRELATION RESEARCH
▪ Looking at the relationship between two or more variables.
▪ Looking for STRENGTH and DIRECTION of the relationship that
exists.
▪ Closer it is to a straight line = stronger the relationship is

ρ = population paired data

r = sample paired data


LINEAR CORRELATION COEFFICIENT
▪ Denoted by r (r has no units.)
▪ –1 ≤ r ≤ 1
▪ r > 0 indicates a positive relationship
between x and y , r < 0 indicates a
negative relationship.
▪ r = 0 indicates no linear relationship.
▪ Switching the explanatory variable
and response variable does not
change r.
▪ Changing the units of the variables
does not change r.
SCATTERPLOT DIAGRAMS
▪ Graph that plots individual points on the horizontal and vertical axis
▪ Shows the strength and direction of the relationship between two
variables
HYPOTHESIS TESTING
Steps:
1. State your hypotheses (Null and Alternate)
2. Determine the significance level α
3. Calculate test statistic
4. Determine the p-value
5. Make decision
HYPOTHESIS
because row being equal to zero means their is no relation

Hypotheses: we are using row instead of r because row is for the population & r is for the sample

H0: ρ = 0 (there is no linear correlation)


H1: ρ ≠ 0 (there is a linear correlation)

Once the r has been computed, use the critical value of


Pearson Correlation Coefficient r to draw your conclusion
Follow BEDMAS

FORMULA: CALCULATING r
n number of pairs of sample data
x denotes the sum of all x-values.
n x y – ( x)( y) x2 indicates that each x-value
r= should be squared and then
those squares added.
n(x2) – (x)2 n(y2) – (y)2
(x)2 indicates that the x-values
should be added and then the
total squared.
xy indicates that each x-value
r = linear correlation coefficient for sample data. should be first multiplied by its
 = linear correlation coefficient for population data. corresponding y-value. After
obtaining all such products,
When calculating pearson’s r go to 3 decimal places
find their sum.
HYPOTHESIS TESTING
If the absolute value of the computed value of r, denoted |r|,
exceeds the value in Table (critical value), conclude that
there is a linear correlation. (Reject H0)
ρ calculated > ρ table : Reject H0

If | r | ≤ critical value, then we fail to reject H0 and conclude


that there is not sufficient evidence to support the claim of a
linear correlation.

ρ calculated < ρ table : Fail to Reject H0


HYPOTHESIS TESTING
• The tables provide the
critical value of r for a
given significance level
(α) using degrees of
freedom calculated as
d.f = n-2
• n is the number of pair
data available
PRACTICE
▪ Below is information regarding the number of employees enrolled in
a health insurance plan (x) and the administrative cost per claim (y)
x
y 3 7 15 35 75
40 35 30 25 18
1. Draw a scatterplot (scatter diagram) for the above situation
2. Calculate the r for the above situation
3. Use a α= .05. Perform a Hypothesis Test and Interpret your results:
is there a correlation or not?
COMMON CORRELATION ERRORS
▪ Mistaking correlation for causation
▪ Averages
▪ Linearity

R-calculated will be given


HYPOTHESIS TESTING
LESSON PLAN

• Why do we use hypothesis testing in research?


• Understanding the difference between null and alternative hypothesis & writing it correctly
• Types of tests
• Test statistics
• P-value
• Types of Error
• Testing the Mean
• Interpreting the results
HYPOTHESIS TESTING

• What is a hypothesis?
We don’t say we accept the null hypothsis, we says we failed to reject
• What is hypothesis testing
• A statistical method that helps us make decisions or inferences about
a population parameter (like the mean) based on sample data.
• Inferential statistics
• We accept or reject the null hypotheses by the end of the test
• What is the difference between confidence intervals and hypothesis
testing?
HYPOTHESIS TESTING

Steps in Hypothesis Testing: P-hacking it is a stistical


technique used to
manipulate data to
achieve a desierd
1. State your hypotheses H0 and H1 probaility

2. Determine the significance level you are using


o One tail vs. Two tail tests
3. Draw a sample
4. Calculate your P-value (your test statistics based on z or students’ t!)
5. Draw conclusion by comparing your significance level to your P-value
CLAIM

A factory says the average battery life is 10 hours.


We want to test if :
a. the mean is different from 10.
b. The mean is greater than 10
c. The mean is less than
STATING YOUR HYPOTHESIS

Setting up your hypothesis

Give symbolic form Create alternate


Identify the claim to
that must be true is hypothesis by using
be tested
original claim is false symbols >, < or ≠
STATING YOUR HYPOTHESES

Claim: A factory says the average battery life is 10 hours.

NULL HYPOTHESIS ALTERNATE HYPOTHESIS

• H0 • Ha or H1
• No effect or no difference • Reject H0
• Things are in fact different
o Greater than
o Less than
o Different the not equal to symbol, relates to a 2 tail test
SATING YOUR HYPOTHESES

LEFT TAIL RIGHT TAIL TWO-TAIL

• You're testing if the • You're testing if the • You're testing if the


population mean population mean population mean
is less than a is greater than a is different (either
specific value. specific value. higher or lower) from
a given value.
• H₁: μ < μ₀ • H₁: μ > μ₀
• H₁: μ ≠ μ₀
TYPES OF TEST

Based on • Right-Tailed test: H1: μ > k


your
hypothesis
you will
then need • Left-tailed test: H1: μ < k
to
determine if
you are
looking at a • Two tailed tests H1: μ ≠ k
one tail or
two tail test!
CLAIM

A factory says the average battery life is 10 hours.


We want to test if :
a. the mean is different from 10.
b. The mean is greater than 10
c. The mean is less than
Hn = 10
H1 is note equal to 10
H2 <10
H3 >10
P-VALUE
Pink area is
rejection zone!
• Probability of Chance → the probability of getting that statistic!
• Low p-value means that your results are not due to chance alone “If p is low,
the null must
• Significance level: Threshold for acceptance of the P-value go”
TEST STATISTICS

x− x−
z= or t =
 s
n n
Review: When do we use z versus t?
CONC LUDING A STATISTIC AL TEST

• Significance level: Threshold for acceptance of the P-value

• Remember! We always test the null hypothesis and compare


your p-value to your α <— The alpha symbol
Data is
• Conclusions will therefore be one of the following: statistically
1. Reject the null hypothesis → when P-value ≤ α significant!
Z- critical is equall to the cutt
OR when z critical < z calculated off point of 0.01.0.05

1. Fail to reject the null hypothesis. → P-value > α


OR z critical > z calculated
USING THE Z-TABLE FOR HYPOTHESIS
TESTING

Let's use a 5% level of significance

ONE TAIL TEST TWO TAIL TEST

• 5% represents .05 • 5% is first divided by two


o Z = 1.645 o .05/2 = .025
Find the
o Z= 1.96 corresponding
ones for 1%
level of
significance
one tail: z=+/-2.330
Two tail +/- 2.575
USING THE Z-TABLE FOR
HYPOTHESIS TESTING

• Compare your Z-critical to your calculated Z-


score.
• If your Z-score is in the rejection zone, you
reject H0. If not, you fail to reject it.
USING THE T-TABLE FOR HYPOTHESIS
TESTING

Steps:
1. Calculate your t- test value.
2. Determine the degrees of freedom
you are using
3. Determine if it is a one tail or two
tail test
4. Determine the significance level you
are using for comparison
5. Compare the two
6. Draw your conclusion based on the
rejection zone principle
LET'S PRACTICE

A factory says the average battery life is 10 hours.


We want to test if the mean is different from 10 at
the 5% level of significance. We calculate our z=1.25

Draw a conclusion based on this information


H0: mew = 10 hours
H1: mew is not equal to 10 hours

Z-caluclated = 1.25
Z-Critical = +/- 1.96

We fail to reject H0
LET'S PRACTICE

A factory says the average battery life is 10 hours.


We want to test if the mean is greater than 10 at
the 5% level of significance. We calculate our t=3.38
with d.f. = 9.

Draw a conclusion based on this information


H0: mew = 10
H1: mew > 10, therefore we use a 1 tail test

t-critcal= 1.833

t-cal > t-crit

We reject our H0
DRAWING CONC LUSIONS

• If the p-value is larger than α:


• “Because the p-value of ____ > α= 0.05; we fail to reject Ho. We do not have
convincing evidence that the population mean of ___________ is
larger/smaller/different than _____."

• If the p-value is smaller than α:


• “Because the p-value of ____ < α= 0.05; we reject Ho in favour of Ha. We do have
convincing evidence that the population mean of ___________ is
larger/smaller/different than _____.”
TYPES OF ERRORS

Drawing the wrong conclusion when comparing our P-value to our


Significance level
Mistake can be made due the use of sample base information to determine your conclusion
- The sample data can be wrong

TYPE I ERROR TYPE II ERROR


A false postive, also calle an alpha error Acceping a false null hypothesis, also called beta error

• Type I Error: Rejecting • Type II Error: Failing to


the null hypothesis when reject the null hypothesis
the null hypothesis is true when the null hypothesis is
(significance level is your false
type I error!)
TYPE I AND TYPE II ERRORS

H0 True H0 False
Reject H0 Type I error = Alpha Error Correct Conclusion!

Fail to reject Correct conclusion! Type II error = Beta error


H0

Power test = 1 — Beta


Alpha error increase —> Beta error decreases - Lets us better understand the beta error, therefore better understand alpha test
Alpha error decreases --> Beta error increases
EXAMPLE 1

You are testing the null hypothesis that you are not allergic
to peanuts. What is a type I error?

A. You are allergic to peanuts.


B.You are allergic to peanuts, but the test result is that you
are not allergic to peanuts.
C. The test result says that you are allergic to peanuts, but in
reality, you are not allergic to peanuts.
D. You are not allergic to peanuts when you really are.
It is c because the test results say is different from the null
hypothesis, but in realty their is no difference in the null hypothesis
BE C AREFUL!

• Never conclude a hypothesis test with a statement of


“reject the null hypothesis” or “fail to reject the null
hypothesis.” Always make sense of the conclusion with a
statement that uses simple nontechnical wording that
addresses the original claim.
SUMMARY

Steps
Use symbols to denote
• State the null and alternate hypothesis the hypothsis

• Is this a right tail or left tail or two tailed test?


• Are you using a t-test or z-test?
• Compare your zcritical to zcalculated OR tcritical to tcalculated
• Draw your conclusion
PRACTICE!

Research on blood plasma revealed that the Ph of plasma for


a healthy adult is μ =7.4. A random sample of 31 patients in a
hospital revealed a plasma level of 𝑥ҧ = 8.1 and s=1.9. Use a
5% level of significance to test the claim that the mean is
different
PRACTICE!

• Environment Canada has been studying Miller Creek regarding their


ammonia nitrogen concentration levels. For years the levels were 2.3
mg/L. However, due to development in the city, there are concerns
that the level has changed. Let x be a random variable representing
the ammonia nitrogen concentration levels. Based on past research,
researchers have indicated that the data is normally distributed with a
σ= .3. Below is a random sample of eight water tests. Use a α= .01
2.1 2.5 2.2 2.8 3.0 2.2 2.4 2.9
Estimation of
Parameters
Lesson Plan
• Understanding Confidence Intervals

• Using Critical Values and Confidence levels

• Estimating μ using large samples or when σ is known

• Estimating μ using small samples or when σ is not known:


Student t-distribution
• Estimating proportions
Summary of Content covered

Knows Population
Standard Deviation

4/1/2025 3
Remember!

• We use inferential statistics to:

1. Estimate values of a population

2. To test hypotheses made about the population


parameters

This is Estimation
Central limit theorem is guiding these principles

Assumptions about the Random


Variable
Estimation of the Mean (σ is known)

• Simple random sample (n) drawn from a population x

• If x is a normal distribution, then we can use any sample size

• If x distribution shape is unknown, our sample size must n≥ 30


On page 13 of workbook

Important Notation
 = population standard deviation
 = population mean
x = sample mean (point estimate)
n = number of sample values Sample size
E = margin of error It is something we can caluculate
zc = critical value On the z table
Estimating a Parameter
• Estimating the population parameter using a single number is the
point estimate A single number that represent the population

is estimating for μ
x
A limination of Point estmimate
- It is not repersentive of the entire poulation

• Estimating for a range of numbers is the confidence interval

x–E<μ<x+E
Confidence Interval
• A confidence interval gives a set of plausible values for a
parameter based on sample data

• The level C of a confidence interval gives the probability that


the interval produced by the method employed successfully
captures the true value of the parameter.
The apropriate point
• To build a confidence interval you need: estimate to for the
population mean, It is the
sample mean
o Point estimate = Sample mean

o Margin of error

4/1/2025 8
Practice!

Are the following point estimates or confidence


intervals?
• The sample weight of thirty 10 year old girls is
90 pounds point estimate

• The average starting salary for graduating


students from a business program is between
$35,000- $45,000 confidence intervals
ഥ–E<μ<𝒙
𝒙 ഥ+ E
Building a Confidence Interval
Margin of Error The margin of error allows us to understand how far
away our sample mean maybe to the population
mean

• Magnitude of difference between the sample point and


the true population parameter value.
x - m OR x -m
we take the absolute
value because we cannot
have negative probabilty

Point estimate
Margin of Error when σ is known

Standard
Error

Torrlence for error = E

Maximal marginal error


when σ is known

Critical Value
Confidence Level
Confidence Level
• Way to measure the
reliability of an estimate
P(-zc < z < zc) = c
• Critical Value
• Number of the area between
–zc and zc
Practice!

• Find the z.99 such that P(-z0.99 < z < z0.99) = .99

(1-.99)/2 = .005

P(-2.58 < z < 2.58) ≈ .99


Confidence Level
Confidence Interval : Writing your
confidence interval
Step 1: Estimating μ

Confidence Interval for


μ
Confidence Interval : Writing your
confidence interval
Step 2:
"We are c% confident that the interval from ______ to _____
captures the true value of the population mean."

How is this interpreted?


• If we were to select 100 random samples from a
population and construct a C% confidence interval using
each sample, about C% of the interval would capture the
true value of the population mean.
Practice!
Andrew jogs 5 km every Saturday morning. He
knows that his jogs have a standard deviation (σ)
of 2.40 minutes. A random sample of his last 90
jogs provided you with a mean run time of 22.50
minutes. Find a .99 confidence interval for μ.
Estimating Proportions
Estimating p in a binominal distribution
A discrete distribution

n = number of trials
p = success of a single trial
q = 1-p = failure of a single
trial Because we are in
binomial distribution.
It is either a sources
or a failure that is
why q= 1-p
Estimating Proportions
Point estimate for p and q

n= number of trials r= number of successes


𝑟
𝑝ෝ = 𝑛
𝑞ෝ = 1-𝑝ෝ
Estimating Proportions
Confidence Interval

𝑝ෝ − 𝐸 < 𝑝 < 𝑝ෝ + E E = maximum margi


or error
c = confidence level
(95% or 99%)
zc= critical value of
𝑝ො 𝑞ො 𝑝ො (1−𝑝ො )
𝐸 ≈ 𝑍𝑐 = 𝑍𝑐 the confidence
𝑛 𝑛 level (1.96 or 2.58
Question!

Do we always have access to our population


standard deviation?
Becaus eit is hard to
always get that much
data
Watch and Learn
• Who developed the student t-test
A Guiness Beer Brewer: William Gosset

• Why was the t-test developed?


Guinness wanted to mass produce beer production and they to make sure of consitenty

• Why weren't small samples used?


They were not repersentive of greater sample

• When do we use a student t-test


When the sample is considered to be small
To interprit experimentaal result we don’t have population info for

• https://youtu.be/Ea4_eX--mIY?si=xJisoJ_PsPu9uNzU

4/1/2025 22
Student t Distribution
• Developed by William Sealy Gosset

• Estimating for μ when σ is unknown

• Used with a small sample size, n


Differes from bell curve, The t distribution is not as high and is wider d.f = degrees of freedom
Properties of a student t
distribution
• Distribution is symmetrical around the mean

• Distribution is based around the degrees of freedom

• Distribution is bell-shaped with a thicker tail

• As degrees of freedom increase, curve approaches a normal


distribution
• Area under the curve = 1
d.f = n-1
how many option choices
Student t Distribution
• Estimation for μ
x-µ tc = critical t value
t = separating an area of
s (1-c)/2 in the right tail
n of the t distribution

If the d.f you need is not there, you go to next smallest

• Degrees of freedom

d.f. = n − 1
Practice
• Find the Critical value for a .95 confidence level for a t distribution
with a sample size of n= 10
Estimation of μ

Marginal Error

Confidence Interval
Estimation Summary
Examine the problem statement

σ is known σ is unknown

Use the normal Use the Student’s t


distribution margin of distribution with the
error margin of error
Practice!
• A company produces diamonds and produced 37 in a
trial run. The distribution is symmetrical and mound
shaped. The mean weight for the 37 diamonds produced
in 6.75 carats and the sample standard deviation is .33
carat. What is the mean weight for all diamonds
produced with a 95% confidence interval.
Writing out the Confidence interval &
Interpretation
• Same as before!

• "We are c% confident that the interval from ______ to _____ captures the true value
of the population mean."

• How is this interpreted?

• If we were to select 100 random samples from a population and construct a C%


confidence interval using each sample, about C% of the interval would capture
the true value of the population mean.
Factor that affect the margin of error E

In general, we prefer an estimate with a small margin of error. The


margin of error gets smaller when:

• The confidence level decreases. To obtain a smaller margin of


error from the same data, you must be willing to accept a smaller
capture rate.
• The sample size 𝒏 increases. In general, increasing the sample
size n reduces the margin of error for any fixed confidence level.
Probability,
Normal Curves
and Sampling
Distribution
Lesson Plan
• Probability and statistics

• Understanding the difference between discrete and continuous


variables
• Calculating the probability of simple events
• Understanding the law of large numbers

• Understanding the characteristics of a normal curve


• Understanding the elements of the empirical rule
• Developing an understanding for standard score

• Understanding the components of Central Limit Theorem


Inferential statistics and Probability
• Inferential Statistics: Using data from sample to draw
conclusions about the population. We use a sample instead of an entire
population, because it is extremly difficult,
cost to much, to much poopulation

• Inferential Statistics always involve a degree of


uncertainty. How likely is it that our sample is
providing reliable information about a specific
characteristics of the population? that’s why
probability matters

3/18/2025 3
When discusing if something is going to happen or not
- ex sports, player stats

What is probability? Use 3 desmials places in probality


- so round to 3 decimal places

• A numerical measure between 0 and 1

• Describes the likelihood of events


Chance, probilty

0 .5 1

Impossibility Certainty of
of occurrence 50 /50 chance occurrence
Definitions
• Statistical Experiment: Any random activity that results in definite outcomes

• Sample space: the set of all possible outcomes of that experiment

• Events: Collection of one or more outcomes in a statistical experiment


Is the outcome of the experiment

Example: Coin toss

What are the possible outcomes?


Either heads or tails

What is the sample space? Heads and Tails


Notation
Simple events
• Events can be named with capital letters: A, B, C, …

• P indicates the probability of an event occurring


o So… P(A) is the probability of ‘A’ occurring

• What’s the probability of an event NOT occurring?


C = compliment of an event
- the probality of an event not occuring
1-P(A) = P(Ac)
• Complement of an Event notation
- n —> and
- U ---> or
P(A) + P(Ac) = 1 Converse rule!

In Class Example:
Probability of rolling a two
Probability Assignment
Equally likely outcomes
Number of Outcomes Favorable to Event A
P ( A) =
Total Number of Outcomes
f= frequency
n = sample size

Relative frequency

Probability of event = relative frequency = f/n


Intuition

Non-numerical approach
non- scientific approach
Note: Use the
Practice proper notation

Question:
A bag contains 5 red marbles, 3 blue marbles, and 2 green marbles.
If one marble is drawn at random from the bag, what is the
probability that:
• The marble drawn is red?
P(R) = 5/10. =0.5

• The marble drawn is blue or green?


P(BUG) = (3+2)/10 =0.5

• The marble drawn is not red?


c
P(R ) = 1 - 0.5 = 0.5

3/18/2025 8
Statistics and Probability
• Rare Event Rules:
• Probability of observed events are so small we conclude that
the assumption is probably not correct Ex fliping a coin 20
times and landing
on heads every time

• Used in inferential statistics


We take info from sample stat and use to draw conclusions

• Law of large numbers:


• As sample size increases, the relative frequency will get closer
to the theoretical probability.
Let's test
• Mutually Exclusive: this out!
• Events cannot both occur at the same time
Law of Large Numbers
• Toss a coin repeatedly. Record the number of heads you had.
What did you notice? As the number of coin flips go up, the closer we get to our theoretical probality

Relative
0.52 0.518 0.495 0.503 0.4996
Frequency
f = number of
104 259 495 1006 2498
heads
n = number of
200 500 1000 2000 5000
flips
Random Variables

Two types of random variables

You can’t have a half a child

Discrete random variables: EG:


number of children in a family

Continuous random variables:


height of a person
Probability distributions
• Assignment of probability of each distinctive value for discrete
variables and interval values for a continuous random variable.

Features of a discrete random Features of a continuous


variable: random variable:
Sum of all assignments must Exact probability of value
be equal to 1 equal to zero

Normal Distribution cure


- Bell curve
- Mean, median, mode are in middle of graph
- Are of the cure equals 1
No gap between the bars
Roughfly symetrical

Graphical display of discrete


variables
Why is the standard deviation is
equal to zero is perfectly in the
middle
- mean, median, mode (there is no
deviation to central tendacies)

3/18/2025 13
Continuous Variables: The Normal
Inflection point always

Distribution occur at 1 standard


deviation away form the
mean

The normal curve


never crowss the
w-xis, it only
approches it
Criteria for normal cure

The Normal Distribution


- Mode, meadian, mean
are in the center of the
curve
- It is symetrical,
- Does not cross the x-
axis

Both curves are


symmetrical, They both
have the same mean.

The difference is their


varriabilty (distribution)
- A less variabiltiy
- More varabiltiy
The spread
The area under of curve A
and B, are the same
of the data
because they equal to 1
(they are both normal
distribution curves)
Practice! Normal Distribution or
not?

No, becaus eit


crosses the avisis

No becaue the mean in not in


No, because the curve is not bell
the middle, and it is not
shape and it is multi modal
symetrical
The Standard Normal
Distribution

• The standard normal


distribution is the
normal distribution with
mean 0 and standard
deviation 1.

Take the normal curve and stadarize it by:


Making the mean equal to zero and every intral of 1 is a standard deveation away
Estimating probability under
normal curve
Empricaly rule is specific for
numeral distribution curves

Empirical Rule Using z-score and tables


• Approximately 68% of the • Indicates the number of
curve lies 1 standard standard deviations from
deviation from each side the mean
of the mean The area on the curve from
negative 1 to postive 1
• Positive score:
• 95% lies within 2 standard measurement is above
deviation from either side the mean
of the mean The area of the curve form
negative 2 to positive 2 • Negative score:
• 99.7% will lie within 3 measurement is below
standard deviation from the mean
either side of the mean The area of the curve
form negative 3 to
positive 3
The Empirical Rule
Practice!
• You are conducting research on the scores students receive on
their English exit exam. The data is normally distributed with a
mean of 67 and a standard deviation of 8.
1. Draw the normal distribution curve, indicating the mean and the
standard deviations on the curve.
2. Between what grades does 68% of your data fall between?
Question!

How can you find the probability of data that


lies 2.4 standard deviations away from the
No because it is not

mean? Does the empirical rule still work?


a full number

Use the Z-score!


Z-score
• To make a comparison we need the same standardized
measurement.
• Z-score: how many standard deviations away from the mean you
are. S-score is relation to the normal distribution

• Positive Z-Score: Above the mean

• Negative Z-score: Below the mean

x–µ
z=
x minus population mean,
devided by population


standard deviation
Practice!
• The below represents your score in your psychology class.

your grade μ σ

Psychology 68 65 6

What is your z-score? X -m


z=
s
=
(68 - 65)
= 0.50
6
Using the standard normal
distribution table
• When calculating the z-score, round to 2 decimal places

• For anything less than -3.49 use 0.000 for approximate the area

• For anything above 3.49 use 1.000 to approximate the area.


• P(z < b) = P(z ≤ b) Gretear then is the area above the z score

• P(z > c) = P(z ≥ c) Less then is the area blow the z score

How do we find the


probability? Use
the standard
normal distribution
table!
Using the standard normal
distribution table
probality of 1.587
Using the standard normal
distribution table

1 – area left to Z1

Probability of 0.8264 to the left of Z1

Z1 = .94
Area above is 1-0.8264=0.174
Probability to the right of Z1 is o.174
Using the standard normal
distribution table
Areas to the left of
Z2 – area left to Z1

Probility of Z2 minus Probilty of Z1


What is the area in blue on the curve?
Practice!

See the next slide for details!


What is the area in blue on the curve?
Practice!

P(-1.78 <z < 0.35) = P(z < 0.35) – P (z < -1.78 )


.6368

.6868 - .0375 = .5993


Inverse Normal Distribution

• Working backwards from a probability to find the x-


value.

• Look up the z-score that corresponds to the probability


of concern.
OR
• Look up the z-score for the probability for 1- area
under concern.
Inverse Normal Distribution
• What is the Z-score that corresponds to a probability of .9953?
• ANS: 2.60

• Given the Z-score above, and knowing that the population mean is
32 with a standard deviation of 2.5, find x

2.60 = x – 32 = 38.5
2.5
Inverse Normal Distribution
Practice!
Question:
A social science researcher is studying the number of hours students
spend on social media per week. The researcher finds that the average
(mean) time spent on social media is 15 hours with a standard deviation
of 4 hours.
• What is the z-score for a student who spends 20 hours per week on
social media?
• What is the probability a student spends more than 20 hours per
week on social media?

3/18/2025 33
What is a Sampling Distribution
• The sampling distribution of the statistic is the distribution of all
values of that statistic when all possible samples of the same
size are taken from the population.
o We use a set of samples We use several samples because
theire is variability from sample to
sample

• A sampling distribution provides information about how a


statistic (for example, the mean) varies from one sample to
another
• Why we use sampling distribution?
• To estimate the population parameters.
• Evaluate the reliability of our inferences
3/18/2025
Sampling Distribution
Finding Z for the sampling distribution of the mean

Bigger sample size the smaller standard error


because the smaple size is more repersentive
of the population

Standard Error:
Standard deviation
for sample
distribution
Central Limit Theorem
• For a population of any distribution,
the sample mean approaches a
normal distribution as the sample size
increases.
• Bigger sample size = more of a normal Symbol for standard
shape Error
Central Limit Theorem

What does the Central Limit Theorem tell us?


• If n≥ 30 our distribution should be normal

• If population distribution is symmetrical, the sampling distribution is norma


n≥ 15
• If population is normally distributed, the sampling distribution will be
normal regardless of sample size.
LET'S PRACTICE
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Central Limit Theorem: Video
• https://www.youtube.com/watch?v=jvoxEYmQHNM

• How is Central Limit Theorem depicted in this video?

You might also like