0% found this document useful (0 votes)
2 views

Chi Square Distribution

Uploaded by

23070241047
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chi Square Distribution

Uploaded by

23070241047
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Chi-Square (Χ²) Distributions

A chi-square (Χ2) distribution is a continuous probability distribution that is used in


many hypothesis tests.
The shape of a chi-square distribution is determined by the parameter k. The graph
below shows examples of chi-square distributions with different values of k.
What is a chi-square
distribution?
Chi-square (Χ2) distributions are a family of continuous probability distributions. They’re
widely used in hypothesis tests, including the chi-square goodness of fit test and the chi-
square test of independence.

The shape of a chi-square distribution is determined by the parameter k, which represents


the degrees of freedom.

Very few real-world observations follow a chi-square distribution. The main purpose of chi-
square distributions is hypothesis testing, not describing real-world distributions.

In contrast, most other widely used distributions, like normal distributions or Poisson
distributions, can describe useful things such as newborns’ birth weights or disease cases
per year, respectively.
Relationship to the standard
normal distribution
Chi-square distributions are useful for hypothesis testing because of their close relationship to
the standard normal distribution. The standard normal distribution, which is a normal
distribution with a mean of zero and a variance of one, is central to many important statistical
tests and theories.

Imagine taking a random sample of a standard normal distribution (Z). If you squared all the
values in the sample, you would have the chi-square distribution with k = 1.
Χ21 = (Z)2

Now imagine taking samples from two standard normal distributions (Z1 and Z2). If each time
you sampled a pair of values, you squared them and added them together, you would have
the chi-square distribution with k = 2.
Χ22 = (Z1)2 + (Z2)2

More generally, if you sample from k independent standard normal distributions and then
square and sum the values, you’ll produce a chi-square distribution with k degrees of
freedom.
Χ2k = (Z1)2 + (Z2)2 + … + (Zk)2
Chi-square test statistics
(formula)
Chi-square tests are hypothesis tests with test statistics that follow a chi-square
distribution under the null hypothesis. Pearson’s chi-square test was the first chi-square
test to be discovered and is the most widely used.

Pearson’s chi-square test statistic is:


Formula Explanation
Where
• X² is the chi-square test statistic
• is the summation operator
• is the observed frequency
• is the expected frequency
The shape of chi-square
distributions
We can see how the shape of a chi-square distribution changes as the degrees of freedom (k)
increase by looking at graphs of the chi-square probability density function. A probability density
function is a function that describes a continuous probability distribution.

When k is one or two

When k is one or two, the chi-square distribution is


a curve shaped like a backwards “J.”
The curve starts out high and then drops off,
meaning that there is a high probability that Χ² is
close to zero.
When k is greater than two
When k is greater than two, the chi-square distribution is
hump-shaped. The curve starts out low, increases, and then
decreases again. There is low probability that Χ² is very close to
or very far from zero. The most probable value of Χ² is Χ² − 2.

When k is only a bit greater than two, the distribution is much


longer on the right side of its peak than its left (i.e., it is strongly
right-skewed).

As k increases, the distribution looks more and


more similar to a normal distribution. In fact,
when k is 90 or greater, a normal distribution is a
good approximation of the chi-square distribution.
Properties of chi-square
distributions
Chi-square distributions start at zero and continue to infinity. The chi-square distribution
starts at zero because it describes the sum of squared random variables, and a squared
number can’t be negative.

The mean (μ) of the chi-square distribution is its degrees of freedom, k. Because the chi-
square distribution is right-skewed, the mean is greater than the median and mode. The
variance of the chi-square distribution is 2k.
Property Value
Continuous or discrete Continuous
Mean k
Mode k − 2 (when k > 2)
Variance 2k
Standard deviation
Range 0 to ∞
Symmetry Asymmetrical (right-skewed), but increasingly
symmetrical as k increases.
Example applications of chi-square
distributions
The chi-square distribution makes an appearance in many statistical tests and
theories. The following are a few of the most common applications of the chi-square
distribution.

Pearson’s chi-square test

One of the most common applications of chi-square distributions is Pearson’s chi-square


tests. Pearson’s chi-square tests are statistical tests for categorical data. They’re used to
determine whether your data are significantly different from what you expected. There
are two types of Pearson’s chi-square tests:

•Chi-square goodness of fit test

•Chi-square test of independence


Chi-Square Goodness of Fit Test

A chi-square (Χ2) goodness of fit test is a type of Pearson’s chi-square test. You can use it
to test whether the observed distribution of a categorical variable differs from your
expectations.

Example: Chi-square goodness of fit test You're hired by a dog food company to help them test three new dog
food flavors.

You recruit a random sample of 75 dogs and offer each dog a choice between the three flavors by placing
bowls in front of them. You expect that the flavors will be equally popular among the dogs, with about 25 dogs
choosing each flavor.

Once you have your experimental results, you plan to use a chi-square goodness of fit test to figure out
whether the distribution of the dogs’ flavor choices is significantly different from your expectations.
What is the chi-square goodness
of fit test?
A chi-square (Χ2) goodness of fit test is a goodness of fit test for a categorical variable.
Goodness of fit is a measure of how well a statistical model fits a set of observations.

•When goodness of fit is high, the values expected based on the model are close to the
observed values.
•When goodness of fit is low, the values expected based on the model are far from the
observed values.

The statistical models that are analyzed by chi-square goodness of fit tests
are distributions. They can be any distribution, from as simple as equal probability for
all groups, to as complex as a probability distribution with many parameters.
Hypothesis testing

The chi-square goodness of fit test is a hypothesis test. It allows you to


draw conclusions about the distribution of a population based on a sample. Using the
chi-square goodness of fit test, you can test whether the goodness of fit is “good
enough” to conclude that the population follows the distribution.
With the chi-square goodness of fit test, you can ask questions such as: Was this sample
drawn from a population that has…

•Equal proportions of male and female turtles?


•Equal proportions of red, blue, yellow, green, and purple jelly beans?
•90% right-handed and 10% left-handed people?
•Offspring with an equal probability of inheriting all possible genotypic combinations
(i.e., unlinked genes)?
•A Poisson distribution of floods per year?
•A normal distribution of bread prices?
Example: Observed and expected frequencies
After weeks of hard work, your dog food experiment is complete and you compile your data in a
table:
Observed and expected frequencies of dogs’ flavor choices
Flavor Observed Expected
Garlic Blast 22 25
Blueberry Delight 30 25
Minty Munch 23 25

To help visualize the differences between your


observed and expected frequencies, you also
create a bar graph:
The president of the dog food company looks at your graph and
declares that they should eliminate the Garlic Blast and Minty
Munch flavors to focus on Blueberry Delight. “Not so fast!” you tell
him.

You explain that your observations were a bit different from what
you expected, but the differences aren’t dramatic. They could be
the result of a real flavor preference or they could be due to
chance.

To put it another way: You have a sample of 75 dogs, but what you
really want to understand is the population of all dogs. Was this
sample drawn from a population of dogs that choose the three
flavors equally often?
Step 1: Create a table
Create a table with the observed and expected frequencies in two
columns.
Example: Step 1
Flavor Observed Expected
Garlic Blast 22 25
Blueberry Delight 30 25
Minty Munch 23 25

Step 2: Calculate O − E
Add a new column called “O − E”. Subtract the expected frequencies from the
observed frequency.
Example: Step 2
Flavor Observed Expected O−E
Garlic Blast 22 25 22 − 25 = −3
Blueberry Delight 30 25 5
Minty Munch 23 25 −2
Step 3: Calculate (O − E)2
Add a new column called “(O − E)2”. Square the values in the previous
column.
Example: Step 3
Flavor Observed Expected O−E (O − E)2
Garlic Blast 22 25 −3 (−3)2 = 9
Blueberry Delight 30 25 5 25
Minty Munch 23 25 −2 4

Step 4: Calculate (O − E)2 / E


Add a final column called “(O − E)² / E“. Divide the previous column by the expected
frequencies.
Example: Step 4
Flavor Observed Expected O−E (O − E)2 (O − E)² / E
Garlic Blast 22 25 −3 9 9/25 = 0.36
Blueberry 30 25 5 25 1
Delight
Minty Munch 23 25 −2 4 0.16
Step 5: Calculate Χ2
Add up the values of the previous column. This is the chi-square test statistic (Χ 2).

Example: Step 5
Flavor Observed Expected O−E (O − E)2 (O − E)2 / E
Garlic Blast 22 25 −3 9 9/25 = 0.36
Blueberry 30 25 5 25 1
Delight
Minty Munch 23 25 −2 4 0.16

Χ2 = 0.36 + 1 + 0.16 = 1.52


Example 2:
Is gender independent of education level? A random sample of 395 people were surveyed and each
person was asked to report the highest education level they obtained. The data that resulted from
the survey is summarized in the following table:
High School Bachelors Masters Ph.d. Total
Female 60 54 46 41 201
Male 40 44 53 57 194
Total 100 98 99 98 395

Question: Are gender and education level dependent at 5% level of significance? In other words, given the data
collected above, is there a relationship between the gender of an individual and the level of education that they
have obtained?
ormula
= (row total * column total)/sample size

Here's the table of expected counts:

High School Bachelors Masters Ph.d. Total


Female 50.886 49.868 50.377 49.868 201
Male 49.114 48.132 48.623 48.132 194
Total 100 98 99 98 395

So, working this out,


χ2=(60−50.886)^2/50.886+⋯+(57−48.132)^2/48.132=8.006

The critical value of χ2 with 3 degree of freedom is 7.815. Since 8.006 > 7.815, we reject the
null hypothesis and conclude that the education level depends on gender at a 5% level of
significance.

You might also like