100% found this document useful (1 vote)

465 views52 pages

Chi Square Test

A contingency table is a cross-tabulation of n paired observations into categories. Each cell shows the count of observations that fall into category B defined by its row (r) and column (c) heading. The Chi-Square Test for independence is based on frequencies.

Uploaded by

Klein Chris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

465 views52 pages

Chi Square Test

Uploaded by

Klein Chris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 52

Chapter

Chi-Square Tests

Chi-Square Test for Independence Chi-Square Tests for Goodnessof-Fit Uniform Goodness-of-Fit Test Poisson Goodness-of-Fit Test Normal Chi-Square Goodnessof-Fit Test ECDF Tests (Optional)
McGraw-Hill/Irwin Copyright 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

Chi-Square Test for Independence

Contingency Tables
A contingency table is a cross-tabulation of n paired observations into categories. Each cell shows the count of observations that fall into the B category A defined by its row (r) and column (c) heading.

15-2

Chi-Square Test for Independence

Contingency Tables
For example:

Table 15.1

15-3

Chi-Square Test for Independence

Chi-Square Test
In a test of independence for an r x c contingency table, the hypotheses are H0: Variable A is independent of variable B H1: Variable A is not independent of variable B Use the chi-square test for independence to test these hypotheses. This non-parametric test is based on frequencies. The n data pairs are classified into c columns and r rows and then the observed frequency fjk is compared with the expected frequency ejk.

15-4

Chi-Square Test for Independence

Chi-Square Distribution
The critical value comes from the chi-square probability distribution with n degrees of freedom. n = degrees of freedom = (r 1)(c 1) where r = number of rows in the table c = number of columns in the table Appendix E contains critical values for righttail areas of the chi-square distribution. The mean of a chi-square distribution is n with variance 2n.

15-5

Chi-Square Test for Independence

Chi-Square Distribution
Consider the shape of the chi-square distribution:

Figure 15.1
15-6

Chi-Square Test for Independence

Expected Frequencies
Assuming that H0 is true, the expected frequency of row j and column k is: ejk = RjCk/n where Rj = total for row j (j = 1, 2, , r) Ck = total for column k (k = 1, 2, , c) n = sample size

15-7

Chi-Square Test for Independence

Steps in Testing the Hypotheses
Step 1: State the Hypotheses H0: Variable A is independent of variable B H1: Variable A is not independent of variable B Step 2: Specify the Decision Rule Calculate n = (r 1)(c 1) For a given a, look up the right-tail critical value (c2R) from Appendix E or by using Excel. Reject H0 if c2R > test statistic.

15-8

Chi-Square Test for Independence

Steps in Testing the Hypotheses
For example, for n = 6 and a = .05, c2.05 = 12.59.

Figure 15.2
15-9

Chi-Square Test for Independence

Steps in Testing the Hypotheses
Here is the rejection region.

Figure 15.3
15-10

Chi-Square Test for Independence

Steps in Testing the Hypotheses

Step 3: Calculate the Expected Frequencies ejk = RjCk/n For example,

15-11

Chi-Square Test for Independence

Steps in Testing the Hypotheses
Step 4: Calculate the Test Statistic The chi-square test statistic is
calc

Step 5: Make the Decision Reject H0 if c2R > test statistic or if the p-value < a.

15-12

Chi-Square Test for Independence

Small Expected Frequencies

The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: Cochrans Rule requires that ejk > 5 for all cells. Up to 20% of the cells may have ejk < 5 Most agree that a chi-square test is infeasible if ejk < 1 in any cell. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.

15-13

Chi-Square Test for Independence

Cross-Tabulating Raw Data
Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories.

For example, the variables Infant Deaths per 1,000 and Doctors per 100,000 can each be coded into various categories:
Figure 15.6
15-14

Chi-Square Test for Goodness-ofFit

Why Do a Chi-Square Test on Numerical Data?
The researcher may believe theres a relationship between X and Y, but doesnt want to use regression. There are outliers or anomalies that prevent us from assuming that the data came from a normal population. The researcher has numerical data for one variable but not the other.

15-15

Chi-Square Test for Independence

3-Way Tables and Higher
More than two variables can be compared using contingency tables. However, it is difficult to visualize a higher order table. For example, you could visualize a cube as a stack of tiled 2-way contingency tables. Major computer packages permit 3-way tables.

15-16

Chi-Square Test for Goodness-ofFit

Purpose of the Test
The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. The chi-square test will be used because it is versatile and easy to understand.

15-17

Chi-Square Test for Goodness-ofFit

Multinomial GOF Test

A multinomial distribution is defined by any k probabilities p1, p2, , pk that sum to unity. For example, consider the following official proportions of M&M colors.

calc

15-18

Chi-Square Test for Goodness-ofFit

Multinomial GOF Test
The hypotheses are
H0: p1 = .30, p2 = .20, p3 = .10, p4 = .10, p5 = .10, p6 = .20

H1: At least one of the pj differs from the hypothesized value No parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom are n=cm1=60-1

15-19

Chi-Square Test for Goodness-ofFit

Hypotheses for GOF
The hypotheses are:
H0: The population follows a _____ distribution H1: The population does not follow a ______ distribution

The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).

15-20

Chi-Square Test for Goodness-ofFit

Test Statistic and Degrees of Freedom for GOF
Assuming n observations, the observations are grouped into c classes and then the chisquare test statistic is found using:
calc

where

fj = the observed frequency of observations in class j ej = the expected frequency in class j if H0 were true

15-21

Chi-Square Test for Goodness-ofFit

Test Statistic and Degrees of Freedom for GOF
If the proposed distribution gives a good fit to the sample, the test statistic will be near zero. The test statistic follows the chi-square distribution with degrees of freedom n=cm1 where c is the no. of classes used in the test m is the no. of parameters estimated

15-22

Chi-Square Test for Goodness-ofFit

Test Statistic and Degrees of Freedom for GOF
v c m c 0 1 c 1 v c m c 1 1 c 2 v c m c 2 1 c 3

15-23

Chi-Square Test for Goodness-ofFit

Data-Generating Situations
Instead of fishing for a good-fitting model, visualize a priori the characteristics of the underlying data-generating process.

Mixtures: A Problem
Mixtures occur when more than one datagenerating process is superimposed on top of one another.

15-24

Chi-Square Test for Goodness-ofFit

Eyeball Tests
A simple eyeball inspection of the histogram or dot plot may suffice to rule out a hypothesized population.

Small Expected Frequencies

Goodness-of-fit tests may lack power in small samples. As a guideline, a chi-square goodness-of-fit test should be avoided if n < 25.

15-25

Uniform Goodness-of-Fit Test

Uniform Distribution
The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence. The chi-square test for a uniform distribution compares all c groups simultaneously. The hypotheses are: H0: p1 = p2 = , pc = 1/c H1: Not all pj are equal

15-26

Uniform Goodness-of-Fit Test

Uniform GOF Test: Grouped Data

15-27

The test can be performed on data that are already tabulated into groups. Calculate the expected frequency ej for each cell. The degrees of freedom are n = c 1 since there are no parameters for the uniform distribution. Obtain the critical value c2a from Appendix E for the desired level of significance a. The p-value can be obtained from Excel. Reject H0 if p-value < a.

Uniform Goodness-of-Fit Test

Uniform GOF Test: Raw Data

First form c bins of equal width and create a frequency distribution. Calculate the observed frequency fj for each bin. Define ej = n/c. Perform the chi-square calculations. The degrees of freedom are n = c 1 since there are no parameters for the uniform distribution. Obtain the critical value from Appendix E for a given significance level a and make the decision.

15-28

Uniform Goodness-of-Fit Test

Uniform GOF Test: Raw Data
Maximize the tests power by defining bin width as

As a result, the expected frequencies will be as large as possible.

15-29

Uniform Goodness-of-Fit Test

Uniform GOF Test: Raw Data
Calculate the mean and standard deviation of the uniform distribution as: m = (a + b)/2 s= [(b a + 1)2 1)/12 If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed. So, test the hypothesized uniform mean using

15-30

Poisson Goodness-of-Fit Test

Poisson Data-Generating Situations
In a Poisson distribution model, X represents the number of events per unit of time or space. X is a discrete nonnegative integer (X = 0, 1, 2, ) Event arrivals must be independent of each other. Sometimes called a model of rare events because X typically has a small mean.

15-31

Poisson Goodness-of-Fit Test

Poisson Goodness-of-Fit Test
The mean l is the only parameter. Assuming that l is unknown and must be estimated from the sample, the steps are: Step 1: Tally the observed frequency fj of each X-value. Step 2: Estimate the mean l from the sample. Step 3: Use the estimated l to find the Poisson probability P(X) for each value of X.

15-32

Poisson Goodness-of-Fit Test

Poisson Goodness-of-Fit Test
Step 4: Multiply P(X) by the sample size n to get expected Poisson frequencies ej. Step 5: Perform the chi-square calculations. Step 6: Make the decision. You may need to combine classes until expected frequencies become large enough for the test (at least until ej > 2).

15-33

Poisson Goodness-of-Fit Test

Poisson GOF Test: Tabulated Data
Calculate the sample mean as:
^= l
j =1

S xj fj n

Using this estimate mean, calculate the Poisson probabilities either by using the Poisson formula P(x) = (lxe-l)/x! or Excel.

15-34

Poisson Goodness-of-Fit Test

Poisson GOF Test: Tabulated Data
For c classes with m = 1 parameter estimated, the degrees of freedom are n=cm1 Obtain the critical value for a given a from Appendix E. Make the decision.

15-35

Normal Chi-Square Goodness-of-Fit Test

Normal Data Generating Situations
Two parameters, m and s, fully describe the normal distribution. Unless m and s are know a priori, they must be estimated from a sample by using x and s. Using these statistics, the chi-square goodness-of-fit test can be used.

15-36

Normal Chi-Square Goodness-of-Fit Test

Method 1: Standardizing the Data
Transform the sample observations x1, x2, , xn into standardized values.

Count the sample observations fj within intervals of the form x + ks and compare them with the known frequencies ej based on the normal distribution.

15-37

Normal Chi-Square Goodness-of-Fit Test

Method 1: Standardizing the Data
Advantage is a standardized scale. Disadvantage is that data are no longer in the original units.
Figure 15.14
15-38

Normal Chi-Square Goodness-of-Fit Test

Method 2: Equal Bin Widths
To obtain equal-width bins, divide the exact data range into c groups of equal width. Step 1: Count the sample observations in each bin to get observed frequencies fj. Step 2: Convert the bin limits into standardized z-values by using the formula.

15-39

Normal Chi-Square Goodness-of-Fit Test

Method 2: Equal Bin Widths
Step 3: Find the normal area within each bin assuming a normal distribution. Step 4: Find expected frequencies ej by multiplying each normal area by the sample size n. Classes may need to be collapsed from the ends inward to enlarge expected frequencies.

15-40

Normal Chi-Square Goodness-of-Fit Test

Method 3: Equal Expected Frequencies
Define histogram bins in such a way that an equal number of observations would be expected within each bin under the null hypothesis. Define bin limits so that ej = n/c A normal area of 1/c in each of the c bins is desired. The first and last classes must be open-ended for a normal distribution, so to define c bins, we need c 1 cutpoints.

15-41

Normal Chi-Square Goodness-of-Fit Test

Method 3: Equal Expected Frequencies
The upper limit of bin j can be found directly by using Excel. Alternatively, find zj for bin j using Excel and then calculate the upper limit for bin j as x + zjs Once the bins are defined, count the observations fj within each bin and compare them with the expected frequencies ej = n/c.

15-42

Normal Chi-Square Goodness-of-Fit Test

Method 3: Equal Expected Frequencies
Standard normal cutpoints for equal area bins.

Table 15.16
15-43

Normal Chi-Square Goodness-of-Fit Test

Histograms

The fitted normal histogram gives visual clues as to the likely outcome of the GOF test. Histograms reveal any outliers or other nonnormality issues. Further tests are needed since histograms vary.

Figure 15.15
15-44

Normal Chi-Square Goodness-of-Fit Test

Critical Values for Normal GOF Test
Since two parameters, m and s, are estimated from the sample, the degrees of freedom are n = c m 1

Table 15.19

15-45

At least 4 bins are needed to ensure 1 df.

ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests
There are many alternatives to the chi-square test based on the Empirical Cumulative Distribution Function (ECDF). The Kolmogorov-Smirnov (K-S) test statistic D is the largest absolute difference between the actual and expected cumulative relative frequency of the n data values: D = Max |Fa Fe| The K-S test is not recommended for grouped data.

15-46

ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests

15-47

Fa is the actual cumulative frequency at observation i. Fe is the expected cumulative frequency at observation i under the assumption that the data came from the hypothesized distribution. The K-S test assumes that no parameters are estimated. If parameters are estimated, use a Lilliefors test. Both of these tests are done by computer.

ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests

K-S test for uniformity.

Figure 15.20
15-48

ECDF Tests
Kolmogorov-Smirnov and Lilliefors Tests

K-S test for normality.

Figure 15.21
15-49

ECDF Tests
Anderson-Darling Tests
The Anderson-Darling (A-D) test is widely used for non-normality because of its power. The A-D test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test statistic measures the overall distance between the actual and the hypothesized distributions, using a weighted squared distance.

15-50

ECDF Tests
Anderson-Darling Tests with MINITAB

Figure 15.22

15-51

Applied Statistics in Business & Economics

End of Chapter 15

15-52

R Statistics For Comparing Means Interior
100% (1)
R Statistics For Comparing Means Interior
205 pages
Practice-Exam-for-Design-of-Experiments-DOE
100% (1)
Practice-Exam-for-Design-of-Experiments-DOE
30 pages
G Power Manual
0% (1)
G Power Manual
82 pages
Parametric Test
No ratings yet
Parametric Test
28 pages
Logistic Regression: Prof. Andy Field
No ratings yet
Logistic Regression: Prof. Andy Field
34 pages
ANOVA Homework
No ratings yet
ANOVA Homework
7 pages
06 - Independent Sample T Test
No ratings yet
06 - Independent Sample T Test
11 pages
Aggregate Planning, Linear Programming and Excel Solver
No ratings yet
Aggregate Planning, Linear Programming and Excel Solver
7 pages
Chi-Square Test A Nonparametric Hypothesis Test
No ratings yet
Chi-Square Test A Nonparametric Hypothesis Test
52 pages
Hypothesis Testing For One Population Parameter - Samples
100% (1)
Hypothesis Testing For One Population Parameter - Samples
68 pages
Anova Probs
No ratings yet
Anova Probs
4 pages
Wilcoxon Signed-Ranks Test
No ratings yet
Wilcoxon Signed-Ranks Test
16 pages
Inferential Statistics
No ratings yet
Inferential Statistics
29 pages
Paired T Test
No ratings yet
Paired T Test
24 pages
Statistics and Freq Distribution
No ratings yet
Statistics and Freq Distribution
35 pages
Superlab Manual PDF
100% (1)
Superlab Manual PDF
76 pages
Answers To Exercises and Review Questions: T-Test
100% (1)
Answers To Exercises and Review Questions: T-Test
27 pages
Dsur I Chapter 18 Categorical Data
No ratings yet
Dsur I Chapter 18 Categorical Data
47 pages
Student's T-Test
100% (1)
Student's T-Test
11 pages
Hypothesis Testing
100% (1)
Hypothesis Testing
60 pages
Statistics Using Stata An Integrative Approach: Weinberg and Abramowitz 2016
No ratings yet
Statistics Using Stata An Integrative Approach: Weinberg and Abramowitz 2016
46 pages
Intuitive Biostatistics: Choosing A Statistical Test
No ratings yet
Intuitive Biostatistics: Choosing A Statistical Test
5 pages
Sample Size Determination PDF - Houndolo
100% (2)
Sample Size Determination PDF - Houndolo
52 pages
Random Effects Models
No ratings yet
Random Effects Models
37 pages
Ttest
No ratings yet
Ttest
8 pages
Quasi Experimental Research Design: Arlene C. Mendoza
100% (1)
Quasi Experimental Research Design: Arlene C. Mendoza
19 pages
Quartile & Deviation
No ratings yet
Quartile & Deviation
31 pages
Chi Square Test
100% (2)
Chi Square Test
75 pages
SPC Charts - Statistical Process Control Charts PDF
No ratings yet
SPC Charts - Statistical Process Control Charts PDF
6 pages
Confidence Intervals: Submitted To: Prof. Neeta Gupta
100% (2)
Confidence Intervals: Submitted To: Prof. Neeta Gupta
13 pages
Guidelines On Writing Learning Outcomes
No ratings yet
Guidelines On Writing Learning Outcomes
4 pages
Spss Syllabus
No ratings yet
Spss Syllabus
3 pages
ExP Psych 14
100% (1)
ExP Psych 14
29 pages
Characteristics of Good Assessment
No ratings yet
Characteristics of Good Assessment
18 pages
Week 6 - Result Analysis 2b
No ratings yet
Week 6 - Result Analysis 2b
60 pages
Different Types of Sampling Designs
100% (6)
Different Types of Sampling Designs
12 pages
Chapter 6 The 2 Factorial Design
No ratings yet
Chapter 6 The 2 Factorial Design
50 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
Anova PDF
100% (1)
Anova PDF
7 pages
What Is Hypothesis Testing
No ratings yet
What Is Hypothesis Testing
18 pages
ANOVA Presentation
No ratings yet
ANOVA Presentation
12 pages
Hypothesis Testing Unit-4
No ratings yet
Hypothesis Testing Unit-4
26 pages
Irt vs. CTT
No ratings yet
Irt vs. CTT
4 pages
Introduction To Path Analysis and SEM With AMOS
0% (1)
Introduction To Path Analysis and SEM With AMOS
41 pages
Independent T-Test For Two Samples
No ratings yet
Independent T-Test For Two Samples
3 pages
Chi Square Statistics
No ratings yet
Chi Square Statistics
7 pages
ExperimentalPsychology 7e Ch14 PowerPoint
No ratings yet
ExperimentalPsychology 7e Ch14 PowerPoint
29 pages
R Tutorials - Independent Samples T Test
100% (1)
R Tutorials - Independent Samples T Test
5 pages
What Is Hypothesis Testing
100% (1)
What Is Hypothesis Testing
32 pages
Two-Way Anova: (BS Chem 3B - Group 2)
No ratings yet
Two-Way Anova: (BS Chem 3B - Group 2)
21 pages
Hypothesis
100% (1)
Hypothesis
61 pages
Non-Parametric Tests
100% (1)
Non-Parametric Tests
10 pages
Mann Whitney Wilcoxon Tests (Simulation)
No ratings yet
Mann Whitney Wilcoxon Tests (Simulation)
16 pages
Hypothesis Testing
0% (1)
Hypothesis Testing
139 pages
2010 AP Statistics Free Response Solutions
No ratings yet
2010 AP Statistics Free Response Solutions
3 pages
Chi Square Test
No ratings yet
Chi Square Test
5 pages
Doane_ASBE_6e_Chap15_PPT_accessible
No ratings yet
Doane_ASBE_6e_Chap15_PPT_accessible
56 pages
Ermi Stat LL CH 4
No ratings yet
Ermi Stat LL CH 4
32 pages
QM Lecture 10 - Chi Square Tests (1)
No ratings yet
QM Lecture 10 - Chi Square Tests (1)
48 pages
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Intorductory Statistics, Chapter 1 - Prem Mann
83% (6)
Intorductory Statistics, Chapter 1 - Prem Mann
47 pages
Inference For Numerical Data
No ratings yet
Inference For Numerical Data
3 pages
Advanced_Research_Methodology (Survey Design)
No ratings yet
Advanced_Research_Methodology (Survey Design)
45 pages
Unit 2
No ratings yet
Unit 2
17 pages
Lecture 6. Bayesian Estimation
No ratings yet
Lecture 6. Bayesian Estimation
14 pages
Econometrics-Ii Quiz
No ratings yet
Econometrics-Ii Quiz
1 page
HW1 Sol
No ratings yet
HW1 Sol
3 pages
5.5 Normal Approximations To Binomial Distributions
No ratings yet
5.5 Normal Approximations To Binomial Distributions
27 pages
Week1 Questions and Answers
No ratings yet
Week1 Questions and Answers
9 pages
Using The Students T-Test With Extremely Small Sample Sizes
No ratings yet
Using The Students T-Test With Extremely Small Sample Sizes
13 pages
Question - A: Parameters Smoker Non-Smoker Total Men Women Total
No ratings yet
Question - A: Parameters Smoker Non-Smoker Total Men Women Total
7 pages
DISC 203-Probability and Statistics-Muhammad Asim PDF
No ratings yet
DISC 203-Probability and Statistics-Muhammad Asim PDF
6 pages
Chi Square
No ratings yet
Chi Square
28 pages
SQQS1013 Elementary Statistics: Possibility A Particular Event Will Occur
No ratings yet
SQQS1013 Elementary Statistics: Possibility A Particular Event Will Occur
33 pages
The Sampling Distribution Would Have Less Dispersion. An Extended Example
No ratings yet
The Sampling Distribution Would Have Less Dispersion. An Extended Example
10 pages
Review Sample Size Determination For Crombachs Alpha
No ratings yet
Review Sample Size Determination For Crombachs Alpha
15 pages
in Binomial Distribution, The Formula of Calculating Standard Deviation Is A) Square Root of P B) Square Root of PQ D) Square Root of NP 2
No ratings yet
in Binomial Distribution, The Formula of Calculating Standard Deviation Is A) Square Root of P B) Square Root of PQ D) Square Root of NP 2
14 pages
An Introduction To Bayesian Statistics
100% (9)
An Introduction To Bayesian Statistics
20 pages
Research Methodology Research Design Comprehensive Exam Study Guide
No ratings yet
Research Methodology Research Design Comprehensive Exam Study Guide
3 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
48 pages
Chapter Four Presentation of Results and Discussion
No ratings yet
Chapter Four Presentation of Results and Discussion
15 pages
Two Way Analysis of Variance For Independent Samples I
No ratings yet
Two Way Analysis of Variance For Independent Samples I
4 pages
Pressure Calibrator MSATest Method 1 Rev 5
No ratings yet
Pressure Calibrator MSATest Method 1 Rev 5
8 pages
09 Chapter 4 & 5
No ratings yet
09 Chapter 4 & 5
20 pages
Studi Deskriptif Effect Size Penelitian
No ratings yet
Studi Deskriptif Effect Size Penelitian
17 pages
Lesson Exemplar Template
No ratings yet
Lesson Exemplar Template
4 pages
Essential Statistics 2E: William Navidi and Barry Monk
No ratings yet
Essential Statistics 2E: William Navidi and Barry Monk
30 pages
TASK 9 Statistik Inferensial (Hasnah Zaini - 22006147)
No ratings yet
TASK 9 Statistik Inferensial (Hasnah Zaini - 22006147)
6 pages