0% found this document useful (0 votes)
54 views

Notes

The document contains class notes and assignments for a statistics course covering chi-square goodness of fit tests, two-way tables, and modeling. It includes examples of calculating chi-square statistics, expected counts for two-way tables under the null hypothesis, and assignments such as worksheets, exercises, and a lab due. Key topics covered are chi-square distribution properties, calculating chi-square by

Uploaded by

Avery Hall
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Notes

The document contains class notes and assignments for a statistics course covering chi-square goodness of fit tests, two-way tables, and modeling. It includes examples of calculating chi-square statistics, expected counts for two-way tables under the null hypothesis, and assignments such as worksheets, exercises, and a lab due. Key topics covered are chi-square distribution properties, calculating chi-square by

Uploaded by

Avery Hall
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Thursday, March 14, 2019: Section 13.1: Test for Goodness of Fit Read p.

727 - 737
• Exercises *Extra problem, 13.2, 13.3. 13.4, 13.13, (Ignore directions on book exercises. Use tables and  2 cdf to
find values for  and p .
2
Show work.)
Night: Sun Mon Tues Wed Thurs Fri Sat • *Extra Problem for 13.1: A school’s principal
Average 130 108 115 104 99 37 62 wants to know if students spend about the same amount
Time:
of time on homework each night of the week. She asks a
random sample of 50 students to keep track of their homework time for a week. The following table displays the
average amount of time (in minutes) students reported per night: Explain
carefully why it would not be appropriate to perform a chi-square test for goodness of fit using these data.
Friday, March 15, 2019: Section 13.1: Test for Goodness of Fit
• Exercises 13.10, 13.31,
• 13.2 Intro worksheet
• Question: What is the critical value needed to reject the null hypothesis at the .05 level for a four category GOF?
What is the maximum critical value to fail to reject the null of a 10 category GOF at the .05 level?
• Goodness of Fit Lab (Due Wednesday)
Monday, March 18,2019: Section 13.2: Inference for Two-Way Tables and Section 14.1: Inference About the Model
Read p. 744 – 766
• 13.2 WS I
Tuesday, March 19, 2019: Section 13.2: Inference for Two-Way Tables and Section 14.1: Inference About the Model
• Read p. 781 – 806 Section
• 14.1 Intro Worksheet
Wednesday, March 20, 2019: Sections 14.1: Inference About the Model and 14.2: Predictions and Conditions
• Read p. 781 – 806
• Lab is due
• 14.1 WS I
Thursday, March 21, 2019 Sections 14.1: Inference About the Model and 14.2: Predictions and Conditions
• 14.2 WS II
Friday, March 22, 2019: Sections 14.1: Inference About the Model and 14.2: Predictions and Conditions
• FRQ 1,2 MC1-6
Monday, March 25, 2019: Review
• MC 7-12, FRQ 3
• FRQ Review
Tuesday, March 26, 2019: Wednesday, March 27, 2019: Chapters 13 and 14 Test
AP Statistics Section 13.1 Notes (Day 1)
• Sometimes we want to examine the of a single
variable in a population.
• The allows
us to determine whether a hypothesized distribution seems valid.
I. Definition: Chi-Square Statistic
The chi-square statistic is a measure of how far the observed counts are from the expected counts. The
formula for the statistic
2 =
where the sum is over all possible outcomes.

• NOT Normal -- right skewed


• ONLY non-negative values (Why?)
• When the expected counts are at least , the sampling distribution of the chi-square statistic
is modeled well by a chi-square distribution with degrees of freedom equal to
________________.
• The mean of a particular chi-square distribution is equal to its degrees of freedom.
• For df  2 , the mode (peak) of the chi-square density curve is at df – 2.
II. Calculating  2 by definition.
Examples:
1) Do seagulls have a preference for where they land? To answer this question, biologists conducted a
study in a enclosed outdoor space with a piece of shore whose area was made up of 56% sand, 29% mud, and
15% rocks. The biologist chose a random sample of seagulls. Each seagull was released into the outdoor
space on its own and observed until it landed somewhere on the piece of shore. In all, 128 seagulls landed on
the sand, 61 landed in the mud, and 11 landed on the rocks. Calculate the  2 statistic and associated probability
for these data (don’t forget to list df).
2) It seems as if people should have birthdays evenly distributed among the four quarters of the year. A
random sample of people’s birthdates is given below. Calculate the  2 statistic and associated
probability for these data (don’t forget to list df).
Birthday: Jan-Mar Apr-Jun Jul-Sep Oct-Dec
Number of People: 32 20 16 12

3) Biologists wish to mate pairs of fruit flies having genetic makup RrCc, indicating that each has one
dominant gene (R) and one recessive gene (r) for eye color, along with one dominant (C) and one
recessive (c) gene for wing type. Each offspring will receive one gene for each of the two traits from
each parent. The following table, known as a Punnett square, shows the possible combinations of genes
received by the offspring:
RC Rc rC rc Any offspring receiving an R gene will have
RC RRCC (x) RRCc (x) RrCC (x) RrCc (x) red eyes, and any offspring receiving a C gene
Rc RRCc (x) RRcc (y) RrCc (x) Rrcc (y)
rC RrCC (x) RrCc (x) rrCC (z) rrCc (z) will have straight wings. Thus, based on this
rc RrCc (x) Rrcc (y) rrCc (z) Rrcc (w) Punnett Square, biologists predict a ratio of
9:3:3:1

• 9 red-eyed, straight-winged offspring (x) To test their hypothesis about the distribution of offspring, the
• 3 red-eyed, curly-winged offspring (y) biologists mate a random sample of pairs of fruit flies. Of 180
• 3 white-eyed, straight-winged offspring (z) offspring, 99 had red eyes and straight wings, 42 had red eyes
• 1 white-eyed, curcly-winged offspring (w) and curly wings, 29 had white eyes and straight wings, and 10
had white eyes and curly wings. Calculate the  2 statistic and
associated probability for these data (don’t forget to list df).
Chi-Square Goodness of Fit test
Conditions: Random sample; n < 1/10 N, All expected counts > 5.
(𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝑥2 = ∑ 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑

Example 3: Accident records at a large engineering company are listed below:


Shift Morning Afternoon Night
Number of injuries 897 905 967
Is there sufficient evidence to say that the number of accidents during the three shifts are not the same?
Section 13.2 Intro WS
Market researchers suspect that background music may affect the mood and buying behavior of
Type of Music customers. One study in a European restaurant
Entrée Ordered None French Italian Total compared three randomly assigned treatments: no
French 30 39 30 99 music, French accordion music, and Italian string
Italian 11 1 19 31
Other 43 35 35 113 music. Under each condition, the researchers
Total 84 75 84 243 recorded the number of customers who ordered
French, Italian, and other entrees. Here is a table
that summarizes the data. The null hypothesis in the restaurant experience is that there is no difference
in the distribution of entrees ordered when no music, French accordion music, or Italian string music is
played.
(1) Rationale behind Expected Counts for Two-way Tables
• If the specific type of music that’s playing has no effect on entrée orders, the proportion of French
entrees ordered under each music condition should be 99 .
243

• Since there were 84 total entrees ordered when no music was played, the expected count of French
entrees would be 84 • 99 = 34.22 .
243

• Since there were 75 total entrees ordered when French music was played, the expected count of
French entrees would be 75 • 99 = 30.56 .
243

• Since there were 84 total entrees ordered when Italian music was played, the expected count of
French entrees would be 84 •
99
= 34.22 .
243

Problem: If the specific type of music that’s playing has no effect on entrée orders, the proportion of
Italian entrees ordered under each music condition should be .
Problem: Find the three expected counts for this proportion: expected counts for no music, French
music, and Italian music when Italian entrees are ordered.__________________________________________
II. Check your work with Expected Counts Table (Note: I did the last row for you.)
Type of Music Note: When H 0 is true, the expected count in
Entrée Ordered None French Italian Total any cell of a two-way table is
French 34.22 30.56 34.22 99
Italian 10.72 9.57 10.72 31 expected count =
( row total ) • ( column total ) Look at
Other 39.06 34.88 39.06 113 table total
Total 84 75 84 243 computations & convince yourself this
formula=true.

III. Chi-Square Test Statistic and associated Probability

Recall  2 =  (
Observed - Expected )
2

df = (# of rows – 1)(# of columns – 1)


Expected
( 30 − 34.22) ( 39 − 30.56) ( 35 − 39.06)
2 2 2

For this problem:  =


2
+ + ...
34.22 30.56 39.06

 2 = 18.28 4 df P (  2  18.28) = 0.0011 (continues to next page)

IV. Problem
The Pennsylvania State University has its main campus in the town of State College and more than
20 smaller “commonwealth campuses” around the state. The Penn State Division of Student
Affairs polled separate random samples of undergraduates from the main campus and
commonwealth campuses about their use of online social networking. Facebook was the most
popular site. Here is a comparison of Facebook use by undergraduates at the main campus, and
commonwealth campuses who have a Facebook account.
Use Facebook Main Campus Commonwealth Total
Several times a month or less 55 76 131
At least once a week 215 157 372
At least once a day 640 394 1034
Total Facebook users 910 627 1537
(1) Calculate the table of expected counts.

(2) Calculate the chi-square test statistic (show work as shown above) and associated probability.
Use the formula for degrees of freedom for two-way tables above.

Section 13.2 Notes

Chi-Square test for HOMOGENEITY:


Conditions: (1) SRS - (or groups in a randomized experiment) *****USES DATA FROM TWO OR MORE
1 1
(2) n1  N1 , n2  N 2 (3) All expected counts  5 SAMPLES
10 10
H0 : There is no difference in the distribution of a categorical variable for several
populations or treatments.
H a : There is a difference in the distribution of a categorical variable for several populations or

( Observed − Expected )
2

treatments.  =
2
df: (row – 1)(column – 1)
Expected

Example:
Random digit telephone surveys used to exclude cell phone numbers. If the opinions of people who only
have cell phones differ from those of people who have landline service, the poll results may not represent
the entire adult population. The Pew-Research Center interviewed separate random samples of cell-only
and landline telephone users. Here’s what the Pew survey found about how these people describe their
political party affiliation:
Cell-only Sample Landline Sample
Democrat or lean Democratic 49 47 Do these data provide convincing
Refuse to lean either way 15 27 evidence at the  = 0.05 level
Republican or lean Republican 32 30 that the distribution of party
Total 96 104 affiliation differs in cell-only and
landline user populations? (Separate paper)

II. Chi-Square test for INDEPENDENCE:


Conditions: (1) SRS - (or groups in a randomized experiment) *****USES DATA FROM ONE SAMPLE
1
(2) n  N (3) Large Counts: All expected counts  5
10
H0 : There is no association in between two categorical variables in the population of interest.
Ha : There is an association in between two categorical variables in the population of interest.
( Observed− Expected )
2

2 =  df: (row – 1)(column – 1)


Expected
Example: A study followed a random Low Anger Moderate Anger High Anger Total
sample of 8474 people with a normal CHD 53 110 27 190
blood pressure for about four years. No CHD 3057 4621 606 8284
Total 3110 4731 633 8474
All the individuals were free of heart
disease at the beginning of the study. Each person took Quality of Life Canada United States
the Spielberger Trait Anger Scale test, which measures Much better 75 541
Somewhat better 71 498
how prone a person is to sudden anger. Researchers also About the same 96 779
recorded whether each individual developed coronary Somewhat worse 50 282
heart disease (CHD). Here is a summary of the data: Do Much worse 19 65
the data provide convincing evidence of an association Total 311 2165
between anger level&heart disease in the population of interest?
13.2 WS 1
Determine if the Chi-Square test for Homogeneity or the Chi-Square test for Independence is the most
appropriate test in each circumstance. Then carry out the significance test.
1) (Chi-Square Test for Homogeneity) Canada has universal health care. The United States does not but
often offers more elaborate treatment to patients with access. How do the two systems compare in treating heart
attacks. Researchers compared random samples of U.S. and Canadian heart attack patients. One key outcome
was the patients’ own assessment of their quality of life relative to what it had been before the heart attack.
Here are the data for the patients who survived a year: Is there a significant difference between the two
distributions of quality-of-life ratings? Carry out an appropriate test at the α=0.01 level.
2) (Chi-Square Test for Independence) The owner of a local franchise benefits from the brand
recognition, national advertising, and detailed guidelines provided by the franchise chain. In return, he or she
pays fees to the franchise firm and agrees to follow its policies. The relationship between the local owner and
the franchise firm is spelled out in a detailed contract. One clause that the contract may or may not contain is
the entrepreneur’s right t an exclusive territory. This means that the new outlet Success Yes No Total
Yes 108 15 123
will be the only representative of the franchise in a specified territory and will
No 34 13 47
not have to compete with other outlets of the same chain. How does the Total 142 28 170
presence of an exlcusive-territory clause in the contract relate to the survival of
the business? A study designed to address this question collected data from a random sample of 170 new
franchise firms. First, the franchisor was classified as successful or not based on whether or not it was still
offering franchises as of a certain date. Second, the contract each franchisor offered to franchisees was
classified according to whether or not there was an exclusive-territory clause. Here are the count data, arranged
in a two-way table. Do these data provide convincing evidence at the α=0.01 level of an association between an
exclusive-territory clause and business survival for new franchise firms?
FOR 3 and 4: YOU DO NOT HAVE TO SHOW CONDITIONS OR THE WORK. YOU STILL NEED
TO NAME THE TEST, HYPOTHESES, P-VALUE, AND CONCLUSION
(3) Are men and women equally likely to suffer lingering Fright Symptoms? Male Female Total
fear from watching scary movies as children? Researchers Yes 7 29 36
asked a random sample of 117 college students to write No 31 50 81
Total 38 79 117
narrative accounts of their exposure to scary movies before the
age of 13. More than one-fourth of the students said that some of the fright symptoms are still present when
they are awake. The following table breaks down these results by gender.
Do these data provide convincing evidence that males and females are equally likely to suffer later in life from
watching scary movies as children?

(4) How is the hatching of water python eggs influenced by the


temperature of the snake’s nest? Researchers randomly assigned newly Hatched? Cold Neutral Hot
Yes 16 38 75
laid eggs to one of the three water temperatures: hot, neutral, or cold. No 11 18 29
Hot duplicates the extra warmth provided by the mother python, and cold
duplicates the absence of the mother. Here are the data on the number eggs that hatched and didn’t hatch:
Are the differences between the three groups statistically significant? Give appropriate evidence to support
your answer.
Age (months) A 36 48 54 60 66
Section 14.1 Intro WS (Review from Chapter 3) Height (inches) H 35 38 41 43 45
John’s parents recorded his height at various ages up to 6
months. Below is a record of the results. a) What is the equation of the Least Squares Regression Line of
John’s height on age? (Use the ŷ = a + bx form.)
b) Identify and interpret the coefficient of correlation?
c) Identify and interpret the coefficient of determination
d) Identify and interpret the slope
e) Identify and interpret the y-intercept
Section 14.1 Notes Supplement
Does fidgeting keep you slim? Sometimes people don’t gain weight even when they overeat. Perhaps
fidgeting and other “non-exercise activity” (NEA) explain why -- some people may spontaneously
increase non-exercise activity when fed more. Researchers deliberately overfed a random sample of 16
healthy young adults for 8 weeks. They Regression Analysis: Fat Gain vs. NEA Change
measured fat gain (in kilograms) as the Predictor Coef SE Coef T P
response variable and change in energy use
Constant 3.5051 a 0.03036 11.54 0.000
NEA Change –0.0034415 0.0007414 –4.64 0.001
(in calories) from activity other than a
S=0.739853 R-Sq = 60.6% R-Sq (adj) = 57.8%
deliberate exercise – fidgeting, daily living,
and the like – as the explanatory variable. Standard deviation of residual Slope
The regression analysis is given below. Assume all conditions have been verified.
(a) Find and interpret a 95% confidence interval for the slope of the true regression line.
(b) Do these data provide convincing evidence at the  = 0.05 level of a negative linear
relationship between fat gain and NEA change in the population of healthy young adults?

14.1 WS 1
Regression Analysis: Flight Time vs. Drop Time
1) Use the least squares regression Predictor Coef SE Coef T P
analysis on Flight Time vs. Drop Time of Constant –0.03761 0.05838 –0.64 0.522
a Helicopter. The data came from Drop height (cm) 0.0057244 0.0002018 28.37 0.000
S=0.168181 R-Sq = 92.2% R-Sq (adj) = 92.1%
dropping 70 paper helicopters from
various heights and measuring flight times. Assume all conditions have been verified. Find and interpret
a 95% confidence interval for the slope of the true regression line.
2) A random sample of 16 used Ford F-150 SuperCrew 4 x 4’s was selected among those for sale on
autotrader.com The number of miles driven Regression Analysis: Price ($) versus Miles Drive
and price (in dollars) were recorded for each of Predictor Coef SE Coef T P
Constant 38257 2446 15.64 0.000
the trucks. The regression analysis is given
Miles Drive –0.16292 0.03096 –5.26 0.000
below. Assume all conditions have been S=5740.13 R-Sq = 66.4% R-Sq (adj) = 64%
verified.
Find and interpret a 90% confidence interval for the slope of the true regression line.
3) Infants who cry easily may be more
Regression Analysis: IQ vs Crycount
easily stimulated than others. This may be a Predictor Coef SE Coef T P
sign of higher IQ. Child development Constant 91.268 8.934 10.22 0.000
researchers explored the relationship between Crycount 1.4929 0.4870 3.07 0.004
the crying of infants 4 to 10 days old and their S=17.50 R-Sq = 20.7% R-Sq (adj) = 18.5%
later IQ scores. The researchers measured the “crycount” at age 4 to 10 days and then later measured IQ
at age three. Thirty-eight children were sampled (SRS). Assume all conditions have been verified. The
regression analysis is given below. Do theses data provide convincing evidence that a positive linear
relationship between crying counts and IQ in the population of interest? (Carry out a significance test.)
Chapter 14 Notes Day 2
1) Body weights and backpack weights were collected for 8 students. The regression analysis and
relevant graphs are given below. Is there statistically significant evidence of a positive linear relationship
between body weight and backpack weight? (a) Carry out an appropriate significance test at the
 = 0.01 level. (b) Give a 99% confidence interval for the slope of the population regression line.
Predictor Coef Stdev t-ratio P
Constant 16.265 3.937 4.13 0.006
BodyWT 0.09080 0.02831 3.21 0.018
S = 2.270 R-sq = 63.2% R-sq (adj) = 57%
Backckpack wt (lbs)

Frequency
Residual

Body weight (lbs) Body weight (lbs) Residual

You might also like