(eBook PDF) Stat2 : Building Models for a World of Data instant download
(eBook PDF) Stat2 : Building Models for a World of Data instant download
https://ebookluna.com/product/ebook-pdf-stat2-building-models-
for-a-world-of-data/
https://ebookluna.com/product/ebook-pdf-building-classroom-management-
methods-and-models-12th-edition/
ebookluna.com
https://ebookluna.com/product/original-pdf-stats-data-and-models-3rd-
canadian-edition/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-stats-data-and-models-2nd-
canadian-edition/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-a-phonetics-workbook-for-
students-building-a-foundation-for-transcription/
ebookluna.com
(eBook PDF) Loss Models: From Data to Decisions 5th
Edition
https://ebookluna.com/product/ebook-pdf-loss-models-from-data-to-
decisions-5th-edition/
ebookluna.com
https://ebookluna.com/product/stats-data-and-models-4th-edition-by-
richard-d-de-veaux-ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-stats-data-and-models-5th-
edition-by-richard-d-de-veaux/
ebookluna.com
https://ebookluna.com/download/conns-handbook-of-models-for-human-
aging-ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-computational-models-of-brain-
and-behavior-by-ahmed-a-moustafa/
ebookluna.com
Preface
This book introduces students to statistical modeling beyond what they learn in an introductory
course. We assume that students have successfully completed a Stat 101 college course or an AP
Statistics course. Building on basic concepts and methods learned in that course, we empower
students to analyze richer datasets that include more variables and address a broader range of
research questions.
Guiding Principles
Principles that have guided the development of this book include:
• Modeling as a unifying theme. Students will analyze many types of data structures with a
wide variety of purposes throughout this course. These purposes include making predictions,
understanding relationships, and assessing differences. The data structures include various
numbers of variables and different kinds of variables in both explanatory and response roles.
The unifying theme that connects all of these data structures and analysis purposes is statisti-
cal modeling. The idea of constructing statistical models is introduced at the very beginning,
in a setting that students encountered in their Stat 101 course. This modeling focus continues
throughout the course as students encounter new and increasingly more complicated scenarios.
Basic principles of statistical modeling that apply in all settings, such as the importance
of checking model conditions by analyzing residuals with graphical and numerical, are em-
phasized throughout. Although it’s not feasible in this course to prepare students for all
possible contingencies that they might encounter when fitting models, we want students to
recognize when a model has substantial faults. Throughout the book, we offer two general
approaches for analyzing data when model conditions are not satisfied: data transformations
and computer-intensive methods such as bootstrapping and randomization tests.
Students will go beyond their Stat 101 experience by learning to develop and apply mod-
els with both quantitative and categorical response variables, with both quantitative and
categorical explanatory variables, and with multiple explanatory variables.
vii
viii PREFACE
• Modeling as an interactive process. Students will discover that the practice of statisti-
cal modeling involves applying an interactive process. We employ a four-step process in all
statistical modeling: Choose a form for the model, fit the model to the data, assess how well
the model describes the data, and use the model to address the question of interest.
As students gain more and more facility with the interplay between data and models, they
will find that this modeling process is not as linear as it might appear. They will learn how to
apply their developing judgment about statistical modeling. This development of judgment,
and the growing realization that statistical modeling is as much an art as a science, are more
ways in which this second course is likely to differ from students’ Stat 101 experiences.
• Modeling of real, rich datasets. Students will encounter real and rich datasets through-
out this course. Analyzing and drawing conclusions from real data are crucial for preparing
students to use statistical modeling in their professional lives. Using real data to address gen-
uine research questions also helps to motivate students to study statistics. The richness stems
not only from interesting contexts in a variety of disciplines, but also from the multivariable
nature of most datasets.
This multivariable dimension is an important aspect of how this course builds on what stu-
dents learned in Stat 101 and prepares them to analyze data that they will see in our modern
world that is so permeated with data.
Prerequisites
We assume that students using this book have successfully completed an introductory statistics
course (Stat 101), including statistical inference for comparing two proportions and for comparing
two means. No further mathematical prerequisites are needed to learn the material in this book.
Some material on data transformations and logistic regression assumes that students are able to
understand and work with exponential and logarithmic functions.
predictors in Chapter 3. For a class of students with strong backgrounds, an instructor may choose
to move more quickly through the first chapters, treating that material mostly as review to help
students get “up to speed.”
Organization of Chapters
After completing this course, students should be able to work with statistical models where the
response variable is either quantitative or categorical and where explanatory/predictor variables
are quantitative or categorical (or with both kinds of predictors). Chapters are grouped to consider
models based on the type of response and type of predictors.
Chapter 0: Introduction. We remind students about basic statistical terminology and present
our four-step process for constructing statistical models in the context of a two-sample t-test.
Unit A (Chapters 1–4): Linear regression models. These four chapters develop and exam-
ine statistical models for a quantitative response variable, first with one quantitative predictor and
then with multiple predictors of both quantitative and categorical types.
Unit B (Chapters 5–8): Analysis of variance models. These four chapters also consider
models for a quantitative response variable, but specifically with categorical explanatory vari-
ables/factors. We start with a single factor (one-way ANOVA) and then move to models that
consider multiple factors. We follow this with an overview of experimental design issues.
Unit C (Chapters 9–11): Logistic regression models. These three chapters introduce models
for a binary response variable with either quantitative or categorical predictors.
• The next chapter of the unit (Chapters 3, 6, 10) extends these ideas to models with multiple
predictors/factors.
• Each unit then presents a chapter of additional topics that extend ideas discussed earlier
(Chapters 4, 7, 11). For example, Section 1.5 gives a brief and informal introduction to
outliers and influential points in linear regression models. Topic 4.3 covers these ideas in
more depth, introducing more formal methods to measure leverage and influence and to
detect outliers. The topics in these chapters are relatively independent and so allow for
considerable flexibility in choosing among the additional topics.
• Unit B also has a chapter providing an overview of experimental design issues (Chapter 8).
x PREFACE
Exercises
Developing skills of statistical modeling requires considerable practice working with real data.
Homework exercises are an important component of this book. Exercises appear at the end of each
chapter, except for the “Additional Topics” chapters that have exercises after each independent
topic. These exercises are grouped into four categories:
• Conceptual exercises. These questions are brief and require minimal (if any) calculations.
They give students practice with applying basic terminology and assess students’ understand-
ing of concepts introduced in the chapter.
• Guided exercises. These exercises ask students to perform various stages of a modeling
analysis process by providing specific prompts for the individual steps.
• Open-ended exercises. These exercises ask for more complete analyses and reporting of
conclusions, without much or any step-by-step direction.
• Supplemental exercises. Topics for these exercises go somewhat beyond the scope of the
material covered in the chapter.
PREFACE xi
To the Student
In your introductory statistics course you saw many facets of statistics but you probably did little
if any work with the formal concept of a statistical model. To us, modeling is a very important
part of statistics. In this book, we develop statistical models, building on ideas you encountered in
your introductory course. We start by reviewing some topics from Stat 101 but adding the lens of
modeling as a way to view ideas. Then we expand our view as we develop more complicated models.
You will find a thread running through the book:
• Choose a type of model.
• Use the fitted model to understand the data and the population from which they came.
We hope that the Choose, Fit, Assess, Use quartet helps you develop a systematic approach to
analyzing data.
Modern statistical modeling involves quite a bit of computing. Fortunately, good software exists
that enables flexible model fitting and easy comparisons of competing models. We hope that by
the end of your Stat2 course, you will be comfortable using software to fit models that allow for
deep understanding of complex problems.
Acknowledgments
We are grateful for the assistance of a great number of people in writing Stat2.
First, we thank all the reviewers and classroom testers listed at the end of this section. This group
of people gave us valuable advice, without which we would have not progressed far from early drafts
of our book.
xii PREFACE
We thank the students in our Stat2 classes who took handouts of rough drafts of chapters and gave
back the insight, suggestions, and kind of encouragement that only students can truly provide.
We thank our publishing team at W. H. Freeman, especially Roland Cheyney, Katrina Wilhelm,
Kirsten Watrud, Lisa Kinne, Liam Ferguson, and Ruth Baruth. It has been a pleasure working
with such a competent organization.
We thank the students, faculty colleagues, and other researchers who have generously provided
their data for use in this project. Rich interesting data are the lifeblood of statistics and critical to
helping students learn and appreciate how to effectively model real-world situations.
We thank Emily Moore of Grinnell College, for giving us our push into the uses of LaTex typesetting.
We thank our families for their patience and support. The list would be very long if eight authors
listed all family members who deserve our thanks. But we owe them a lot and will continue to let
them know this.
Finally, we thank all our wonderful colleagues in the Statistics in the Liberal Arts Workshop
(SLAW). For 25 years, this group has met and supported one another through a variety of projects
and life experiences. Of the 11 current attendees of our annual meetings, 8 of us became the au-
thor team, but the others shared their ideas, criticism, and encouragement. These individuals are
Rosemary Roberts of Bowdoin College, Katherine Halvorsen of Smith College, and Joy Jordan of
Lawrence University.
We also thank four retired SLAW participants who were active with the group when the idea for a
Statistics 2 textbook went from a wish to a plan. These are the late Pete Hayslett of Colby College,
Gudmund Iversen of Swarthmore College, Don Bentley of Pomona College, and David Moore of
Purdue University. Pete taught us about balance in one’s life, and so a large author team allowed us
to make the project more fun and more social. Gudmund taught us early about the place of statis-
tics within the liberal arts, and we sincerely hope that our modeling approach will allow students to
see our discipline as a general problem-solving tool worthy of the liberal arts. Don taught us about
sticking to our guns and remaining proud of our roots in many disciplines, and we hope that our
commitment to a wide variety of applications, well represented by many datasets, will do justice to
his teaching. All of us in SLAW have been honored by David Moore’s enthusiastic participation in
our group until his retirement, and his leadership in the world of statistics education and writing
great textbooks will continue to inspire us for many years to come. His work and his teaching give
us a standard to aspire to.
Reviewers
Carmen O. Acuna Bucknell University
David C. Airey Vanderbilt School of Medicine
Jim Albert Bowling Green State University
Robert H. Carver Stonehill College
William F. Christensen Brigham Young University
Julie M. Clark Hollins University
Phyllis Curtiss Grand Valley State University
Lise DeShea University of Oklahoma Health Sciences Center
Christine Franklin University of Georgia
Susan K. Herring Sonoma State University
Martin Jones College of Charleston
David W. Letcher The College of New Jersey
Ananda Manage Sam Houston State University
John D. McKenzie, Jr. Babson College
Judith W. Mills Southern Connecticut State University
Alan Olinsky Bryant University
Richard Rockwell Pacific Union College
Laura Schultz Rowan University
Peter Shenkin John Jay College of Criminal Justice
Daren Starnes The Lawrenceville School
Debra K. Stiver University of Nevada, Reno
Linda Strauss Pennsylvania State University
Dr. Rocky Von Eye Dakota Wesleyan University
Jay K. Wood Memorial University
Jingjing Wu University of Calgary
Class Testers
Sarah Abramowitz Drew University
Ming An Vassar College
Christopher Barat Stevenson College
Nancy Boynton SUNY, Fredonia
Jessica Chapman St. Lawrence University
Michael Costello Bethesda-Chevy Chase High School
Michelle Everson University of Minnesota
Katherine Halvorsen Smith College
Joy Jordan Lawrence University
Jack Morse University of Georgia
Eric Nordmoe Kalamazoo College
Ivan Ramler St. Lawrence University
David Ruth U.S. Naval Academy
Michael Schuckers St. Lawrence University
Jen-Ting Wang SUNY, Oneonta
To David S. Moore,
with enduring affection, admiration, and thanks:
Thank you, David, for all that your leadership has done for our profession,
and thank you also for all that your friendship, support, and guidance
have done for each of us personally.
CHAPTER 0
The unifying theme of this book is the use of models in statistical data analysis. Statistical models
are useful for answering all kinds of questions. For example:
• Can we use the number of miles that a used car has been driven to predict the price that is
being asked for the car? How much less can we expect to pay for each additional 1000 miles
that the car has been driven? Would it be better to base our price predictions on the age of
the car in years, rather than its mileage? Is it helpful to consider both age and mileage, or do
we learn roughly as much about price by considering only one of these? Would the impact of
mileage on the predicted price be different for a Honda as opposed to a Porsche?
• Do babies begin to walk at an earlier age if they engage in a regimen of special exercises? Or
does any kind of exercise suffice? Or does exercise have no connection to when a baby begins
to walk?
• If we find a footprint and a handprint at the scene of a crime, are they helpful for predicting
the height of the person who left them? How about for predicting whether the person is male
or female?
• Can we distinguish among different species of hawks based solely on the lengths of their tails?
• Do students with a higher grade point average really have a better chance of being accepted
to medical school? How much better? How well can we predict whether or not an applicant
is accepted based on his or her GPA? Is there a difference between male and female students’
chances for admission? If so, does one sex retain its advantage even after GPA is accounted
for?
• Can a handheld device that sends a magnetic pulse into the head reduce pain for migraine
sufferers?
• When people serve ice cream to themselves, do they take more if they are using a bigger
bowl? What if they are using a bigger spoon?
• Which is more strongly related to the average score for professional golfers: driving distance,
driving accuracy, putting performance, or iron play? Are all of these useful for predicting a
1
2 CHAPTER 0. WHAT IS A STATISTICAL MODEL?
golfer’s average score? Which are most useful? How much of the variability in golfers’ scores
can be explained by knowing all of these other values?
a. Making predictions. Examples include predicting the price of a car based on its age,
mileage, and model; predicting the length of a hawk’s tail based on its species; predicting the
probability of acceptance to medical school based on grade point average.
b. Understanding relationships. For example, after taking mileage into account, how is the
age of a car related to its price? How does the relationship between foot length and height
differ between men and women? How are the various measures of a golfer’s performance
related to each other and to the golfer’s scoring average?
c. Assessing differences. For example, is the difference in ages of first walking different enough
between an exercise group and a control group to conclude that exercise really does affect age
of first walking? Is the rate of headache relief for migraine sufferers who experience a magnetic
pulse sufficiently higher than those in the control group to advocate for the magnetic pulse
as an effective treatment?
As with all models, statistical models are simplifications of reality. George Box, a renowned statis-
tician, famously said that “all statistical models are wrong, but some are useful.” Statistical models
are not deterministic, meaning that their predictions are not expected to be perfectly accurate. For
example, we do not expect to predict the exact price of a used car based on its mileage. Even if
we were to record every imaginable characteristic of the car and include them all in the model, we
would still not be able to predict its price exactly. And we certainly do not expect to predict the
exact moment that a baby first walks based on the kind of exercise he or she engaged in. Statistical
models merely aim to explain as much of the variability as possible in whatever phenomenon is
being modeled. In fact, because human beings are notoriously variable and unpredictable, social
scientists who develop statistical models are often delighted if the model explains even a small part
of the variability.
A distinguishing feature of statistical models is that we pay close attention to possible simplifica-
tions and imperfections, seeking to quantify how much the model explains and how much it does
not. So, while we do not expect our model’s predictions to be exactly correct, we are able to state
how confident we are that our predictions fall within a certain range of the truth. And while we
do not expect to determine the exact relationship between two variables, we can quantify how far
off our model is likely to be. And while we do not expect to assess exactly how much two groups
may differ, we can draw conclusions about how likely they are to differ and by what magnitude.
0.1. FUNDAMENTAL TERMINOLOGY 3
or as
Y = f (X) + ϵ
The Y here represents the variable being modeled, X is the variable used to do the modeling,
and f is a function.1 We start in Chapter 1 with just one quantitative, explanatory variable X
and with a linear function f . Then we will consider more complicated functions for f , often by
transforming Y or X or both. Later, we will consider multiple explanatory variables, which can be
either quantitative or categorical. In these initial models we assume that the response variable Y
is quantitative. Eventually, we will allow the response variable Y to be categorical.
The ϵ term in the model above is called the “error,” meaning the part of the response variable Y
that remains unexplained after considering the predictor X. Our models will sometimes stipulate
a probability distribution for this ϵ term, often a normal distribution. An important aspect of our
modeling process will be checking whether the stipulated probability distribution for the error term
seems reasonable, based on the data, and making appropriate adjustments to the model if it does
not.
The observational units in a study are the people, objects, or cases on which data are recorded.
The variables are the characteristics that are measured or recorded about each observational unit.
In the study about predicting the price of a used car, the observational units are the cars. The
variables are the car’s price, mileage, age (in years), and manufacturer (Porsche or Honda).
⋄
In the study about babies walking, the observational units are the babies. The variables are whether
or not the baby was put on an exercise regimen and the age at which the baby first walked.
⋄
1
The term “model” is used to refer to the entire equation or just the structural part that we have denoted by
f (X).
4 CHAPTER 0. WHAT IS A STATISTICAL MODEL?
You may find it helpful to envision the data in a spreadsheet format. The row labels are cities, which
are observational units, and the columns correspond to the variables. For example, Figure 0.1 shows
part of a Minitab worksheet with data compiled by the U.S. Census Bureau on health-care facilities
in metropolitan areas. The observational units are the metropolitan areas and the variables count
the number of doctors, hospitals, and beds in each city as well as rates (number of doctors or beds
per 100,000 residents). The full dataset for 83 metropolitan areas is in the file MetroHealth83.
⋄
Variables can be classified into two types: quantitative and categorical. A quantitative variable
records numbers about the observational units. It must be sensible to perform ordinary arithmetic
operations on these numbers, so zip codes and jersey numbers are not quantitative variables. A
categorical variable records a category designation about the observational units. If there are
only two possible categories, the variable is also said to be binary.
Example 0.1 (continued): The price, mileage, and age of a car are all quantitative variables.
The model of the car is a categorical variable.
⋄
Example 0.2 (continued): Whether or not a baby was assigned to a special exercise regimen is
a categorical variable. The age at which the baby first walked is a quantitative variable.
⋄
Whether or not an applicant is accepted for medical school is a binary variable, as is the gender of
the applicant. The applicant’s undergraduate grade point average is a quantitative variable.
⋄
0.1. FUNDAMENTAL TERMINOLOGY 5
Another important consideration is the role played by each variable in the study. The variable that
measures the outcome of interest is called the response variable. The variables whose relationship
to the response is being studied are called explanatory variables. (When the primary goal of the
model is to make predictions, the explanatory variables are also called predictor variables.)
Example 0.1 (continued): The price of the car is the response variable. The mileage, age, and
model of the car are all explanatory variables. ⋄
Example 0.2 (continued): The age at which the baby first walked is the response variable.
Whether or not a baby was assigned to a special exercise regimen is an explanatory variable. ⋄
Example 0.4 (continued): Whether or not an applicant is accepted for medical school is the
response variable. The applicant’s undergraduate grade point average and sex are explanatory
variables. ⋄
One reason that these classifications are important is that the choice of the appropriate analy-
sis procedure depends on the type of variables in the study and their roles. Regression analysis
(covered in Chapters 1–4) is appropriate when the response variable is quantitative and the ex-
planatory variables are also quantitative. In Chapter 3, you will also learn how to incorporate
binary explanatory variables into a regression analysis. Analysis of variance (ANOVA, considered
in Chapters 5–8) is appropriate when the response variable is quantitative, but the explanatory
variables are categorical. When the response variable is categorical, logistic regression (considered
in Chapters 9–11) can be used with either quantitative or categorical explanatory variables. These
various scenarios are displayed in Table 0.1.
Keep in mind that variables are not always clear-cut to measure or even classify. For example,
measuring headache relief is not a straightforward proposition and could be done with a quantita-
tive measurement (intensity of pain on a 0–10 scale), a categorical scale (much relief, some relief,
no relief), or as a binary categorical variable (relief or not).
We collect data and fit models in order to understand populations, such as all students who are
applying to medical school, and parameters, such as the acceptance rate of all students with a
grade point average of 3.5. The collected data are a sample and a characteristic of a sample, such
as the percentage of students with grade point averages of 3.5 who were admitted to medical school,
out of those who applied, is a statistic. Thus, sample statistics are used to estimate population
parameters.
An article about handwriting appeared in the October 11, 2006, issue of The Washington Post. The
article mentioned that among students who took the essay portion of the SAT exam in 2005–2006,
those who wrote in cursive style scored significantly higher on the essay, on average, than students
who used printed block letters. This is an example of an observational study since there was no
controlled assignment of the type of writing for each essay. While it shows an association between
handwriting and essay scores, we can’t tell whether better writers tend to choose to write in cursive
or if graders tend to score cursive essays more generously and printed ones more harshly. We might
also suspect that students with higher GPAs are more likely to use cursive writing. To examine
this carefully, we could fit a model with GPA as a covariate.
The article also mentioned a different study in which the identical essay was shown to many graders,
but some graders were shown a cursive version of the essay and the other graders were shown a
version with printed block letters. Again, the average score assigned to the essay with the cursive
style was significantly higher than the average score assigned to the essay with the printed block
letters. This second study involved an experiment since the binary explanatory factor of interest
(cursive versus block letters) was controlled by the researchers. In that case, we can infer that
using cursive writing produces better essay scores, on average, than printing block letters.
⋄
0.2. FOUR-STEP PROCESS 7
Losing weight is an important goal for many individuals. An article2 in the Journal of the American
Medical Association describes a study in which researchers investigated whether financial incentives
would help people lose weight more successfully. Some participants in the study were randomly
assigned to a treatment group that offered financial incentives for achieving weight loss goals, while
others were assigned to a control group that did not use financial incentives. All participants were
monitored over a four-month period and the net weight change (Bef ore − Af ter in pounds) was
recorded for each individual. Note that a positive value corresponds to a weight loss and a nega-
tive change is a weight gain. The data are given in Table 0.2 and stored in WeightLossIncentive4.
2
K. Volpp, L. John, A.B. Troxel, L. Norton, J. Fassbender, and G. Lowenstein (2008), “Financial Incentive-based
Approaches for Weight Loss: A Randomized Trial”, JAMA, 300(22): 2631–2637.
8 CHAPTER 0. WHAT IS A STATISTICAL MODEL?
Control 12.5 12.0 1.0 −5.0 3.0 −5.0 7.5 −2.5 20.0 −1.0
2.0 4.5 −2.0 −17.0 19.0 −2.0 12.0 10.5 5.0
Incentive 25.5 24.0 8.0 15.5 21.0 4.5 30.0 7.5 10.0 18.0
5.0 −0.5 27.0 6.0 25.5 21.0 18.5
The response variable in this situation (weight change) is quantitative and the explanatory factor
of interest (control versus incentive) is categorical and binary. The subjects were assigned to the
groups at random so this is a statistical experiment. Thus, we may investigate whether there is a
statistically significant difference in the distribution of weight changes due to the use of a financial
incentive.
CHOOSE
When choosing a model, we generally consider the question of interest and types of variables
involved, then look at graphical displays, and compute summary statistics for the data. Since the
weight loss incentive study has a binary explanatory factor and quantitative response, we examine
dotplots of the weight losses for each of the two groups (Figure 0.2) and find the sample mean and
standard deviation for each group.
The dotplots show a pair of reasonably symmetric distributions with roughly the same variability,
although the mean weight loss for the incentive group is larger than the mean for the control
group. One model for these data would be for the weight losses to come from a pair of normal
distributions, with different means (and perhaps different standard deviations) for the two groups.
Let the parameter µ1 denote the mean weight loss after four months without a financial incentive,
and let µ2 be the mean with the incentive. If σ1 and σ2 are the respective standard deviations
and we let the variable Y denote the weight losses, we can summarize the model as Y ∼ N (µi , σi ),
0.2. FOUR-STEP PROCESS 9
where the subscript indicates the group membership3 and the symbol ∼ signifies that the variable
has a particular distribution. To see this in the DAT A = M ODEL + ERROR format, this model
could also be written as
Y = µi + ϵ
where µi is the population mean for the ith group and ϵ ∼ N (0, σi ) is the random error term. Since
we only have two groups, this model says that
FIT
To fit this model, we need to estimate four parameters (the means and standard deviations for
each of the two groups) using the data from the experiment. The observed means and standard
deviations from the two samples provide obvious estimates. We let y 1 = 3.92 estimate the mean
weight loss for the control group and y 2 = 15.68 estimate the mean for a population getting the
incentive. Similarly, s1 = 9.11 and s2 = 9.41 estimate the respective standard deviations. The
fitted model (a prediction for the typical weight loss in either group) can then be expressed as 4
ŷ = y i
that is, that ŷ = 3.92 pounds for individuals without the incentive and ŷ = 15.68 pounds for those
with the incentive.
Note that the error term does not appear in the fitted model since, when predicting a particular
weight loss, we don’t know whether the random error will be positive or negative. That does not
mean that we expect there to be no error, just that the best guess for the average weight loss under
either condition is the sample group mean, y i .
ASSESS
Our model indicates that departures from the mean in each group (the random errors) should
follow a normal distribution with mean zero. To check this, we examine the sample residuals or
deviations between what is predicted by the model and the actual data weight losses:
For subjects in the control group, we subtract ŷ = 3.92 from each weight loss and we subtract
ŷ = 15.68 for the incentive group. Dotplots of the residuals for each group are shown in Figure 0.3.
3
For this example, an assumption that the variances are equal, σ12 = σ22 , might be reasonable, but that would lead
to the less familiar pooled variance version of the t-test. We explore this situation in more detail in a later chapter.
4
We use the caratˆsymbol above a variable name to indicate predicted value, and refer to this as y-hat.
10 CHAPTER 0. WHAT IS A STATISTICAL MODEL?
Note that the distributions of the residuals are the same as the original data, except that both
are shifted to have a mean of zero. We don’t see any significant departures from normality in the
dotplots, but it’s difficult to judge normality from dotplots with so few points. Normal probability
plots (as shown in Figure 0.4) are a more informative technique for assessing normality. Departures
from a linear trend in such plots indicate a lack of normality in the data. Normal probability plots
will be examined in more detail in the next chapter.
H0 : µ1 = µ2
Ha : µ1 ̸= µ2
The null hypothesis (H0 ) corresponds to the simpler model Y = µ+ϵ, which uses the same mean for
both the control and incentive groups. The alternative (Ha ) reflects the model we have considered
here that allows each group to have a different mean. Would the simpler (common mean) model
suffice for the weight loss data or do the two separate groups means provide a significantly better
explanation for the data? One way to judge this is with the usual two-sample t-test (as shown in
the computer output below).
Other documents randomly have
different content
The old Mission Mill (below) has been reconstructed on its original
foundations.
17
Lookouts were provided above each of the four gates entering the
plaza at San Jose. Unfriendly Indians, however, seem to have
seldom bothered the mission.
Around three sides of the plaza are reproduced the living quarters
of the mission Indians.
18
MISSION SAN FRANCISCO de la ESPADA
The wrought iron cross atop this mission is said to have been made
on the premises by the founders.
19
21
A fortified tower has thirty-six-inch walls. Holes for cannon muzzles
were created near the base. Musket loopholes can be seen higher.
The Moorish entrance of Mission Espada. A wooden cross beside
the door is a reminder of the efficacy of prayer.
22
Nestled in a thick grove of tall hackberry and pecan trees, stands
Mission San Juan Capistrano. Founded in 1731, this Mission is less
imposing than the others in the area. San Juan Capistrano followed
the plan typical of the other missions, with an enclosed area
containing all the buildings. Although in ruins, the original
boundaries and foundations can still be seen. Unlike other missions
the main buildings formed part of the rampart walls.
23
Of the chapel interior of San Juan Capistrano, the outer walls, the
three wooden statues and a few odd items represent the original
mission.
24
Looking through the entrance gate into “La Villita”, a restored
settlement of the oldest remaining residential section of the city. It
was started about 1722 shortly after the establishment of the
presidio San Antonio de Bejar.
25
The houses in La Villita are built of rock and adobe. The residents
were mostly soldiers, many of whom had intermarried with the
Indians, and their families. A feeling of class distinction was created
in 1731 with the coming of the Canary Islanders, who considered
themselves of noble lineage. The Islanders established their own
settlement and refused to have any relations with those living in La
Villita.
26
27
High walls to give protection as well as privacy, enclose a patio of
the Cos House. The house itself is of adobe with very thick walls.
28
This picturesque old adobe house on Dawson Street is but a few
hundred yards from the Alamo and is typical of hundreds of similar
early homes still to be seen. At the door of this home is a metate
stone, still used by many Mexicans to grind their corn for a masa
mixture used in making tortillas.
29
30
The arrangement and furnishing of the ten rooms in the Spanish
Governors’ Palace give a picture of home life in the better class
Spanish homes of the day. In such homes there was a private
chapel such as this room of the Blessed Virgin.
32
In the cocina or kitchen of the Spanish Governors’ Palace the stove
is typical of the Spanish kitchen in which charcoal fanned to flame
by bellows, is used.
33
This comedor (dining room) in the Governors’ Palace was the scene
of many gay and festive affairs.
34
The garden of the Spanish Governors’ Palace, filled with subtropical
shrubbery and flowers, could have been no more beautiful in the
days when Spanish viceroys ruled within its walls. The pebble
mosaic walks form interesting patterns in the patio.
35
Moses Austin, born in Connecticut, lost in 1819 the fortunes he had
made in the South and West and two days before Christmas of the
following year arrived in San Antonio seeking permission from the
Spanish authorities to bring 300 families from the states to found a
colony. This bronze statue of Moses Austin, modeled by Waldine
Tauch, stands on the City Hall grounds facing the restored Spanish
Governors’ Palace, from whence came permission to establish his
colony.
36
42
One of the several boat landings along the San Antonio River. Many
of the buildings bordering the river have overhanging balconies and
a few street level business houses can be reached from river bank
entrances.
43
The Arneson River Theatre, a unique outdoor playhouse, can be
reached through this Villita Street entrance which adjoins the Cos
House, as well as from the river walks. Seen through the arch is a
portion of the stage.
44