0% found this document useful (0 votes)
2 views

Unit 3 Statistics Notes

The document provides an overview of data collection methods, including definitions of population, census, sample, and various sampling techniques such as simple random sampling and stratified sampling. It also discusses observational studies and experiments, highlighting the importance of random assignment and control in experimental design. Additionally, it addresses poor sampling methods, response bias, and ethical considerations in studies involving human participants.

Uploaded by

megan.s.aversa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit 3 Statistics Notes

The document provides an overview of data collection methods, including definitions of population, census, sample, and various sampling techniques such as simple random sampling and stratified sampling. It also discusses observational studies and experiments, highlighting the importance of random assignment and control in experimental design. Additionally, it addresses poor sampling methods, response bias, and ethical considerations in studies involving human participants.

Uploaded by

megan.s.aversa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

3A Introduction to Data Collection

★ Population = a statistical study’s entire group of individuals that we want


information about
★ Census = collects data from EVERY individual
★ Sample = subset that we collect data from
★ Sample Survey = study that collects data from a sample to learn abt the population
○ first step is to pick population, then decide what to measure
★ Random sampling = using a chance process to determine which members of the
population are chosen for the sample
★ Observational study = observes individuals and measures variables of interest,
DOES NOT attempt to influence responses
○ Examples: sample surveys, recording behavior of animals to see food
preferences, tracking lung cancer for smokers vs nonsmokers overtime
○ They can be retrospective (examining existing data for individuals) or
prospective (track individuals into the future)
○ GOAL = describe group/situation, compare groups, examine relationships
between variables
○ Inferences can only be made to the general population IF the sample was
randomly selected (otherwise it can only be generalized to those in sample)
○ CANNOT conclude cause and effect from observational studies, even if they
have random sampling
★ Experiment = deliberately imposes treatments on experimental units to measure
responses
○ GOAL = determine whether treatment causes change in response
3B Sampling and Surveys
★ Simple Random Sampling (SRS) = a sample chosen in a way where every group of
the sample size in the population has an equal chance of being selected as the
sample
○ EX: Mrs. Smith writes each student of her class of 30 on pieces of paper, and
then randomly selects 5 of them out of a hat to call on
■ each group of 5 has an equal chance of being selected
○ This can also be done with technology
■ give each individual in the population a number 1-N (N being the # of
the population), use a random # generator to get n different numbers
from 1-N, then choose the individuals that correspond
○ Can also be done with table
■ give each individual in the population a number with the same # of
digits, read groups of the appropriate length from left to right across a
line in the table (ignore spaces, groups that were not used as labeled
#s, and duplicates), stop when you choose n different labels, choose
those that correspond
○ SRS samples without replacement, meaning the individual can only be
selected once, so repeated numbers should be ignored
■ on calc, press math, then prob, then randInt
★ Strata = groups of individuals in a population that share characteristics thought to
be associated with the variables being measured in a study
★ Stratified random sample = a sample selected by choosing an SRS from each strata
and combining the SRSs into the overall sample
○ works best when the individuals within
each strata are very similar with respect
to what is measured
○ EX. In a study about sleep habits, the
population is divided into 4 strata: 9th,
10th, 11th, and 12th graders. It works
because 9th graders will have different
sleep habits than 12th graders but they
should be fairly similar within each
grade.
○ How many individuals should be selected in each strata? keep the sample size
proportional to how many are in the population
■ EX: if 20% of the high school students are 12th graders, and we want a
stratified sample of 250, then we should have (0.20)(250)= 50 12th
graders
■ when strata are chosen wisely, the estimate is much more precise than
in a simple random sample of the same size
★ Cluster = group of individuals in the population that are located near each other
★ Cluster sample = a sample selected by randomly choosing clusters and including
each member of the selected clusters in the sample
○ used when populations are large and spread out over a wide area
○ it means dividing the population into non overlapping groups of individuals
that are ‘near’ each other then randomly
selecting whole clusters to form the overall
sample
○ save time and money, but clusters should
be similar to each other in composition
■ EX: Administrators want to survey
100 students, but it would be
difficult to track down 100 different
students, so instead they select an
SRS of 4 homerooms and give the survey ti all 25 students in each
selected homeroom
★ Systematic Random Sample = a sample selected from an ordered arrangement of
the population by randomly selecting one of the first k individuals and choosing
every kth individual from there
○ might be helpful in exit polls, as it would be impossible to use an SRS because
we would have to know which voters will show up, numbering them all, and
then identifying them as they leave
○ select k by dividing the population size by the desired sample size, if possible
○ EX: A poller is asked to poll every 20th voter. They randomly select a number
1-20. If the number was 6, they would poll the 6th, then the 26th, then the
46th, etc
○ HOWEVER, if there are patterns in the way the population is ordered that
coincide with the pattern in a systematic random sample, the sample may not
be representative of the population
★ multistage campaigning combines two or more sampling methods
Poor Sampling Methods
★ Convenience Sample = consists of individuals from a population that is easy to
reach (EX: going to the library to ask 30 students abt their hw time)
○ produces unreliable results bc the members of the sample often differ from
the population; the example would overestimate hw times
○ shows bias, meaning that the design of the study is very likely to
overestimate or underestimate the desired value
★ Voluntary Response Sample = consists of people who choose to be in the sample
by responding to a general invitation, sometimes called self-selected samples (EX:
advice columnist asked her followers if they’d want to have kids again if they had a
redo, and 70% said no)
○ shows bias bc people who respond are likely to feel strongly about it; the
example overestimates those who said no bc only those who felt strongly abt
it (that they wouldn’t want kids again) responded
★ Undercoverage = occurs when some members of the population are less likely to be
chosen or cannot be chosen in the sample (EX: randomly calling telephone numbers
doesn’t include those who do not have telephones in the sample)
○ ideally, sample is chosen from a list of all the individuals in the population
(called the sampling frame)
★ Nonresponse = occurs when an individual chosen for the sample can’t be contacted
or refuses to participate (EX: some people are rarely at home and do not pick up the
phone)
○ can only occur after the sample is selected
★ Response Bias = occurs when there is consistent pattern of inaccurate responses to
a survey question (EX: questions ordered/worded weirdly, characteristics or
behavior of interviewer effects responses, non anonymous survey)
3C Experiments
★ Response variable = measures outcome of study
★ Explanatory variable = may help explain or predict changes in response variable
○ EX: in experiment studying if vitamin D lowers risk of diabetes, the response
is diabetes status and the explanatory is vitamin D level
★ However, we cannot say that more vitamin D lowers risk of diabetes
○ there are too many confounding variables
○ confounding variables = occurs when two variables are associated in such a
way that their effects on a response variable cannot be distinguished from
each other
■ EX: it is possible that those with healthier diets eat foods rich in
vitamin D, and that diet is a confounding variable because it is related
to both vitamin D consumption and diabetes status
★ An experiment was set up to determine whether vitamin D actually impacted
diabetes status
○ Treatment = specific condition applied to individuals in an experiment; if the
experiment has several explanatory variables, the treatment is a combination
of specific values of these variables
■ dose of vitamin D, no dose of vitamin D
○ Experimental Unit = the object to which a treatment is randomly assigned;
when experimental units are human beings, they are also called subjects
■ 500 patients with pre-diabetes
○ Placebo = a treatment that has no active ingredients but is otherwise like the
other treatments
■ treatment with no actual vitamin D given
★ placebo effect = describes the fact that some subjects in an experiment will respond
favorably to any treatment, even inactive treatment
○ because of this, it’s important that the subjects don’t know which treatment
they have; sometimes it’s beneficial that the experiment givers are also
unaware
■ double-blind experiment: neither the subjects nor those who interact
with them and measure the response variable know which treatment
a subject is receiving
■ sing-blind experiment: either the subjects or the people who interact
with them and measure the response don’t know which treatment a
subject is receiving
○ They avoided confounding variables by randomly assigning which
participants got the vitamin D and which didn’t, so people with healthier
diets were evenly split between treatment groups
○ Factor = an explanatory variable that is manipulated and may cause change
in the response variable
■ there is one factor (explanatory variable) = vitamin D level
○ Levels = different values of a factor
■ there is two levels = 20000 mg vitamin D dose, 0 mg vitamin D dose
○ Control group = used to provide a baseline for comparing the effects of other
treatments; depending on the experiment, it can be the inactive treatment,
the active treatment, or no treatment at all
■ not all experiments actually require a control group, as long as there is
comparison in place
★ Basic principles of experimental design
○ Comparison: use a design that compares two or more treatments
○ Random Assignment: use a chance process to assign treatments to
experimental units to create roughly equivalent groups before treatments are
imposed
■ vitamin D or no vitamin D was randomly assigned to pre-diabetes
patients
○ Replication: use each treatment with enough experimental units so that the
effects of the treatments can be distinguished from chance differences
between the groups
■ does NOT mean replicating the experiment
■ EX: using 100 patients per treatment group instead of 1 patient
○ Control: keep other variables the same for all groups; it helps avoid
confounding and reduces the variation in the response variable, making it
easier to decide if the treatment is effective
★ Completely Randomized Design = the experimental units are assigned to the
treatments at random
★ Randomized Block Design = forms groups (blocks) of experimental units that are
similar with respect to a variable that is expected to affect the response. Treatments
are assigned at random within each block then the responses are compared within
each block and combined with the responses of other blocks after accounting for the
differences between each.
○ it is easier to determine if one treatment is more effective than another this
way
★ Matched Pairs Design = a common form of randomized block design for comparing
two treatments, where each subject received both treatments in a random order
○ in others, two very similar subjects are paired, and the two treatments are
randomly assigned within each pair
★ Statistically Significant = when the observed difference in responses between the
groups in an experiment is so large that it is unlikely to be explained by chance
variation in the random assignment
★ The scope of inference = describes the types of conclusions we can make based on
how data is collected
○ Inference about a population = requires that individuals are randomly
selected from the population
○ Inference about cause and effect = requires a well-designed experiment
with random assignment of treatments and statistically significant results
★ Data Ethics: studies involving humans must be screened in advance by an
institutional review board. All participants must give informed consent before
taking part; any info about the participants must be confidential

You might also like