The document provides an overview of data collection methods, including definitions of population, census, sample, and various sampling techniques such as simple random sampling and stratified sampling. It also discusses observational studies and experiments, highlighting the importance of random assignment and control in experimental design. Additionally, it addresses poor sampling methods, response bias, and ethical considerations in studies involving human participants.
The document provides an overview of data collection methods, including definitions of population, census, sample, and various sampling techniques such as simple random sampling and stratified sampling. It also discusses observational studies and experiments, highlighting the importance of random assignment and control in experimental design. Additionally, it addresses poor sampling methods, response bias, and ethical considerations in studies involving human participants.
★ Population = a statistical study’s entire group of individuals that we want
information about ★ Census = collects data from EVERY individual ★ Sample = subset that we collect data from ★ Sample Survey = study that collects data from a sample to learn abt the population ○ first step is to pick population, then decide what to measure ★ Random sampling = using a chance process to determine which members of the population are chosen for the sample ★ Observational study = observes individuals and measures variables of interest, DOES NOT attempt to influence responses ○ Examples: sample surveys, recording behavior of animals to see food preferences, tracking lung cancer for smokers vs nonsmokers overtime ○ They can be retrospective (examining existing data for individuals) or prospective (track individuals into the future) ○ GOAL = describe group/situation, compare groups, examine relationships between variables ○ Inferences can only be made to the general population IF the sample was randomly selected (otherwise it can only be generalized to those in sample) ○ CANNOT conclude cause and effect from observational studies, even if they have random sampling ★ Experiment = deliberately imposes treatments on experimental units to measure responses ○ GOAL = determine whether treatment causes change in response 3B Sampling and Surveys ★ Simple Random Sampling (SRS) = a sample chosen in a way where every group of the sample size in the population has an equal chance of being selected as the sample ○ EX: Mrs. Smith writes each student of her class of 30 on pieces of paper, and then randomly selects 5 of them out of a hat to call on ■ each group of 5 has an equal chance of being selected ○ This can also be done with technology ■ give each individual in the population a number 1-N (N being the # of the population), use a random # generator to get n different numbers from 1-N, then choose the individuals that correspond ○ Can also be done with table ■ give each individual in the population a number with the same # of digits, read groups of the appropriate length from left to right across a line in the table (ignore spaces, groups that were not used as labeled #s, and duplicates), stop when you choose n different labels, choose those that correspond ○ SRS samples without replacement, meaning the individual can only be selected once, so repeated numbers should be ignored ■ on calc, press math, then prob, then randInt ★ Strata = groups of individuals in a population that share characteristics thought to be associated with the variables being measured in a study ★ Stratified random sample = a sample selected by choosing an SRS from each strata and combining the SRSs into the overall sample ○ works best when the individuals within each strata are very similar with respect to what is measured ○ EX. In a study about sleep habits, the population is divided into 4 strata: 9th, 10th, 11th, and 12th graders. It works because 9th graders will have different sleep habits than 12th graders but they should be fairly similar within each grade. ○ How many individuals should be selected in each strata? keep the sample size proportional to how many are in the population ■ EX: if 20% of the high school students are 12th graders, and we want a stratified sample of 250, then we should have (0.20)(250)= 50 12th graders ■ when strata are chosen wisely, the estimate is much more precise than in a simple random sample of the same size ★ Cluster = group of individuals in the population that are located near each other ★ Cluster sample = a sample selected by randomly choosing clusters and including each member of the selected clusters in the sample ○ used when populations are large and spread out over a wide area ○ it means dividing the population into non overlapping groups of individuals that are ‘near’ each other then randomly selecting whole clusters to form the overall sample ○ save time and money, but clusters should be similar to each other in composition ■ EX: Administrators want to survey 100 students, but it would be difficult to track down 100 different students, so instead they select an SRS of 4 homerooms and give the survey ti all 25 students in each selected homeroom ★ Systematic Random Sample = a sample selected from an ordered arrangement of the population by randomly selecting one of the first k individuals and choosing every kth individual from there ○ might be helpful in exit polls, as it would be impossible to use an SRS because we would have to know which voters will show up, numbering them all, and then identifying them as they leave ○ select k by dividing the population size by the desired sample size, if possible ○ EX: A poller is asked to poll every 20th voter. They randomly select a number 1-20. If the number was 6, they would poll the 6th, then the 26th, then the 46th, etc ○ HOWEVER, if there are patterns in the way the population is ordered that coincide with the pattern in a systematic random sample, the sample may not be representative of the population ★ multistage campaigning combines two or more sampling methods Poor Sampling Methods ★ Convenience Sample = consists of individuals from a population that is easy to reach (EX: going to the library to ask 30 students abt their hw time) ○ produces unreliable results bc the members of the sample often differ from the population; the example would overestimate hw times ○ shows bias, meaning that the design of the study is very likely to overestimate or underestimate the desired value ★ Voluntary Response Sample = consists of people who choose to be in the sample by responding to a general invitation, sometimes called self-selected samples (EX: advice columnist asked her followers if they’d want to have kids again if they had a redo, and 70% said no) ○ shows bias bc people who respond are likely to feel strongly about it; the example overestimates those who said no bc only those who felt strongly abt it (that they wouldn’t want kids again) responded ★ Undercoverage = occurs when some members of the population are less likely to be chosen or cannot be chosen in the sample (EX: randomly calling telephone numbers doesn’t include those who do not have telephones in the sample) ○ ideally, sample is chosen from a list of all the individuals in the population (called the sampling frame) ★ Nonresponse = occurs when an individual chosen for the sample can’t be contacted or refuses to participate (EX: some people are rarely at home and do not pick up the phone) ○ can only occur after the sample is selected ★ Response Bias = occurs when there is consistent pattern of inaccurate responses to a survey question (EX: questions ordered/worded weirdly, characteristics or behavior of interviewer effects responses, non anonymous survey) 3C Experiments ★ Response variable = measures outcome of study ★ Explanatory variable = may help explain or predict changes in response variable ○ EX: in experiment studying if vitamin D lowers risk of diabetes, the response is diabetes status and the explanatory is vitamin D level ★ However, we cannot say that more vitamin D lowers risk of diabetes ○ there are too many confounding variables ○ confounding variables = occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other ■ EX: it is possible that those with healthier diets eat foods rich in vitamin D, and that diet is a confounding variable because it is related to both vitamin D consumption and diabetes status ★ An experiment was set up to determine whether vitamin D actually impacted diabetes status ○ Treatment = specific condition applied to individuals in an experiment; if the experiment has several explanatory variables, the treatment is a combination of specific values of these variables ■ dose of vitamin D, no dose of vitamin D ○ Experimental Unit = the object to which a treatment is randomly assigned; when experimental units are human beings, they are also called subjects ■ 500 patients with pre-diabetes ○ Placebo = a treatment that has no active ingredients but is otherwise like the other treatments ■ treatment with no actual vitamin D given ★ placebo effect = describes the fact that some subjects in an experiment will respond favorably to any treatment, even inactive treatment ○ because of this, it’s important that the subjects don’t know which treatment they have; sometimes it’s beneficial that the experiment givers are also unaware ■ double-blind experiment: neither the subjects nor those who interact with them and measure the response variable know which treatment a subject is receiving ■ sing-blind experiment: either the subjects or the people who interact with them and measure the response don’t know which treatment a subject is receiving ○ They avoided confounding variables by randomly assigning which participants got the vitamin D and which didn’t, so people with healthier diets were evenly split between treatment groups ○ Factor = an explanatory variable that is manipulated and may cause change in the response variable ■ there is one factor (explanatory variable) = vitamin D level ○ Levels = different values of a factor ■ there is two levels = 20000 mg vitamin D dose, 0 mg vitamin D dose ○ Control group = used to provide a baseline for comparing the effects of other treatments; depending on the experiment, it can be the inactive treatment, the active treatment, or no treatment at all ■ not all experiments actually require a control group, as long as there is comparison in place ★ Basic principles of experimental design ○ Comparison: use a design that compares two or more treatments ○ Random Assignment: use a chance process to assign treatments to experimental units to create roughly equivalent groups before treatments are imposed ■ vitamin D or no vitamin D was randomly assigned to pre-diabetes patients ○ Replication: use each treatment with enough experimental units so that the effects of the treatments can be distinguished from chance differences between the groups ■ does NOT mean replicating the experiment ■ EX: using 100 patients per treatment group instead of 1 patient ○ Control: keep other variables the same for all groups; it helps avoid confounding and reduces the variation in the response variable, making it easier to decide if the treatment is effective ★ Completely Randomized Design = the experimental units are assigned to the treatments at random ★ Randomized Block Design = forms groups (blocks) of experimental units that are similar with respect to a variable that is expected to affect the response. Treatments are assigned at random within each block then the responses are compared within each block and combined with the responses of other blocks after accounting for the differences between each. ○ it is easier to determine if one treatment is more effective than another this way ★ Matched Pairs Design = a common form of randomized block design for comparing two treatments, where each subject received both treatments in a random order ○ in others, two very similar subjects are paired, and the two treatments are randomly assigned within each pair ★ Statistically Significant = when the observed difference in responses between the groups in an experiment is so large that it is unlikely to be explained by chance variation in the random assignment ★ The scope of inference = describes the types of conclusions we can make based on how data is collected ○ Inference about a population = requires that individuals are randomly selected from the population ○ Inference about cause and effect = requires a well-designed experiment with random assignment of treatments and statistically significant results ★ Data Ethics: studies involving humans must be screened in advance by an institutional review board. All participants must give informed consent before taking part; any info about the participants must be confidential