Probability Sampling: Some Definitions
Probability Sampling: Some Definitions
Some Definitions
Before I can explain the various probability methods we have to define
some basic terms. These are:
That's it. With those terms defined we can begin to define the different
probability sampling methods.
There are several major reasons why you might prefer stratified
sampling over simple random sampling. First, it assures that you will be
able to represent not only the overall population, but also key subgroups
of the population, especially small minority groups. If you want to be
able to talk about subgroups, this may be the only way to effectively
assure you'll be able to. If the subgroup is extremely small, you can use
different sampling fractions (f) within the different strata to randomly
over-sample the small group (although you'll then have to weight the
within-group estimates using the sampling fraction whenever you want
overall population estimates). When we use the same sampling fraction
within strata we are conducting proportionate stratified random
sampling. When we use different sampling fractions in the strata, we call
this disproportionate stratified random sampling. Second, stratified
random sampling will generally have more statistical precision than
simple random sampling. This will only be true if the strata or groups
are homogeneous. If they are, we expect that the variability within-
groups is lower than the variability for the population as a whole.
Stratified sampling capitalizes on that fact.
For example,
let's say that
the population
of clients for
our agency
can be
divided into
three groups:
Caucasian,
African-
American and
Hispanic-
American.
Furthermore,
let's assume that both the African-Americans and Hispanic-Americans
are relatively small minorities of the clientele (10% and 5%
respectively). If we just did a simple random sample of n=100 with a
sampling fraction of 10%, we would expect by chance alone that we
would only get 10 and 5 persons from each of our two smaller groups.
And, by chance, we could get fewer than that! If we stratify, we can do
better. First, let's determine how many people we want to have in each
group. Let's say we still want to take a sample of 100 from the
population of 1000 clients over the past year. But we think that in order
to say anything about subgroups we will need at least 25 cases in each
group. So, let's sample 50 Caucasians, 25 African-Americans, and 25
Hispanic-Americans. We know that 10% of the population, or 100
clients, are African-American. If we randomly sample 25 of these, we
have a within-stratum sampling fraction of 25/100 = 25%. Similarly, we
know that 5% or 50 clients are Hispanic-American. So our within-
stratum sampling fraction will be 25/50 = 50%. Finally, by subtraction
we know that there are 850 Caucasian clients. Our within-stratum
sampling fraction for them is 50/850 = about 5.88%. Because the groups
are more homogeneous within-group than across the population as a
whole, we can expect greater statistical precision (less variance). And,
because we stratified, we know we will have enough cases from each
group to make meaningful subgroup inferences.
All of this
will be much
clearer with an example. Let's assume that we have a population that
only has N=100 people in it and that you want to take a sample of n=20.
To use systematic sampling, the population must be listed in a random
order. The sampling fraction would be f = 20/100 = 20%. in this case,
the interval size, k, is equal to N/n = 100/20 = 5. Now, select a random
integer from 1 to 5. In our example, imagine that you chose 4. Now, to
select the sample, start with the 4th unit in the list and take every k-th
unit (every 5th, because k=5). You would be sampling units 4, 9, 14, 19,
and so on to 100 and you would wind up with 20 units in your sample.
For this to work, it is essential that the units in the population are
randomly ordered, at least with respect to the characteristics you are
measuring. Why would you ever want to use systematic random
sampling? For one thing, it is fairly easy to do. You only have to select a
single random number to start things off. It may also be more precise
than simple random sampling. Finally, in some situations there is simply
no easier way to do random sampling. For instance, I once had to do a
study that involved sampling from all the books in a library. Once
selected, I would have to go to the shelf, locate the book, and record
when it last circulated. I knew that I had a fairly good sampling frame in
the form of the shelf list (which is a card catalog where the entries are
arranged in the order they occur on the shelf). To do a simple random
sample, I could have estimated the total number of books and generated
random numbers to draw the sample; but how would I find book
#74,329 easily if that is the number I selected? I couldn't very well count
the cards until I came to 74,329! Stratifying wouldn't solve that problem
either. For instance, I could have stratified by card catalog drawer and
drawn a simple random sample within each drawer. But I'd still be stuck
counting cards. Instead, I did a systematic random sample. I estimated
the number of books in the entire collection. Let's imagine it was
100,000. I decided that I wanted to take a sample of 1000 for a sampling
fraction of 1000/100,000 = 1%. To get the sampling interval k, I divided
N/n = 100,000/1000 = 100. Then I selected a random integer between 1
and 100. Let's say I got 57. Next I did a little side study to determine
how thick a thousand cards are in the card catalog (taking into account
the varying ages of the cards). Let's say that on average I found that two
cards that were separated by 100 cards were about .75 inches apart in the
catalog drawer. That information gave me everything I needed to draw
the sample. I counted to the 57th by hand and recorded the book
information. Then, I took a compass. (Remember those from your high-
school math class? They're the funny little metal instruments with a
sharp pin on one end and a pencil on the other that you used to draw
circles in geometry class.) Then I set the compass at .75", stuck the pin
end in at the 57th card and pointed with the pencil end to the next card
(approximately 100 books away). In this way, I approximated selecting
the 157th, 257th, 357th, and so on. I was able to accomplish the entire
selection procedure in very little time using this systematic random
sampling approach. I'd probably still be there counting cards if I'd tried
another random sampling method. (Okay, so I have no life. I got
compensated nicely, I don't mind saying, for coming up with this
scheme.)
Multi-Stage Sampling
The four methods we've covered so far -- simple, stratified, systematic
and cluster -- are the simplest random sampling strategies. In most real
applied social research, we would use sampling methods that are
considerably more complex than these simple variations. The most
important principle here is that we can combine the simple methods
described earlier in a variety of useful ways that help us address our
sampling needs in the most efficient and effective manner possible.
When we combine sampling methods, we call this multi-stage
sampling.
For example, consider the idea of sampling New York State residents for
face-to-face interviews. Clearly we would want to do some type of
cluster sampling as the first stage of the process. We might sample
townships or census tracts throughout the state. But in cluster sampling
we would then go on to measure everyone in the clusters we select. Even
if we are sampling census tracts we may not be able to measure
everyone who is in the census tract. So, we might set up a stratified
sampling process within the clusters. In this case, we would have a two-
stage sampling process with stratified samples within cluster samples.
Or, consider the problem of sampling students in grade schools. We
might begin with a national sample of school districts stratified by
economics and educational level. Within selected districts, we might do
a simple random sample of schools. Within schools, we might do a
simple random sample of classes or grades. And, within classes, we
might even do a simple random sample of students. In this case, we have
three or four stages in the sampling process and we use both stratified
and simple random sampling. By combining different sampling methods
we are able to achieve a rich variety of probabilistic sampling methods
that can be used in a wide range of social research contexts.