Descriptive Statistics
Descriptive Statistics
Part I Statistics
Research results do not speak for themselves. They must be organized and ma-
nipulated so that whatever meaning they have can be quickly and easily under-
stood by the researcher and by his or her readers. Researchers use statistics to
clarify their results and communicate effectively. In this chapter, we consider
some commonly used techniques for presenting research results: percentages
and proportions, ratios and rates, percentage change, tables, charts, and graphs.
Mathematically speaking, these univariate descriptive statistics are not very com-
plex (although they are not as simple as they might seem at first glance), but they
are extremely useful for presenting research results clearly and concisely.
2.1 PERCENTAGES Consider the following statement: “Of the 269 cases handled by the court, 167
AND PROPORTIONS resulted in prison sentences of five years or more.” While there is nothing wrong
with this statement, the same fact could have been more clearly conveyed if it
had been reported as a percentage: “About 62% of all cases resulted in prison
sentences of five or more years.”
Percentages and proportions supply a frame of reference for reporting re-
search results, in the sense that they standardize the raw data: percentages to
the base 100 and proportions to the base 1.00. The mathematical definitions of
proportions and percentages are
f
FORMULA 2.1 Proportion: p
N
f
FORMULA 2.2 Percentage: % a b 100
N
where f frequency, or the number of cases in any category
N the number of cases in all categories
To illustrate the computation of percentages, consider the data presented
in Table 2.1. How can we find the percentage of cases in the first category (sen-
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 23
*The slight discrepancy in the totals of the percentage column is due to rounding error. See the Preface
(Basic Mathematical Review) for more on rounding.
tences of five years or more)? Note that there are 167 cases in the category (f
167) and a total of 269 cases in all (N 269). So
f 167
Percentage 1% 2 a b 100 a b 100 10.62082 100 62.08%
N 269
Using the same procedures, we can also find the percentage of cases in the sec-
ond category:
f 72
Percentage 1% 2 a b 100 a b 100 10.26772 100 26.77%
N 269
Both results could have been expressed as proportions. For example, the pro-
portion of cases in the third category is 0.07:
f 20
Proportion 1 p2 0.07
N 269
Percentages and proportions are easier to read and comprehend than fre-
quencies. This advantage is particularly obvious when attempting to compare
groups of different sizes. For example, based on the information presented in
Table 2.2, which college has the higher relative number of social science ma-
jors? Because the total enrollments are so different, comparisons are difficult to
make from the raw frequencies. Computing percentages eliminates the differ-
ence in size of the two campuses by standardizing both distributions to the base
of 100. The same data are presented in percentages in Table 2.3.
The percentages in Table 2.3 make it easier to identify differences as well as
similarities between the two colleges. College A has a much higher percentage of
Application 2.1
Not long ago, in a large social service agency, the two sets of numbers (his staff versus the total staff
following conversation took place between the and the workload of his division versus the total
executive director of the agency and a supervisor workload of the agency), proportions or percent-
of one of the divisions. ages would be a more forceful way of presenting
Executive director: Well, I don’t want to results. What if the supervisor had said, “Only
seem abrupt, but I’ve only got a few min- 28.25% of the staff is assigned to my division, but
utes. Tell me, as briefly as you can, about we handle 37.26% of the total workload of the
this staffing problem you claim to be agency”? Is this a clearer message?
having. The first percentage is found by
Supervisor: Ma’am, we just don’t have f 50
enough people to handle our workload. % a b 100 100
N 177
Of the 177 full-time employees of the
agency, only 50 are in my division. Yet, 1.28252 100 28.25%
6231 of the 16,722 cases handled by the
and the second percentage is found by
agency last year were handled by my
division. f 6231
Executive director (smothering a yawn): % a b 100 a b 100
N 16,722
Very interesting. I’ll certainly get back to
you on this matter. 1.37262 100 37.26%
How could the supervisor have presented his case
more effectively? Because he wants to compare
social science majors (even though the absolute number of social science majors
is less than at College B) and about the same percentage of humanities majors.
How would you describe the differences in the remaining two major fields? (For
practice in computing and interpreting percentages and proportions, see problems
2.1 and 2.2.)
Here are some further guidelines on the use of percentages and proportions.
1. When working with a small number of cases (say, fewer than 20), it is usually
preferable to report the actual frequencies rather than percentages or propor-
tions. With a small number of cases, the percentages can change drastically
with relatively minor changes in the size of the data set. For example, if you
TABLE 2.3 DECLARED MAJOR FIELDS ON TWO COLLEGE CAMPUSES (fictitious data)
Step 1: Determine the values for f (number of cases category and the entire group are the same (e.g.,
in a category) and N (number of cases in all cate- when all students are male). Proportions cannot ex-
gories). Remember that f will be the number of cases ceed 1.00, and percentages cannot exceed 100.00%.
in a specific category (e.g., males on your campus)
Step 2: For a proportion, divide f by N.
and N will be the number of cases in all categories
(e.g., all students, males and females, on your cam- Step 3: For a percentage, multiply the value you cal-
pus) and that f will be smaller than N, except when the culated in step 2 by 100.
begin with a group of 10 males and 10 females (that is, 50% of each gender)
and then add another female, the percentage distributions will change no-
ticeably to 52.38% female and 47.62% male. Of course, as the number of ob-
servations increases, each additional case will have a smaller impact. If we
started with 500 males and females and then added one more female, the per-
centage of females would change by only a tenth of a percent (from 50% to
50.10%).
2. Always report the number of observations along with proportions and per-
centages. This permits the reader to judge the adequacy of the sample size
and, conversely, helps to prevent the researcher from lying with statistics.
Statements like “Two out of three people questioned prefer courses in statis-
tics to any other course” might impress you, but the claim would lose its gloss
if you learned that only three people were tested. You should be extremely sus-
picious of reports that fail to report the number of cases that were tested.
3. Percentages and proportions can be calculated for variables at the ordinal and
nominal levels of measurement, in spite of the fact that they require division.
This is not a violation of the level-of-measurement guideline (see Table 1.2).
Percentages and proportions do not require the division of the scores of the
variable (as would be the case in computing the average score on a test, for
example) but rather the number of cases in a particular category (f ) of the vari-
able by the total number of cases in the sample (N). When we make a state-
ment like “43% of the sample is female,” we are merely expressing the relative
size of a category (female) of the variable (gender) in a convenient way.
2.2 RATIOS, RATES, Ratios, rates, and percentage change provide some additional ways of summa-
AND PERCENTAGE rizing results simply and clearly. Although they are similar to each other, each
CHANGE statistic has a specific application and purpose.
Application 2.2
Application 2.3
In 2005, there were 2500 births in a city of 167,000. In 1965, there were 20.30 births for every 1000
In 1965, when the population of the city was only people in the city. For 2000:
133,000, there were 2700 births. Is the birthrate ris-
2500
ing or falling? Although this question can be an- Crude birthrate 1000 14.97
swered from the preceding information, the trend 167,000
in birthrates will be much more obvious if we com- In 2005, there were 14.97 births for every 1000
pute birthrates for both years. Like crude death people in the city. With the help of these statistics,
rates, crude birthrates are usually multiplied by the decline in the birthrate is clearly expressed.
1000 to eliminate decimal points. For 1965:
2700
Crude birthrate 1000 20.30
133,000
If there were 100 deaths during a given year in a town of 7000, the crude death
rate for that year would be
100
Crude death rate 1,000 10.014292 1,000 14.29
7,000
Or, for every 1000 people, there were 14.29 deaths during this particular year.
In the same way, if a city of 237,000 people experienced 120 auto thefts during
a particular year, the auto theft rate would be
120
Auto theft rate 100,000 10.00050632 100,000 50.63
237,000
Or, for every 100,000 people, there were 50.63 auto thefts during the year in ques-
tion. (For practice in computing and interpreting rates, see problems 2.3 and 2.4a.)
Application 2.4
In our example, f1 is the death rate in 1995 ( f1 16) and f2 is the death rate in
2005 ( f2 24). The formula tells us to subtract the earlier score from the later and
then divide by the earlier score. The value that results expresses the size of the
change in scores ( f2 f1) relative to the score at the earlier time ( f1). The value
is then multiplied by 100 to express the change in the form of a percentage:
24 16 8
Percent change a b 100 a b 100 1.502 100 50%
16 16
The death rate in 2005 is 50% higher than in 1995. This means that the 2005 rate
was equal to the 1995 rate plus half of the earlier score. If the rate had risen
to 32 deaths per 1000, the percent change would have been 100% (the rate
would have doubled), and if the death rate had fallen to 8 per 1000, the percent
change would have been 50%. Note the negative sign: It means that the death
rate has decreased by 50%. The 2005 rate would have been half the size of the
1995 rate.
An additional example should make the computation and interpretation of
the percentage change clearer. Suppose we wanted to compare the projected
population growth rates for various nations over the next 50 years. The necessary
information is presented in Table 2.4, which shows the actual population for each
nation in 2000 and the projected population for 2050. The “Increase/Decrease”
column shows how many people will be added or lost over the 50-year time span.
Casual inspection will give us some information about population trends. For
example, compare the “Increase/Decrease” column for China and the United
States. These societies are projected to add roughly similar numbers of people
(about 155 million for China, a little less for the United States), but, since China’s
2000 population is about five times the size of the population of the United States,
its percent change will be much lower (about 12% vs. almost 50%).
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 29
TABLE 2.4 PROJECTED POPULATION GROWTH FOR SIX NATIONS, 2000 –2050
Calculating percent change will make comparisons more precise. The right-
hand column shows the percent change in projected population for each nation.
These values were computed by subtracting the 2000 population ( f1) from the
2050 population ( f2), dividing by the 2000 population, and multiplying by 100.
Although China has the largest population of these six nations, it will grow
at the slowest rate (12.24%). The United States and Mexico will increase by about
50% (in 2050, their populations will be half again larger than in 2000) and Can-
ada will grow by about one-third. Italy will actually lose people and its popula-
tion will decline by over 12%. Nigeria has by far the highest growth rate: It will
add the most people and its population will increase in size by over 200%.
This means that in 2050 the population of Nigeria will be more than three times
its 2000 size. (For practice in computing and interpreting percent change, see
problem 2.4b.)
Step 2: Determine the number of possible occur- Step 3: Divide the quantity you found in step 2 by f1.
rences. This value will usually be the total population Step 4: Multiply the quantity you found in step 3
for the area in question. by 100.
30 PART I DESCRIPTIVE STATISTICS
2.3 FREQUENCY Frequency distributions are tables that summarize the distribution of a variable
DISTRIBUTIONS: by reporting the number of cases contained in each category of the variable.
INTRODUCTION They are very helpful and commonly used ways of organizing and working with
data. In fact, the construction of frequency distributions is almost always the first
step in any statistical analysis.
To illustrate the usefulness of frequency distributions and to provide some
data for examples, assume that the counseling center at a university is assessing
the effectiveness of its services. Any realistic evaluation research would collect a
variety of information from a large group of students, but, for the sake of this ex-
ample, we will confine our attention to just four variables and 20 students. The
data are reported in Table 2.5.
Note that, even though the data in Table 2.5 represent an unrealistically low
number of cases, it is difficult to discern any patterns or trends. For example, try
to ascertain the general level of satisfaction of the students from Table 2.5. You
may be able to do so with just 20 cases, but it will take some time and effort. Imag-
ine the difficulty with 50 cases or 100 cases presented in this fashion. Clearly the
data need to be organized in a format that allows the researcher (and his or her
audience) to understand easily the distribution of the variables.
One general rule that applies to all frequency distributions is that the cate-
gories of the frequency distribution must be exhaustive and mutually exclusive.
In other words, the categories must be stated in a way that permits each case to
be counted in one and only one category. This basic principle applies to the con-
struction of frequency distributions for variables measured at all three levels of
measurement.
Beyond this rule, there are only guidelines to help you construct useful fre-
quency distributions. As you will see, the researcher has a fair amount of discre-
tion in stating the categories of the frequency distribution (especially with vari-
Satisfaction
Student Sex Marital Status with Services Age
A Male Single 4 18
B Male Married 2 19
C Female Single 4 18
D Female Single 2 19
E Male Married 1 20
F Male Single 3 20
G Female Married 4 18
H Female Single 3 21
I Male Single 3 19
J Female Divorced 3 23
K Female Single 3 24
L Male Married 3 18
M Female Single 1 22
N Female Married 3 26
O Male Single 3 18
P Male Married 4 19
Q Female Married 2 19
R Male Divorced 1 19
S Female Divorced 3 21
T Male Single 2 20
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 31
ables measured at the interval-ratio level). I will identify the issues to consider as
you make decisions about the nature of any particular frequency distribution. Ul-
timately, however, the guidelines I state are aids for decision-making, nothing
more than helpful suggestions. As always, the researcher has the final responsi-
bility for making sensible decisions and presenting his or her data in a meaning-
ful way.
Single 10 Married 7
Married 7 Not married 13
Divorced 3
N 20
N 20
Frequency Percentage
Satisfaction (f ) (%)
Frequency Percentage
Satisfaction (f ) (%)
Satisfied 13 65
Dissatisfied 7 35
N 20 100%
distribution for variables measured at all levels. This table reports that most stu-
dents were either satisfied or very satisfied with the services of the counseling
center. The most common response (nearly half the sample) was “satisfied.” If the
researcher wanted to emphasize this major trend, the categories could be col-
lapsed as in Table 2.10. Again, the price paid for this increased compactness is
that some information (in this case, the exact breakdown of degrees of satisfac-
tion and dissatisfaction) is lost. (For practice in constructing and interpreting fre-
quency distributions for nominal- and ordinal-level variables, see problem 2.5.)
Application 2.5
The following list shows the ages of 50 prisoners bution. Columns may be added for percentages,
enrolled in a work-release program. Is this group cumulative percentages, and/or cumulative fre-
young or old? A frequency distribution will provide quency. The complete distribution, with a column
an accurate picture of the overall age structure. added for percentages, is
18 60 57 27 19 Ages Frequency Percentages
20 32 62 26 20
25 35 75 25 21 18 –23 10 20
30 45 67 41 30 24 –29 7 14
37 47 65 42 25 30 –35 9 18
18 51 22 52 30 36 – 41 5 10
22 18 27 53 38 42– 47 8 16
27 23 32 35 42 48 –53 4 8
32 37 32 40 45 54 –59 2 4
55 42 45 50 47 60 – 65 3 6
66 –71 1 2
We will use about 10 intervals to display these data. 72–77 1 2
By inspection we see that the youngest prisoner is
N 50 100 %
18 and the oldest is 75. The range is thus 57. Inter-
val size will be 57/10, or 5.7, which we can round The prisoners seem to be fairly evenly spread
off to either 5 or 6. Let’s use a six-year interval be- across the age groups up to the 48 –53 interval.
ginning at 18. The limits of the lowest interval will There is a noticeable lack of prisoners in the old-
be 18 –23. Now we must state the limits of all other est age groups and a concentration of prisoners in
intervals, count the number of cases in each inter- their 20s and 30s.
val, and display these counts in a frequency distri-
For example, suppose you wished to report the distribution of the variable
“age” for a sample drawn from a community. Unlike the college data reported in
Table 2.5, a community sample would have a very broad range of ages. If you
simply reported the number of times that each year of age (or score) occurred,
you could easily wind up with a frequency distribution that contained 80, 90, or
even more categories. Such a large frequency distribution would not present a
concise picture. The scores (years) must be grouped into larger categories to
heighten clarity and ease of comprehension. How large should these categories
be? How many categories should be included in the table? Although there are no
hard-and-fast rules for making these decisions, they always involve a trade-off
between more detail (a greater number of narrow categories) or more compact-
ness (a smaller number of wide categories).
18 5
19 6
20 3
21 2
22 1
23 1
24 1
25 0
26 1
N 20
from youngest to oldest, counting the number of times each score (year of age)
occurs, and then totaling the number of scores for each category. Table 2.11 pres-
ents the information and reveals a concentration or clustering of scores in the 18
and 19 class intervals.
Even though the picture presented in this table is fairly clear, assume for the
sake of illustration that you desire a more compact (less detailed) summary. To
do this, you will have to group scores into wider class intervals. By increasing the
interval width (say, to two years), you can reduce the number of intervals and
achieve a more compact expression. The grouping of scores in Table 2.12 clearly
emphasizes the relative predominance of younger respondents. This trend in the
data can be stressed even more by the addition of a column displaying the per-
centage of cases in each category.
Note that the class intervals in Table 2.12 have been stated with an appar-
ent gap between them (that is, the class intervals are separated by a distance of
one unit). At first glance, these gaps may appear to violate the principle of ex-
haustiveness; but, because age has been measured in whole numbers, the gaps
actually pose no problem. Given the level of precision of the measurement (in
whole years, as opposed to, say, 10ths of a year), no case could have a score
falling between these class intervals. For these data, the set of class intervals in
Table 2.12 are exhaustive and mutually exclusive. Each of the 20 respondents in
the sample can be sorted into one and only one age category.
18 –19 11 55
20 –21 5 25
22 –23 2 10
24 –25 1 5
26 –27 1 5
N 20 100%
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 35
Step 1: Find the upper and lower limits of the lowest Step 3: Midpoints for other intervals can be found by
interval in the frequency distribution. For any interval, repeating steps 1 and 2 for each interval. As an alter-
the upper limit is the highest score included in the in- native, you can find the midpoint for any interval by
terval and the lower limit is the lowest score included adding the value of the interval width to the midpoint
in the interval. For example, for the top set of intervals of the next lower interval. For example, the lowest in-
in Table 2.13, the lowest interval (0 –2) includes scores terval in Table 2.13 is 0 –2 and the midpoint is 1. Inter-
of 0, 1, and 2. The upper limit of this interval is 2 and vals are 3 units wide (that is, they each include three
the lower limit is 0. scores), so the midpoint for the next higher interval
(3 –5) is 1 3, or 4. The midpoint for the interval 6 – 8
Step 2: Add the upper and lower limits and divide by
is 4 3, or 7, and so forth.
2. For the interval 0 –2: (0 2)/2 1. The midpoint for
this interval is 1.
However, consider the potential difficulties if age had been measured with
greater precision. If age had been measured in 10ths of a year, into which class
interval in Table 2.12 would a 19.4-year-old subject be placed? You can avoid
this ambiguity by always stating the limits of the class intervals at the same level
of precision as the data. Thus, if age were being measured in 10ths of a year,
the limits of the class intervals in Table 2.12 would be stated in 10ths of a year.
For example:
17.0 –18.9
19.0 –20.9
21.0 –22.9
23.0 –24.9
25.0 –26.9
To maintain mutual exclusivity between categories, do not overlap the class in-
tervals. If you state the limits of the class intervals at the same level of precision
as the data (which might be in whole numbers, tenths, hundredths, etc.) and
maintain a “gap” between intervals, you will always produce a frequency distri-
bution where each case can be assigned to one and only one category.
Midpoints. On occasion, you will need to work with the midpoints of the
class intervals, for example, when constructing or interpreting certain graphs.
Midpoints are defined as the points exactly halfway between the upper and
lower limits and can be found for any interval by dividing the sum of the upper
and lower limits by 2. Table 2.13 displays midpoints for two different sets of
class intervals. (For practice in finding midpoints, see problems 2.8b and 2.9b.)
Real Limits.1 For certain purposes, you must eliminate the “gap” between
class intervals and treat a distribution as a continuous series of categories that
1This section is optional. It is necessary for understanding the material presented in Chapters
3 and 4 on computing measures of central tendency and dispersion for grouped data.
36 PART I DESCRIPTIVE STATISTICS
Step 1: Find the distance (the “gap”) between the Step 3: Add the value found in step 2 to all upper
stated class intervals. In Table 2.12, for example, this stated limits and subtract it from all lower stated limits.
value is 1.
*This section is optional.
Step 2: Divide the value found in step 1 in half.
0 –2 1.0
3 –5 4.0
6–8 7.0
9 –11 10.0
border each other. This is necessary for the construction of some graphs (see Sec-
tion 2.7) and for computing summary statistics for variables that have been
grouped into frequency distributions.
To illustrate, we’ll begin with Table 2.12. Note the “gap” of one year between
intervals. As we saw before, the gap is only apparent: scores are measured in
whole years (i.e., 19, 21 vs. 19.5 or 21.3) and cannot fall between intervals. These
types of class intervals are called stated class limits and they organize the scores
of the variable into a series of discrete, nonoverlapping intervals.
To treat the variable as continuous, we must use the real class limits. To
find the real limits of any class interval, divide the distance between the stated
class intervals (the “gap”) in half and add the result to all upper stated limits and
subtract it from all lower stated limits. This process is illustrated below with the
class intervals stated in Table 2.12. The distance between intervals is one, so the
real limits can be found by adding 0.5 to all upper limits and subtracting 0.5 from
all lower limits.
Stated Limits Real Limits
18 –19 17.5–19.5
20 –21 19.5–21.5
22–23 21.5–23.5
24 –25 23.5–25.5
26 –27 25.5–27.5
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 37
3 –5 2.5 –5.5
6–8 5.5 – 8.5
9 –11 8.5 –11.5
Note that, with real limits, the class intervals overlap and the distribution can be
seen as continuous. Table 2.14 presents additional illustrations of real limits for
two different sets of class intervals. In both cases, the “gap” between the stated
limits is 1. (For practice in finding real limits, see problem 2.7c and problem 2.8d.)
18 –19 11 11
20 –21 5 16
22 –23 2 18
24 –25 1 19
26 –27 1 20
N 20
38 PART I DESCRIPTIVE STATISTICS
tribution for percentages as in Table 2.12. This column shows the percentage of
all cases in each class interval. To find cumulative percentages, follow the same
addition pattern explained earlier for cumulative frequency. That is, the cumula-
tive percentage for the lowest class interval will be the same as the percentage of
cases in the interval. For the next-higher interval, the cumulative percentage is
the percentage of cases in the interval plus the percentage of cases in the first in-
terval, and so on. Table 2.16 shows the age data with a cumulative percentage
column added.
These cumulative columns are quite useful in situations where the researcher
wants to make a point about how cases are spread across the range of scores. For
example, Tables 2.15 and 2.16 show quite clearly that most students in the coun-
seling center survey are less than 21 years of age. If the researcher wishes to
impress this feature of the age distribution on his or her audience, then these
cumulative columns are quite handy. Most realistic research situations will be
concerned with many more than 20 cases and/or many more categories than our
tables have. Since the cumulative percentage column is clearer and easier to
interpret in such cases, it is normally preferred to the cumulative frequencies
column.
18 –19 11 11
20 –21 5 16
22 –23 2 18
24 –25 1 19
26 –27 1 20
28 and older 1 21
N 21
intervals (28 –30, 31–32, 32–33, etc.) with zero cases in them before we got to the
46 – 47 interval. This would waste space and probably be unclear and confusing.
An alternative way to handle the situation would be to add an “open-ended” in-
terval to the frequency distribution, as in Table 2.17:
The open-ended interval in Table 2.17 allows us to present the information
more compactly and efficiently than listing all of the empty intervals between
“28 –29” and “46 – 47.” Note also that we could handle an extremely low score
by adding an open-ended interval as the lowest class interval (e.g., “17 and
younger”). There is a small price to pay for this efficiency (there is no informa-
tion in Table 2.17 about the value of the scores included in the open-ended in-
terval), so this technique should not be used indiscriminately.
Intervals of Unequal Size. On some variables, most scores are tightly clustered
together but others are strewn across a broad range of scores. Consider, as an
example, the distribution of income in the United States. In 2005, most house-
holds (a little more than 50%) reported annual incomes between $20,000 and
$75,000 and a sizeable grouping (about 20%) earned less than that. The problem
(from a statistical point of view) comes with more affluent households. Many
of these cases are in the $75,000 –$100,000 range but some have incomes in the
high six- or seven- (and even eight-) -figure range. The number of very wealthy
households is quite small, of course, but we must still account for these extreme
cases.
If we tried to use a frequency distribution with equal intervals of, say,
$10,000 to summarize this variable, we would need 30 or 40 or more intervals
to include all of the more affluent households, and many of our intervals in the
higher income ranges—those over $100,000 —would have few or zero cases. In
situations such as this, researchers often use intervals of unequal size to sum-
marize the variable more efficiently. To illustrate, Table 2.18 uses unequal inter-
vals to summarize the distribution of income in the United States.
Some of the intervals in Table 2.18 are $10,000 wide, others are $25,000,
$50,000 or $150,000 wide, and two (the lowest and highest intervals) are open
ended. Tables that use intervals of mixed widths might be a little confusing for
the reader, but the trade-off in compactness and efficiency can be considerable.
(For practice in constructing and interpreting frequency distributions for inter-
val-ratio level variables, see problems 2.5 to 2.9.)
40 PART I DESCRIPTIVE STATISTICS
Households Households
Income (Frequency) (Percent)
2.6 CONSTRUCTING We covered a lot of ground in the preceding section, so let’s pause and review
FREQUENCY these principles by considering a specific research situation. The following data
DISTRIBUTIONS represent the numbers of visits received over the past year by 90 residents of a
FOR INTERVAL-RATIO-
retirement community.
LEVEL VARIABLES:
A REVIEW 0 52 21 20 21 24 1 12 16 12
16 50 40 28 36 12 47 1 20 7
9 26 46 52 27 10 3 0 24 50
24 19 22 26 26 50 23 12 22 26
23 51 18 22 17 24 17 8 28 52
20 50 25 50 18 52 46 47 27 0
32 0 24 12 0 35 48 50 27 12
28 20 30 0 16 49 42 6 28 2
16 24 33 12 15 23 18 6 16 50
Listed in this format, the data are a hopeless jumble from which no one
could derive much meaning. The function of the frequency distribution is to
arrange and organize these data so that their meanings will be made obvious.
First, we must decide how many class intervals to use in the frequency dis-
tribution. Following the guidelines presented in the One Step at a Time: Con-
structing Frequency Distributions for Interval-Ratio Variables box, let’s use about
10 intervals (k 10). By inspecting the data, we can see that the lowest score is
0 and the highest is 52. The range of these scores (R) is 52 0, or 52. To find the
approximate interval size (i ), divide the range (52) by the number of intervals
(10). Since 52/10 = 5.2, we can set the interval size at 5.
The lowest score is 0, so the lowest class interval will be 0 – 4. The highest
class interval will be 50 –54, which will include the high score of 52. All that re-
mains is to state the intervals in table format, count the number of scores that fall
into each interval, and report the totals in a frequency column. These steps have
been taken in Table 2.19, which also includes columns for the percentages and
cumulative percentages. Note that this table is the product of several relatively ar-
bitrary decisions. The researcher should remain aware of this fact and inspect the
frequency distribution carefully. If the table is unsatisfactory for any reason, it can
be reconstructed with a different number of categories and interval sizes.
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 41
Step 1: Decide how many class intervals (k) you wish define the class intervals so that each case can be
to use. One reasonable convention suggests that the sorted into one and only one category.
number of intervals should be about 10. Many research
Step 6: Count the number of cases in each class in-
situations may require fewer than 10 intervals (k 10),
terval, and report these subtotals in a column labeled
and it is common to find frequency distributions with as
“Frequency.” Report the total number of cases (N ) at
many as 15 intervals. Only rarely will more than 15 in-
the bottom of this column. The table may also include
tervals be used, since the resultant frequency distribu-
a column for percentages, cumulative frequencies,
tion would be too large for easy comprehension.
and cumulative percentages.
Step 2: Find the range (R) of the scores by subtract-
Step 7: Inspect the frequency distribution carefully.
ing the low score from the high score.
Has too much detail been lost? If so, reconstruct the
Step 3: Find the size of the class intervals (i ) by di- table with a greater number of class intervals (or
viding R (from step 2) by k (from step 1): smaller interval size). Is the table too detailed? If so, re-
construct the table with fewer class intervals (or use
i R /k
wider intervals). Are there too many intervals with no
Round the value of i to a convenient whole number. cases in them? If so, consider using open-ended inter-
This will be the interval size or width. vals or intervals of unequal size. Remember that the
frequency distribution results from a number of deci-
Step 4: State the lowest interval so that its lower limit sions you make in a rather arbitrary manner. If the ap-
is equal to or below the lowest score. By the same to- pearance of the table seems less than optimal given
ken, your highest interval will be the one that contains the purpose of the research, redo the table until you are
the highest score. Generally, intervals should be equal satisfied that you have struck the best balance be-
in size, but unequal and open-ended intervals may be tween detail and conciseness.
used when convenient.
Step 8: Give your table a clear, concise title, and num-
Step 5: State the limits of the class intervals at the ber the table if your report contains more than one. All
same level of precision as you have used to measure categories and columns must also be clearly labeled.
the data. Do not overlap intervals. You will thereby
0– 4 10 10 11.11% 11.11
5 –9 5 15 5.56% 16.67
10 –14 8 23 8.89% 25.26
15 –19 12 35 13.33% 38.89
20 –24 18 53 20.00% 58.89
25 –29 12 65 13.33% 72.22
30 –34 3 68 3.33% 75.55
35 –39 2 70 2.22% 77.77
40 – 44 2 72 2.22% 79.99
45 – 49 6 78 6.67% 86.66
50 –54 12 90 13.33% 99.99
N 90 99.99%*
*Percentage columns will occasionally fail to total to 100% because of rounding error. If the total is be-
tween 99.90% and 100.10%, ignore the discrepancy. Discrepancies of greater than 0.10% may indi-
cate mathematical errors, and the entire column should be computed again.
42 PART I DESCRIPTIVE STATISTICS
Now, with the aid of the frequency distribution, some patterns in the data
can be discerned. There are three distinct groupings of scores in the table. Ten
residents were visited rarely, if at all (the 0 – 4 visits per year interval). The single
largest interval, with 18 cases, is 20 –24. Combined with the intervals immedi-
ately above and below, this represents quite a sizeable grouping of cases (42 out
of 90, or 46.66% of all cases) and suggests that the dominant visiting rate is about
twice a month, or approximately 24 visits per year. The third grouping, in the
50 –54 class interval (12 cases), reflects a visiting rate of about once a week. The
cumulative percentage column indicates that the majority of the residents
(58.89%) were visited 24 or fewer times a year.
2.7 CHARTS AND GRAPHS Researchers frequently use charts and graphs to present their data in ways that
are visually more dramatic than frequency distributions. These devices are par-
ticularly useful for conveying an impression of the overall shape of a distribution
and for highlighting any clustering of cases in a particular range of scores. Many
graphing techniques are available, but we will examine just four. The first two,
pie and bar charts, are appropriate for discrete variables at any level of measure-
ment. The last two, histograms and line charts (or frequency polygons), are used
with both discrete and continuous interval-ratio variables but are particularly ap-
propriate for the latter.
The sections that follow explain how to construct graphs and charts “by
hand.” These days, however, computer programs are almost always used to pro-
duce graphic displays. Graphing software is sophisticated and flexible but also
relatively easy to use; if such programs are available to you, you should familiar-
ize yourself with them. The effort required to learn these programs will be repaid
in the quality of the final product. The SPSS for Windows section at the end of this
chapter includes a demonstration of how to produce bar charts and line charts.
Bar Charts. Like pie charts, bar charts are relatively straightforward. Con-
ventionally, the categories of the variable are arrayed along the horizontal axis
(or abscissa) and frequencies, or percentages if you prefer, along the vertical axis
(or ordinate). For each category of the variable, construct (or draw) a rectangle
of constant width and with a height that corresponds to the number of cases in
the category. The bar chart in Figure 2.2 reproduces the marital status data from
Figure 2.1 and Table 2.20.
The chart in Figure 2.2 would be interpreted in exactly the same way as the
pie chart in Figure 2.1, and researchers are free to choose between these two
methods of displaying data. However, if a variable has more than four or five
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 43
FIGURE 2.1 SAMPLE PIE CHART: MARITAL TABLE 2.20 MARITAL STATUS OF RESPONDENTS,
STATUS OF RESPONDENTS COUNSELING CENTER SURVEY
(N = 20)
Frequency Percentage
Status (f ) (%)
Single
50% Single 10 50
Divorced Married 7 35
15% Divorced 3 15
N 20 100%
Married
35%
12
10
8
Frequency
0
Single Married Divorced
Marital status
categories, the bar chart would be preferred. With too many categories, the pie
chart gets very crowded and loses its visual clarity. To illustrate, Figure 2.3 uses
a bar chart to display the data on visiting rates for the retirement community pre-
sented in Table 2.19. A pie chart for this same data would have had 11 different
“slices,” a more complex or “busier” picture than that presented by the bar chart.
In Figure 2.3, the clustering of scores in the “20 to 24” range (approximately two
visits a month) is readily apparent, as are the groupings in the “0 to 4” and “50
to 54” ranges.
Bar charts are particularly effective ways to display the relative frequencies
for two or more categories of a variable when you want to emphasize some com-
parisons. Suppose, for example, that you wished to make a point about changing
rates of homicide victimization for white males and females since 1955. Figure 2.4
displays the data in a dramatic and easily comprehended way. The bar chart
shows that rates for males are higher than rates for females, that rates for both
sexes were highest in 1975, and that rates declined after that time. (For practice
in constructing and interpreting pie and bar charts, see problems 2.5b and 2.10.)
Histograms. Histograms look a lot like bar charts and, in fact, are con-
structed in much the same way. However, histograms use real limits rather than
stated limits, and the categories or scores of the variable border each other, as
if they merged into each other in a continuous series. Therefore, these graphs
are most appropriate for continuous interval-ratio-level variables, but they are
44 PART I DESCRIPTIVE STATISTICS
FIGURE 2.3 SAMPLE BAR CHART FOR VISITS PER YEAR, RETIREMENT COMMUNITY RESIDENTS (N 90)
20
18
16
14
Frequency
12
10
8
6
4
2
0
4
14
19
24
29
34
39
44
49
54
0
5
10
15
20
25
30
35
40
45
50
Number of visits
FIGURE 2.4 HOMICIDE VICTIMIZATION RATES, 1955 –2003 (selected rates, per 100,000 population, whites only)
14
Rate per 100,000 population
12
10
8
6
4
0
1955 1965 1975 1985 1995 2000 2003
Year
Males Females
Source: U.S. Bureau of the Census. 2007. Statistical Abstract of the United States, 2007. Washington,
D.C.: Government Printing Office. p. 195 (Available at: http://www.census.gov/prod/2006pubs/
07statab/ law.pdf)
250
200
150
Frequency
100
50
0
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Age of respondent
graph are 5 years wide, and their uneven heights reflect the varying number of
respondents for each 5-year group. The graph peaks around age 50, and the
sample has more respondents (higher bars) who are younger than 50 and fewer
respondents (lower bars) older than 50. Note also that there are no people in the
sample younger than age 18, the usual cutoff point for respondents to public-
opinion polls.
Line Charts. Construction of a line chart (or frequency polygon) is similar
to construction of a histogram. Instead of using bars to represent the frequencies,
however, use a dot at the midpoint of each interval. Straight lines then connect
the dots. Because the line is continuous from highest to lowest score, these graphs
are especially appropriate for continuous interval-ratio-level variables but are
frequently used with discrete interval-ratio-level variables. Figure 2.6 displays a
line chart for the visiting data previously displayed in the bar chart in Figure 2.3.
Line charts can also be used to display trends across time. Figure 2.7 shows
both marriage and divorce rates per 1000 population for the United States since
1950. Note that both rates rose until the early 1980s and have been falling since,
with the marriage rate falling slightly faster.
Histograms and frequency polygons are alternative ways of displaying es-
sentially the same message. Thus, the choice between the two techniques is
left to the aesthetic pleasures of the researcher. (For practice in constructing
46 PART I DESCRIPTIVE STATISTICS
FIGURE 2.6 NUMBER OF VISITS PER YEAR, RETIREMENT COMMUNITY RESIDENTS (N 90)
21
18
16
14
Frequency
12
10
8
6
4
2
0
4
14
19
24
29
34
39
44
49
54
0
5
10
15
20
25
30
35
40
45
50
Number of visits
FIGURE 2.7 U.S. MARRIAGE AND DIVORCE RATES, 1950 –2004 (rates per 1000 population)
12
10
8
Rate
0
1950
1952
1954
1956
1958
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
Year
Marriage Divorce
Source: U.S. Bureau of the Census. 2007. Statistical Abstract of the United States, 2007. Washington,
D.C.: Government Printing Office. p. 63. (Available at: http://www.census.gov/prod/2006pubs/
07statab/ law.pdf)
and interpreting histograms and line charts, see problems 2.7b, 2.8d, 2.9d, 2.11,
and 2.12.)
2.8 INTERPRETING A sizeable volume of statistical material has been introduced in this chapter, and
STATISTICS: USING it will be useful to conclude by focusing on meaning and interpretation. What can
PERCENTAGES, FRE- you say after you have calculated percentages, built a frequency distribution, or
QUENCY DISTRIBUTIONS, constructed a graph or chart? Remember that statistics are tools to help us analyze
CHARTS, AND GRAPHS information and answer questions. They never speak for themselves and they al-
TO ANALYZE CHANGING ways have to be understood in the context of some research question or test of
PATTERNS OF WORK- hypothesis. This section provides an example of interpretation by posing and an-
PLACE SURVEILLANCE swering some questions from social science research. The interpretation (words)
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 47
will be explicitly linked to the statistics (numbers) so that you will be able to see
how and why conclusions are developed.
80
70
60
Percent “yes”
50
40
30
20
10
0 e
se
ns
ce
l
ai
ai
rit
ile
us
us
m
m
ru
tio
an
cu
rf
E-
et
sa
e
te
rm
te
on
se
ic
rn
pu
er
pu
fo
vo
Ph
te
eo
nv
om
er
om
In
of
d
co
-p
Vi
C
C
ew
eo
e
on
vi
d
Vi
Re
Ph
Method
60
50
40
Percent “yes”
30
20
10
0
1997 1999 2001 2003 2005
Year
Percent “Yes”
Type of Monitoring and Surveillance 1997 1998 1999 2000 2001 2005
You will find that the statistics covered in this chapter tion, birthrates and death rates, residential patterns,
are frequently used in the research literature of the educational levels, and a host of other variables. Cen-
social sciences— as well as in the popular press sus data is readily available (at www.census.gov), but
and the media— and one of the goals of this text is since they represent information about the entire
to help you develop your skills in understanding and population (almost 290 million people), the numbers
critically analyzing these types of statistical informa- are often large, cumbersome, and awkward to use or
tion. Fortunately, this task is usually quite straightfor- understand. Thus, percentages, rates, and graphs
ward, but these statistical tools are sometimes not as are extremely useful statistical devices when analyz-
simple as they appear and they can be misused. ing or presenting census information.
Here are some ideas to keep in mind when reading Consider, for example, a recent report on the
research reports that use these statistics. changing U.S. family.* The purpose of the report was
First, there are many different formats for present- to present information regarding the structure of the
ing results, and the tables and graphs you find in the American family and to present and discuss recent
research literature will not necessarily follow the con- changes and trends. Consider how this report might
ventions used in this text. Second, because of space have read if the information had been given in words
limitations, tables and graphs may be presented with and raw numbers:
a minimum of detail. For example, the researcher
In 2003, there were about 57,320,000 million
may present a frequency distribution with only a per-
married-couple households, 13,620,000 million
centage column.
female-headed households, and 35,682,000
Begin your analysis by examining the statistics
million nonfamily households. Ten years earlier, in
carefully. If you are reading a table or graph, first
1993, there were 52,457,000 married-couple
read the title, all labels (that is, row and/or column
households, 11,692,000 female-headed house-
headings), and any footnotes. These will tell you ex-
holds, and 28,496,000 nonfamily households.
actly what information is being presented. Inspect
the body of the table or graph with the author’s anal- Can you distill any meaningful understandings about
ysis in mind. See if you agree with the author’s analy- American family life from these sentences? Raw in-
sis. (You almost always will, but it never hurts to formation simply does not speak for itself, and these
double-check and exercise your critical abilities.) facts have to be organized or placed in some con-
Finally, remember that most research projects text to reveal their meaning. Thus, social scientists
analyze interrelationships among many variables. almost always use percentages, rates, or graphs to
Because the tables and graphs covered in this chap- present this kind of info so that they can understand
ter display variables one at a time, they are unlikely to it themselves, assess the meaning, and convey their
be included in such research reports (or perhaps, interpretations to others.
included only as background information). Even In contrast with the foregoing raw information,
when not reported, you can be sure that the research consider the following table on family trends using
began with an inspection of percentages, frequency percentages rather than raw numbers.
distributions, or graphs for each variable. Univariate
tables and graphs display a great deal of information U.S. Households by Type, 1970 and 2003
about the variables in a compact, easily understood Percent of All
format and are almost universally used as descriptive Households
devices.
1970 2003
STATISTICS IN THE PROFESSIONAL LITERATURE Family households:
Social scientists rely heavily on the U.S. census for Married couples with
information about the characteristics and trends of children 40.3% 23.3%
change in American society, including age composi- (continued next page)
50 PART I DESCRIPTIVE STATISTICS
MARTIAL STATUS OF U.S. POPULATION, 1970 –2003 (15 years of age or older, male)
70
60
50
40
Percent
30
20
10
0
1970 1980 1990 2000
Year
Married Never Married Divorced/Seperated Widowed
MARTIAL STATUS OF U.S. POPULATION, 1970 –2003 (15 years of age or older, female)
70
60
50
40
Percent
30
20
10
0
1970 1980 1990 2000
Year
Married Never Married Divorced/Seperated Widowed
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 51
As we have seen, graphs are almost always more couples with children. Note also that the percentage
efficient and understandable ways of expressing of men and women who “never married” is increas-
trends. Some of the fundamental changes in Ameri- ing steadily. The author of the report attributes these
can family life are presented in the two line charts on changes to several factors, including people waiting
page 50, one for men and one for women. As you longer to get married and a high divorce rate.
would expect, the graphs show essentially the same
*Fields, Jason. 2004. “America’s Families and Living Arrange-
trends and, together with the frequency distribution, ments: 2003.” U.S. Bureau of the Census: Current Population
their message is pretty clear: The percentage of the Reports. Available at http://www.census.gov/prod/2004pubs/
population living in “married-couple households” is p20-553.pdf
declining, and this is particularly true for married
in monitoring and shows that monitoring of email messages, phone use, and
computer files have become particularly common.
Computers may be ubiquitous features of employment in our information-
age economy, but these data suggest that the potential for workplace monitor-
ing and surveillance is also increasing. The computer on your desk is a double-
edged sword. While it provides you with the necessary tools to do your job,
employers are increasingly using the very same technology to watch you.
SUMMARY
1. We considered several different ways of summariz- nomenon compared with the number of possible
ing the distribution of a single variable and, more occurrences per some unit of time. Percentage
generally, reporting the results of our research. Our change shows the relative increase or decrease in a
emphasis throughout was on the need to communi- variable over time.
cate our results clearly and concisely. You will often 3. Frequency distributions are tables that summarize
find that, as you strive to communicate statistical in- the entire distribution of some variable. Statistical
formation to others, the meanings of the information analysis almost always starts with the construction
will become clearer to you as well. and review of these tables for each variable. Col-
2. Percentages and proportions, ratios, rates, and per- umns for percentages, cumulative frequency, and/
centage change represent several different ways to or cumulative percentages often enhance the read-
enhance clarity by expressing our results in terms ability of frequency distributions.
of relative frequency. Percentages and proportions 4. Pie and bar charts, histograms, and line charts (or
report the relative occurrence of some category of a frequency polygons) are graphic devices used to
variable compared with the distribution as a whole. express the basic information contained in the
Ratios compare two categories with each other, and frequency distribution in a compact and visually
rates report the actual occurrences of some phe- dramatic way.
SUMMARY OF FORMULAS
f f1
Proportions 2.1 p Ratios 2.3 Ratio
N f2
f f2 f1
Percentage 2.2 % a b 100 Percent change 2.4 Percent change a b 100
N f1
52 PART I DESCRIPTIVE STATISTICS
GLOSSARY
Bar chart. A graphic display device for discrete vari- Midpoint. The point exactly halfway between the up-
ables. Categories are represented by bars of equal per and lower limits of a class interval.
width, the height of each corresponding to the num- Percentage. The number of cases in a category of a
ber (or percentage) of cases in the category. variable divided by the number of cases in all cate-
Class intervals. The categories used in the frequency gories of the variable, the entire quantity multiplied
distributions for interval-ratio variables. by 100.
Cumulative frequency. An optional column in a fre- Percent change. A statistic that expresses the
quency distribution that displays the number of magnitude of change in a variable from time 1 to
cases within an interval and all preceding intervals. time 2.
Cumulative percentage. An optional column in a fre- Pie chart. A graphic display device especially for dis-
quency distribution that displays the percentage of crete variables with only a few categories. A circle
cases within an interval and all preceding intervals. (the pie) is divided into segments proportional in
Frequency distribution. A table that displays the size to the percentage of cases in each category of
number of cases in each category of a variable. the variable.
Frequency polygon. A graphic display device for in- Proportion. The number of cases in one category of
terval-ratio variables. Class intervals are represented a variable divided by the number of cases in all cat-
by dots placed over the midpoints, the height of egories of the variable.
each corresponding to the number (or percentage) Rate. The number of actual occurrences of some phe-
of cases in the interval. All dots are connected by nomenon or trait divided by the number of pos-
straight lines. Same as a line chart. sible occurrences per some unit of time.
Histogram. A graphic display device for interval-ratio Ratio. The number of cases in one category divided
variables. Class intervals are represented by con- by the number of cases in some other category.
tiguous bars of equal width (equal to the class lim- Real class limits. The class intervals of a frequency
its), the height of each corresponding to the number distribution when stated as continuous categories.
(or percentage) of cases in the interval. Stated class limits. The class intervals of a frequency
Line chart. See Frequency polygon. distribution when stated as discrete categories.
PROBLEMS
2.1 SOC The tables that follow report the marital sta- a. What percentage of the respondents in each
tus of 20 respondents in two different apartment complex are married?
complexes. (HINT: Make sure that you have the b. What is the ratio of single to married respon-
correct numbers in the numerator and denomina- dents at each complex?
tor before solving the following problems. For ex- c. What proportion of each sample are widowed?
ample, problem 2.1a asks for “the percentage of re- d. What percentage of the single respondents live
spondents who are married in each complex,” and in Complex B?
the denominators will be 20 for these two fractions. e. What is the ratio of the “unmarried/living to-
Problem 2.1d, on the other hand, asks for the “per- gether” to the “married” at each complex?
centage of the single respondents who live in Com-
plex B,” and the denominator for this fraction will
2.2 At St. Algebra College, the numbers of males and
be 4 6, or 10.)
females in the various major fields of study are as
Status Complex A Complex B follows:
Married 5 10
Unmarried (“living together”) 8 2 Major Males Females Totals
Single 4 6
Separated 2 1 Humanities 117 83 200
Widowed 0 1 Social sciences 97 132 229
Divorced 1 0 Natural sciences 72 20 92
20 20 (continued next page)
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 53
1997 2005
a. Calculate the homicide rate per 100,000 popu- crease? Which society seems to have the largest
lation for each state and each province for change in homicide rates? Summarize your re-
each year. Relatively speaking, which state and sults in a paragraph
which province had the highest homicide rates
in each year? Which society seems to have the 2.5 SOC The scores of 15 respondents on four vari-
higher homicide rate? Write a paragraph de- ables are as reported next. These scores were taken
scribing these results. from a public opinion survey called the General
b. Using the rates you calculated in part a, calcu- Social Survey, or the GSS. This data set is used for
late the percent change between 1997 and 2005 the computer exercises in this text. Small subsam-
for each state and each province. Which states ples from the GSS will be used throughout the text
and provinces had the largest increase and de- to provide “real” data for problems. For the actual
54 PART I DESCRIPTIVE STATISTICS
questions and other details, see Appendix G. The tervals to display these scores, the interval size will be 2.
numerical codes for the variables are as follows: Since there are no scores of 0 or 1 for either test, you
Support for Level of may state the first interval as 2 –3. To make compar-
Sex Gun Control Education Age isons easier, both frequency distributions should have
the same intervals)
Male 1 In favor 0 Less Actual
than HS years 2.7 SOC Sixteen high school students completed a
Female 2 Opposed 1 HS class to prepare them for the College Boards.
2 Jr. college
3 Bachelor’s Their scores were as follows.
4 Graduate 420 345 560 650
Case Support for Level of 459 499 500 657
Number Sex Gun Control Education Age 467 480 505 555
480 520 530 589
1 2 1 1 45
2 1 2 1 48 These same 16 students were given a test of math and
3 2 1 3 55 verbal ability to measure their readiness for college-
4 1 1 2 32 level work. Scores are reported here in terms of the
5 2 1 3 33
6 1 1 1 28 percentage of correct answers for each test.
7 2 2 0 77 Math Test
8 1 1 1 50
9 1 2 0 43 67 45 68 70
10 2 1 1 48 72 85 90 99
11 1 1 4 33 50 73 77 78
12 1 1 4 35 52 66 89 75
13 1 1 0 39 Verbal Test
14 2 1 1 25
15 1 1 1 23 89 90 78 77
75 70 56 60
a. Construct a frequency distribution for each 77 78 80 92
variable. Include a column for percentages. 98 72 77 82
b. Construct pie and bar charts to display the
a. Display each of these variables in a frequency
distributions of sex, support for gun control,
distribution with columns for percentages and
and level of education.
cumulative percentages.
2.6 SW A local youth service agency has begun a sex b. Construct a histogram and frequency polygon
education program for teenage girls who have for these data.
been referred by the juvenile courts. The girls were c.2 Find the upper and lower real limits for the
given a 20-item test for general knowledge about intervals you established.
sex, contraception, and anatomy and physiology
2.8 GER Following are reported the number of times
upon admission to the program and again after
25 residents of a community for senior citizens left
completing the program. The scores of the first 15
their homes for any reason during the past week.
girls to complete the program are as follows.
0 2 1 7 3
Case Pretest Posttest Case Pretest Posttest 7 0 2 3 17
14 15 5 0 7
A 8 12 I 5 7
B 7 13 J 15 12 5 21 4 7 6
C 10 12 K 13 20 2 0 10 5 7
D 15 19 L 4 5
a. Construct a frequency distribution to display
E 10 8 M 10 15
F 10 17 N 8 11 these data.
G 3 12 O 12 20 b. What are the midpoints of the class intervals?
H 10 11 c. Add columns to the table to display the per-
centage distribution, cumulative frequency,
Construct frequency distributions for the pretest and and cumulative percentages.
posttest scores. Include a column for percentages.
(HINT: There were 20 items on the test, so the maxi-
mum range for these scores is 20. If you use 10 class in- 2 This problem is optional.
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 55
d.3 Find the real limits for the intervals you following data on police response time to calls for
selected. assistance during two different years. (Response
e. Construct a histogram and a frequency poly- times were rounded off to whole minutes.) Con-
gon to display this distribution. vert both frequency distributions into percent-
f. Write a paragraph summarizing this distribu- ages, and construct pie charts and bar charts to
tion of scores. display the data. Write a paragraph comparing the
2.9 SOC Twenty-five students completed a question- changes in response time between the two years.
naire that measured their attitudes toward inter- Response Frequency
personal violence. Respondents who scored high Time, 1995 (f )
believed that in many situations a person could le-
21 minutes or more 35
gitimately use physical force against another per- 16 –20 minutes 75
son. Respondents who scored low believed that in 11–15 minutes 180
no situation (or very few situations) could the use 6 –10 minutes 375
of violence be justified. Less than 6 minutes 210
52 47 17 8 92 875
53 23 28 9 90 Response Frequency
17 63 17 17 23 Time, 2005 (f )
19 66 10 20 47
21 minutes or more 45
20 66 5 25 17 16 –20 minutes 95
a. Construct a frequency distribution to display 11–15 minutes 155
these data. 6 –10 minutes 350
Less than 6 minutes 250
b. What are the midpoints of the class intervals?
c. Add columns to the table to display the per- 895
centage distribution, cumulative frequency,
and cumulative percentage. 2.11 SOC Figures 2.10 through 2.12 display trends in
d. Construct a histogram and a frequency poly- crime in the United States over the last two
gon to display these data. decades. Write a paragraph describing each of
e. Write a paragraph summarizing this distribu- these graphs. What similarities and differences
tion of scores. can you observe among the three graphs? (For ex-
ample, do crime rates always change in the same
2.10 PA/CJ As part of an evaluation of the efficiency direction?) Note the differences in the vertical
of your local police force, you have gathered the axes from chart to chart—for homicide the axis
ranges from 0 to 12, while for burglary and auto
3 This problem is optonal. theft the range is from 0 to 1600. The latter crimes
FIGURE 2.10 U.S. HOMICIDE RATES, 1984 –2005 (per 100,000 population)
12
Rate per 100,000 population
10
0
19 4
19 5
19 6
87
19 8
19 9
19 0
19 1
19 2
19 3
94
19 5
19 6
19 7
19 8
20 9
20 0
20 1
20 2
20 3
20 4
05
8
8
8
8
8
9
9
9
9
9
9
9
9
9
0
0
0
0
0
19
19
19
Year
56 PART I DESCRIPTIVE STATISTICS
FIGURE 2.11 U.S. ROBBERY AND AGGRAVATED ASSAULT RATES, 1984 –2005 (per 100,000 population)
500
450
400
300
250
200
150
100
50
0
19 4
19 5
19 6
19 7
19 8
19 9
19 0
19 1
19 2
93
19 4
19 5
19 6
97
19 8
20 9
20 0
20 1
20 2
20 3
04
05
8
8
8
8
8
8
9
9
9
9
9
9
9
9
0
0
0
0
19
19
19
20
Year
Robbery Aggravated assault
FIGURE 2.12 U.S. BURGLARY AND CAR THEFT RATES, 1984 –2005 (per 100,000 population)
1600
1400
1200
Rate per 100,000 Population
1000
800
600
400
200
0
84
19 5
19 6
19 7
19 8
19 9
19 0
19 1
92
19 3
19 4
19 5
19 6
19 7
98
20 9
20 0
20 1
02
20 3
20 4
05
8
8
8
8
8
9
9
9
9
9
9
9
9
0
0
0
0
19
19
19
19
20
Year
Burglary Car theft
are far more common, and a scale with smaller in- dangerous stretch of highway. Early in the year,
tervals is needed to display the rates. the city lowered the speed limit on this highway
and increased police patrols. Data on the num-
2.12 PA The city’s Department of Transportation has ber of accidents before and after the changes are
been keeping track of accidents on a particularly presented here. Did the changes work? Is the
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 57