0% found this document useful (0 votes)
6 views

Descriptive Statistics

.

Uploaded by

joseduardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Descriptive Statistics

.

Uploaded by

joseduardo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Descriptive

Part I Statistics

Part I consists of four chapters, each devoted to a different application of


univariate descriptive statistics. Chapter 2 covers “basic” descriptive statistics, in-
cluding percentages, ratios, rates, frequency distributions, and graphs. It is a
lengthy chapter, but the material is relatively elementary and at least vaguely fa-
miliar to most people. Although the statistics covered in this chapter are “basic,”
they are not necessarily simple or obvious, and the explanations and examples
should be considered carefully before attempting the end-of-chapter problems or
using them in actual research.
Chapter 3 and 4 cover measures of central tendency and dispersion, respec-
tively. Measures of central tendency describe the typical case or average score
(e.g., the mean), while measures of dispersion describe the amount of variety or
diversity among the scores (e.g., the range, or the distance from the high score to
the low score). These two types of statistics are presented in separate chapters to
stress the point that centrality and dispersion are independent, separate charac-
teristics of a variable. You should realize, however, that both measures are nec-
essary and commonly reported together (along with some of the statistics pre-
sented in Chapter 2). To reinforce the idea that measures of centrality and
dispersion are complementary descriptive statistics, many of the problems at the
end of Chapter 4 require the computation of a measure of central tendency from
Chapter 3.
Chapter 5 is a pivotal chapter in the flow of the text. It takes some of the sta-
tistics from Chapters 2 through 4 and applies them to the normal curve, a concept
of great importance in statistics. The normal curve is a type of line chart or fre-
quency polygon (see Chapter 2), which can be used to describe the position of
scores using means (Chapter 3) and standard deviations (Chapter 4). Chapter 5
also uses proportions and percentages (Chapter 2).
In addition to its role in descriptive statistics, the normal curve is a central
concept in inferential statistics, the topic of Part II of this text. Thus, Chapter 5
serves a dual purpose: It ends the presentation of univariate descriptive statistics
and lays essential groundwork for the material to come.
2 Basic Descriptive Statistics
Percentages, Ratios and Rates,
Tables, Charts, and Graphs

LEARNING OBJECTIVES By the end of this chapter, you will be able to

1. Explain the purpose of descriptive statistics in making data comprehensible.


2. Compute and interpret percentages, proportions, ratios, rates, and percentage
change.
3. Construct and analyze frequency distributions for variables at each of the three
levels of measurement.
4. Construct and analyze bar and pie charts, histograms, and line graphs.

Research results do not speak for themselves. They must be organized and ma-
nipulated so that whatever meaning they have can be quickly and easily under-
stood by the researcher and by his or her readers. Researchers use statistics to
clarify their results and communicate effectively. In this chapter, we consider
some commonly used techniques for presenting research results: percentages
and proportions, ratios and rates, percentage change, tables, charts, and graphs.
Mathematically speaking, these univariate descriptive statistics are not very com-
plex (although they are not as simple as they might seem at first glance), but they
are extremely useful for presenting research results clearly and concisely.

2.1 PERCENTAGES Consider the following statement: “Of the 269 cases handled by the court, 167
AND PROPORTIONS resulted in prison sentences of five years or more.” While there is nothing wrong
with this statement, the same fact could have been more clearly conveyed if it
had been reported as a percentage: “About 62% of all cases resulted in prison
sentences of five or more years.”
Percentages and proportions supply a frame of reference for reporting re-
search results, in the sense that they standardize the raw data: percentages to
the base 100 and proportions to the base 1.00. The mathematical definitions of
proportions and percentages are

f
FORMULA 2.1 Proportion: p 
N

f
FORMULA 2.2 Percentage: %  a b  100
N
where f  frequency, or the number of cases in any category
N  the number of cases in all categories
To illustrate the computation of percentages, consider the data presented
in Table 2.1. How can we find the percentage of cases in the first category (sen-
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 23

TABLE 2.1 DISPOSITION OF 269 CRIMINAL CASES (fictitious data)*

Frequency Percentage Proportion


Sentence (f ) (%) (p)

Five years or more 167 62.08 0.62


Less than five years 72 26.77 0.27
Suspended 20 7.44 0.07
Acquitted 10 3.72 0.04
N  269 100.01% 1.00

*The slight discrepancy in the totals of the percentage column is due to rounding error. See the Preface
(Basic Mathematical Review) for more on rounding.

tences of five years or more)? Note that there are 167 cases in the category (f 
167) and a total of 269 cases in all (N  269). So
f 167
Percentage 1% 2  a b  100  a b  100  10.62082  100  62.08%
N 269
Using the same procedures, we can also find the percentage of cases in the sec-
ond category:
f 72
Percentage 1% 2  a b  100  a b  100  10.26772  100  26.77%
N 269
Both results could have been expressed as proportions. For example, the pro-
portion of cases in the third category is 0.07:
f 20
Proportion 1 p2    0.07
N 269
Percentages and proportions are easier to read and comprehend than fre-
quencies. This advantage is particularly obvious when attempting to compare
groups of different sizes. For example, based on the information presented in
Table 2.2, which college has the higher relative number of social science ma-
jors? Because the total enrollments are so different, comparisons are difficult to
make from the raw frequencies. Computing percentages eliminates the differ-
ence in size of the two campuses by standardizing both distributions to the base
of 100. The same data are presented in percentages in Table 2.3.
The percentages in Table 2.3 make it easier to identify differences as well as
similarities between the two colleges. College A has a much higher percentage of

TABLE 2.2 DECLARED MAJOR FIELDS ON TWO COLLEGE CAMPUSES


(fictitious data)

Major College A College B

Business 103 3120


Natural sciences 82 2799
Social sciences 137 1884
Humanities 93 2176
N  415 N  9979
24 PART I DESCRIPTIVE STATISTICS

Application 2.1

Not long ago, in a large social service agency, the two sets of numbers (his staff versus the total staff
following conversation took place between the and the workload of his division versus the total
executive director of the agency and a supervisor workload of the agency), proportions or percent-
of one of the divisions. ages would be a more forceful way of presenting
Executive director: Well, I don’t want to results. What if the supervisor had said, “Only
seem abrupt, but I’ve only got a few min- 28.25% of the staff is assigned to my division, but
utes. Tell me, as briefly as you can, about we handle 37.26% of the total workload of the
this staffing problem you claim to be agency”? Is this a clearer message?
having. The first percentage is found by
Supervisor: Ma’am, we just don’t have f 50
enough people to handle our workload. % a b  100   100
N 177
Of the 177 full-time employees of the
agency, only 50 are in my division. Yet,  1.28252  100  28.25%
6231 of the 16,722 cases handled by the
and the second percentage is found by
agency last year were handled by my
division. f 6231
Executive director (smothering a yawn): % a b  100  a b  100
N 16,722
Very interesting. I’ll certainly get back to
you on this matter.  1.37262  100  37.26%
How could the supervisor have presented his case
more effectively? Because he wants to compare

social science majors (even though the absolute number of social science majors
is less than at College B) and about the same percentage of humanities majors.
How would you describe the differences in the remaining two major fields? (For
practice in computing and interpreting percentages and proportions, see problems
2.1 and 2.2.)
Here are some further guidelines on the use of percentages and proportions.

1. When working with a small number of cases (say, fewer than 20), it is usually
preferable to report the actual frequencies rather than percentages or propor-
tions. With a small number of cases, the percentages can change drastically
with relatively minor changes in the size of the data set. For example, if you

TABLE 2.3 DECLARED MAJOR FIELDS ON TWO COLLEGE CAMPUSES (fictitious data)

Major College A College B

Business 24.82% 31.27%


Natural sciences 19.76% 28.05%
Social sciences 33.01% 18.88%
Humanities 22.41% 21.81%
100.00% 100.01%
(415) (9979)
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 25

ONE STEP AT A TIME Finding Percentages and Proportions

Step 1: Determine the values for f (number of cases category and the entire group are the same (e.g.,
in a category) and N (number of cases in all cate- when all students are male). Proportions cannot ex-
gories). Remember that f will be the number of cases ceed 1.00, and percentages cannot exceed 100.00%.
in a specific category (e.g., males on your campus)
Step 2: For a proportion, divide f by N.
and N will be the number of cases in all categories
(e.g., all students, males and females, on your cam- Step 3: For a percentage, multiply the value you cal-
pus) and that f will be smaller than N, except when the culated in step 2 by 100.

begin with a group of 10 males and 10 females (that is, 50% of each gender)
and then add another female, the percentage distributions will change no-
ticeably to 52.38% female and 47.62% male. Of course, as the number of ob-
servations increases, each additional case will have a smaller impact. If we
started with 500 males and females and then added one more female, the per-
centage of females would change by only a tenth of a percent (from 50% to
50.10%).
2. Always report the number of observations along with proportions and per-
centages. This permits the reader to judge the adequacy of the sample size
and, conversely, helps to prevent the researcher from lying with statistics.
Statements like “Two out of three people questioned prefer courses in statis-
tics to any other course” might impress you, but the claim would lose its gloss
if you learned that only three people were tested. You should be extremely sus-
picious of reports that fail to report the number of cases that were tested.
3. Percentages and proportions can be calculated for variables at the ordinal and
nominal levels of measurement, in spite of the fact that they require division.
This is not a violation of the level-of-measurement guideline (see Table 1.2).
Percentages and proportions do not require the division of the scores of the
variable (as would be the case in computing the average score on a test, for
example) but rather the number of cases in a particular category (f ) of the vari-
able by the total number of cases in the sample (N). When we make a state-
ment like “43% of the sample is female,” we are merely expressing the relative
size of a category (female) of the variable (gender) in a convenient way.

2.2 RATIOS, RATES, Ratios, rates, and percentage change provide some additional ways of summa-
AND PERCENTAGE rizing results simply and clearly. Although they are similar to each other, each
CHANGE statistic has a specific application and purpose.

Ratios. Ratios are especially useful for comparing categories of a variable in


terms of relative frequency. Instead of standardizing the distribution of the vari-
able to the base 100 or 1.00, as we did in computing percentages and proportions,
we determine ratios by dividing the frequency of one category by the frequency
in another. Mathematically, a ratio can be defined as
f1
FORMULA 2.3 Ratio 
f2
26 PART I DESCRIPTIVE STATISTICS

Application 2.2

In Table 2.2, how many natural science majors are f1 2799


there compared to social science majors at College Ratio    1.49
f2 1884
B? This question could be answered with frequen-
cies, but a more easily understood way of express- For every social science major, there are 1.49 nat-
ing the answer would be with a ratio. The ratio of ural science majors at College B.
natural science to social science majors would be

where f1  the number of cases in the first category


f2  the number of cases in the second category
To illustrate the use of ratios, suppose that you were interested in the relative sizes
of the various religious denominations and found that a particular community
included 1370 Protestant families and 930 Catholic families. To find the ratio of
Protestants (f1) to Catholics (f2), divide 1370 by 930:
f1 1370
Ratio    1.47
f2 930
The ratio of 1.47 means that there are 1.47 Protestant families for every Catholic
family.
Ratios can be very economical ways of expressing the relative predomi-
nance of two categories. That Protestants outnumber Catholics in our example
is obvious from the raw data. Percentages or proportions could have been used
to summarize the overall distribution (e.g., “59.56% of the families were Protes-
tant, 40.44% were Catholic”). In contrast to these other methods, ratios express
the relative size of the categories: They tell us exactly how much one category
outnumbers the other.
Ratios are often multiplied by some power of 10 to eliminate decimal
points. For example, the ratio just computed might be multiplied by 100 and re-
ported as 147 instead of 1.47. This would mean that there are 147 Protestant
families for every 100 Catholic families in the community. To ensure clarity, the
comparison units for the ratio are often expressed as well. Based on a unit of
ones, the ratio of Protestants to Catholics would be expressed as 1.47:1. Based
on hundreds, the same statistic might be expressed as 147:100. (For practice in
computing and interpreting ratios, see problems 2.1 and 2.2.)

Rates. Rates provide still another way of summarizing the distribution of a


single variable. Rates are defined as the number of actual occurrences of some
phenomenon divided by the number of possible occurrences per some unit of
time. Rates are usually multiplied by some power of 10 to eliminate decimal
points. For example, the crude death rate for a population is defined as the num-
ber of deaths in that population (actual occurrences) divided by the number of
people in the population (possible occurrences) per year. This quantity is then
multiplied by 1000. The formula for the crude death rate can be expressed as
Number of Deaths
Crude death rate   1000
Total population
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 27

Application 2.3

In 2005, there were 2500 births in a city of 167,000. In 1965, there were 20.30 births for every 1000
In 1965, when the population of the city was only people in the city. For 2000:
133,000, there were 2700 births. Is the birthrate ris-
2500
ing or falling? Although this question can be an- Crude birthrate   1000  14.97
swered from the preceding information, the trend 167,000
in birthrates will be much more obvious if we com- In 2005, there were 14.97 births for every 1000
pute birthrates for both years. Like crude death people in the city. With the help of these statistics,
rates, crude birthrates are usually multiplied by the decline in the birthrate is clearly expressed.
1000 to eliminate decimal points. For 1965:
2700
Crude birthrate   1000  20.30
133,000

If there were 100 deaths during a given year in a town of 7000, the crude death
rate for that year would be
100
Crude death rate   1,000  10.014292  1,000  14.29
7,000
Or, for every 1000 people, there were 14.29 deaths during this particular year.
In the same way, if a city of 237,000 people experienced 120 auto thefts during
a particular year, the auto theft rate would be
120
Auto theft rate   100,000  10.00050632  100,000  50.63
237,000
Or, for every 100,000 people, there were 50.63 auto thefts during the year in ques-
tion. (For practice in computing and interpreting rates, see problems 2.3 and 2.4a.)

Percentage Change. Measuring social change, in all its variety, is an impor-


tant task for all social sciences. One very useful statistic for this purpose is the
percentage change, which tells us how much a variable has increased or de-
creased over a certain span of time.
To compute this statistic, we need the scores of a variable at two different
points in time. The scores could be in the form of frequencies, rates, or percent-
ages. The percentage change will tell us how much the score has changed at the
later time relative to the earlier time. Using death rates as an example once again,
imagine a society suffering from a devastating outbreak of disease in which
the death rate rose from 16 deaths per 1000 population in 1995 to 24 deaths per
1000 in 2005. Clearly, the death rate is higher in 2005, but by how much relative
to 1995?
The formula for the percent change is
f2  f1
FORMULA 2.4 Percent change  a b  100
f1
where f1  first score, frequency, or value
f2  second score, frequency, or value
28 PART I DESCRIPTIVE STATISTICS

Application 2.4

The American family has been changing rapidly 31.7


over the past several decades. One major change  a b  100
36.7
has been an increase in the number of married
 1.863762  100
women and mothers with jobs outside the home.
For example, in 1975, 36.7% of women with chil-  86.38%
dren under the age of 6 worked outside the home. In the 30-year period between 1975 and 2005, the
In 2005, this percentage had risen to 68.4%.* How percentage of women with children younger than
large has this change been? 6 who worked outside the home increased by
It is obvious that the 2005 percentage is much 86.38%. This is an extremely large change (ap-
higher, and calculating the percentage change will proaching 100%, or double the earlier percentage)
give us an exact idea of the magnitude of the in a short time frame and signals major changes in
change. The 1975 percentage is f1 and the 2005 this social institution.
figure is f2, so
*U.S. Bureau of the Census. 2007. Statistical Abstract of
68.4  36.7 the United States, 2007. Washington, DC: Government
Percent change  a b  100
36.7 Printing Office. p. 380.

In our example, f1 is the death rate in 1995 ( f1  16) and f2 is the death rate in
2005 ( f2  24). The formula tells us to subtract the earlier score from the later and
then divide by the earlier score. The value that results expresses the size of the
change in scores ( f2  f1) relative to the score at the earlier time ( f1). The value
is then multiplied by 100 to express the change in the form of a percentage:
24  16 8
Percent change  a b  100  a b  100  1.502  100  50%
16 16
The death rate in 2005 is 50% higher than in 1995. This means that the 2005 rate
was equal to the 1995 rate plus half of the earlier score. If the rate had risen
to 32 deaths per 1000, the percent change would have been 100% (the rate
would have doubled), and if the death rate had fallen to 8 per 1000, the percent
change would have been 50%. Note the negative sign: It means that the death
rate has decreased by 50%. The 2005 rate would have been half the size of the
1995 rate.
An additional example should make the computation and interpretation of
the percentage change clearer. Suppose we wanted to compare the projected
population growth rates for various nations over the next 50 years. The necessary
information is presented in Table 2.4, which shows the actual population for each
nation in 2000 and the projected population for 2050. The “Increase/Decrease”
column shows how many people will be added or lost over the 50-year time span.
Casual inspection will give us some information about population trends. For
example, compare the “Increase/Decrease” column for China and the United
States. These societies are projected to add roughly similar numbers of people
(about 155 million for China, a little less for the United States), but, since China’s
2000 population is about five times the size of the population of the United States,
its percent change will be much lower (about 12% vs. almost 50%).
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 29

TABLE 2.4 PROJECTED POPULATION GROWTH FOR SIX NATIONS, 2000 –2050

Nation Population, Population, Increase/Decrease Percent


2000 (f1) 2050 (f2) (f2  f1) Change

China 1,268,853,000 1,424,162,000 155,309,000 12.24


United States 282,339,000 420,081,000 137,742,000 48.79
Canada 31,278,000 41,430,000 10,152,000 32.46
Mexico 99,927,000 147,908,000 47,981,000 48.02
Italy 57,719,000 50,390,000 7,329,000 12.70
Nigeria 114,307,000 356,524,000 242,217,000 211.90

Source: U.S. Bureau of the Census: http://www.census.gov/ipc/www/idbsum.html

Calculating percent change will make comparisons more precise. The right-
hand column shows the percent change in projected population for each nation.
These values were computed by subtracting the 2000 population ( f1) from the
2050 population ( f2), dividing by the 2000 population, and multiplying by 100.
Although China has the largest population of these six nations, it will grow
at the slowest rate (12.24%). The United States and Mexico will increase by about
50% (in 2050, their populations will be half again larger than in 2000) and Can-
ada will grow by about one-third. Italy will actually lose people and its popula-
tion will decline by over 12%. Nigeria has by far the highest growth rate: It will
add the most people and its population will increase in size by over 200%.
This means that in 2050 the population of Nigeria will be more than three times
its 2000 size. (For practice in computing and interpreting percent change, see
problem 2.4b.)

ONE STEP AT A TIME Finding Ratios, Rates, and Percent Change

Ratios Step 3: Divide the number of actual occurrences


(step 1) by the number of possible occurrences
Step 1: Determine the values for f1 and f2. The value for
(step 2).
f1 will be the number of cases in the first category (e.g.,
the number of males on your campus), and the value Step 4: Multiply the value you calculated in step 3 by
for f2 will be the number of cases in the second cate- some power of 10. Conventionally, birthrates and
gory (e.g., the number of females on your campus). death rates are multiplied by 1000 and crime rates are
multiplied by 100,000.
Step 2: Divide the value of f1 by the value of f2.
Percent Change
Rates Step 1: Determine the values for f1 and f2. The former
will be the score at time 1 (the earlier time) and the lat-
Step 1: Determine the number of actual occurrences
ter will be the score at time 2 (the later time).
(e.g., births, deaths, homicides, assaults). This value
will be the numerator. Step 2: Subtract f1 from f2.

Step 2: Determine the number of possible occur- Step 3: Divide the quantity you found in step 2 by f1.
rences. This value will usually be the total population Step 4: Multiply the quantity you found in step 3
for the area in question. by 100.
30 PART I DESCRIPTIVE STATISTICS

2.3 FREQUENCY Frequency distributions are tables that summarize the distribution of a variable
DISTRIBUTIONS: by reporting the number of cases contained in each category of the variable.
INTRODUCTION They are very helpful and commonly used ways of organizing and working with
data. In fact, the construction of frequency distributions is almost always the first
step in any statistical analysis.
To illustrate the usefulness of frequency distributions and to provide some
data for examples, assume that the counseling center at a university is assessing
the effectiveness of its services. Any realistic evaluation research would collect a
variety of information from a large group of students, but, for the sake of this ex-
ample, we will confine our attention to just four variables and 20 students. The
data are reported in Table 2.5.
Note that, even though the data in Table 2.5 represent an unrealistically low
number of cases, it is difficult to discern any patterns or trends. For example, try
to ascertain the general level of satisfaction of the students from Table 2.5. You
may be able to do so with just 20 cases, but it will take some time and effort. Imag-
ine the difficulty with 50 cases or 100 cases presented in this fashion. Clearly the
data need to be organized in a format that allows the researcher (and his or her
audience) to understand easily the distribution of the variables.
One general rule that applies to all frequency distributions is that the cate-
gories of the frequency distribution must be exhaustive and mutually exclusive.
In other words, the categories must be stated in a way that permits each case to
be counted in one and only one category. This basic principle applies to the con-
struction of frequency distributions for variables measured at all three levels of
measurement.
Beyond this rule, there are only guidelines to help you construct useful fre-
quency distributions. As you will see, the researcher has a fair amount of discre-
tion in stating the categories of the frequency distribution (especially with vari-

TABLE 2.5 DATA FROM COUNSELING CENTER SURVEY

Satisfaction
Student Sex Marital Status with Services Age

A Male Single 4 18
B Male Married 2 19
C Female Single 4 18
D Female Single 2 19
E Male Married 1 20
F Male Single 3 20
G Female Married 4 18
H Female Single 3 21
I Male Single 3 19
J Female Divorced 3 23
K Female Single 3 24
L Male Married 3 18
M Female Single 1 22
N Female Married 3 26
O Male Single 3 18
P Male Married 4 19
Q Female Married 2 19
R Male Divorced 1 19
S Female Divorced 3 21
T Male Single 2 20
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 31

ables measured at the interval-ratio level). I will identify the issues to consider as
you make decisions about the nature of any particular frequency distribution. Ul-
timately, however, the guidelines I state are aids for decision-making, nothing
more than helpful suggestions. As always, the researcher has the final responsi-
bility for making sensible decisions and presenting his or her data in a meaning-
ful way.

2.4 FREQUENCY Nominal-Level Variables. For nominal-level variables, constructing fre-


DISTRIBUTIONS FOR quency distribution is typically very straightforward. Count the number of times
VARIABLES MEASURED each category or score of the variable occurred and then display the frequencies
AT THE NOMINAL in table format. Table 2.6 displays a frequency distribution for the variable “sex”
AND ORDINAL LEVELS from the counseling center survey. For purposes of illustration, a column for tal-
lies has been included in this table to illustrate how the score could be counted.
This column would not be included in the final form of the frequency distribu-
tion. Note that the table has a descriptive title, clearly labeled categories (male
and female), and a report of the total number of cases at the bottom of the fre-
quency column. These items must be included in all tables regardless of the vari-
able or level of measurement. The meaning of the table is quite clear. There are
10 males and 10 females in the sample, a fact that is much easier to comprehend
from the frequency distribution than from the unorganized data presented in
Table 2.5.
For some nominal variables, the researcher might have to make some choices
about the number of categories he or she wishes to report. For example, the dis-
tribution of the variable “marital status” could be reported using the categories
listed in Table 2.5. Table 2.7 presents the resultant frequency distribution. Al-
though this is a perfectly fine frequency distribution, it may be too detailed for
some purposes. For example, the researcher might want to focus on unmarried
as opposed to married students. That is, the researcher might not be concerned
with the difference between single and divorced respondents but may want to
treat both as simply “not married.” In that case, these categories could be grouped
together and treated as a single entity, as in Table 2.8. Notice that, when categories
are collapsed like this, information and detail will be lost. This latter version of
the table would not allow the researcher to discriminate between the two un-
married states.

Ordinal-Level Variables. Frequency distributions for ordinal-level variables


are constructed following the same routines used for nominal-level variables.
Table 2.9 reports the frequency distribution of the “satisfaction” variable from the
counseling center survey. Note that a column of percentages by category has
been added to this table. Such columns heighten the clarity of the table (espe-
cially with larger samples) and are common adjuncts to the basic frequency

TABLE 2.6 SEX OF RESPONDENTS, COUNSELING CENTER SURVEY

Sex Tallies Frequency (f )

Male //// //// 10


Female //// //// 10
N = 20
32 PART I DESCRIPTIVE STATISTICS

TABLE 2.7 MARITAL STATUS TABLE 2.8 MARITAL STATUS


OF RESPONDENTS, COUNSEL- OF RESPONDENTS, COUNSEL-
ING CENTER SURVEY ING CENTER SURVEY

Status Frequency (f ) Status Frequency (f )

Single 10 Married 7
Married 7 Not married 13
Divorced 3
N  20
N  20

TABLE 2.9 SATISFACTION WITH SERVICES, COUNSELING CENTER SURVEY

Frequency Percentage
Satisfaction (f ) (%)

(4) Very satisfied 4 20


(3) Satisfied 9 45
(2) Dissatisfied 4 20
(1) Very dissatisfied 3 15
N  20 100%

TABLE 2.10 SATISFACTION WITH SERVICES, COUNSELING CENTER SURVEY

Frequency Percentage
Satisfaction (f ) (%)

Satisfied 13 65
Dissatisfied 7 35
N  20 100%

distribution for variables measured at all levels. This table reports that most stu-
dents were either satisfied or very satisfied with the services of the counseling
center. The most common response (nearly half the sample) was “satisfied.” If the
researcher wanted to emphasize this major trend, the categories could be col-
lapsed as in Table 2.10. Again, the price paid for this increased compactness is
that some information (in this case, the exact breakdown of degrees of satisfac-
tion and dissatisfaction) is lost. (For practice in constructing and interpreting fre-
quency distributions for nominal- and ordinal-level variables, see problem 2.5.)

2.5 FREQUENCY Basic Considerations. In general, the construction of frequency distributions


DISTRIBUTIONS for variables measured at the interval-ratio level is more complex than for nominal
FOR VARIABLES and ordinal variables. Interval-ratio variables usually have a large number of pos-
MEASURED sible scores (that is, a wide range from the lowest to the highest score). The large
AT THE INTERVAL- number of scores requires some collapsing or grouping of categories to produce
RATIO LEVEL reasonably compact frequency distributions. To construct frequency distributions
for interval-ratio-level variables, you must decide how many categories to use and
how wide these categories should be.
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 33

Application 2.5

The following list shows the ages of 50 prisoners bution. Columns may be added for percentages,
enrolled in a work-release program. Is this group cumulative percentages, and/or cumulative fre-
young or old? A frequency distribution will provide quency. The complete distribution, with a column
an accurate picture of the overall age structure. added for percentages, is
18 60 57 27 19 Ages Frequency Percentages
20 32 62 26 20
25 35 75 25 21 18 –23 10 20
30 45 67 41 30 24 –29 7 14
37 47 65 42 25 30 –35 9 18
18 51 22 52 30 36 – 41 5 10
22 18 27 53 38 42– 47 8 16
27 23 32 35 42 48 –53 4 8
32 37 32 40 45 54 –59 2 4
55 42 45 50 47 60 – 65 3 6
66 –71 1 2
We will use about 10 intervals to display these data. 72–77 1 2
By inspection we see that the youngest prisoner is
N  50 100 %
18 and the oldest is 75. The range is thus 57. Inter-
val size will be 57/10, or 5.7, which we can round The prisoners seem to be fairly evenly spread
off to either 5 or 6. Let’s use a six-year interval be- across the age groups up to the 48 –53 interval.
ginning at 18. The limits of the lowest interval will There is a noticeable lack of prisoners in the old-
be 18 –23. Now we must state the limits of all other est age groups and a concentration of prisoners in
intervals, count the number of cases in each inter- their 20s and 30s.
val, and display these counts in a frequency distri-

For example, suppose you wished to report the distribution of the variable
“age” for a sample drawn from a community. Unlike the college data reported in
Table 2.5, a community sample would have a very broad range of ages. If you
simply reported the number of times that each year of age (or score) occurred,
you could easily wind up with a frequency distribution that contained 80, 90, or
even more categories. Such a large frequency distribution would not present a
concise picture. The scores (years) must be grouped into larger categories to
heighten clarity and ease of comprehension. How large should these categories
be? How many categories should be included in the table? Although there are no
hard-and-fast rules for making these decisions, they always involve a trade-off
between more detail (a greater number of narrow categories) or more compact-
ness (a smaller number of wide categories).

Constructing the Frequency Distribution. To introduce the mechanics


and decision-making processes involved, we will construct a frequency distribu-
tion to display the ages of the students in the counseling center survey. Because
of the narrow age range of a group of college students, we can use categories of
only one year (these categories are often called class intervals when working with
interval-ratio data). The frequency distribution is constructed by listing the ages
34 PART I DESCRIPTIVE STATISTICS

TABLE 2.11 AGE OF RESPONDENTS, COUNSELING CENTER SURVEY


(interval width  one year of age)

Class Intervals Frequency (f )

18 5
19 6
20 3
21 2
22 1
23 1
24 1
25 0
26 1
N  20

from youngest to oldest, counting the number of times each score (year of age)
occurs, and then totaling the number of scores for each category. Table 2.11 pres-
ents the information and reveals a concentration or clustering of scores in the 18
and 19 class intervals.
Even though the picture presented in this table is fairly clear, assume for the
sake of illustration that you desire a more compact (less detailed) summary. To
do this, you will have to group scores into wider class intervals. By increasing the
interval width (say, to two years), you can reduce the number of intervals and
achieve a more compact expression. The grouping of scores in Table 2.12 clearly
emphasizes the relative predominance of younger respondents. This trend in the
data can be stressed even more by the addition of a column displaying the per-
centage of cases in each category.
Note that the class intervals in Table 2.12 have been stated with an appar-
ent gap between them (that is, the class intervals are separated by a distance of
one unit). At first glance, these gaps may appear to violate the principle of ex-
haustiveness; but, because age has been measured in whole numbers, the gaps
actually pose no problem. Given the level of precision of the measurement (in
whole years, as opposed to, say, 10ths of a year), no case could have a score
falling between these class intervals. For these data, the set of class intervals in
Table 2.12 are exhaustive and mutually exclusive. Each of the 20 respondents in
the sample can be sorted into one and only one age category.

TABLE 2.12 AGE OF RESPONDENTS, COUNSELING CENTER SURVEY


(interval width  two years of age)

Class Frequency Percentage


Intervals (f ) (%)

18 –19 11 55
20 –21 5 25
22 –23 2 10
24 –25 1 5
26 –27 1 5
N  20 100%
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 35

ONE STEP AT A TIME Finding Midpoints

Step 1: Find the upper and lower limits of the lowest Step 3: Midpoints for other intervals can be found by
interval in the frequency distribution. For any interval, repeating steps 1 and 2 for each interval. As an alter-
the upper limit is the highest score included in the in- native, you can find the midpoint for any interval by
terval and the lower limit is the lowest score included adding the value of the interval width to the midpoint
in the interval. For example, for the top set of intervals of the next lower interval. For example, the lowest in-
in Table 2.13, the lowest interval (0 –2) includes scores terval in Table 2.13 is 0 –2 and the midpoint is 1. Inter-
of 0, 1, and 2. The upper limit of this interval is 2 and vals are 3 units wide (that is, they each include three
the lower limit is 0. scores), so the midpoint for the next higher interval
(3 –5) is 1  3, or 4. The midpoint for the interval 6 – 8
Step 2: Add the upper and lower limits and divide by
is 4  3, or 7, and so forth.
2. For the interval 0 –2: (0  2)/2  1. The midpoint for
this interval is 1.

However, consider the potential difficulties if age had been measured with
greater precision. If age had been measured in 10ths of a year, into which class
interval in Table 2.12 would a 19.4-year-old subject be placed? You can avoid
this ambiguity by always stating the limits of the class intervals at the same level
of precision as the data. Thus, if age were being measured in 10ths of a year,
the limits of the class intervals in Table 2.12 would be stated in 10ths of a year.
For example:
17.0 –18.9
19.0 –20.9
21.0 –22.9
23.0 –24.9
25.0 –26.9
To maintain mutual exclusivity between categories, do not overlap the class in-
tervals. If you state the limits of the class intervals at the same level of precision
as the data (which might be in whole numbers, tenths, hundredths, etc.) and
maintain a “gap” between intervals, you will always produce a frequency distri-
bution where each case can be assigned to one and only one category.

Midpoints. On occasion, you will need to work with the midpoints of the
class intervals, for example, when constructing or interpreting certain graphs.
Midpoints are defined as the points exactly halfway between the upper and
lower limits and can be found for any interval by dividing the sum of the upper
and lower limits by 2. Table 2.13 displays midpoints for two different sets of
class intervals. (For practice in finding midpoints, see problems 2.8b and 2.9b.)

Real Limits.1 For certain purposes, you must eliminate the “gap” between
class intervals and treat a distribution as a continuous series of categories that

1This section is optional. It is necessary for understanding the material presented in Chapters
3 and 4 on computing measures of central tendency and dispersion for grouped data.
36 PART I DESCRIPTIVE STATISTICS

ONE STEP AT A TIME Finding Real Limits*

Step 1: Find the distance (the “gap”) between the Step 3: Add the value found in step 2 to all upper
stated class intervals. In Table 2.12, for example, this stated limits and subtract it from all lower stated limits.
value is 1.
*This section is optional.
Step 2: Divide the value found in step 1 in half.

TABLE 2.13 MIDPOINTS

Class Interval Width  3

Class Intervals Midpoints

0 –2 1.0
3 –5 4.0
6–8 7.0
9 –11 10.0

Class Interval Width  6

Class Intervals Midpoints

100 –105 102.5


106 –111 108.5
112 –117 114.5
118 –123 120.5

border each other. This is necessary for the construction of some graphs (see Sec-
tion 2.7) and for computing summary statistics for variables that have been
grouped into frequency distributions.
To illustrate, we’ll begin with Table 2.12. Note the “gap” of one year between
intervals. As we saw before, the gap is only apparent: scores are measured in
whole years (i.e., 19, 21 vs. 19.5 or 21.3) and cannot fall between intervals. These
types of class intervals are called stated class limits and they organize the scores
of the variable into a series of discrete, nonoverlapping intervals.
To treat the variable as continuous, we must use the real class limits. To
find the real limits of any class interval, divide the distance between the stated
class intervals (the “gap”) in half and add the result to all upper stated limits and
subtract it from all lower stated limits. This process is illustrated below with the
class intervals stated in Table 2.12. The distance between intervals is one, so the
real limits can be found by adding 0.5 to all upper limits and subtracting 0.5 from
all lower limits.
Stated Limits Real Limits
18 –19 17.5–19.5
20 –21 19.5–21.5
22–23 21.5–23.5
24 –25 23.5–25.5
26 –27 25.5–27.5
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 37

TABLE 2.14 REAL CLASS LIMITS

Class Intervals (stated limits) Real Class Limits

3 –5 2.5 –5.5
6–8 5.5 – 8.5
9 –11 8.5 –11.5

Class Intervals (stated limits) Real Class Limits

100 –105 99.5 –105.5


106 –111 105.5 –111.5
112 –117 111.5 –117.5
118 –123 117.5 –123.5

Note that, with real limits, the class intervals overlap and the distribution can be
seen as continuous. Table 2.14 presents additional illustrations of real limits for
two different sets of class intervals. In both cases, the “gap” between the stated
limits is 1. (For practice in finding real limits, see problem 2.7c and problem 2.8d.)

Cumulative Frequency and Cumulative Percentage. Two commonly


used adjuncts to the basic frequency distribution for interval-ratio data are the cu-
mulative frequency and cumulative percentage columns. Their primary pur-
pose is to allow the researcher (and his or her audience) to tell at a glance how
many cases fall below a given score or class interval in the distribution.
To construct a cumulative frequency column, begin with the lowest class in-
terval (i.e., the class interval with the lowest scores) in the distribution. The entry
in the cumulative frequency columns for that interval will be the same as the num-
ber of cases in the interval. For the next-higher interval, the cumulative frequency
will be all cases in the interval plus all the cases in the first interval. For the third
interval, the cumulative frequency will be all cases in the interval plus all cases in
the first two intervals. Continue adding (or accumulating) cases until you reach
the highest class interval, which will have a cumulative frequency of all the cases
in the interval plus all cases in all other intervals. For the highest interval, cumu-
lative frequency equals the total number of cases. Table 2.15 shows a cumulative
frequency column added to Table 2.12.
The cumulative percentage column of Table 2.15 is quite similar to the cu-
mulative frequency column. Begin by adding a column to the basic frequency dis-

TABLE 2.15 AGE OF RESPONDENTS, COUNSELING CENTER SURVEY

Class Frequency Cumulative


Intervals (f ) Frequency

18 –19 11 11
20 –21 5 16
22 –23 2 18
24 –25 1 19
26 –27 1 20
N  20
38 PART I DESCRIPTIVE STATISTICS

tribution for percentages as in Table 2.12. This column shows the percentage of
all cases in each class interval. To find cumulative percentages, follow the same
addition pattern explained earlier for cumulative frequency. That is, the cumula-
tive percentage for the lowest class interval will be the same as the percentage of
cases in the interval. For the next-higher interval, the cumulative percentage is
the percentage of cases in the interval plus the percentage of cases in the first in-
terval, and so on. Table 2.16 shows the age data with a cumulative percentage
column added.
These cumulative columns are quite useful in situations where the researcher
wants to make a point about how cases are spread across the range of scores. For
example, Tables 2.15 and 2.16 show quite clearly that most students in the coun-
seling center survey are less than 21 years of age. If the researcher wishes to
impress this feature of the age distribution on his or her audience, then these
cumulative columns are quite handy. Most realistic research situations will be
concerned with many more than 20 cases and/or many more categories than our
tables have. Since the cumulative percentage column is clearer and easier to
interpret in such cases, it is normally preferred to the cumulative frequencies
column.

Unequal Class Intervals. As a general rule, the class intervals of frequency


distributions should be equal in size in order to maximize clarity and ease of
comprehension. For example, note that all of the class intervals in Tables 2.15
and 2.16 are the same width (2 years). There are several situations, however, in
which the researcher may chose to use open-ended class intervals or intervals
of unequal size. Open-ended intervals have an unspecified upper or lower limit
and can be used when there are a few cases with unusually high or low scores.
Intervals of unequal size can be used to collapse a variable with a wide range
of scores into more easily comprehended groupings. We will examine each sit-
uation separately.

Open-Ended Intervals. What would happen to the frequency distribution in


Table 2.15 if we added one more student who was 47 years of age? We would
now have 21 cases and there would be a large gap between the oldest respon-
dent (now 47) and the second oldest (age 26). If we simply added the older stu-
dent to the frequency distribution, we would have to include nine new class

TABLE 2.16 AGE OF RESPONDENTS, COUNSELING CENTER SURVEY

Class Frequency Cumulative Cumulative


Intervals (f ) Frequency Percentage Percentage

18 –19 11 11 55% 55%


20 –21 5 16 25% 80%
22 –23 2 18 10% 90%
24 –25 1 19 5% 95%
26 –27 1 20 5% 100%
N  20 100%
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 39

TABLE 2.17 AGE OF RESPONDENTS, COUNSELING CENTER SURVEY (N  21)

Class Frequency Cumulative


Intervals (f ) Frequency

18 –19 11 11
20 –21 5 16
22 –23 2 18
24 –25 1 19
26 –27 1 20
28 and older 1 21
N  21

intervals (28 –30, 31–32, 32–33, etc.) with zero cases in them before we got to the
46 – 47 interval. This would waste space and probably be unclear and confusing.
An alternative way to handle the situation would be to add an “open-ended” in-
terval to the frequency distribution, as in Table 2.17:
The open-ended interval in Table 2.17 allows us to present the information
more compactly and efficiently than listing all of the empty intervals between
“28 –29” and “46 – 47.” Note also that we could handle an extremely low score
by adding an open-ended interval as the lowest class interval (e.g., “17 and
younger”). There is a small price to pay for this efficiency (there is no informa-
tion in Table 2.17 about the value of the scores included in the open-ended in-
terval), so this technique should not be used indiscriminately.

Intervals of Unequal Size. On some variables, most scores are tightly clustered
together but others are strewn across a broad range of scores. Consider, as an
example, the distribution of income in the United States. In 2005, most house-
holds (a little more than 50%) reported annual incomes between $20,000 and
$75,000 and a sizeable grouping (about 20%) earned less than that. The problem
(from a statistical point of view) comes with more affluent households. Many
of these cases are in the $75,000 –$100,000 range but some have incomes in the
high six- or seven- (and even eight-) -figure range. The number of very wealthy
households is quite small, of course, but we must still account for these extreme
cases.
If we tried to use a frequency distribution with equal intervals of, say,
$10,000 to summarize this variable, we would need 30 or 40 or more intervals
to include all of the more affluent households, and many of our intervals in the
higher income ranges—those over $100,000 —would have few or zero cases. In
situations such as this, researchers often use intervals of unequal size to sum-
marize the variable more efficiently. To illustrate, Table 2.18 uses unequal inter-
vals to summarize the distribution of income in the United States.
Some of the intervals in Table 2.18 are $10,000 wide, others are $25,000,
$50,000 or $150,000 wide, and two (the lowest and highest intervals) are open
ended. Tables that use intervals of mixed widths might be a little confusing for
the reader, but the trade-off in compactness and efficiency can be considerable.
(For practice in constructing and interpreting frequency distributions for inter-
val-ratio level variables, see problems 2.5 to 2.9.)
40 PART I DESCRIPTIVE STATISTICS

TABLE 2.18 DISTRIBUTION OF INCOME BY HOUSEHOLD, UNITED STATES, 2005

Households Households
Income (Frequency) (Percent)

Less than $20,000 23,848,000 20.9


$20,000 to $29,999 13,642,000 11.9
$30,000 to $39,999 12,388,000 10.8
$40,000 to $49,999 11,028,000 9.6
$50,000 to $74,999 21,031,000 18.4
$75,000 to $99,999 12,734,000 11.1
$100,000 to $149,999 12,132,000 10.6
$150,000 to $199,999 4,031,000 3.5
$200,000 to $249,000 1,529,000 1.3
$250,000 and above 2,023,000 1.8
114,386,000 99.9%

Source: U.S. Census Bureau, http://pubdb3.census.gov/macro/032006/hhinc/new06_000.htm

2.6 CONSTRUCTING We covered a lot of ground in the preceding section, so let’s pause and review
FREQUENCY these principles by considering a specific research situation. The following data
DISTRIBUTIONS represent the numbers of visits received over the past year by 90 residents of a
FOR INTERVAL-RATIO-
retirement community.
LEVEL VARIABLES:
A REVIEW 0 52 21 20 21 24 1 12 16 12
16 50 40 28 36 12 47 1 20 7
9 26 46 52 27 10 3 0 24 50
24 19 22 26 26 50 23 12 22 26
23 51 18 22 17 24 17 8 28 52
20 50 25 50 18 52 46 47 27 0
32 0 24 12 0 35 48 50 27 12
28 20 30 0 16 49 42 6 28 2
16 24 33 12 15 23 18 6 16 50
Listed in this format, the data are a hopeless jumble from which no one
could derive much meaning. The function of the frequency distribution is to
arrange and organize these data so that their meanings will be made obvious.
First, we must decide how many class intervals to use in the frequency dis-
tribution. Following the guidelines presented in the One Step at a Time: Con-
structing Frequency Distributions for Interval-Ratio Variables box, let’s use about
10 intervals (k  10). By inspecting the data, we can see that the lowest score is
0 and the highest is 52. The range of these scores (R) is 52  0, or 52. To find the
approximate interval size (i ), divide the range (52) by the number of intervals
(10). Since 52/10 = 5.2, we can set the interval size at 5.
The lowest score is 0, so the lowest class interval will be 0 – 4. The highest
class interval will be 50 –54, which will include the high score of 52. All that re-
mains is to state the intervals in table format, count the number of scores that fall
into each interval, and report the totals in a frequency column. These steps have
been taken in Table 2.19, which also includes columns for the percentages and
cumulative percentages. Note that this table is the product of several relatively ar-
bitrary decisions. The researcher should remain aware of this fact and inspect the
frequency distribution carefully. If the table is unsatisfactory for any reason, it can
be reconstructed with a different number of categories and interval sizes.
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 41

ONE STEP AT A TIME Finding Frequency Distributions for Interval-Ratio Variables

Step 1: Decide how many class intervals (k) you wish define the class intervals so that each case can be
to use. One reasonable convention suggests that the sorted into one and only one category.
number of intervals should be about 10. Many research
Step 6: Count the number of cases in each class in-
situations may require fewer than 10 intervals (k  10),
terval, and report these subtotals in a column labeled
and it is common to find frequency distributions with as
“Frequency.” Report the total number of cases (N ) at
many as 15 intervals. Only rarely will more than 15 in-
the bottom of this column. The table may also include
tervals be used, since the resultant frequency distribu-
a column for percentages, cumulative frequencies,
tion would be too large for easy comprehension.
and cumulative percentages.
Step 2: Find the range (R) of the scores by subtract-
Step 7: Inspect the frequency distribution carefully.
ing the low score from the high score.
Has too much detail been lost? If so, reconstruct the
Step 3: Find the size of the class intervals (i ) by di- table with a greater number of class intervals (or
viding R (from step 2) by k (from step 1): smaller interval size). Is the table too detailed? If so, re-
construct the table with fewer class intervals (or use
i  R /k
wider intervals). Are there too many intervals with no
Round the value of i to a convenient whole number. cases in them? If so, consider using open-ended inter-
This will be the interval size or width. vals or intervals of unequal size. Remember that the
frequency distribution results from a number of deci-
Step 4: State the lowest interval so that its lower limit sions you make in a rather arbitrary manner. If the ap-
is equal to or below the lowest score. By the same to- pearance of the table seems less than optimal given
ken, your highest interval will be the one that contains the purpose of the research, redo the table until you are
the highest score. Generally, intervals should be equal satisfied that you have struck the best balance be-
in size, but unequal and open-ended intervals may be tween detail and conciseness.
used when convenient.
Step 8: Give your table a clear, concise title, and num-
Step 5: State the limits of the class intervals at the ber the table if your report contains more than one. All
same level of precision as you have used to measure categories and columns must also be clearly labeled.
the data. Do not overlap intervals. You will thereby

TABLE 2.19 NUMBER OF VISITS PER YEAR, 90 RETIREMENT COMMUNITY RESIDENTS

Frequency Cumulative Percentage Cumulative


Class Intervals (f ) Frequency (%) Percentage

0– 4 10 10 11.11% 11.11
5 –9 5 15 5.56% 16.67
10 –14 8 23 8.89% 25.26
15 –19 12 35 13.33% 38.89
20 –24 18 53 20.00% 58.89
25 –29 12 65 13.33% 72.22
30 –34 3 68 3.33% 75.55
35 –39 2 70 2.22% 77.77
40 – 44 2 72 2.22% 79.99
45 – 49 6 78 6.67% 86.66
50 –54 12 90 13.33% 99.99
N  90 99.99%*

*Percentage columns will occasionally fail to total to 100% because of rounding error. If the total is be-
tween 99.90% and 100.10%, ignore the discrepancy. Discrepancies of greater than 0.10% may indi-
cate mathematical errors, and the entire column should be computed again.
42 PART I DESCRIPTIVE STATISTICS

Now, with the aid of the frequency distribution, some patterns in the data
can be discerned. There are three distinct groupings of scores in the table. Ten
residents were visited rarely, if at all (the 0 – 4 visits per year interval). The single
largest interval, with 18 cases, is 20 –24. Combined with the intervals immedi-
ately above and below, this represents quite a sizeable grouping of cases (42 out
of 90, or 46.66% of all cases) and suggests that the dominant visiting rate is about
twice a month, or approximately 24 visits per year. The third grouping, in the
50 –54 class interval (12 cases), reflects a visiting rate of about once a week. The
cumulative percentage column indicates that the majority of the residents
(58.89%) were visited 24 or fewer times a year.

2.7 CHARTS AND GRAPHS Researchers frequently use charts and graphs to present their data in ways that
are visually more dramatic than frequency distributions. These devices are par-
ticularly useful for conveying an impression of the overall shape of a distribution
and for highlighting any clustering of cases in a particular range of scores. Many
graphing techniques are available, but we will examine just four. The first two,
pie and bar charts, are appropriate for discrete variables at any level of measure-
ment. The last two, histograms and line charts (or frequency polygons), are used
with both discrete and continuous interval-ratio variables but are particularly ap-
propriate for the latter.
The sections that follow explain how to construct graphs and charts “by
hand.” These days, however, computer programs are almost always used to pro-
duce graphic displays. Graphing software is sophisticated and flexible but also
relatively easy to use; if such programs are available to you, you should familiar-
ize yourself with them. The effort required to learn these programs will be repaid
in the quality of the final product. The SPSS for Windows section at the end of this
chapter includes a demonstration of how to produce bar charts and line charts.

Pie Charts. To construct a pie chart, begin by computing the percentage of


all cases that fall into each category of the variable. Then divide a circle (the pie)
into segments (slices) proportional to the percentage distribution. Be sure that
the chart and all segments are clearly labeled.
Figure 2.1 is a pie chart that displays the distribution of “marital status” from
the counseling center survey. The frequency distribution (Table 2.7) is reproduced
as Table 2.20, with a column added for the percentage distribution. Since a circle’s
circumference is 360°, we will apportion 180° (or 50%) for the first category, 126°
(35%) for the second, and 54° (15%) for the last category. The pie chart visually re-
inforces the relative preponderance of single respondents and the relative absence
of divorced students in the counseling center survey.

Bar Charts. Like pie charts, bar charts are relatively straightforward. Con-
ventionally, the categories of the variable are arrayed along the horizontal axis
(or abscissa) and frequencies, or percentages if you prefer, along the vertical axis
(or ordinate). For each category of the variable, construct (or draw) a rectangle
of constant width and with a height that corresponds to the number of cases in
the category. The bar chart in Figure 2.2 reproduces the marital status data from
Figure 2.1 and Table 2.20.
The chart in Figure 2.2 would be interpreted in exactly the same way as the
pie chart in Figure 2.1, and researchers are free to choose between these two
methods of displaying data. However, if a variable has more than four or five
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 43

FIGURE 2.1 SAMPLE PIE CHART: MARITAL TABLE 2.20 MARITAL STATUS OF RESPONDENTS,
STATUS OF RESPONDENTS COUNSELING CENTER SURVEY
(N = 20)

Frequency Percentage
Status (f ) (%)
Single
50% Single 10 50
Divorced Married 7 35
15% Divorced 3 15
N  20 100%
Married
35%

FIGURE 2.2 SAMPLE BAR CHART: MARITAL STATUS OF RESPONDENTS (N  20)

12

10

8
Frequency

0
Single Married Divorced
Marital status

categories, the bar chart would be preferred. With too many categories, the pie
chart gets very crowded and loses its visual clarity. To illustrate, Figure 2.3 uses
a bar chart to display the data on visiting rates for the retirement community pre-
sented in Table 2.19. A pie chart for this same data would have had 11 different
“slices,” a more complex or “busier” picture than that presented by the bar chart.
In Figure 2.3, the clustering of scores in the “20 to 24” range (approximately two
visits a month) is readily apparent, as are the groupings in the “0 to 4” and “50
to 54” ranges.
Bar charts are particularly effective ways to display the relative frequencies
for two or more categories of a variable when you want to emphasize some com-
parisons. Suppose, for example, that you wished to make a point about changing
rates of homicide victimization for white males and females since 1955. Figure 2.4
displays the data in a dramatic and easily comprehended way. The bar chart
shows that rates for males are higher than rates for females, that rates for both
sexes were highest in 1975, and that rates declined after that time. (For practice
in constructing and interpreting pie and bar charts, see problems 2.5b and 2.10.)

Histograms. Histograms look a lot like bar charts and, in fact, are con-
structed in much the same way. However, histograms use real limits rather than
stated limits, and the categories or scores of the variable border each other, as
if they merged into each other in a continuous series. Therefore, these graphs
are most appropriate for continuous interval-ratio-level variables, but they are
44 PART I DESCRIPTIVE STATISTICS

FIGURE 2.3 SAMPLE BAR CHART FOR VISITS PER YEAR, RETIREMENT COMMUNITY RESIDENTS (N  90)

20
18
16
14

Frequency
12
10
8
6
4
2
0
4

14

19

24

29

34

39

44

49

54
0

5


10

15

20

25

30

35

40

45

50
Number of visits

FIGURE 2.4 HOMICIDE VICTIMIZATION RATES, 1955 –2003 (selected rates, per 100,000 population, whites only)

14
Rate per 100,000 population

12
10
8

6
4

0
1955 1965 1975 1985 1995 2000 2003
Year

Males Females

Source: U.S. Bureau of the Census. 2007. Statistical Abstract of the United States, 2007. Washington,
D.C.: Government Printing Office. p. 195 (Available at: http://www.census.gov/prod/2006pubs/
07statab/ law.pdf)

commonly used for discrete interval-ratio-level variables as well. To construct a


histogram from a frequency distribution, follow these steps.
1. Array the real limits of the class intervals or scores along the horizontal axis
(abscissa).
2. Array frequencies along the vertical axis (ordinate).
3. For each category in the frequency distribution, construct a bar with height
corresponding to the number of cases in the category and with width corre-
sponding to the real limits of the class intervals.
4. Label each axis of the graph.
5. Title the graph.
As an example, Figure 2.5 uses a histogram to display the distribution of ages
for a sample of respondents to a national public-opinion poll. The bars in the
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 45

FIGURE 2.5. AGE OF RESPONDENTS, 2006 GENERAL SOCIAL SURVEY

250

200

150
Frequency

100

50

0
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Age of respondent

graph are 5 years wide, and their uneven heights reflect the varying number of
respondents for each 5-year group. The graph peaks around age 50, and the
sample has more respondents (higher bars) who are younger than 50 and fewer
respondents (lower bars) older than 50. Note also that there are no people in the
sample younger than age 18, the usual cutoff point for respondents to public-
opinion polls.
Line Charts. Construction of a line chart (or frequency polygon) is similar
to construction of a histogram. Instead of using bars to represent the frequencies,
however, use a dot at the midpoint of each interval. Straight lines then connect
the dots. Because the line is continuous from highest to lowest score, these graphs
are especially appropriate for continuous interval-ratio-level variables but are
frequently used with discrete interval-ratio-level variables. Figure 2.6 displays a
line chart for the visiting data previously displayed in the bar chart in Figure 2.3.
Line charts can also be used to display trends across time. Figure 2.7 shows
both marriage and divorce rates per 1000 population for the United States since
1950. Note that both rates rose until the early 1980s and have been falling since,
with the marriage rate falling slightly faster.
Histograms and frequency polygons are alternative ways of displaying es-
sentially the same message. Thus, the choice between the two techniques is
left to the aesthetic pleasures of the researcher. (For practice in constructing
46 PART I DESCRIPTIVE STATISTICS

FIGURE 2.6 NUMBER OF VISITS PER YEAR, RETIREMENT COMMUNITY RESIDENTS (N  90)

21
18
16
14

Frequency
12
10
8
6
4
2
0
4

14

19

24

29

34

39

44

49

54
0

5


10

15

20

25

30

35

40

45

50
Number of visits

FIGURE 2.7 U.S. MARRIAGE AND DIVORCE RATES, 1950 –2004 (rates per 1000 population)

12

10

8
Rate

0
1950
1952
1954
1956
1958
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
Year

Marriage Divorce

Source: U.S. Bureau of the Census. 2007. Statistical Abstract of the United States, 2007. Washington,
D.C.: Government Printing Office. p. 63. (Available at: http://www.census.gov/prod/2006pubs/
07statab/ law.pdf)

and interpreting histograms and line charts, see problems 2.7b, 2.8d, 2.9d, 2.11,
and 2.12.)

2.8 INTERPRETING A sizeable volume of statistical material has been introduced in this chapter, and
STATISTICS: USING it will be useful to conclude by focusing on meaning and interpretation. What can
PERCENTAGES, FRE- you say after you have calculated percentages, built a frequency distribution, or
QUENCY DISTRIBUTIONS, constructed a graph or chart? Remember that statistics are tools to help us analyze
CHARTS, AND GRAPHS information and answer questions. They never speak for themselves and they al-
TO ANALYZE CHANGING ways have to be understood in the context of some research question or test of
PATTERNS OF WORK- hypothesis. This section provides an example of interpretation by posing and an-
PLACE SURVEILLANCE swering some questions from social science research. The interpretation (words)
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 47

will be explicitly linked to the statistics (numbers) so that you will be able to see
how and why conclusions are developed.

Your New Job and Workplace Surveillance. Congratulations! You have


just landed a job with a major U.S. corporation and you now find yourself in the
middle of the lunch hour in your cubicle. Should you log on to the Internet and
spend a few minutes with your favorite role-playing game? Should you check
your personal email or contact your friends about plans for the weekend? Before
making a decision, consider a series of reports issued by the American Manage-
ment Association (AMA). These reports suggest that the chances are growing that
you may be the subject of workplace surveillance and that your email, telephone,
and, more recently, your Internet use may be monitored by your employer.

Monitoring and Surveillance in 2005. Although the computer has be-


come an important component of the workday for more and more people, your
employer may feel compelled to monitor its use. Table 2.21 reports the percent-
age of companies that indicated they practiced a specific form of monitoring and
surveillance.
Graphs are almost always a more effective method of presenting this type of
information. The variable (type of monitoring and surveillance) is nominal level
(the “types” are different from each other but do not form a scale), and, with nine
possible scores, a bar chart would be preferred to a pie chart. Figure 2.8 shows
that monitoring Internet connections was the most common form of surveillance,
with about 75% of the companies practicing it. Storage and review of email mes-
sages (about 55%) and telephone monitoring (about 51%) and were also very
common. This graph clearly indicates that it would be unwise to surf the net on
company time or to use the phone or email for personal business.

Monitoring and Surveillance Over Time. What changes occurred in


specific workplace monitoring practices between 1997 and 2005? Table 2.22 dis-
plays trends for five types of monitoring and surveillance. In 1997, the levels of
monitoring were relatively low. Only about one-third of companies monitored
telephone use, and no more than 16% practiced the other forms of surveillance.
The levels rise quite dramatically over the period, and by 2005 three different
forms of surveillance were being practiced by half or more of the companies.
Once again, these trends and patterns would be more clearly presented and
appreciated in the form of a graph. Figure 2.9 presents the information in Table
2.22 in the form of a line graph. The graph clearly displays the general increase

TABLE 2.21 MONITORING AND SURVEILLANCE, 2005

Type of Monitoring and Surveillance % Yes

Monitoring Internet connections 76%


Storage and review of e-mail messages 55%
Telephone use (time spent, numbers called) 51%
Video surveillance for security purposes 51%
Storage and review of computer files 50%
Computer use (time logged on, keystroke counts, etc.) 36%
Recording and review of telephone conversations 22%
Video recording of employee job performance 16%
Storage and review of voice mail messages 15%
48 PART I DESCRIPTIVE STATISTICS

FIGURE 2.8 MONITORING AND SURVEILLANCE OF EMPLOYEES, 2005

80
70
60

Percent “yes”
50
40
30
20
10
0 e

se

ns

ce

l
ai

ai
rit

ile
us

us
m

m
ru

tio

an
cu

rf
E-
et

sa

e
te

rm
te
on

se

ic
rn

pu

er
pu

fo

vo
Ph
te

eo

nv
om

er
om
In

of
d

co

-p
Vi

C
C

ew
eo
e
on

vi
d
Vi

Re
Ph
Method

FIGURE 2.9 MONITORING AND SURVEILLANCE, 1997–2005

60

50

40
Percent “yes”

30

20

10

0
1997 1999 2001 2003 2005
Year

Phone Email Computer files


Computer use Voice mail

TABLE 2.22 MONITORING AND SURVEILLANCE, 1997–2005

Percent “Yes”

Type of Monitoring and Surveillance 1997 1998 1999 2000 2001 2005

Storage and review of voice mail messages 5 5 6 7 8 15


Storage and review of computer files 14 20 21 31 36 50
Storage and review of email messages 15 20 27 38 47 55
Telephone use (time spent, numbers called) 34 40 39 44 43 51
Computer use (time logged on, keystroke counts, etc.) 16 16 15 19 19 36
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 49

READING STATISTICS 2: Percentages, Rates, Tables, and Graphs

You will find that the statistics covered in this chapter tion, birthrates and death rates, residential patterns,
are frequently used in the research literature of the educational levels, and a host of other variables. Cen-
social sciences— as well as in the popular press sus data is readily available (at www.census.gov), but
and the media— and one of the goals of this text is since they represent information about the entire
to help you develop your skills in understanding and population (almost 290 million people), the numbers
critically analyzing these types of statistical informa- are often large, cumbersome, and awkward to use or
tion. Fortunately, this task is usually quite straightfor- understand. Thus, percentages, rates, and graphs
ward, but these statistical tools are sometimes not as are extremely useful statistical devices when analyz-
simple as they appear and they can be misused. ing or presenting census information.
Here are some ideas to keep in mind when reading Consider, for example, a recent report on the
research reports that use these statistics. changing U.S. family.* The purpose of the report was
First, there are many different formats for present- to present information regarding the structure of the
ing results, and the tables and graphs you find in the American family and to present and discuss recent
research literature will not necessarily follow the con- changes and trends. Consider how this report might
ventions used in this text. Second, because of space have read if the information had been given in words
limitations, tables and graphs may be presented with and raw numbers:
a minimum of detail. For example, the researcher
In 2003, there were about 57,320,000 million
may present a frequency distribution with only a per-
married-couple households, 13,620,000 million
centage column.
female-headed households, and 35,682,000
Begin your analysis by examining the statistics
million nonfamily households. Ten years earlier, in
carefully. If you are reading a table or graph, first
1993, there were 52,457,000 married-couple
read the title, all labels (that is, row and/or column
households, 11,692,000 female-headed house-
headings), and any footnotes. These will tell you ex-
holds, and 28,496,000 nonfamily households.
actly what information is being presented. Inspect
the body of the table or graph with the author’s anal- Can you distill any meaningful understandings about
ysis in mind. See if you agree with the author’s analy- American family life from these sentences? Raw in-
sis. (You almost always will, but it never hurts to formation simply does not speak for itself, and these
double-check and exercise your critical abilities.) facts have to be organized or placed in some con-
Finally, remember that most research projects text to reveal their meaning. Thus, social scientists
analyze interrelationships among many variables. almost always use percentages, rates, or graphs to
Because the tables and graphs covered in this chap- present this kind of info so that they can understand
ter display variables one at a time, they are unlikely to it themselves, assess the meaning, and convey their
be included in such research reports (or perhaps, interpretations to others.
included only as background information). Even In contrast with the foregoing raw information,
when not reported, you can be sure that the research consider the following table on family trends using
began with an inspection of percentages, frequency percentages rather than raw numbers.
distributions, or graphs for each variable. Univariate
tables and graphs display a great deal of information U.S. Households by Type, 1970 and 2003
about the variables in a compact, easily understood Percent of All
format and are almost universally used as descriptive Households
devices.
1970 2003
STATISTICS IN THE PROFESSIONAL LITERATURE Family households:
Social scientists rely heavily on the U.S. census for Married couples with
information about the characteristics and trends of children 40.3% 23.3%
change in American society, including age composi- (continued next page)
50 PART I DESCRIPTIVE STATISTICS

READING STATISTICS 2: (continued)

Married couples without A quick comparison of the two years reveals a


children 30.3% 28.2% dramatic decrease in the percentage of American
Other 10.6% 16.4% households consisting of married couples living with
Nonfamily households: children and an increase in the percentages of men
Women living alone 5.6% 11.2% and women living alone. What other trends can you
Men living alone 11.5% 15.2% see in the table?
Other 1.7% 5.6%
100.0% 99.9%

MARTIAL STATUS OF U.S. POPULATION, 1970 –2003 (15 years of age or older, male)

70

60

50

40
Percent

30

20

10

0
1970 1980 1990 2000
Year
Married Never Married Divorced/Seperated Widowed

MARTIAL STATUS OF U.S. POPULATION, 1970 –2003 (15 years of age or older, female)

70

60

50

40
Percent

30

20

10

0
1970 1980 1990 2000
Year
Married Never Married Divorced/Seperated Widowed
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 51

READING STATISTICS 2: (continued)

As we have seen, graphs are almost always more couples with children. Note also that the percentage
efficient and understandable ways of expressing of men and women who “never married” is increas-
trends. Some of the fundamental changes in Ameri- ing steadily. The author of the report attributes these
can family life are presented in the two line charts on changes to several factors, including people waiting
page 50, one for men and one for women. As you longer to get married and a high divorce rate.
would expect, the graphs show essentially the same
*Fields, Jason. 2004. “America’s Families and Living Arrange-
trends and, together with the frequency distribution, ments: 2003.” U.S. Bureau of the Census: Current Population
their message is pretty clear: The percentage of the Reports. Available at http://www.census.gov/prod/2004pubs/
population living in “married-couple households” is p20-553.pdf
declining, and this is particularly true for married

in monitoring and shows that monitoring of email messages, phone use, and
computer files have become particularly common.
Computers may be ubiquitous features of employment in our information-
age economy, but these data suggest that the potential for workplace monitor-
ing and surveillance is also increasing. The computer on your desk is a double-
edged sword. While it provides you with the necessary tools to do your job,
employers are increasingly using the very same technology to watch you.

SUMMARY

1. We considered several different ways of summariz- nomenon compared with the number of possible
ing the distribution of a single variable and, more occurrences per some unit of time. Percentage
generally, reporting the results of our research. Our change shows the relative increase or decrease in a
emphasis throughout was on the need to communi- variable over time.
cate our results clearly and concisely. You will often 3. Frequency distributions are tables that summarize
find that, as you strive to communicate statistical in- the entire distribution of some variable. Statistical
formation to others, the meanings of the information analysis almost always starts with the construction
will become clearer to you as well. and review of these tables for each variable. Col-
2. Percentages and proportions, ratios, rates, and per- umns for percentages, cumulative frequency, and/
centage change represent several different ways to or cumulative percentages often enhance the read-
enhance clarity by expressing our results in terms ability of frequency distributions.
of relative frequency. Percentages and proportions 4. Pie and bar charts, histograms, and line charts (or
report the relative occurrence of some category of a frequency polygons) are graphic devices used to
variable compared with the distribution as a whole. express the basic information contained in the
Ratios compare two categories with each other, and frequency distribution in a compact and visually
rates report the actual occurrences of some phe- dramatic way.

SUMMARY OF FORMULAS

f f1
Proportions 2.1 p Ratios 2.3 Ratio 
N f2
f f2  f1
Percentage 2.2 % a b  100 Percent change 2.4 Percent change  a b  100
N f1
52 PART I DESCRIPTIVE STATISTICS

GLOSSARY

Bar chart. A graphic display device for discrete vari- Midpoint. The point exactly halfway between the up-
ables. Categories are represented by bars of equal per and lower limits of a class interval.
width, the height of each corresponding to the num- Percentage. The number of cases in a category of a
ber (or percentage) of cases in the category. variable divided by the number of cases in all cate-
Class intervals. The categories used in the frequency gories of the variable, the entire quantity multiplied
distributions for interval-ratio variables. by 100.
Cumulative frequency. An optional column in a fre- Percent change. A statistic that expresses the
quency distribution that displays the number of magnitude of change in a variable from time 1 to
cases within an interval and all preceding intervals. time 2.
Cumulative percentage. An optional column in a fre- Pie chart. A graphic display device especially for dis-
quency distribution that displays the percentage of crete variables with only a few categories. A circle
cases within an interval and all preceding intervals. (the pie) is divided into segments proportional in
Frequency distribution. A table that displays the size to the percentage of cases in each category of
number of cases in each category of a variable. the variable.
Frequency polygon. A graphic display device for in- Proportion. The number of cases in one category of
terval-ratio variables. Class intervals are represented a variable divided by the number of cases in all cat-
by dots placed over the midpoints, the height of egories of the variable.
each corresponding to the number (or percentage) Rate. The number of actual occurrences of some phe-
of cases in the interval. All dots are connected by nomenon or trait divided by the number of pos-
straight lines. Same as a line chart. sible occurrences per some unit of time.
Histogram. A graphic display device for interval-ratio Ratio. The number of cases in one category divided
variables. Class intervals are represented by con- by the number of cases in some other category.
tiguous bars of equal width (equal to the class lim- Real class limits. The class intervals of a frequency
its), the height of each corresponding to the number distribution when stated as continuous categories.
(or percentage) of cases in the interval. Stated class limits. The class intervals of a frequency
Line chart. See Frequency polygon. distribution when stated as discrete categories.

PROBLEMS

2.1 SOC The tables that follow report the marital sta- a. What percentage of the respondents in each
tus of 20 respondents in two different apartment complex are married?
complexes. (HINT: Make sure that you have the b. What is the ratio of single to married respon-
correct numbers in the numerator and denomina- dents at each complex?
tor before solving the following problems. For ex- c. What proportion of each sample are widowed?
ample, problem 2.1a asks for “the percentage of re- d. What percentage of the single respondents live
spondents who are married in each complex,” and in Complex B?
the denominators will be 20 for these two fractions. e. What is the ratio of the “unmarried/living to-
Problem 2.1d, on the other hand, asks for the “per- gether” to the “married” at each complex?
centage of the single respondents who live in Com-
plex B,” and the denominator for this fraction will
2.2 At St. Algebra College, the numbers of males and
be 4  6, or 10.)
females in the various major fields of study are as
Status Complex A Complex B follows:
Married 5 10
Unmarried (“living together”) 8 2 Major Males Females Totals
Single 4 6
Separated 2 1 Humanities 117 83 200
Widowed 0 1 Social sciences 97 132 229
Divorced 1 0 Natural sciences 72 20 92
20 20 (continued next page)
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 53

(continued ) d. What percentage of the total student body are


males?
Major Males Females Totals
e. What is the ratio of males to females for the
Business 156 139 295 entire sample?
Nursing 3 35 38 f. What proportion of the nursing majors are male?
Education 30 15 45
g. What percentage of the sample are social sci-
Totals 475 424 899 ence majors?
Read each of the following problems carefully before h. What is the ratio of humanities majors to busi-
constructing the fraction and solving for the answer. ness majors?
(HINT: Be sure you place the proper number in the de- i. What is the ratio of female business majors to
nominator of the fractions. For example, some prob- female nursing majors?
lems use the total number of males or females as the de- j. What proportion of the males are education
nominator, whereas others use the total number of majors?
majors) 2.3 CJ The town of Shinbone, Kansas, has a popula-
a. What percentage of social science majors are tion of 211,732 and experienced 47 bank rob-
male? beries, 13 murders, and 23 auto thefts during the
b. What proportion of business majors are past year. Compute a rate for each type of crime
female? per 100,000 population. (HINT: Make sure that
c. For the humanities, what is the ratio of males you set up the fraction with size of population in
to females? the denominator)
2.4 CJ The numbers of homicides in five states and
five Canadian provinces for the years 1997 and
2005 were as follows:
1997 2005

State Homicides Population Homicides Population

New Jersey 338 8,053,000 417 8,717,925


Iowa 52 2,852,000 38 2,966,334
Alabama 426 4,139,000 374 4,557,808
Texas 1327 19,439,000 1407 22,859,968
California 2579 32,268,000 2503 36,132,147

1997 2005

Province Homicides Population Homicides Population

Nova Scotia 24 936,100 20 936,100


Quebec 132 7,323,600 100 7,597,800
Ontario 178 11,387,400 218 12,558,700
Manitoba 31 1,137,900 49 1,174,100
British Columbia 116 3,997,100 98 4,257,800

Source: Statistics Canada, http://www.statcan.ca.

a. Calculate the homicide rate per 100,000 popu- crease? Which society seems to have the largest
lation for each state and each province for change in homicide rates? Summarize your re-
each year. Relatively speaking, which state and sults in a paragraph
which province had the highest homicide rates
in each year? Which society seems to have the 2.5 SOC The scores of 15 respondents on four vari-
higher homicide rate? Write a paragraph de- ables are as reported next. These scores were taken
scribing these results. from a public opinion survey called the General
b. Using the rates you calculated in part a, calcu- Social Survey, or the GSS. This data set is used for
late the percent change between 1997 and 2005 the computer exercises in this text. Small subsam-
for each state and each province. Which states ples from the GSS will be used throughout the text
and provinces had the largest increase and de- to provide “real” data for problems. For the actual
54 PART I DESCRIPTIVE STATISTICS

questions and other details, see Appendix G. The tervals to display these scores, the interval size will be 2.
numerical codes for the variables are as follows: Since there are no scores of 0 or 1 for either test, you
Support for Level of may state the first interval as 2 –3. To make compar-
Sex Gun Control Education Age isons easier, both frequency distributions should have
the same intervals)
Male 1  In favor 0  Less Actual
than HS years 2.7 SOC Sixteen high school students completed a
Female 2  Opposed 1  HS class to prepare them for the College Boards.
2  Jr. college
3  Bachelor’s Their scores were as follows.
4  Graduate 420 345 560 650
Case Support for Level of 459 499 500 657
Number Sex Gun Control Education Age 467 480 505 555
480 520 530 589
1 2 1 1 45
2 1 2 1 48 These same 16 students were given a test of math and
3 2 1 3 55 verbal ability to measure their readiness for college-
4 1 1 2 32 level work. Scores are reported here in terms of the
5 2 1 3 33
6 1 1 1 28 percentage of correct answers for each test.
7 2 2 0 77 Math Test
8 1 1 1 50
9 1 2 0 43 67 45 68 70
10 2 1 1 48 72 85 90 99
11 1 1 4 33 50 73 77 78
12 1 1 4 35 52 66 89 75
13 1 1 0 39 Verbal Test
14 2 1 1 25
15 1 1 1 23 89 90 78 77
75 70 56 60
a. Construct a frequency distribution for each 77 78 80 92
variable. Include a column for percentages. 98 72 77 82
b. Construct pie and bar charts to display the
a. Display each of these variables in a frequency
distributions of sex, support for gun control,
distribution with columns for percentages and
and level of education.
cumulative percentages.
2.6 SW A local youth service agency has begun a sex b. Construct a histogram and frequency polygon
education program for teenage girls who have for these data.
been referred by the juvenile courts. The girls were c.2 Find the upper and lower real limits for the
given a 20-item test for general knowledge about intervals you established.
sex, contraception, and anatomy and physiology
2.8 GER Following are reported the number of times
upon admission to the program and again after
25 residents of a community for senior citizens left
completing the program. The scores of the first 15
their homes for any reason during the past week.
girls to complete the program are as follows.
0 2 1 7 3
Case Pretest Posttest Case Pretest Posttest 7 0 2 3 17
14 15 5 0 7
A 8 12 I 5 7
B 7 13 J 15 12 5 21 4 7 6
C 10 12 K 13 20 2 0 10 5 7
D 15 19 L 4 5
a. Construct a frequency distribution to display
E 10 8 M 10 15
F 10 17 N 8 11 these data.
G 3 12 O 12 20 b. What are the midpoints of the class intervals?
H 10 11 c. Add columns to the table to display the per-
centage distribution, cumulative frequency,
Construct frequency distributions for the pretest and and cumulative percentages.
posttest scores. Include a column for percentages.
(HINT: There were 20 items on the test, so the maxi-
mum range for these scores is 20. If you use 10 class in- 2 This problem is optional.
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 55

d.3 Find the real limits for the intervals you following data on police response time to calls for
selected. assistance during two different years. (Response
e. Construct a histogram and a frequency poly- times were rounded off to whole minutes.) Con-
gon to display this distribution. vert both frequency distributions into percent-
f. Write a paragraph summarizing this distribu- ages, and construct pie charts and bar charts to
tion of scores. display the data. Write a paragraph comparing the
2.9 SOC Twenty-five students completed a question- changes in response time between the two years.
naire that measured their attitudes toward inter- Response Frequency
personal violence. Respondents who scored high Time, 1995 (f )
believed that in many situations a person could le-
21 minutes or more 35
gitimately use physical force against another per- 16 –20 minutes 75
son. Respondents who scored low believed that in 11–15 minutes 180
no situation (or very few situations) could the use 6 –10 minutes 375
of violence be justified. Less than 6 minutes 210
52 47 17 8 92 875
53 23 28 9 90 Response Frequency
17 63 17 17 23 Time, 2005 (f )
19 66 10 20 47
21 minutes or more 45
20 66 5 25 17 16 –20 minutes 95
a. Construct a frequency distribution to display 11–15 minutes 155
these data. 6 –10 minutes 350
Less than 6 minutes 250
b. What are the midpoints of the class intervals?
c. Add columns to the table to display the per- 895
centage distribution, cumulative frequency,
and cumulative percentage. 2.11 SOC Figures 2.10 through 2.12 display trends in
d. Construct a histogram and a frequency poly- crime in the United States over the last two
gon to display these data. decades. Write a paragraph describing each of
e. Write a paragraph summarizing this distribu- these graphs. What similarities and differences
tion of scores. can you observe among the three graphs? (For ex-
ample, do crime rates always change in the same
2.10 PA/CJ As part of an evaluation of the efficiency direction?) Note the differences in the vertical
of your local police force, you have gathered the axes from chart to chart—for homicide the axis
ranges from 0 to 12, while for burglary and auto
3 This problem is optonal. theft the range is from 0 to 1600. The latter crimes

FIGURE 2.10 U.S. HOMICIDE RATES, 1984 –2005 (per 100,000 population)

12
Rate per 100,000 population

10

0
19 4
19 5
19 6
87

19 8
19 9
19 0
19 1
19 2
19 3
94

19 5
19 6
19 7
19 8
20 9
20 0
20 1
20 2
20 3
20 4
05
8
8
8

8
8
9
9
9
9

9
9
9
9
9
0
0
0
0
0
19

19

19

Year
56 PART I DESCRIPTIVE STATISTICS

FIGURE 2.11 U.S. ROBBERY AND AGGRAVATED ASSAULT RATES, 1984 –2005 (per 100,000 population)

500

450

400

Rate per 100,000 population


350

300

250

200

150

100

50

0
19 4
19 5
19 6
19 7
19 8
19 9
19 0
19 1
19 2
93

19 4
19 5
19 6
97

19 8
20 9
20 0
20 1
20 2
20 3
04
05
8
8
8
8
8
8
9
9
9

9
9
9

9
9
0
0
0
0
19

19

19

20
Year
Robbery Aggravated assault

FIGURE 2.12 U.S. BURGLARY AND CAR THEFT RATES, 1984 –2005 (per 100,000 population)

1600

1400

1200
Rate per 100,000 Population

1000

800

600

400

200

0
84

19 5
19 6
19 7
19 8
19 9
19 0
19 1
92

19 3
19 4
19 5
19 6
19 7
98

20 9
20 0
20 1
02

20 3
20 4
05
8
8
8
8
8
9
9

9
9
9
9
9

9
0
0

0
0
19
19

19

19

20

Year
Burglary Car theft

are far more common, and a scale with smaller in- dangerous stretch of highway. Early in the year,
tervals is needed to display the rates. the city lowered the speed limit on this highway
and increased police patrols. Data on the num-
2.12 PA The city’s Department of Transportation has ber of accidents before and after the changes are
been keeping track of accidents on a particularly presented here. Did the changes work? Is the
CHAPTER 2 BASIC DESCRIPTIVE STATISTICS 57

highway safer? Construct a line chart to display May 15 9


these two sets of data (use graphics software if June 17 10
July 24 11
available), and write a paragraph describing the August 28 15
changes. September 23 17
October 20 14
12 Months 12 Months November 21 18
Month Before After December 22 20
January 23 25
February 25 21
March 20 18
April 19 12

SPSS FOR WINDOWS

Using SPSS for Windows to Produce


Frequency Distributions and Graphs
Click the SPSS icon on your monitor screen to start SPSS for Windows. Load the 2006
GSS by clicking the file name on the first screen or by clicking file, Open, and Data
on the SPSS Data Editor screen. You may have to change the drive specification
to locate the 2006 GSS data supplied with this text (probably named GSS2006.sav).
Double-click the file name to open the data set. When you see the message “SPSS
Processor is Ready” on the bottom of the screen, you are ready to proceed.

SPSS DEMONSTRATION 2.1 Frequency Distributions


We produced and examined a frequency distribution for the variable sex in Appendix F.
Use the same procedures to produce frequency distributions for the variables age and
marital (marital status). From the menu bar, click Analyze. From the menu that drops
down, click Descriptive Statistics and Frequencies. The Frequencies window ap-
pears, with the variables listed in alphabetical order in the left-hand box. The window
may display variables by name (e.g. abany, abhlth) or by label (e.g., ABORTION IF
WOMAN WANTS FOR ANY REASON). If labels are displayed, you may switch to vari-
able names by clicking Edit, Options, and then making the appropriate selections on
the “General” tab. Depending on the version of SPSS you are using, these changes may
not take effect until you load a new data set or restart SPSS. See Appendix F and Table
F.2 for further information. The variable age (AGE OF RESPONDENT) will be visible.
Click on it to highlight it, and then click the arrow button in the middle of the screen to
move age to the right-hand window.
Find marital in the left-hand box by using the slider button or the arrow keys on the
right-hand border to scroll through the variable list. As an alternative, type “m”; the cur-
sor will move to the first variable name in the list that begins with that letter. Highlight mar-
ital and click the arrow button in the center of the screen to move the variable name to
the Variables box. There should now be two variable names in the box, age and mari-
tal. SPSS will process together all variables listed in the right-hand box. Click OK in the
upper-right-hand corner, and SPSS will rush off to create the frequency distributions you
requested.
The table will be in the Output window that will now be “closest” to you on the
screen. The tables, along with other information, will be in the right-hand box of the Out-
put window. To change the size of the output window, click the middle symbol (shaped
like either a square or two intersecting squares) in the upper-right-hand corner of the
Output window.

You might also like