06 Chapter 6
06 Chapter 6
PnoseBrlrrY Turonv
AND THE Nonlael
PnosaBrLrrY
DrsrnrBUTroN
154
Chapter 6 Probabilily Theory and the Normal Probability Disttiblttion 155
college. The laws of chance are tools for determining the degree of accuracy in
social science predictions. We refer to the analysis and understanding of chance oc-
currences as probability theory.
Probability Theory
The analysis and understanding of chance occurences.
Discovery of the laws of chance began in ancient times and perhaps was
siimulated as much by leisure activities such as gaming as by work activities
(David 1962: 4-10). Among the artifacts of Egypt's first dynasty (3500 e.c.) are
board games, playing pieces, and animal astralagl (joint bones), the precursors
of dice. In Egypt, cubical dice were in common use by 3000 B.c. Gaming was
so common in Roman times that it was prohibited on certain days. In Roman
literature there are references to a book by Claudius (10 B.c.-A.D.54) eniitled
How to Win at Dice. Astute gamblers had the statistical imagination. They
could think proportionately, recognizing that some "tosses of the bones" oc-
curred a greater proportion of the time ihan did others. By successfully advis-
ing members of the ruling classes on how to increase their gambling winnings,
these early statisticians gained high status. Successful stock market analysts
and survey researchers as well as horse race handicappers are the modern
equivalent of these highly respected statistical advisers.
We may surmise that human interest in predicting the outcomes of future
events was not Iimited to games of chance. As far as humans are concerned,
the forces of nature (especially climate) involve chance, and environmental
adaptation is wrought with fate and good or bad luck. Cultural evolution is
stimulated by a society's need to anticipate what will happen next. For exam-
ple, by the middle dynastic period (circa 2000 B.c.) the ancient Egyptians had
developed complex irrigation and canal systems to regulate the annual flood-
ing of the Nile River With daia managed by a highly efficient bureaucracy,
they monitored the river's depih with "nilometers" placed at strategic points
along the Nile's vast 4,145-mile length. By studying and anticipating flows,
they used floodwater advantageously for crop irrigation (David 1962). The ac-
curacy of their predictions produced a stable economy that enhanced the Polit-
ical power of the ruling dynasties. Many ancient cultures had their share of
empiricists: individuals who believed in the merits of observation and mea-
surement. Empiricism and the popularity of gambling, religion, and fortune-
telling attest to an innate human interest in betting on what will hapPen next
and ireparing for it. Everything from Predicting enemy trooP strength to de-
cidins whether to carry an umbrella, invest in stocks' or ProPose marriage re-
ori."! -"u.r."-ents and estimations of the likelihood of success or failure'
are
S'iuiirii.ut u"utvtit using probability theory is the tool by which Predictions
made with a marimum degree of accuracy'
Chapler 6 Prcbabilit! Theory afld fhe Notmal Probability Distribution 757
What Is a Probability?
A probability (p) is a specification of how frequently a particular eoent ol interest is
likely to occur oaer a large number of trials (situations in which the eaent can occur).
We call the probability of this interesting event occurring the probability of
success. Similarly, the probability of the event rol occurring is called the prob-
ability of failure. Brackets are used to distinguish the targeted event of interest,
and a lowercase p is used to indicate "the probability" of a specific calculation.
Note that this symbol is the same one used in previous chapters for proportion.
This is done because probabilities are proportions, as we will discuss shortly.
A probability (p)
A specification of how frequently a particular event of interest is likely to occur
over a large number of trials.
Computing a Probability
For a coin there are two possible outcomes, and heads is one of them. Thus:
, theadst = ,, , #
# of possible
li.udt 1= .sooo
outcomes= 2 """"
Example C: When randomly drawing a single marble from a box of 300 mar-
bles in which 100 are red and 200 are green:
# of reds inbox 100
- total # marbles in bo>. - 300 - ----
o:Ls'""I:i",b:t
b. p [qreen] ' =.total # marbles in box=ry
300
=.66o7
Since probabilities are proportions, their lower numerical limit is zero (the
event cannot happen) and their upper numerical limit is 1.00 (the event must
happen). In other words, probabilities always calculate between 0.00 and 1.00
(or 0 percent and 100 percent). If this is not the case, a mathematical mistake
has occurred.
Some events do have a zero probability of occurdng-they never occur
(e.g., remaining alive underwater for 24 hours without life-support devices).
Some events occur with a 100 percent probability-they always happen (e.g.,
the sun will rise tomorrow). Many events, however, are not so definite; their
probabilities of occurrence are somewhere between never and always.
ace. The addition rule for alternative events states that the probability ol altel-
natiae eaents is equal to the sum of the probabilities of the indioidual eoents. There-
fore,
p [king or ace] = p [kingl + p [aceJ
Do not make this complicated. The addition rule is iust a guide to help us
calculate a probability when there are several ways to gain success. In the case
of drawing an ace or a king, there are eight ways. (If you are not convinced,
count the aces and kings in a deck of cards.)
In later chapters we will use the symbol P (capitalized) to represent the
probability of success and Q to represent the probability of failure. (These
symbols probably evoked the old adage "Mind your p's and q's.") The addi-
tion rule leads to an important poini: The probability of success or failure must
be 1.00; ihai is, P + Q = 1. It follows from this that if we know 4 then Q can be
computed quickly. That is,
Q=1_P
Similarly,
P ="t-Q
For example, if P = p [king or ace], then
-l
Q = p [any card other than a king or ace] = - p = 1 - .1538 = .8462 (abofi 85Vo)
This answer is inco ect. If we take a deck of cards and count the "success
cards" (kings, queens, and hearts), we will find 19, not 21. This is the case be-
cause when adding the separate probabilities, we counted both the king of
hearts and the queen of hearts twice. By being a king and a heart, the king of
hearts is successful in two ways (put another way, the characteristics are not
mutually exclusive). Similarly, double success occurs for the queen of hearts.
When we have nn eoent that double courtts success or joins ht)o aspects of suc-
cess, we call this a ioint occurrence. (This is the same thing as a joint frequency
of occurrence of categories of the two variables in the cells of a cross-tabulation
table; see Chapter 2.) To compute the conect probability, we must subtract
every joint occurrence to eliminate these double counts. In this case, the queen
of hearts and the king of hearts each is a ioint occurence. Thus:
p [king or queen or heart]
)t ) lq
52 52 52 ----'
36!t4
---=--
Some events have two or more parts to them. We call these multiPle-patt euents
compound events (from chemistry, where a compound such as water is defined
as a substance composed of two or more elements, in this case hydrogen and
oxygen). For example, we may define success as drawing a pair of aces from
the deck, thai is, drawing an ace, putting it back in, reshuffling (i.e., randomiz-
ing), and then drawing an ace again. The multiplication rule for compound
events states lhat the probnbility of a compound eoent is equal to the multiple of the
probabilities of the separate parts of the ettent. Thts,
A simple trick to follow is to replace the word then (or and) with the multiPli-
cation sign, ..
Do not make this complicated. Mathematically, the multiplication rule
simply extracts the number of successes in the numerator of the fraction and
the ioial number of possible events in the denominator Accordingly, it turns
out that if we spent months drawing a card, replacing- it,.reshuffling'
drawing
the oulcomes, we would discover that there are
u a"aona card, and recording
that there
;.;;4 ;;t"ibl"'..-binations"of two cards And we would discover
;;";';;;1" of ace pairs' as is shown in Figure 6-l Thank
";mbinations
-F
Possible pairings
of aces wher
randomly drawing
a card , replacing if ,
and randomly
drawing a second
card fi----l
l*l
t.t
t
l*l
I l*l
L
l*l
t.t
E r l*l E
fi 'l rl-__l
rl rl
f^l
l-___tJ
rl--_l
t1J
hl
I rl
fr-__-l
r
-__iJ
G -----)
r.-__-l r.-__-] rl___-l
E EE u E lel
3
LJ tT
t
t [ ;l
I tI
I_-l
E r rE
a^-- ----\
c-'-_l a^--1 r-l
lel
L rl t.t l*l
I
lel
I rl rl
m
Lvl
goodness for mathematicians! They asiutely noticed that instead of having to
sort these combinations out piecemeal, we need only multiply the seParate
probabilities.
A simple coin-flipping exercise will further reinforce the simplicity of the
multiplication rule. Let us compute the probability of flipping a coin twice and
getting heads both times:
p [heads then hea4s] = p [headsl . p [heads]
= .5 . .5 = .2500 (or 1 out of 4)
As is shown in Figure 6-2, flipping two coins (or flipping a single coin twice)
results in four possible outcomes, and only one of those outcomes is heads
then heads.
To really grasp this, make your own similar chart for the probability of
getting all heads when tossing three coins. (Mathematically, the probabilities
of dichoto-ous events are calculated by expanding the binomial distribution
formula; see ChaPter 13.)
Possible outcomes
of tloo flipped coins
We stipulated that ihe first card drawn was to be returned to the deck before
the second card was drawn. This stipulation for calculating the probability of a
compound event is called "with replacement." If we had not returned the first
card, the calculation would have been done "without replacement" and the
computed probability would have been different:
p [ace then ace] without replacement = p [ace] ' p [ace]
4 72
=.0045
52 51 2,652
The probability of the first ace is the same with or without replacement be-
cause the event begins wiih 52 cards and four aces. But if the first card drawn
is an ace and it is not replaced, then for the second draw there are only 51 cards
in the deck and only three are aces. Close attention must be paid to issues of
replacement in comPound events. Numerators and denominators are adjusted
accordingly. For instance, let us compuie the following:
p [ace then king ihen ace] without replacement = p [36s] ' p [king] ' p [acel
4434R = = ooo+
=
si ' si ' do r:ioo
Finally, not all compound events involve issues of replacement For exam-
ole. replaiement is not an issue with coin tossing The calculated probabiliiies
tossing one
lre'thJ same for "heads then heads" in tossing two coins at once or
coin twice.
'"^^ must be consid-
itt" tir" *r", of probability are fundamental; that is' they
how simPle or com-
th'e probabiiity of any event' no matter
"r"a-h -*p"ting
r-
plicated that event is. The simple examples presented in this chapter illustrate
these basic principles. Much more complex formulations of probabilities are
presented in advanced texts such as that of Lee and Maykovich (1995). Fortu-
nately for students and scholars today, it is not necessary to have extensive
mathematical skills to compute probabilities. Computer software writers re-
quire us to learn only which buttons to push or what table to read to obtain
the answers to probability questions. A thorough understanding of basic prob-
ability theory, however, is necessary to avoid misinterpreting such comPuter
output. Moreover, an understanding of probability theory is essential for ac-
quiring the statistical imagination.
_ x-x
Zy = ')---!) = number of standard deviations (SD)
"x from the mean
We noted thai roughly 68 percent of the cases in a normally distributed popu-
lation have X-scores within 1 standard deviation distance to both sides of the
mean (i.e., between a Z-score of plus and minus 1). For example, suPPose we
have the following information where X = heiSht for a sample of men at a
health and fitness club:
Since this distribution is normal,let us draw the normal curve to get a sense of
proportion about how many of the men are how tall. Our basic knowledge of
ihe normat curve tells us that roughly 68 percent are between 66 inches and
72 inches, as noted on the curve (page rc$' Moreover, since the median is Io-
we know that ia'if the men are below 69 inches (five feet'
"r,"Jli
if'," -""., And since over 99 Pelcent of a normally
,.'i..." i.'"t'"s) and half are above.
1,64 The Statisticol hnagi atioll
distributed population falls within three Z-scores to both sides of the mean,
very few are shorter than 60 inches or taller than 78 inches
1 1
50 63 66 6972757aX
-3SD -2SD -1SD 0 +1SD +2SD +3SD Zx
Thus,
p lol X = 66 ro X = 721 = aPProximately 687o
In fact, with the help of a statistical table, we can comPute Z-scores and
use them to determine any area under the curve. This procedure is called par-
titioning areas under the normal curve, and we will do some Partitioning
shortly.
As it turns out, areas under the normal curve rePresent probabilities of oc-
currence. Notice that we use the symbol p to rePresent ProPortions a'd proba-
bilities. Probabilities are Proportions of time for which success occurs out of all
possible occurrences. Knowing the ProPortion of success Jor the Population as
i whole gives us the probability of success for a single subiect. In other words,
a specified area under the normal curve provides the probability of occutrence
of iny single score falling between any two scorevalues.
To illuitrate this connection, suppose we are hanging out at the health and
fitness club, killing time. To entertain ourselves, we play a whimsical game
called "guess the height." The rules of the game are such that when we hear
so*"onE approachin[ from around the cornet we Suess his height and then
ask him wiren he appears. If we are within 3 inches of the correct height, we
win.
How can we improve our chances of winning? We know thai ihe fitness
club members' heights are normally distributed around a mean of 69 inches
with a standard deiiation of 3 inches. This tells us that about 68 percent of the
men are between 66 and 72 inches tall. Let us think probabilistically; that is' let
us look at ihe long run. For every 100 men that approach,68 will fall in
the
"success" range:
These three interpretations are saying the same thing: About 6g percent of the
men are between 66 and,72 inches in height. Because of its prob;bilstic inter_
pretation, the normal curve often is referred to as a probabiliiy curve.
These distinctions_ also highlight an important point about the probabili_
.
ties of events. Although stated for a single type of ,,success,,, any proLability is
based on the entire distribution of all possible events. A singuiar event is as-
sessed relative to a larger set of occurrences. This type of proportional think-
ing is central to grasping the siatistical imagination.
discov-
Where do the numbers in ihis iable come from? Statisticians long ago
ereJ how th" o.currences of many natural phenomena fit the belt shape of the
,,o.^ui .".r". They worked oui the mathematics of this phenomenon and
for-
ca-e up witir th" rn"url, the standard deviation, and Z-scores Then they
(p)
,""iu,"i r."u" or proPortions under the curve These areas are fixed and
u-rJu ,o ur,, normally distribuied variable because normality is a natural
oc-
table provides precisely calcu-
.Ii.J"."-l'1tt like giavity. The normal curve
lut"d ,r"uJ rr,d". th"e curve. One thing must be emphasized here: Such parti
tionine of areas using the mean, thJ standard deviation' Z-scores' and the
c,i.r'" ruorks only if we haae reason to belieue that the scores in
"..*"i J"t.LU*r"r.,
are tnrmally distributed li the distribution of scores is skewed
or
"-rrrirti",oddly shapet, the normal curve table cannot be used in calcula-
",lfrl#r"
tions.
FrcuRE 5-3 A. In column Ar Computed Z-scores for one side of the curve or the other
Information
pro|.lided in the
columns of the
nornal disttibution
table ( Statistical
Table B ifl
Appendix B)
x
-zx zx
B.In column B: Area under the curve fiom lhe mean of Xto the Z-score for a value ofX
p hrealrommean to a Z-s(ore)
,
1 x
lSD Zx 2SD zx
C.In column C: Area under the curve frcm the Z-score fot a value of Xand beyond
tlx
0 1sD zx 2sD 3SD Zx
Column C of the normal curve table gives the area under the curve from a
Z-score and beyond in the "tail" of the curve, as in Figure 6-3C. For example,
.1,587 (or 75.87 percent) of scores in a normal distribution fall to the right of a
Z-score of 1 .00 or to the left of a Z-score of --l .00 . This is found by looking at a
Z-score of 1.00 in column A and then observing the entry .1587 in column C.
We noted earlier that any normally distributed variable has a median
equal to the mean. Thus, 50 Percent of the scores in any normal distribution
fall in either direction from the mean. Since the table provides half the curve,
note that for any Z-score, columns B and C sum to .5000, or 50 percent' Finally,
keep in mind that Z-scores may be positive or negative, depending on
168 The St at ist icil Ima gination
whether a raw score is above or below the mean, respectively. Z-scores can be
infinitely large, although in practice they typically fall between about -3.00
and 3.00 because in a normal distribution nearly 100 percent of cases fall
within 3 standard deviations to both sides of the mean. The nreas in columns B
and C of the normal curve table, however, are always positive; these areas de-
pict space. Zero space is the smallest amount we can have, and 100 percent
space is the largest.
X
246 8107274X
-3SD -2SD -1SD 0 +1SD +2SD +3SD Zx
Our basic knowledge of the normal distribution readily tells us the following:
(1) 50 percent of the assistance recipients score above 8, and 50 percent score
below 8, (2) approximately 68 percent score between 6 and 10 on the self-
esteem measure, (3) approximately 95 percent score between 4 and 12, and (4)
nearly all-over 99 percent-score between 2 and 14.
We can use the normal curve table to answer several types of questions
about the distribution of self-esteem among recipients of family assistance.
Important Study Hint: The normal curve table requires Z-scores. When in
doubt about how to start a problem, compute Z-scores.
Problern Type l: p [of Cases from the Mean to an X-Score]. Find the
proportion (p) of cases between the mean and some X-score.
Solution plan: Draw and label the normal curve for the variable X; shade the
target area (p) from the mean out to the specified X-score; compute the Z-score
for that X-score; locate the Z-score in column A of the normal curve table; get p
from column B; report the answer in everyday terms.
Illustration: What percentage of assistance recipients have self-esteem scores
between 5 and 8?
Identify this target area, p.
I I
X=5 *
246 8 10 12 14 X
-3SD -2SD -1SD 0 +1SD +2SD +3SD zx
770 The Stotisticdl Imagitlation
Column B in the normal curve table provides areas under the curve from
the mean out to any Z-score. By drawing the curve, we can see that the target
area (p) is bordered by the mean; thus, p is a "column B type" area.
The next step in solving problems is to transform a raw score inlo a Z-
score:
_ x-x 5-8" -1
sx =' 2
=::---::
2... = = -i=
2
-1.50 sD
Remember that a Z-score is iust another way to express a raw score. An assis-
tance recipient scoring 5 on self-esteem falls 1.50 SD below the mean, the ,lega-
tioe Z-scire of -1.50; lhe is among those with rather low self-esteem. In col-
umn A of the normal curve table, find 1.5 and treat it as -1.5. Look in column B
and report the answer as follows:
plof X = 5 to X = 8 I -- .4332; v"=p(-100)=43.32V"
Finally, answer the question in everyday terms: A little over 43 percent of as-
sistanie recipients scored between 5 and 8 on the self-esteem measure (This is
a disiributional interPretation describing the result in relation to the distribu-
tion of scores of the PoPulation of recipients of family assistance.) If a ran-
domly selected name is chosen from the case files, there is about a 43 percent
chance that this person will score between 5 and 8 on the self-esteem measure.
(This is a probabilistic interPretation, the Probability of a single randomly
drawn assistance recipient falling in the targeted area.) We comPute Percent-
ages and substitute lhe lerlrl chance for probability for clarity of expression to a
public audiene.
Problem Type 2: p [of Cases Greater Than an X-Scorel. Find the proportion
(p) of cases greater than a specified X-score.
Solution plan: Draw and label the normal curve for the variable & shade the
target area (p) from the X-score out into ihe tail in the positirte or " Sreater than"
diriction; comp*e the Z-score and locate it in column A; get p from column C'
lllustrntion: What ProPortion of the assistance reciPients score at or above 13
on the self-esteem scale?
Shade the tarBet area, P:
x X=79
24 6 8 10 12 74 x
0 +1SD +2SD +3SD zx
-3SD -2SD -1SD
Chapter 6 Prcbability Theory and the Nonnol Pnbability Disftibutiotl 171
z,=x-x-13
-Y- sx -8
Z -""--
2 =J=z.soso
Find 2.50 in column A of the normal curve table. Look in column C and report
the proportion of area greater than or equal to 13 as follows:
Problem Type [of Cases between Two X-scores on Different Sides of the
3: p
Vfean]. Find the proPortion of cases between two X-scores, one below the
mean and one above the mean.
Solution plan: Draw and Iabel the normal curve; shade the target area (p) from
one X-score to the other; compute the Z-scores for the two X-scores; locate
them in column A of the normal curve table; get areas P,4 and PB (drawn be-
low) from column B; comPute the area (p) which will be the sum of PA and
PB.
Totalp=PA+PB
I
I I 1
X=4 x X=10
246 8 1012t4x
-3SD -2SD -1SD 0 +ISD +2SD +3SD Zx
172 The S tat isticrl Inaginat ion
zx
x-x 4-8 __!1,
sx t=-2oosD
z,-x-x-10=8=1=t.ooso
^sr22
Now use the normal curye table. In column A find each of the two Z-scores.
Look in column B to get areas PA and PB and report the answer as follows:
PA = p lol X = 4 to X = 8l = .4772
PB = p lof X = I roX = 101 =.3413
-l0l
p lof X = 4 to X = = PA + PB = .4772 + .3413 = .8785
7o=p(700)=87.85E0
Answer the question in everyday terms: About 82 percent of assistance recipi-
ents have self-esteem is scores between 4 and 10. If a randomly selected name
is chosen from the case files, there is an 82 percent chance that this person will
have a self-esteem score between 4 and 10.
Problem 'fype 4z p lof Cases between Two X-scores on One Side of the
Meanl. Find the proportion (p) of cases between two X-scores on one side of
the mean.
Solution plan: Draw and label the curve; shade the target area (p) from one X-
score to the other; compute the Z-scores and locate them in column A of the
normal curve table; get areas PA and PB from column B; compute the area p,
which is PA minus PB.
Illustration: r{hat proportion of the assistance recipients scored between 11 and
13 on the self-esteem scale? In the sample of 500, how many assistance recipi
ents is this?
Study Hint:By drawing the curve, we see that the target area p does not touch
the mean. Therefore, it is ro, a column B type area in the normal curve table;
neither is it a tail-shaped, column C t)?e area. Thus, to solve ihis illustration,
we must compute p indirectly.
Shade the target area p: o ,
PB
I
,
11
x=17 x=73
246 8 10 12 14 x
-3SD -2SD -lSD 0 +lSD +2SD +3SD zx
Chnpter 6 Probobility Theory dnd the Nonnal ProbabiliLy Disl/ibution 173
- x-x
Sr.
13-8
2 | = z.so so
- x-X
5lz
11 -8
] = r.so so
In column A find each of the two Z-,scores. Look in column B to get areas PA
and PB and report the answer as follows:
PA= p lof X=Ito X = 131 = .4938
PB = plof X=8toX=1-l.l=.4332
plof X=l't toX=131 -PA-PB=.4938- 4332 = 0606
7"=p(700)=6.06%
Sfudv Hin!:Sublract ps (i.e., areas under the curve), not Z-scores'
To determine how many of the 500 assistance reciPients score in this range,
take the proportion of the samPle size n as follows:
n=p(n)
where
# = number of cases in the samPle for the designated area, p
p = proportion of area under the curve
r = samPle size
Illustration: If a randomly selected name were chosen from the case files, what
is the probability that this assistance recipient would score at or below 6.5 on
the self-esteem scale?
Shade the target area, p:
x=6.5 X
2468 t07274X
-3SD -2SD -1SD 0 +lSD +2SD +3SD Zx
7 _x -x _6.5-8
.x- _ 15 =_r"..)
ru
s" - 2 --1 =-.,t
In column A of the normal curve table, find .75 and treat it as though it were
-.75. Look in column C and report the answer as follows:
P lof X<6.5)=.2266
E, = p (100) = 22.668"
Answer the question in everyday terms: The probability that a randomly se-
lected assistance recipient scored at or below 6.5 on the self-esteem scale is
about 23 percent.
Problem Type 6: p [of Cases Less Than an X-Score That Is Greater Than the
Meanl. Find the proportion (p) of cases less than a specified X-score which is
greater than the mean.
Solution plan: Draw the curve; shade the target area (p); compute the Z-score
and locate it in column A; get p from column B and add .5000.
lllustration: What is the probability (p) that a randomly selected assistance re-
cipient scores at or below 10.5 on the self-esteem scale?
Study Hint: Remember that the normal curve table gives areas only for one
side of the curve. Remember also that a normal curve has a median equal to
the mean; therefore, half (or a proportion of.5000) of the scores fall below the
mean. This illustration is solved by working with the area above the mean and
then adding the area below the mean. (Incidentally, to find the proportion (p)
of cases more than a specified X-score which is less than the mean, work from
the left side over Calculate the area below the mean and then add ii to .5000,
which is the area above the mean.)
6
Chaptet 6 Probability Theory and Lhe Normal Probabilit! DistribuLion 775
p =.5 + PA\
PA
I
t
1
x = 10.5
245 8 707274X
-3SD -2SD -1SD 0 +1SD +2SD +3SD Zx
,,,=Y=YfJ=!=r.zsso
In column A of the normal curve table, find 1.25. Look in column B and report
the answer as follows:
lol X = 8 to X = "10.51 = .3944
PA = p
p [of X < 10.51 = PA + .5000 = .3944 + .5000 = .8944
Answer the question in everyday terms: The probability that a randomly se-
lected assistance reciPient scored at or below 10.5 on the self-esteem scale is
over 89 Percent.
Problem Type-
7: Find the X-Score That Has a Specified p [of Cases] above or
below It. Find the value of a raw score X for which a specified percentage of
the sample or Population falls above or below that value.
Solution plan: Whereas the previous problem types provided an X-score and
asked foi an area (p), this pioblem provides information on p and asks for an
X-score. Draw and Iabel the normal curve; roughly identify and shade the tar-
set area ,; find this area in column B or column C of the normal curve table,
irni.f'"u"i column is aPparently aPPropriate from the drawing; read column
A to get the Z-score; solve for X as follows:
X_X
S1
, thus, X = X+ (sx) (Zx)
500 who were measured for self-esteem. Let us choose the 50 with the lowest
self-esteem because they are presumably at the Sreatest risk of depression.
What is the highest self-esteem score a reciPient can have to qualify for ihe
program?
To identify the target arca Pt we comPute the Proportion of assistance re-
cipients who are to qualify:
1 1
X=? x
246 8 70 l2 74X
-3SD -2SD -1SD 0 +lSD +2SD +sSD Zx
Study Hint: At this point, estimate the answer from the graph. Our marking of
the position of X should be close. We know by now that only 15.87 percent of
cases fall below -1 SD, and so the 10 Percent mark must be iust below that.
Thus, our X-score should be slightly below 6. Estimating the answer in this
fashion not only encourages proPortional thinking but also provides a warn-
ing if our calculated answer is incorrect.
Now use the normal curve table. In column C find.1000 or the nearest
amount to it, in this case .1003. Look in column A to find the corresponding Z-
score of -1.28 and solve for X:
Answer the question in everyday terms: Those assistance reciPients who score
less than or equal to 5.44 on the self-esteem scale fall in the lowest 10 Percent
and therefore qualify for the depression-avoidance program.
Study Hint: Problem Type 7 shows that as Iong as we know the mean and the
staniard deviation of"i distribution and can assume that the distribution of
piece of in-
scor"s in the Population is normally shaped- only one additional
is ieeded to solve any pioblem' This piece of information can be
a
;;;; normal curve (p)'
;;; i;.", a standardized Z-icite, o' an area undet the
.a-
Thus:
If given an X-score, compute Zx and use the normal curve table to get p.
If given a Z-score, use the normal curue table to get p or solve for X,
where X = X+ (sr) (Zr).
If given a percentage or area, p, use the normal curve table io get the
corresponding Z-score and solve for X, where X = X+ (sx) (Zx)
Citical Values and Ctitical Regions ufider the Not nal Curre
As we will see in later chapters, there are certain Z-scores and areas under the
normal curve that are of critical (or great) importance in statistical Procedures
and therefore are used frequently. These are called critical Z-scores and critical
regions of the curve. The critical regions are areas under the curve which, of
course, can be viewed as probabilities. These critical probabilities are signified
with the Greek letter alPha (cr). Why do we call these scores and probabilities
critical? Becatse statistical procedures are based on Probability theory. These
o-probabilities are decisive in determining the degree of confidence we may
plice in our reported results (Chapter 8) and also are important for iesting hy-
potheses (Chapters 9 through 16). The notion of uitical will become apParent
iater For the time being let us focus on the relationship of these critical cr-
probabilities to the normal curve.
The most frequently used critical Z-score is t 1.96. Ninety-five percent of
the area under a normal curve falls betrveen +1.96 and -1.96, leaving 5 percent
of the area distributed in the two tails (2.5 percent in each tail). It is the area in
the tails of the curve that constitutes the criiical region or o-probabiliry Since
the focus is on two tails, this is called a two-tailed critical region. A critical Z-
score of t 1.96, then, corresponds to the cdtical region "o = .05, two tails."
We can also have a critical region concentrated on one side of the curve-a
one-tailed critical region. For example, the critical Z-score of 1.64 is a one-
iailed critical region; 5 percent of the curve is beyond 1.64 on one side A criti-
cal Z-score of 1.64, then, corresponds to the critical region "cr = .05, one iail."
These two critical scores and their critical regions are illustrated in Figure 64.
Table 6-1 lists several commonly used Z-scores and the sizes of their critical
regions (i.e., o-probabilities). Note that these critical regions are "comfortable"
siies (i.e., 5 percent, 1 percent, and 0.1 percent). For instance, if asked to rate
the performince of the members of a rock and roll group, you might respond
thafthe group rates in the toP 5 Percent or 1 Percent. You are not likely to use
an awkward percentage such as 4 percent.
FTGURE 6-4 Illustration A: cdtical two-tailed z-score oI11.96; critical region area totats .05 (5y.) dislributed
Critical Z-scores in the two tails.
Designatedr Ciitical region for d =.0S, two tails.
lor a=.05
P=.95
I I
Z = -1.96 z = 7.96 ct=(,025)+(.025)=.05
Illushation B: Critical one-tailed Z-score of 1.64; .ritical region area totals .0S (52.) in one tail.
Designatedr Criticel region for a = .0t one tail.
P=.95
Z = 1.64 d =,05
t)ariable (see Chapter 2). For example, with regard to Problem Type 6, someone
who scored 10.5 on the self-esteem scale scored higher than did 89 percent of
the assistance recipients in the sample-a Percentile rank of 89. Similarly, for
Problem Type 5, someone who scored 6.5 has a percentile rank of 23. When a
variable is normally distributed, we can use the normal curve table to quickly
compute percentile ranks.
Many distributions, especially achievement, intelligence, and school ad-
missions tests, are especially designed to produce a score distribution that is
normally distributed. We all remember receiving percentile ranks in addition
to the raw scores for such tests. The companies that distribute the tests inten-
tionally "normalized" them so that score distributions would fit the normal
curve. Once this normalization is accomplished, the normal curve table is used
to generate Percentile ranks.
Finally, we should mention that percentile ranks can be determined for
distributions that are not normally distributed. All that is necessary to com-
pute any percentile rank is to determine what Percentage of a distribution falls
below a specified X-score. Most computer Programs provide this information
as the "cumulative percentage" of a distribution (see Chapter 2).
Calculations
ZX Percentile Rank
Ronald 24 0 50
Barq, 28 1 84
Sophia 32 2 98
of 28 is only 4 points better than a 24. This is apparent when the raw scores
(X), standardized scores (Zx), and percentile ranks are compared, as they are in
Table 6-2.
This illustrates the imPortance of knowing how a distribution of scores is
spread. Barry is only 4 Points better than Ronald on the raw score, but he is 34
percentage points better in terms of Percentile rank. Barry, like Sophia, scored
better than the great majority of entering students. Raw scores by themselves
suggest otherwise and can be very misleading. The standard deviation as a
unit of measure with normal distributions is a powerful tool for gaining accu-
raie insight into the significance of a raw score.
Finally, the phenomenon of normality is the essence of statistical analysis.
It is very important that we learn how to roam about the normal curve and de-
velop the skills to partition areas under it. A quick look through the remainder
of this text should convince you of the importance of mastering the problems
in this chapier Nearly every chaPter after this one has depictions of the nor-
mal curve or similar probability curves.
game
Imaeine that Bob and Terri are gambling by playing a coin-tossing -Bob
turns deciding how
;;3-;trh heads, ard Terri wirs with tails They take
tf't" nip will be worth, choosing an amount from 5 cents to 25 cents
-""i ".*t
182 The Statistical hngination
Bob jusi won three flips in a row at 10 cents a flip. Should Terri increase the bet
to 25 cents for the next toss? Does ihe fact that heads fell three times in a row
increase the chances that tails will come up next?
The answer is no. A common statistical mistake in computing probabilities
involves the independence of ihe parts of compound events. Each coin is
flipped independently of what happened to it in previous flips. If we flip a
coin twice and get heads both times, this does not increase the probability of a
third flip coming up tails. That probability remains .5000.
This tendency to imagine that independent events are tied together is one
type of gambler's fallacy. When a gambler hits a streak of bad luck, he or she
may start to believe that a streak of good luck must follow. In the long run, in-
deed, good and back luck balance out. But what is the long run? Is it 3 tosses,
10 tosses, 1 million tosses? For a given gambler, is the long run longer than his
or her money will hold out? Moreover, the balance between good and bad luck
occurs among all gamblers, not within a single gambler Thus, if 100 couples
were playing this coin-tossing game, over the course of an evening, chances
are great ihat about as many heads will be tossed as tails. But Bob and Terri
may end up tossing more heads, while ]oe and Maggie may toss more tails.
To assume that coin tosses are linked is to think mistakenly that we know
the length of a "series," a sequence of tosses over the long run. Unfortunately,
there are an infinite number of possible sequences, because each toss is inde-
pendent of the next. For example, Bob's three heads in a row could be part of
any of the following series in which heads and tails fall an equal number of
times:
T,T,H,H,H,T,H,H,T,T
T,T,T,T,H,T,H,H, H, H, H, T
H, T, H, T, T, H, T, T, T, H, H, T, H, H, H, T
TI, H, H, H, T, T, H,H,T,T,T,H,T,T,H,T,H,T,T,H
For a gambler to assume that he or she somehow knows the future sequence
of outcomes is to assume that the future can be seen to a greater extent than
what the basic probabilities of occurrence tell us. This obviously is not a sensi-
ble way to gamble.
4=p(n)
12. The mean, standard deviation, and normal curve are used most
appropriately with variables of what levels of measurement?
13. Why is it appropriate to use the same synnbol p for proporiion, probability,
and area under a normal curve?
14. When a score of a normally distributed variable is to the right of the mean,
it is in the direction.
15. Explain why it is inappropriate to use Z-scores and the normal curve table
for any distribution of scores that is not normally shaped,
16. What information does a percentile rank proyide?
17. Explain what it means to be very lucky or very unlucky.
2. Compute the following probabilities for the roll of one gaming die;
a. p l1l,
b. p [5 then 6i
c.p[1 or3or6]
3. Suppose you have a box of 100 red marbles, 50 blue marbles, and 50 green
marbles. Compute the probabilities of randomly drawing the following
from the box:
4. Suppose you have a box of well stirred dry beans: 150 red, 70 white, and
80 black. Compute the probabilities of randomly drawing the following
from this box:
5. For the toss of one coin (H = heads, T = tails), compute the following:
a. p lH)
,. p [7 then T]
c. p [I then H then H]
Chapter 6 Probabilily meory a d the Nomnl Probability Distribution 185
6. For the toss of one coin (H = heads, T = tails), compute the following:
a.p lTl
b. p [H then fl
c. p [T then T then T]
a. p 11.01
b. 17 or kingl
p
c. P [jack or diamond]
d. p [king then king, or ace then ace] without replacement
9. With a standard deck of 52 playing cards, are your chances of drawing two
aces in a row better with or without "rePlacement"? Illustrate with
comPutations
10. With a standard deck of 52 playing cards, are your chances of drawing an
ace and then a king better with or without "replacement"? Illusirate
with
computations.
11. Frank is conducting a ielephone Poll of the residential households in Big
Frog County. Fooliihly, he uses the telephone book as a sampling frame
ard"rando*ly draws phone numbers from it As it turns out, 5 percent of
county households have no telephone Among households u-ith phones' 30
percent have unlisted numbers. Moreover, 15 perceni of th€ listed
numbers are for businesses even though they are in the White Pages' What
percent of Big Frog County households have any chance of being called by
Frank?
12. The Melodious Lamp Shades is the hottest new popular music act
touring'
and it is booked to aPpear at the local coliseum in 14 days Unfortunately'
the concert sold out tefore your ticket order got in Your only chance
to go
being the first caller when a
is to win a ticket in a local iadio contest by
Lamp Shade hit song is played, which occurs six times each day At any
time; 20,000 people ire listening to the station and 25 percent of them
now
attemplto c;ll. I? you attempt a call at every opportunity between
urd tt .o.t.".t, *hai is the probability that you will win a ticket?
" a-dmissions test'
13. We have ihe following descriptive statistics for a college
-"'
Ut" ti""" a"ta to ansier the iollowing questions Draw the normal curve
and label all iarget areas'
185 The S t at istical lnaginat ion
a. What proportion of those who took this test scored aboL)e 26?
D. What proportion of the scores fell between 77 and 19?
c. What proportion of the scores fell between 78 and23?
d. Determine the score below which 90 percent of the scores fell.
e. If an applicant had to make at least the 90th percentile rank to get
into a college program, what score would he or she need to make
(short answer)?
14. We have the following descriptive statistics for iob performance scores
where a high score indicates good work. Use these data to answer the
following questions. Draw the normal curve and label all target areas.
Y = iob performance score
Y= 78 points sy = 8 points
n=480
The distribution is normal.
,. What proportion of those rated scored aboae 90?
b. What proportion of the scores fellbetueen 88 and 98?
c. What proportion of the scores fell betweenT0 and90?
d. Determine the score below which 95 percent of the scores fell.
e. If an applicant had to make at least the 95th percentile rank to
obtain bonus pay, what score would he or she need to make (short
answer)?
15. You are an intake worker at a homeless shelter When new clients arrive,
you administer the Center for Epidemiological Studies Depression Scale
(CESD), a community screening questionnaire, to determine who needs a
doctor's care for acute psychological depression. Among homeless people,
the mean CESD score is 23.5 with a standard deviation of 7.5, and the
distribution is normal. Any client scoring 16 or higher is to be sent to a
doctor. Draw a normal curve with the solution to each problem.
,. What is the probability ihat your next client will be sent to a doctor?
b. What is the probabiliiy that your next client will score 10 or below?
c. If those homeless scoring in the highest 15 percent on the CESD are
to be targeted for suicide prevention services, what score qualifies a
client for these services?
16. You have a population of young adults with a mean age of 22 years and a
standard deviation of 2 years. Ages in this population are normally
Chopter 6 Probability Theory and the Notmal Probabilily Distribl iotl 787
a. p [of randomly drawing someone between the ages of20 and 241.
b. p [of randomly drawing someone (19 years old or younger) or (25
years old or older)1.
c. If the youngest 10 percent of the young adults are to be mailed a
letter, below what age will the letters be targeted?
17. Draw and Iabel a normal curve to answer each of the following
questions.
18. Draw and label a normal curve to answer each of the following
questions.
a. What critical value of Z has .001 of the area beyond it on one side of
the mean?
b. This critical value (2") applies to what critical region (a)? (If
necessary, review Table 6-1 )
c. What criiical value of Z has .001 of the area beyond it on both sides
of the mean combined?
d. This critical value (2") applies to what critical region (cr)? (lf
necessary, review Table 6-1.)
19. A statisiician says that she rates the performance of a popular singing
artist at 2.33 standard deviations above the mean performance of all artists
she has seen. Percentagewise, how hiEh does this statistician rate the
performing artist? In other words, according to this statistician's
judgment, whai is the performer's percentile rank?
20. ]essica, Michele, and Caroline take an achievement test that is
normalized--especially designed so that the score distribution fits a
normal curve. The mean of the test is 1,000 with a standard deviation of
100.lessica scores 1,000, Michele scores 1,200, and Caroline scores 1,400'
Michele feels dejected because she believes she did not do much better
than Iessica. Use the normal curve and percentiles to show why Michele is
wrong to feel dejected.
188 The Statkt{,al lnagination