A Primer of Operant Conditioning
A Primer of Operant Conditioning
Contents
Summary Statement 4
Respondent Conditioning 7
Operant Conditioning 8
Conditioned Reinforcers 11
13
What Is Research? 13
Programing Equipment 21
25
OF OPERANT BEHAVIOR
Spontaneous Recovery 34
Discriminative Stimuli 36
Stimulus Generalization 37
Directions of Generalization 37
Responding 38
Response Generalization 39
a Discrimination 42
Supraordinate Stimuli 48
Sensory Preconditioning 49
51
Durability 57
Potency 57
Schedules 68
Effects of Changes in the Value of the Ratio or Interval on VR
and VI Schedules 69
Mixed Schedules 72
SCHEDULES OF REINFORCEMENT
Conjunctive Schedules 82
Diagrams of Schedules 82
Alternative Schedules 83
Interlocking Schedules 83
Concurrent VI Schedules 91
94
RESPONDENT CONDITIONING
Unconditioned Respondents 95
Habituation of Respondents 95
Conditioned Respondents 95
Simultaneous Conditioning 96
Delayed Conditioning 97
Trace Conditioning 97
Backward Conditioning 97
Temporal Conditioning 97
Sensitization 97
Spontaneous Recovery 99
Disinhibition 99
Escape 104
Avoidance 106
Punishment 111
Effects of Punishment on Other Reinforced Behavior 114 Punishment with Various Schedules of
Reinforcement and with Extinction 115
Respondent and Operant Components of Emotion 118 The Name of an Emotion 119
The Definition and Measurement of Aggression 124 The Control of Aggression 125
The Relationship Between Emotion and Motivation 125 Motivation in Operant Conditioning 126
103
118
131 137
one
Introduction to the
experimental
analysis of behavior
WHAT IS OPERANT CONDITIONING?
Operant conditioning is an experimental science of behavioir. Strictly speaking, the term Operant
conditioning refers to a process in which the frequency of occurrence of a bit of behavior is modified by
the consequences of the behavior. Over the year, however, Operant conditioning has come to refer to
an entire approach to psychological science. This approach is characterized in general by a deterministic
and experimental analysis of behavior. It is also characterized by ai concentration on the study of
operant or instrumental behavior, although not to the exclusion of the study of instinctive and reflexive
behavior.
As an approach to the study of behavior, operant conditioning consists of a series of assumptions about
behavior and its environ ment; a set of definitions which can be used in the objective, scientific
description of behavior and its environment; a group of techniques and procedures for the
experimental study of behavior in the labora tory; and a large body of facts and principles which have
been demon strated by experiment.
Operant conditioning is concerned with the relationship between the behavior of organisms and their
environment. Research in operant conditioning gathers knowledge about behavior from the
experimental study of the effects on behavior of systematic changes in the'surrounding environment.
Operant conditioning attempts to understand behavior by gaining knowledge of the factors that modify
behavior. As an objective science, it is restricted to the study of factors that can be observed, measured,
and reproduced. The science of operant conditioning has accumulated an enormous body of
knowledge and has taken great strides toward a complete answer to the question: What makes
organisms behave as they do?
The psychologists who use this approach differ greatly in their degree of commitment to the principles
of operant conditioning. At one extreme of commitment are those who accept only the experi mental
techniques because they are convenient methods for studying behavior. At the other extreme are those
who accept, at present partly on faith, the beliefs and findings of operant conditioning as being truly
descriptive of behavior and as guides to the conduct of their per- ,sonallives.
This primer presents, concisely but as completely as possible, the concepts, methods, and findings of
operant conditioning. This first chapter will concentrate on basic assumptions underlying the science
and on definitions of fundamental concepts.
What makes organisms behave as they do? This question is notably subject to insufficient and
incomplete answers. A man is said to go to the store because he "wants" a certain article that is sold at
the store. A child is said to steal because his "superego" has failed to operate. A dog is said to perform
tricks because it "needs" affection. Explanations such as these - which are stated in terms of the will,
hypothetical divisions of an organism's mental apparatus, or the presumed needs of an organism - are
not acceptable in operant conditioning because they do not specify the actual environmental
conditions under which the behavior will reliably occur. Instead, they offer reasons which themselves
require explanation. Thus, it is still necessary to determine the conditions under which the man will
want to go, the child's superego will fail, or the dog's needs will be expressed in tricks.
~ oper~~~~~2.n.!!1g jin adeque.te .ex.planatiolL.of.-behaYior is ~E~ .. !r?J;?p'e<;:ifi~ . Jb ;l·ctual
conctitions which .reliablY,J?x..o.cl.J,l.ce the h!lyiqr.Jo .. bJ~ S)<p.l i'), n~9:. Statements about the causes
of behavior I are accepted as valid only when they specify what can actually be done I under given
circumstances to produce that behavior. The behavior is I understood only when it can be shown
experimentally that under
J these circumstances, specified changes in the environment do ac tually result in the behavior. Because
explanation in operant condi tioning requires the experimental production and manipulation of
behavior, the actual control of behavior becomes an essential part of the process of explanation. !v.. 9-
p' ~J search" . .tg uI} ~~~ I1 d. j> e havior is to cOiltroLit, and vice versl;)..
The specification of the environmental conditions under which behavior will reliably occur is not so
difficult as it might seem. In fact, the science of operant conditioning has already made much progress
in demonstrating how behavior can be controlled by the environment and how the environment can be
described objectively and in detail.
.erE;: 9x.e Jw..Ql\ ~pd~ .. Qf lYi o.nr,n e. n.tal de terminants". ~J::L'!Yior: the
conte1l1,p.or.flI¥ .arnLthe.historical. The behavior of an organism at ru;yone~oment is determined not
only by the currently acting, contemporary environment but also by the organism's previous
experience with these, or similar, environmental conditions. Thus, a man brakes his car to a stop at an
intersection not only because there is a red light but also because of his previous experiences with red
lights. A child stops talking when told not only because he is told to stop but also because of his
previous experiences with the conse
quences of not obeying. A dog runs to the kitchen when his food is taken from the shelf not only
because of the noise of the moving can but also because of its previous experiences with such noises.
Operant conditioning is concerned with the experimental analy Isis of both kinds of determinants of
behavior. In dealing with con ltemporary causes, it tries to determine, through observation
and ,experiment, the particular environmental event which is responsible ,for the behavior. The man
brakes only when the light is red, not 'When it is green; and he continues his journey when the light
changes from red to green. 2:~~l:~.i~~.!: ~.!ly irol!!D.~.ut.al.CDn i
tion whic J~rtngs... ab.ou the peeHic beha¥i Q£ hraking. Byexperi mentally manipulating the
contemporary conditions of which the behavior is a function, we can control the man's behavior. Thus,
if we change the light at the corner to red, the man brakes; if we let it continue to be green, he does
not.
Historical determinants of behavior are more difficult to specify, If if only because they invariably
involve several experiences over a
, period of time. However, the specification of historical determinants t can be as exact as the
specification of contemporary determinants. In the case of the dog's running to the kitchen, we might
suppose that the noise of the can resulted in the running because of the dog's previous experience.
Specifically, we may speculate that the running occurs after the noise because this behavior has
previously been
followed by food from the can. But this explanation, unless devel oped further, is little better than saying
that the dog runs because it wants the food. We have not yet demonstrated exactly what hi.stori-
\ ---_ .. ',.- 9L~~ ~!iences are .... necessary for the occurrence of the behavior. The fact that i. I? e,J! f1:P~
w:Hh the sequence noise-running food is in part responsible for the current behavior can be estab lished
experimentally by either of two possible methods. One is to change the dog's experience and see if this
brings about a change in its behavior. Since the dog's historical experience is in the past, it cannot be
changed directly; but it is possible to create a new history of experience for the dog by exposing it to
new and different experiences for several weeks. For example, suppose that from now on a whistle
always announces the dog's meals and the noise of the can does not. In practice, this involves both
taking the can down noisily from the shelf and not feeding the dog, even though it comes to the
kitchen, and taking the can down quietly and whistling when the dog is to be fed. If, as we have
supposed, the dog's previous experiences of receiving food when it ran to the kitchen following the
noise from the can were responsible for its running, the dog should run to the kitchen when we whistle
and no longer run when it hears the noise of the can. Through experience, the whistle comes to be the
environmental event after which running to the kitchen is followed by food, and the noise of the can
becomes an event after which running is not followed by food. If the dog's behavior does not change in
our experiment, then we have made an incorrect supposi tion about the historical determinants of its
behavior. ' A second method of studying historical determinants of behavior ,is to create the same
history of previous experiences in another, i similar organism. If our assumptions are correct, any other
dog
should also run at the noise of the can if running to the kitchen after the noise has resulted in food in
that dog's experience. As both of f these methods indicate, p.perant conditioning rejects mere plausible
: speculations about the causes of behavio~ 'iind a ims at direct ex peri - mental demonstration of the
contemporary and historical determi nants of behavior.
Summary Statement
Experimental analyses such as those described above have led to the conclusion summarized in this
statement: The characteristics of behavior and its probability of occurrence are determined by the
environmental conditions and events which precede and accompany the behavior, by the
environmental events which change after or as a consequence of the behavior, and by the organism's
previous expe
rience with the environment. It is within the context of this state ment that operant conditioning studies
behavior.
Behavior in this formulation refers to everything that organisms do. Most behavior, such as the dog's
running to the kitchen, can be seen. Some behavior, such as speaking, may only be heard. Other
behavior, such as thinking, is ordinarily accessible only to the organ
ism that does the behaving. The environment iI\!h!lL formulatioo. incll!~!? ~ everything that has an
effect on the organism, . whether or not that effect is immediate. The environment thus includes the
~g-;~iSrn:'L9~n heha'lior, sinc;-;~eofthe -Jefer:rninanfs orcurre~t"
~hi!vior may be the: b~havior which pxece,ded it. The consequences of behavior are simply the
environmental events which follow the behavior closely in time. In our previous example, the food was
a consequence of the dog's running to the kitchen.
In OEer~I).Lg>B c;ii ti.Qning J):1 ,, I!! R.h?~§ .. !S p!,!~ ed on the probqb,il- .!!1L!~~ !j:> e~avior will occur,
In our example, we tacitiyassumeci'that the dog always ran· at the noise and, later, that it always ran at
the whistle; that is to say, we assumed that the probability of running was 1.0. Because such perfection
is not always the case with behav ior, we usually speak of the probability that the behavior will occur
under given circumstances. If the dog ran only half the time when we whistled, the probability of
running would be 0.5. Thus one meaning Q.f.. !. ~a ility s the frequency of occurrence of th;t-e-havior
'reiiiilv"e t.9_.Jhe frequency of occurrence of certain environmental conditions.
Any experimental science relies on description as well as on ex periment. The descriptive system of a
science breaks its subject mat ter down into elements that can be clearly defined and communicated.
The basic concepts of operant conditioning describe behavior and the environment reliably and
precisely. As a result, all members of the scientific community interested in behavior and its control are
able to
understand the descriptions and to reproduce the measurements of behavior and the environment that
are the basis of the science. I ,'
[email protected]~I}diJj Q.Jling, \:Y.e... ink .of. behav)qr. 9.~ §~.8m!;! t~9 j n to lJ nH~s~!l~,g L~~P'9- ~~.eJ.· . ~l:lillk
_.QLthe .. env:iwnmenLGlJu gm nt@d jnto .. units ... called stimuli. Unfortunately, both terms are
somewhat misleading, because they do not refer in operant conditioning to what their ordinary
meanings suggest. Responses, the units of be
havior, need not be " replies" to the environment. Indeed, we shall see that one of the most
fundamental concepts of operant condition ing is that most behavior is not necessarily forced from the
organism by the environment. Nor do stimuli necessarily incite the organism to action. In fact, it is
fundamental in operant conditioning to ap proach the environment from an entirely opposite point of
view. Thus, it is necessary to understand the more precise definitions of these terms as they are used in
operant conditioning.
The responses cOIl1posing behavior are separated into two classes: ~~lass ~~!!- ed ?'p ~';1!
lk9 ;~DlJllental, responses; and the other
£.a.n~ E~ ~E.~~ ~~~~'. !.~ }~::<!v. 9!} ~s . r.:'£P ~':l:!l!.£9}1 jnQ!lL ..8.L... hese two ~i~ds oJ res~on~e~ are
~alle . opera ts r:.~,. !:~3PQ d~}1; ~S . V) The enVIronment IS dIVIded mto several classes of stImuh. One
class, the eliciting stimuli, is composed of environmental events which regularly precede responses.
These sti illWLelici1.I.e la tiy£!y.J ~gd . .And s1ereoLy.:pe.d . r(?sp0 s r..th~ spondenK ment.ione.d Q~ . A
second ) class of stimuli, the reinforcing stimuli, or reinforcers, is composed of environmental events
which follow responses. ~~ D- forcing stimuli l~~~ J!~9. ~~I].~Y qftlte, resp.onses they foll0W.i. they
increase the p.r.0p lJility that these responses will reoccur in the future behavior of the
organism . .Jkr.l':sp.onses Which become more probable when they are followed by reinJorcers are the
ope,.r.Clr.t~~ J:.!l entioned above. Mem- ( 7 ;' bers of a third class of stimuli, called g is
ri.winq1i'Ji.f_.i.tinuifj, · precede \..,../ and accompany operants but do not elicit them as the eliciting
stim uli elicit respondents. Rather, the presence of particular discrimina tive stimuli increases the
probability of those operants which have previously been reinforced in the presence of the same
discrimina- ." t..f tive stimuli. Still another class of stimuli is composed of neu!!al sti!!!lflj· '.- This class
includes all those environmental events which at any particular time bring about no change at all in
behavior, whether they precede, accompany, or follow responses.
These divisions of behavior and the environment are the funda mental concepts of the approach to the
study of behavior called operant conditio'ning. They have grown out of the efforts of experi
mental psychologists to describe behavior a'nd the environment in ways that will be scientifically useful.
So far, we have presented them only in a skeletal form. What follows is a more detailed discussion of
each of these concepts.
All organisms are provided by nature with reflexes, or innate inherited responses to certain
environmental events. Generally, these responses provide automatic behavioral protection and
sustenance for the animal from its earliest hours of contact with its environment. A thorn piercing a
dog's paw automatically elicits flexion, which raises the leg. A bright light on the eye elicits constriction
of the pupil. Vinegar in the mouth elicits secretion of the salivary glands. Stroking the palm of a child
elicits grasping. A sudden loud noise elicits startling. In each of these reflexes, a stimulus elicits a
response be
cause of the inherited structure of the organism and not because the organism has had any specific
previous experience with the stimu lus. The same stimulus elicits the same response from all normal
organisms of the same species (and, to be technically precise, of the same sex and age). As defined
above, such a stimulus is called an e!Lcj ting stimulus, and the response, a respondent.
Two characteristics of respondents should be given special notice because they playa major part in
separating respondents from operants . .fjrst, the frequency of occurrence of a respondent depends
primarily on the frequency of occurrence of its eliciting stimulus. E..~.R.9BQ~}l ~H ~!y' ccu . 'p'
ntane~)t:~b: _~ ~.'?~~!1 2L!'I I2...: cit
!~ ~~iml! EJ" To increase or decrease the frequency of occurrence of a .respondent, it ,is necessary only
to increase or decrease the frequency of its eliciting stimulus. Second, the consequences of respondents
the environmental events which follow them - do not usually affect their frequency. A thorn thrust into
the sole of the foot, for example, elicits flexion of the leg regardless of whether or not the thorn comes
out of the foot as a result of the flexion.
RESPONDENT CONDITIONING
The respondent behavior of an organism changes very little, if at all, throughout the organism's life. Leg
flexion elicited by a thorn in the foot of an old dog is essentially the same as flexion elicited from a
young dog by the same stimulus. What does happen during the life of an organism is that new stimuli,
previously ineffective, come to elicit respondents from the organism. This happens when a new
stimulus occurs again and again at the same time as (or slightly before) an eliciting stimulus. Gradually
the new stimulus comes to elicit a respondent similar to that originally produced only by the eliciting
stimulus.
~J?!QS~s,<. wherehY._ l)ew stimuli gain . th~ Rc>'''Y~ cit respendents,.is €aUed
r.espondenLconditionirW' The traditional example involves the conditioning of the respondent,
salivation. At first, only food or acid actually placed in the mouth elicits salivation. But grad
ually, during the early life of an organism, the sight and smell of food also come to elicit salivation
because they regularly precede and accompany the original eliciting stimulus, food in the mouth.
Respondents and respondent conditioning are discussed in detail in Chapter 8. For the present it should
suffice to remember only two facts. .!..j:~§E~~,! .n ~ !~. innate b~hav' iQJ:., ~.e.gJJ.larl _ .. eli<;;ite.d-, by
,S,ped£ic,stimu.E which precede them and la).: ~!y ,unaffected by stim-
.l!!i w..hich fQIl,Qw. Jl1..e.gt,.:. ~nc;l., second, resp()nde . ~'pJlditioning .in volves the repeated
presentation of a new s!imulus along with a stimulus that already elicits a respondent. The new
stimulus then acquires the power to elicit the respondent.
OPERANT CONDITIONING
Elicited respondents represent only a small proportion of the behavior of the higher organisms. The
remaining behavior is oper an t. Ther~E..1 . en viro menta eliciting, s. LIl!!:!lE~J..9..r .QPJ~I?.l!t.Q.ehav
----=----. ior; it _ __ simply .~ ____ 0 •• _._-occurs _ _ __ • _ . In __ the terminology of operant
conditioning, opera,nts are emitted by the organism. The dog walks, runs, and romps; the bird flies;-
the 'monkey swings from tree to tree; the hu man infant babbles vocally. In each case, the behavior
occurs without any specific eliciting stimulus. ll)~ ~~itia ~au~e t. op.~r.a~~. ehavior .w... ll. n the ga~!s,
g~~lf. The organism simply uses its inherited muscular and skeletal structure in relation to the
environment in which it finds itself. IUsJ n the biological n!lture of org~!;li.sms to emit
oper n~ ~~ avior.
It is clear from observation that some operants occur more fre quently than others and that the
frequency with which a given operant occurs can change. Closer observation suggests that the
frequency of occurrence of an operant is greatly influenced by the consequences of the operant. ,..!!
gr~S !:t!! fl:eq.uenq Q.f P~ :~t
! behavior is determined mainly by the frequency of its eliciting I stimulus (the environmental event that
precedes it), the frequency of \ operant behavior is primarily determined by its effect (the environ- ,
mental event that follows it).
, The effects or consequences of behavior may be either the ap pearance of an additional part of the
environment or the disappear ance of some part of the environment. If the appearance of a stimulus
. a consequence of a response results in an increased probability tllat the response will reoccur in the
future, mWus aUe.~: a 122.§jtive reinforcing ~! imulus , or positive reinforcer. If Jhe_disaP-12earance
-_......
....
of a.. ~ti
rn lus
.probability that the response will reoccur in the future, the stimulus- . is called an aversive stimulus, or
negative reinforcer.
A reinforcer is always defined in terms of its effects on the subsequent frequency of the response which
immediately preceded it. A dog, for example, may one day open the door to its play area with a hard
nudge of the forepaws. If this particular behavior occurs more frequently in the future, we call the
opening of the door to the play area a positive reinforcer. Negative reinforcement, on the other hand,
involves the disappearance of an aversive stimulus. Suppose that the dog manages to dislodge a tick
from its paw by rubbing the I
paw down the side of a venetian blind. If in the future there is an I increased tendency to rub the paw
against the blind whenever a tick! gets onto the paw, we call the presence of the tick an aversive stimu
V Ius and its removal a negative reinforcer, which reinforces the ret sponse of rubbing the paw on the
blind. I
Reinforcers are many and varied. Positive reinforcers, events which reinforce by their appearance,
range from food and water to novel stimuli. Aversive stimuli, events which reinforce by their
disappearance, range from discordant noise to life-threatening situa
tions. In any case of reinforcement, an operant occurs, has an effect on the environment, and, because
of the effect, occurs more fre quently in the future.
Most operants occur with a high frequency only under certain conditions. One rarely, if ever, recites the
Gettysburg Address unless faced with an audience of listeners. The dog enters the kitchen infre quently
except at the usual time for its meals. One rarely turns off a radio which shows no signs of being on.
These are examples of the ..kQ.ntrol QL()perant behavior by discriminative stiffiUJ. ri eac --case, the ..
p.IQpabill ty ortne operanrrn;:rgn·-6ft1y-IfnlLe ·· pt erfce of ~rbiin
environmental events - the discriminative stimuli - and it is low under other conditions. 1!l.,9P~Fant
conditionipg, ~l:l~ discriminaqye stimuli. ~X~""?J L9 Q.,cQ!1 tr9.1 th~ .. 9peX.an t !~ ~poose- The rule '
for-'t'fie ·con· trol of behavior by discriminative stimuli is that an operant will occur at a high frequency
in the presence of the discriminative stim· uli which in the past have accompanied the occurrence of the
operant and have set the occasion for its reinforcement.
To bring an operant under the control of a discriminative stimu lus, it is necessary to reinforce
occurrences of the operant in the presence of the stimulus and not in the absence of the stimulus. This
I procedure was followed, for example, when the dog was trained to run to the kitchen at the sound of
a whistle. The whistle was a dis criminative stimulus in whose presence the operant, running, was
reinforced with food, and the whistle came to control the running. As another example, suppose that
we want the dog to sit on command and that it already sits frequently because sitting has been
previously reinforced with small pieces of sugar. To bring the operant, sitting, under the control of the
discriminative stimulus, "Sit," we give a lump of sugar to the dog whenever we command "Sit" and the
dog does, in fact, sit. At the same time, we do not reinforce sitting unless it is done on command.
Gradually, the dog comes to sit promptly when told to sit and rarely does so otherwise. In operant
conditioning, we say that the (operant) response of sitting has been brought under the control of the
discriminative stimulus, "Sit," by reinforcing the response in the presence of the stimulus.
The relation beween a discriminative stimulus and an operant is fundamentally different from the
relation between an eliciting stimu lus and a respondent. ~.dl £;ti.QJ.jD.ative stimulw;l C'?!1J.r9Is .the
OI;:5i.~~UL e.caus~ Jhe. operant has been reinforced in .its .. presenceJl9t
. 6~£~~ t. the. inherited structure of the orgaIli,sm. There is nothing special about the stimulus, "Sit,"
which destines it for control over the response of sitting. Nor is there anything special about the oper
ant, sitting, which fits it for control by the discriminative stimulus, "Sit." We can just as easily train a dog
to sit when we say "Stand" and to stand when we say "Sit," simply by reinforcing the appropri ate
response when the appropriate command has been given. This is not the case, however, with the fixed
eliciting relationship between I food in the mouth and salivation, for example. The operant relation
I between a discriminative stimulus and an operant response is estab lished and determined only by
whether or not the operant is rein I forced in the presence of the discriminative stimulus. The stimulus /
which precedes the response in the operant case is arbitrary. The I control of behavior by discriminative
stimuli is discussed further in I Chapter 4.
Conditioned Reinforcers
Some stimuli, such as food and water, are able to reinforce behavior without the organism's having any
particular previous experience with them. ..J' ~~l~ ~~~ ~? Rr. ~E~YJ .. '1nqorz a.i !i2 .. !J.~[L !..fj.!:.J.g~Es.
Other stimuli, however, acquire the power to reinforce behavior during the lifetime and through the
experience of the organism. These. stimuli .. a.re cctll~s;l $ec;,pndq.r:y" qr c,QrzdWoned, 1Jlin1JJxce.rs.
Conditioned reinforcers acquire the power to reinforce operants V through a procedure which is similar
to that resulting in respondent conditioning. When a new stimulus is repeatedly presented to an
organism at the same time as or just prior to another stimulus which already has the power to reinforce
behavior, the new stimulus may itself acquire the power to reinforce behavior. If so, it becomes a
conditioned reinforcer, and behavior which precedes it becomes
more probable in the future. Notice that although both discrimina tive stimuli and conditioned
reinforcers share the acquired power to increase the probability of a response, discriminative stimuli
precede or accompany the occurrence of the behavior while conditioned reinforcers follow behavior as
a consequence, just as do primary reinforcers.
One classic example of conditioned reinforcement involves the establishment of poker chips as
reinforcers of a chimpanzee's behav ior. The chimp's behavior may initially be reinforced by grapes,
which it eats. If the chimp is repeatedly given a chance to exchange poker chips for grapes, the poker
chips themselves become reinfor cers. The poker chips can then be used to reinforce the chimp's
behavior. The chimp will even operate a vending machine which dispenses poker chips. Because they
have been exchanged for grapes, the poker chips have become conditioned reinforcers.
Instances of conditioned reinforcement involve orderly se quences of stimuli and responses, which in
operant conditioning are c~ In our example, the chimp operates the vending ma chine, receives a poker
chip, exchanges the poker chip for a grape, and eats the grape. The response, operating the machine, is
made in the pres~nce of the discriminative stimuli afforded by the vending machine and is reinforced
by a conditioned reinforcer, the appear ance of the poker chip. The poker chip is also a discriminative
stimu lus (the second one in the chain), in whose presence the response of exchanging is reinforced by
the appearance of the grape, another conditioned reinforcer. The grape is the third discriminative
stimulus in the chain. In its presence, the response, popping it into the mouth, is reinforced by the
primary reinforcing stimuli afforded by the
Introdu ction to the Experimental Analysis of Behavior 11
eating of the grape. The general formula for chains is that one re sponse leads to a stimulus in whose
presence another response leads to another stimulus. Each stimulus functions both as a conditioned
reinforcer, when it reinforces the response which precedes it, and as a discriminative stimulus, when it
occasions another response in its presence. Chains are thus orderly sequences of stimuli and responses
held together by stimuli which function both as conditioned reinfor cers and as discriminative stimuli.
Conditioned reinforcement and
the nature of chains are discussed further in Chapter 5. So far, we have delineated the broad field of
operant condition ing as an approach to the study of behavior, and we have defined the basic elements
of stimuli and responses and the concepts of condi tioning and reinforcement. Next, in Chapter 2, we
will examine the nature and practice of research in operant conditioning.
two
Research in
operant conditioning
WHAT IS RESEARCH?
Research is the cornerstone of an experimental science. Both the certainty of the conclusions and the
rapidity of the progress of an experimental science depend intimately and ultimately on its re search. As
its root meaning ("to search again") implies, most re search either results in a rediscovery, and hence a
confirmation, of already known facts and principles or represents another painstaking attempt to
answer a formerly unanswered question in an objective and repeatable fashion. But research also
means the search for and the discovery of formerly misunderstood or unconceived principles and facts.
Research is, in practice, a two-pronged fork with one tine in the past and the other in the future. An
experiment attempts to confirm or deny what is already believed to be true and at the same time to go
beyond existing knowledge toward either a more compre hensive body of facts or, if possible, toward a
general principle around which all the known and verifiable facts about a subject may cluster in a
logical, predictable, and sensible whole.
The ultimate goal of research is always a general principle. Rarely, however, does a single experiment
directly establish a general prmciple. _t\ .. ngl~. ~xperiment is com:~rned with the relation. "be
tweeH-,a...specifiC: . ..ittdf /? e.n.denL uariab.le ,...which js-" iulj p.-.u). ~J <i gY.J be
exper.imenlex.,,,.i!.o.d a sp~~ifis depe.!}qe,.nt variable, which cha.~ges as a result of changes in the
independent variable. Each of such rela'tions, established repeatedly in laboratories around the world,
contributes to the formulation of the general principle. For example, several hundred experiments
have shown that variations in a stimulus in whose presence a response has been reinforced (the
independent variable) produce in an organism a decreased tendency to emit the response (the
dependent variable). If responding has been reinforced in the presence of a bright light, the organism
will respond less and less as the light becomes dimmer and dimmer. If responding has been reinforced
in the presence of a green light, the organism will respond less in the presence of either a yellow or blue
light. Many other experiments have had similar results. Together, they all figure in the formulation of
the principle of stimulus generalization (to be discussed in Chapter 4). No single piece of research is
sufficient to formulate a general principle; rather, each experiment contributes, either by repeating and
verifying what is believed or by extending the generality of the principle.
In operant conditioning, research relates changes in the environ ment (the independent variable) to
changes in behavior (the depend ent variable). The experiments of operant conditioning arrange for the
occurrence of specific environmental events and changes in them and for the measurement of behavior
and its changes as a function of the changes in the environment. Each particular relation established
between the environment and an organism's behavior helps form the basis for what operant
experimenters hope will be a general principle concerning the prediction and control- that is to say, the
under standing - of behavior.
Two characteristics distinguish the operant approach to research from other psychological approaches!
In order to be accepted into the group of established facts, a given relation between the environment
and behavior must meet two criteria. ~!l~ .. pri~arily, it mus.t l;>e lIDJ:.Q.!1jYo.cally demonstrated for
every organi!!m in the .experiment.
To require that experimental conditions produce the same effects on each and every subject in the
experiment is a stringent require ment for any science. Many sciences, including most kinds of psy
chological science, are able and willing to settle for average effects. A
relation is considered to be established if the measurements of the dependent variable for one group of
subjects differ on the average from the average of the measurements for another group of subjects
which was treated differently in the experiment. In operant research, b..9wever, effects that are
defined only by the averages of 'groups of organisPl!:? pre not acceptable. The effect of a change in the
independ
enr~ariable of the environment is accepted as valid only if it in fact brings about the same change in the
behavior of every single organ ism subjected to the change. It is not enough that the change in the
environmental conditions bring about an effect on the average; it is absolutely necessary that the
change in the environment change the behavior of every organism in the same way.
Of course, this does not always happen. Organisms differ, and they differ for different reasons. When
the environmental changes arranged by the experiment result in different changes in the behavior of
the individual subjects, the second hallmark of operant research
I - the experimental analysis of behavior- comes into prominence. \ The experimental analysis of
behavior means nothing more than \ what we have emphasized in the first chapter: that research in
'----"" ! operant conditioning strives to find the exact, real, and specifiable
changes in the environment that actually do bring about exact, real, and specifiable changes in the
behavior of organisms. When organ isms differ, experimental analysis attempts to demonstrate exactly
what factors in the history or present environment of each organism
I are responsible for the difference. Experimental analysis is hard, time-consuming work; but it pays off
in knowledge that can be applied with certainty to the prediction and control of the behavior of
individual organisms.
Human beings, especially persons institutionalized for mental illness, have also recently been used as
experimental subjects. Natu rally, their environment cannot be controlled as precisely as that of the
usual experimental animals, but researchers are as rigorous as possi ble. Research with these
institutionalized human beings has been quite successful; many people whose difficulties had not
yielded to traditional methods of treatment have been helped by the techniques of operant
conditioning.
The exact environmental control required by research in operant conditioning has produced a special
technology that is particularly suited to the approach to behavior and the environment which we have
described as characteristic of operant conditioning. The appa
ratus and recording equipment are specially suited to the problems involved in the study of operant
behavior. Because the organisms that are studied differ in their sensory and behavioral propensities,
the details of the apparatus are different for each organism. The basic features of each, however, are
tl)e same.
During each daily experimental session of a few hours, the organism under study in a particular
experiment is housed in an isolated cubicle, c~~2..2 11 !. ~p..~!i :!!; ~!!J3.J l1gmJ?£ r. Isolation is
essential in order to minimize the contribution to the results of the experiment of outside, extraneous
influences. The experimental chamber is usually light-tight and sound-attenuating and generally has a
loud
speaker which presents a flat hissing sound to drown out potentially disturbing noises from the outside.
The chamber is ventilated be cause of the subject's relatively long daily stay in it. Electrical con nections
from the chamber to automatic programing and recording equipment make possible the remote control
and remote recording of the environmental and behavioral events within the chamber. Not even the
experimenter comes into direct contact with the subject during the experiment.
Inside the chamber there is provision for the delivery of a rein forcer, such as food or water. A variety of
other stimuli can also be provided, all under remote control. The stimuli are usually auditory or spatial
for rats and visual for pigeons, monkeys, and men. Finally, each chamber contains one or more devices
which define the operant responses to be studied.
We said in the previous chapter that behavior is divided into units called responses, which are
themselves divided into two types,
operant and respondent. For experimental purposes, we need a specific, empirical definition of an
operant response. How can we recognize a response when one is made? And how can we count the
number of responses that occur per minute?
An operant response is defined in terms of its effect on the en vironment . .1,9SP.!l:a"!tj s __ a S~~s~ .oj
behavjoJ"s.aU of which. change t,he ~I!Yi!:Qnml~UUn..t.p..~ liame way. The response most commonly
used in operant conditioning is the closing of a switch resembling a telegraph key, although any other
objective effect on the environment might be chosen. Each closure of the switch counts as one
occurrence of the response, regardless of the particular behavior that brought it about. The behavior
composing the operant may be any of a large variety: the animal may close the switch by using its foot,
nose, beak, head, or any part of its body in any sort of movement. These variations in the actual
behavior are of no concern in defining the response. Regard less of the particular behavior involved at
anyone time, the effect, closing the switch, is still counted as one response.
~Y. .J l}~s efin~tion, operant responses are rendered recognizable and countable. The response may be
described exactly in terms of tft·e switch - its location in the animal's environment, its physical
characteristics (usually it is some kind of a lever), the amount of force that must be exerted on it in
order to bring about closure, and the distance which it must move in order to close. The actual
occurrence of a response and the number of times the animal responds in a given period of time are
obtained simply by observing the switch and count
ing the number of times it operates. We will be further concerned with the definition of the operant
response and its measurement in Chapter 3.
The specialized apparatus for the study of the pigeon's operant behavior appears in Figure 2.1, along
with its associated experimen tal equipment. The important features of the apparatus can be seen in
the picture. A close-up of the interior of the experimental chamber appears in Figure 2.2. Notice the
walls and cover, which insulate the chamber against light and sound; the loudspeaker in the upper left
corner of the front wall, for presenting auditory stimuli; and the diffuse illumination provided to the
inside of the chamber through the window in the upper right corner of the front wall.
Provision has been made on the front wall of the chamber for the recording of responses and for the
presentation of the reinforcer and various discriminative stimuli. The operant is defined by means of
the round key in the upper center of the wall. The key is a plastic lever
Research in Operant Conditioning 17
Figure 2.1 The operant conditioning apparatus used with the pigeon, including the experimental
chamber and programing and recording eq uipment on the relay rack. Courtesy of th e Grason-Stadler
Company , /n c.
Figure 2.2 A close-up of the interior of the experimental chamber used with the pigeon. Courtesy of the
Grason-Stadler Compa ny, Inc.
mounted flush to the outside of the thin metal wall. The arm of the lever is accessible to the pigeon
throu.gh the circular, three-quarter inch hole in the wall. A force of about fifteen or twenty grams ex
erted by the pigeon on the arm of the lever operates an electrical switch located near the fulcrum of
the lever behind the wall. This signals to the programing and recording equipment that a response has
occurred. The closing of the electrical switch defines the operant studied in this apparatus. Usually, the
pigeon operates the lever and closes the switch by pecking on the key through the hole in the wall; that
is to say, the responses composing this operant are the pigeon's pecks on the key.
The plastic key is translucent. Lights of various colors are ar ranged behind the wall so as to shine
through and illuminate the key. These various colors serve as discriminative stimuli. They are
particularly effective because the bird cannot fail to see them as it pecks on the key. Other stimuli may
be provided when necessary by varying the intensity or the color of the general illumination, by
presenting sounds through the loudspeaker, or by illuminating the key with geometric forms as well as
with colors.
The reinforcer is usually about four seconds of access to mixed in (equal parts of kafir, vetch, and hemp)
in a hopper which appears behind the square hole beneath the key. The grain is not accessible except
during periods of reinforcement. The reinforcer is presented along with distinctive stimuli: the general
illumination and the discriminative stimuli lighting the key are turned off and the grain hopper is lighted
brightly from behind the hole. After four seconds of eating, the lighting returns to normal. This
arrangement makes the reinforcer a clearly noticeable event, the start and finish of which are clearly
defined. For special experiments, the apparatus may be modified to allow for more than one operant or
more than one reinforcer.
In an experimental session, a single pigeon is enclosed in the chamber for a period of several hours,
usually two or three. During the entire session, the programing apparatus makes the changes in the
discriminative stimuli and in the presentations of the reinforcer called for by the passage of time or by
occurrences of the response. The details of the program depend on the particular processes being
studied in the experiment.
Figure 2.3 A close-up of the experimental chamber used with the rat or small squirrel monkey. Courtesy
of the Grason-Stadler Company, Inc.
The operant conditioning apparatus for the rat and monkey is essentially the same as that for the
pigeon. Figure 2.3 shows a close up of the chamber for a rat or a small squirrel monkey. The operant
studied in this sort of apparatus is also defined by the activation of a lever. This lever, however, extends
out into the chamber. (The cham ber in the figure is equipped with two levers.) The lever is usually
activated by depression with the paws, although the class of re sponses composing the operant may
include any behavior that operates the lever. Although all kinds of discriminative stimuli may be
effectively used with monkeys, those used with the rat differ in intensity and spatial location rather than
in color. The reinforcer is usually in the form of pellets of dry £,og4-<)r. a liquid (either water or a
sweetened Jiqw(i- aleCslmilarto"Metl;:ecal). The" apparatus in the figure is designed to dispense pellets
into the hole in the lower center of the front wall. If a liquid reinforcer is used, it is made available for
short periods of time in an automatic dipper reached through a hole in the floor. Distinctive stimuli
accompany either the delivery of the pellets or the periods of access to the liquid reinforcer in the
dipper.
PROGRAMING EQUIPMENT
appreciably with the species of the experimental animal. lJ:!gP[Q gramingequipment consists of
electrically operated switches,...timexs, @!!!\.t~!:§ d\D!:L9Jtu :r:" deyices. They are wired into circuits
to determine the sequence of environmental events within the chamber and to bring the events into
specified relationships with occurrences of the response.
N~""-' There are many reasons why automatic r~ectrical equipment is es x sential in experiments in
behavioral researcl\? First, the programs and the alternatives within the experiments are often too
complex for a person to handle efficiently, if at all. Also, the two- or three-hour length of the sessions
would put severe burdens on a person's effi ciency. The automatic equipment easily, reliably, and
objectively handles complex decisions throughout the entire experimental session.
: f) I Another reason for the use of automatic equipment is the speed required. As we shall see
throughout this book, the effect of the environment on behavior depends critically on the timing of envi
Figure 2.4 The cumulative recorder. Courtesy of the Ralph Ge rbrands Conr parlY , I/'Ic.
ron mental events in relation to behavior. The human reaction time, at its best about one fifth of a
second, is simply too long and too variable for the purpose. The automatic equipment operates effi
ciently and essentially invariably in less than one tenth the time. ~) Another advantage of automatic
equipment is the freedom it affords the researcher. Instead of tediously watching the organism hour
after hour, day after day, the researcher is freed by automation for more fruitful use of his time.
Freedom from human intervention during the experimental session also means freedom from bias. The
experimenter does not have to guess whether or not the lever was pressed each time the rat's paws
touch it. The equipment decides, always according to the same criterion.
v Not the least advantage of automatic equipment is that it allows for the exact repetition of the
experiment in another laboratory next door or halfway around the world. All one experimenter needs
to do in order to repeat another's experiment is obtain an exact description of the chamber and the
details of the programing and reproduce them faithfully. This possibility of unambiguous replication of
ex
periments has done more than any other single factor to encourage the growth of operant conditioning
as a science of behavior.
The most common recording device in operant conditioning is the cumulative recorder. This machine
provides a graph of the cumu lated (total) number of responses as a function of time. Such a re corder is
shown in Figure 2.4 and is schematized in Figure 2.5.
During an experimental session, a motor feeds the paper out at a constant speed. Each operation of the
key or lever moves the pen up one step. Thus, time is measured along the length of the paper (the
abscissa), while responses are counted across its width (the ordinate). A continuous record of the
behavior for the entire session appears in the resulting graph. When the pen reaches the top of the
paper, usually after one thousand responses, it resets to the bottom and begins to trace another record
beginning with the next response.
Other events within the chamber can also be indicated on the record. The occurrence of the reinforcer
is traditionally indicated by a
Figure 2.5 A schematic drawing of the cumulative recorder. The paper unrolls under the two pens as
time passes. Each occurrence of the response moves the response marking pen up one unit toward the
top of the paper. Reinforcement is indicated by the hatch-marks on the cumulative record. Additional
events during an experimental session can be indicated along the horizontal line at the bottom (or top)
of the record by means of the event-marking pen.
Event-marking pen
Responses
temporary displacement of the pen downward, making a mark on the record. Additional events may be
indicated by stationary pens at the top or bottom of the record.
Cumulative records are especially useful in studying the rate of occurrence of the response, because
this rate and its changes over time can be easily read from the slope of the cumulative record. Since the
paper moves at a constant speed, responding at a high rate pro
duces a steep graph, infrequent responding produces a flat, nearly horizontal graph, and all
intermediate rates of responding produce graphs of intermediate slope. Changes in the rate or
probability of responding over time, as a function of the experimental manipula
tions, are reflected in changes in the slope of the record. The cumulative recorder is supplemented in
most experiments by a variety of other recording devices, such as electrical counters which record only
the total number of responses in a given period of time and timers which may record the time between
responses, between responses and stimuli, or between other successive events.
three
Acquisition in the case of respondents is a simple matter, be cause both the initial occurrence of
respondents and their rate of pccurrence depend almost completely on the presentation of the
~liciting stimuli. There are other variables of importance in most specific instances, but they can, for the
present, be considered negli gible. Therefore, in order to bring forth a respondent never before
performed, it is only necessary to present an effective eliciting stimu lus to the organism. In order to
enhance selectively the rate at which
an organism engages in a particular bit of respondent behavior, it is only necessary to present its
eliciting stimulus at a higher rate. A respondent changes little if at all during the lifetime of the
organism. However, as was mentioned in Chapter 1, a new and previously ineffective stimulus may
come to elicit the respondent. This phenomenon will be discussed in greater detail in Chapter 8.
\ ~~ Operants, on the other hand, have no eliciting stimuli. There is \ \p no stimulus, for example, which
will elicit the word operant from all
(\, children or a lever-press from all rats. The creation of new operants and the selective enhancement
of the frequency of existing operants are brought about not by any eliciting stimuli which precede the
behavior but by the reinforcing stimuli which follow the behavior.
! Reinforcers, as we have seen in Chapter 1, are simply those stimuli that result in an increase in the
frequency of the behavior which they follow.
ted by a speaker in making a point, the rate at which this gesture occurs will increase. However, before
the rate of an operant response can be increased, it is always necessary to wait for an occurrence of
the response, because there is no eliciting stimulus which produces it.
Since we must wait for the occurrence of a response before reinforcing it, it may at first seem
impossible to create new operant behavior. However, new operant behavior can be created by a
process called ~J:!l?J.tzg, \yhich uses a c<?m~i~ation of reinforcement ~g. .. }}.QIl,re.infor<:ement to
change existing ~imple responSeS.inuLuew and more complex responses. In order to understand how
shaping is done and how it works, we must first consider some of the effects of reinforcement and
nonreinforcement on behavior.
Positive reinforcement of a response results not only in a sub stantial increase in the frequency of that
particular response but also
in an increase in the frequency of many other bits of the organism's behavior. The extent of the
increase in each case depends on a vari ety of factors, some of which will be discussed in Chapter 4. The
frequency of some of the behaviors that are not directly reinforced increases substantially, while the
increases in the frequency of other
behaviors are so small that they are virtually nonexistent. T..!;~~!:~ ?f p.<? ~~tLve r~inforcement,
then,i,s to raise. the orgalJ:- ism's general level of activity. If we reinforce one response ofa young dliIa,
tfle"child will not only repeat that response but will also emit a flurry of other, varied responses.
Positive reinforcement results in an active organism. This property of positive reinforcement plays an
important part in shaping. It also makes it extremely difficult to reinforce inactivity.
Reinforcement affects not only the frequency but also the topog raphy of responses.
T..QJLQgwphyrefers to the physical natur~ of the res.E~~t; ~~~i9.J ~Q!.npose the .operant.· Thus,
reinforcement modifies the exact form, force, and duration of successive responses, even though each
reinforced response counts as an equivalent instance of the operant regardless of its particular form,
force, and duration. For example, in the operant of lever pressing, the response that depresses the
lever may involve the left or right paw, a forceful or weak depres sion, or a short or long depression.
Whenever one topographical varia tion is consistently reinforced, either by chance or because of the
structure of the organism or the apparatus, that topography comes to predominate. Thus, if the
organism happ~ns to emit several short and forceful depressions in succession and each is reinforced,
the class of jresponses composing the operant will come to contain predomi nantly short and forceful
depressions of the lever. Reinforcement, ,' therefore, not only increases the frequency of occurrence of
the
! operant, pressing the lever, but it also changes the topography of the responses involved in pressing
the lever.
In the preceding example, short, forceful lever-presses just happened to occur and were reinforced. We
may, however, deliber ately arrange the experimental apparatus so that only forceful, short presses are
reinforced. In this case, we have changed the definition of the operant. Formerly, just about any
movement would depress the bar and was thus an instance of the operant. Now, only short, force ful
presses will move the bar enough to be considered instances of the operant. Whether selective
reinforcement is fortuitous or system atic, the result is the same: short, forceful lever-presses come to
predominate. In the first case, we say that the topography of re sponses has changed, since responses
with other topographies will
be reinforced if they occur. In the second case, a change has been made in the definition of the operant,
because behavior with other topographies will not be reinforced. Although the result is the same, the
distinction is important in analyzing the environmental causes of changes in the topography of
responses.
Extinction refers to a procedure in which an operant that has r reviously been reinforced is no longer
reinforced. The primary ~ffect of extinction is a gradual decrease in the frequency of the operant.
However, the behavior does not simply drop out or fade away. In fact, when reinforcement is first
discontinued, the fre
quency of responding may temporarily increase before beginning its decline. Extinction also produces
changes in the topography of the responses: at the start of extinction, the form of the behavior
becomes more variable and its force increases.
Consider the familiar response of opening a door. This response usually consists of a rotating movement
of the hand on the knob followed by a push, and it is usually reinforced by the opening of the door.
Now suppose that no matter how many times the door is tried, it does not open. This constitutes
extinctiQnJ?~~ iilJ~t JI:)J~ response is no longer teinforC""ed ~ Eventually, the frequency with which the
subject tries to open the door will decrease, probably to zero, since we rarely try to open doors that are
always locked. First, however, the force of the response will increase-perhaps the knob will be turned
violently-and the form of the behavior will change-the other hand may be used and the door may even
be kicked. Eventu
ally, if the door still fails to open, the frequency of attempts to open it will decrease, along with the
force and the variability of the behavior.
~ ~~ke sure that the reinforcer to be used will be effective. This is accomplished by depriving the
organism of the reinforcer for some r--') time before shaping begins. Next, we must analyze the exact
behav ior to be produced: Precisely what sequence of responses is required? Once we have decided on
the final behavior, we are in a position to reinforce closer and closer approximations to it.
The general procedure used in shaping begins by raising the deprived organism's general level of
activity. This may be done by
reinforcing any of its responses; however, in order to shorten the shaping procedure, a response
somewhat similar to the desired response is chosen for reinforcement. Reinforcement is then with
drawn, and, as discussed above, the variability and force of the be havior increase. Before the frequency
of the behavior decreases, a response closer to the desired behavior is selected for reinforcement from
the more forceful and variable behavior initially produced by extinction. This s~le ~t~v~ reinforcement
increases the freque.ncy . Pi. ~l1e variation_.that is r~inforced. After this behavior has been firmly
established and is occurring frequently, reinforcement is again I discontinued, variation again increases
for a short time, and a re- I sponse still closer to the desired one is selected from the variation
\ and is reinforced.
l;i~._ PFg~e~s is _ called shaping because we ,actually shape a par- I ticular response from the available
behavior of the organism in much ~ the same way that a sculptor shapes a statue from the clay he has
to i work with. Thus, we might begin by reinforcing any movement 1 which the organism makes. Then,
we may reinforce only walking, 1 then only walking in one direction, and so forth. By continually
( narrowing our definition of the response required for reinforcement, , we increasingly define and
shape the organism's behavior.
Now that we have seen the basic principles and procedures used in shaping, let us use them to shape
the behavior of a pigeon so that the bird will press down a pedal protruding an inch or so from the side
of its cage at a height of about two inches above the floor. This is a response which the pigeon will very
rarely make under ordinary circumstances.
Our first task is to arrange for the immediate reinforcement of responses. J2cia.y.e.cLreinfo.IC.emenLis
_,not. as ... e£fecti.'l.~ilS, i!Tl~Qi~te .....teinfoxcemen.t, partially because it allows the organism to emit
addi tional behavior between the response we wish to reinforce and the actual occurrence of the
reinforcer. Thus, the intervening behavior is also reinforced, with the result that what is reinforced is
the response followed by some other behavior rather than just the response alone. If we wish to
reinforce the response of lifting the hand off the table, for example, it is not efficient to present the
reinforcer when the hand is already being replaced on the table. We need a reinforcer that can be
presented immediately after the lifting of the hand. Only then is the lifting alone reinforced.
T~I2.@.£.ti..@L~luti.Q!!, !Q, D:~_~P- LQblem. o. prQyjJ!.tn.g,. ~~~ec!~~e stimulus
reinforcement is __ __
a __
condf
tioned reinforcer. Auditory and visual stimuli can be presented immediately following any response we
select, while food, for exam ple, cannot follow the response immediately because the organism must
emit additional responses to approach and ingest the food. In order to establish a discriminative
stimulus as a conditioned reinforc er, we reinforce a response with food in the presence of a stimulus.
For a pigeon, grain is a good reinforcer (provided that the bird has been deprived of grain), and the
sound of the grain-delivery mech anism and a decrease of illumination in the bird's chamber have
proved to be effective discriminative stimuli. Thus, we reinforce the behavior of walking to the grain
and eating only in the presence of the sound of the operation of the grain magazine and decreased
illumination. The grain is withdrawn after the pigeon has eaten a few pieces, and it is then presented
again and again along with the stimuli. Several presentations of the stimuli and several reinforce ments
of the response in their presence may be needed to establish the control of the stimuli over the
response. Whm-tbe...b.ir-d- goes ID!!le.dia !::IY ~Il£Lr.eliably, .to .. .the grainea€h ··time' rne "ffOtse.-
an.d- tfle
cre.a~ed .H.lumination occur and does not go there whenJh~.y. dQ. . .uot occur, the isc~ ~i~ative
stimufi. "are"'IncontroLoi"fhe 'b ehaviOLllld !l~Yb-;~ ;~ i§.:~~· Q.;;~j ti9.;$I ~j~~Zi~ :! ? p.Lrrg t!'-
~.Lbeh.~ipr. -~e that this procedure has established a chain of stimuli and responses, as was discussed
at the end of Chapter 1. A response during shaping can now be reinforced by the conditioned reinforcer
of the noise of the magazine and the decrease in illumination. These t' Primary positive reinforcers, such
as the grain, are usually effec tive only if the organism has been deprived of them in the recent
stimuli are, in turn, discriminative stimuli, in whose presence the behavior of approaching and taking
the grain will be reinforced by the ingestion of the grain.
past. (Deprivation as a motivational consideration will be discussed in Chapter 10.) In practice, grain is
an extremely effective reinforcer for the pigeon if the pigeon's weight is kept at about ~ty !..E!:!~t of
the weight it attains when allowed to feed freeJy. The value of eighty- per cenTnonii1iymak:es-gra"(i;-
aneffeaJ.'V"'e -reinforcer, but it also keeps the bird active and alert.
Now that we have made sure that we have an effective reinforcer for our experiment in shaping, we
must analyze the specific behavior to be shaped. In this case, we want the pigeon to walk to the pedal,
place a foot on it, and depress it. Now we can begin to shape the pigeon's behavior by reinforcing with
the conditioned reinforcer the first part of the response, walking. Or better, if the bird initially walks a
great deal, we reinforce the more specific response of walk
ing toward the pedal. Po ~tLy..erein£orcement, .. the Jood which .the-bird 30 A Primer of Operant
Conditioning
qQt ~ns jQ the p-I:esence of the conditioned reinforcer, ~jll produc:e,an ncre~ e in the bird's gene ;~I ;~ l
r~~t i"ty: Aft~; a few prese~ta:
tions f food, 'the ' bird will be' active, a:n'd we will have no difficulty selecting for reinforcement any
activity that brings it closer to the pedaL A few reinforcements of walking toward the pedal will result
in the bird's walking directly toward the pedal after it has eaten.
The next step is to reinforce the lifting of one foot when the bird is in front of the pedaL Since the bird
walks to the pedal, this is not difficult, but it requires careful observation. It is necessary to rein force
immediately the lifting of the leg and not its replacement on the floor. Now, when the bird is in front of
the pedal, we selectively reinforce those lifts of the leg which include the leg's movement toward the
pedal and are high enough to place the foot above it. This will eventually bring the foot onto the pedal.
Finally, we rein force only depressions of the pedal, and the desired response has been shaped.
The careful and systematic application of the shaping procedure with an effective reinforcer is sufficient
to teach any organism any operant behavior of which it is physically capable. For example, pigeons
have been shaped to play ping-pong, rats to lift weights well ! over their own weight, and children to
type acceptably at the age of
two or three years. The potential of shaping for the behavioral ca pacity of both animals and men has
hardly begun to be explored or exploited.
DEPENDENCIES, CONTINGENCIES,
Sometimes changes in behavior are brought about by deliberate and systematic manipulations of the
environment, and sometimes they happen by chance. We noted this difference above in discussing the
effects of reinforcement on topography. In the shaping procedure, we have seen how deliberate
selective reinforcement changes old behavior into new; now we shall examine a process in which
changes in behavior are largely a matter of chance.
Environmental events may have either contingent or dependent relations with behavior. ~!!, ~nviron!
llental even is .said.tQ be,.de pendwL pn !t ~,,: ior if !t ~ event must,hy. .. th I} !!:l~ ~ Qf Jh .t.ll ~tio l Q££!
1Lfollowing ··the behavior ... ~E~~t:l..t<!L~.Yfn l' ai Q. .. !.QJ?.E: contillgJ2.l.LQI!..Q h?yi9 r..i f the ~~!,l..!.
<!2~iJ;Lt,!S;U9n.Q'X. th beha:vior. b,uLnee-d- nG-t- do--se , For example, electrical circuitry determines
that the lights must go out in a room when the switch is thrown. Thus, the relation between the
behavior of turning the switch and the
f( consequential darkness is dependent. The relation between turning the switch and other ensuing
events, such as perhaps the bark of a dog in the next house, is likely to be contingent. There is no nec
essary connection between the thrown switch and the bark, but throwing the switch may occasionally
be followed by a bark. Some contingencies are more reliable than others.
The distinction between contingencies and dependencies will prove extremely useful in the analysis of
behavior as a whole and particularly in the analysis of the control of behavior by occasional
reinforcement. The reader should note, however, that the word contingency and the phrase
contingencies of reinforcement are very frequently used in the current literature to refer to all
relationships that are involved in the reinforcement of behavior, whether they be contingencies or
dependencies. Nevertheless, as we shall see, the distinction we have made between them is real and
important.
~p' !~! itio~~ havior results from the chance reinforcement of behavior, a true contingency. Suppose
that a reinforcer is presented every fifteen seconds, no matter what the organism happens to be doing.
Each presentation reinforces the behavior that occurs imme
diately before it, even though the behavior has nothing to do with the reinforcement. The frequency of
the reinforced behavior in creases, thus increasing the likelihood that it will be repeated just before the
next occurrence of the reinforcer. This process of rein forcement of the behavior that happens to occur
is repeated every fifteen seconds, and gradually, quite elaborate sequences of behav iors may be
acquired. These sequen ~s are called superstitions b~ cause they have absolut ly nothing to do 'with the
occurrence of the reinforcer. For example, rain dances do not cause rain; but they persist because they
are occasionally reinforced by a downpour.
We have seen that extinction involves .the nonreinforcement~ p.r.. xiQQ,sly ;~i~iC;r~ed resp ~se :
·which . soon.er .. Q!: .. ~ter . results in reducing the frequency of re.sponding to. a very low level.
Usually, extinction virtually eliminates responding or brings it back to (or near) its level before
reinforcement. We have also seen that extinction does not usually produce an immediate decrease in
the frequency of the response. Rather, there is often a brief increase in responding
I immediately following the onset of extinction. The topography of the . response also changes at the
start extinction, and the response be comes more forceful and variable for a short time.
The course of extinction varies a great deal, depending on the 32 A Primer of Operant Conditioning
Reinforcement Extinction
40 20
400
~ Responses/min.
responses 10
20 minutes
Figure 3.1 A representative cumulative record of responding during extinction after reinforcement.
organism's experiences prior to and during extinction. The progress of extinction is best followed on a
cumulative record, such as the one shown in Figure 3.1. The record shows a short, temporary increase
in rate followed by a gradual decline which ends in a very low rate of responding (the record runs
almost parallel to the abscissa).
The course of extinction can be summarized in terms of three of its parameters: the rate of decline in
response frequency, the total number of responses emitted before responding either ceases or reaches
some final, low level, and this final level below which the frequency does not sink over a reasonably
long time. These parame
ters of extinction are influenced by a large number of variables, some of which act before extinction
begins and some of which act while extinction is in progress. Together, these parameters determine
what i§. calkUr.esistanL:e.Jo extinction.:::: a rough estimate of the persistence of the tendency to emit
the response after the response is no longer reinforced.
The most important variable affecting the course of extinction is the schedule of reinforcement on
which the operant was previously maintained. A schedule of reinforcement is a .rule that tells whic;h
pccurrences (;fa:-pa ti<; u.l~t resp~nse . will ~ !~iI).forced. So far, we have dealt only with the extinction
of an operant maintained by continuous reinforcement - that is, by reinforcement of every occur
rence of the operant. However, reinforcement does not have to be continuous to be effective; in fact,
intermittent reinforcement on a schedule is probably more common than continuous reinforcement.
Such a schedule of intermittent reinforcement might, for example, prescribe that only every other
occurrence of the response will be reinforced. As we will see in Chapters 6 and 7, there is an almost
unlimited number of possible schedules of reinforcement.
The magnitude of the reinforcer and the number of reinforce I ments received prior to extinction also
affect the course of extinction. I Generally the greater the magnitude of the reinforcer or the number l
of reinforced responses, the greater the resistance to extinction in : terms of both the time and humber
of responses required to reach a I low, terminal rate of responding. The effects of these variables,
however, are heavily modulated by the effects of the schedule of reinforcement.
Another variable influencing extinction is the number of pre I vious extinctions experienced by the
organism. The larger the num ber of previous extinctions, the more rapidly will extinction proceed.
Organisms whose responses have been reinforced and then extin guished a large number of times
generally exhibit very rapid extinc tion once reinforcement ceases. Apparently, through association with
nonreinforcement, the occurrence of a number of unreinforced re sponses (or of a period of time
without reinforced responses) be comes a discriminative stimulus associated with. nonreinforcement
and, hence, itself occasions a low rate of responding.
! The magnitude of the organism's motivation during extinction I can also affect the progress of
extinction. Extinction is generally I slower when it is carried out under more intense deprivation than
prevailed during reinforcement. The effects of motivation and of the number of previous extinctions are
small, however, when compared to the effects of the schedule of reinforcement.
Spontaneous Recovery
Once the decrease in responding begins, extinction usually pro ceeds continuously during anyone
experimental session. At the start of each session, however, the rate of responding is often higher than
the rate which prevailed at the end of the previous session. More over, the longer the time between
successive sessions of extinction, the larger the difference between the rates at the end of one session
and at the beginning of the next session. This phenomenon is called )2..a.ni.Iw~!-,!c.g"p eruJJ.~cause .!
he rate of responding seem~ to _return
2!l!.e~~o.lls!y to a higher leve1 during the time betweenexperimen tal sessi ns.
Spontaneous recovery represents responding in the presence gf one set-clstimuli - those- associated
with the begi.nn~ng of th~ ses sio-ri. -:-;:. in~h;;- Im~e~c~ the responses were previously reinforced.
T;·-; rder for extinction to be complete, it is necessary to extinguish the responses in the presence of
each of the discriminative stimuli that have come to control the responses during the previous period of
reinforcement. The stimulus conditions prevailing during the experi
mental session compose only one set of these discriminative stimuli. Another set is composed of the
stimuli prevailing at the start of the session - for example, the recent entry of the animal into the cham
ber. Spontaneous recovery seems to reflect the control of responding by the latter stimuli.
four
Stimulus control
of operant behavior
Each reinforcement not only increases the probability of reoc currence of the operant that it follows but
also contributes to bring ing the operant under the control of the stimuli that are present when the
operant is reinforced. After the responses composing the operant have been reinforced in the presence
of a particular stimulus a num ber of times, that stimulus comes to control the operant, i.e., the
frequency of those responses is high in the presence of the stimulus and lower in its absence.
DISCRIMINATIVE STIMULI
In Chapter 1, we said that these controlling stimuli are called discriminative stimuli. A discriminative
stimulus is a stimulus in !V..hg§J;: -l>.r~'§~r:!£~ £~~!~~G.-;b.i t ~t ;;p~ ra,nt behavior is highly prob.a ble
becaus~ the .. pehavior.....has. p.reviously b.een reinforced in .. tbe presence of tha stimuIJl . We have
also said that although discrimi native stimuli precede responses, they do not elicit responses . .,Rillher,
disCJi~tD:!'!.tiv.: stimuliare said to oCl;asion operant responses. A dis criminative stimulus sets the
occasion on which the operant has pre viously been reinforced.
Because a response under the control of a discriminative stimu lus is more frequent in the presence of
that stimulus, the frequency of the response may be controlled by controlling the stimulus: it may be
increased by presenting the stimulus or decreased by withholding
the stimulus. However, the relationship between the controlling stim ulus and the response is always
probabilistic, since controlling stimuli only increase or decrease the chances that a response will
'occur. The presentation of the controlling stimulus never guarantees that the response will follow, as
does the presentation of an eliciting stimulus. However, under appropriate circumstances, the chances
are so high that we can be virtually certain that the response will Occur when the discriminative
stimulus is presented. In that case, even though the stimulus is in fact discriminative, it may seem to
elicit the response.
In the last chapter, we saw that the effect of reinforcement on the probability of reoccurrence of the
response is essentially instantane ous. The controlling power of a discriminative .. stimulus, however,
~!Qps ' iradually: ~ver<;ll reinforced responses in the presen ~ ,pf ~;rs th respo se . . -,," - --
the stimulus are always required before the stimulus effectively con- ....---~ .......... ,"'-
STIMULUS GENERALIZATION
Stimulus control is not an entirely selective process. Reinforce ment of responses in the presence of OIle
stimulus increases the tendency to respond not only in the presence of that stimulus but also in the
presence of other stimuli, though to a lesser degree. When
\ this occurs, ~!! orgal1is;m is said .to generalize among sttmuli. Gener- ! alization is defined functionally:
An organism or its behavior is said . to generalize to all those stimuli in whose presence the rate of re I
sponding increases after the response has been reinforced in the : presence of one other stimulus.
, Examples of stimulus generalization are plentiful. When a child is reinforced for calling his father
"dada," he initially calls other people" dada" as well, though less readily. When a dog is trained to sit at
the command " Sit," it initially tends to sit at any forceful, monosyllabic exclamation. When a pigeon's
pecks on a red key are reinforced, it initially pecks, though less frequently, on keys of other colors. In
each instance, reinforcement of a response in the presence of one stimulus increases the probability of
responding not only in the presence of that stimulus but also in the presence of other stimuli.
Directions of Generalization
" The stimuli to which generalization occurs can be determined only by empirical methods of
experiment and observation. Fortu
nately, there are some general ground rules that are more or less dependable in predicting the
directions generalization will take. one~ rule is that generalization occurs to stimuli that are composed
of the same physical parameters and differ only in the value of the param eters. For example,
generalization readily occurs to visual stimuli that differ in color and brightness. Thus, if a pigeon's pecks
are reinforced in the presence of a bright red light, the pigeon will be much more likely to exhibit an
increased rate of pecking in the presence of a dim green light than in the presence of an auditory
stimulus.
Another rule applies to complex stimuli composed of separable Iparts. Generalization can be expected
to occur to stimuli which have perceptible aspects in common with the stimulus that originally set the
occasion for reinforcement. For example, if a pigeon's pecks on a triangle have been reinforced, the
pigeon will be more likely to peck at stimuli with straight edges or sharp corners than to peck at circles
or ovals, because stimuli with edges and sharp corners have those elements in common with the
triangle.
Outside of the experimental laboratory, generalization is not often so easily analyzed as these two rules
imply. The major difficulty is that it is not always clear simply from observation which stimulus controls
the behavior. There is no replacement for experiment when the effective controlling stimulus is in
doubt. A stimulus can be identified as being in control of behavior only when it can be dem
onstrated that the probability or frequency of occurrence of the behavior is different in the presence of
the stimulus than in the absence of the stimulus.
When two objects have elements and dimensions in common, we are likely to say that they are similar.
However, it is fundamen tally incorrect and misleading to believe that the similarity we ob serve
between stimuli is an explanation of generalization. It is not correct to say that an organism generalizes
between stimuli because they appear similar to us and presumably to it. Rather, our labeling of stimuli
as "similar" is a demonstration of our own tendency to generalize between or among them: it shows
that we have responded to them in somewhat the same manner. We simply share this tend ency with
the organism whose behavior we observe.
Stimulus generalization must be distinguished from a simple increase in the overall tendency to emit
the reinforced response regardless of the stimulus. Generalization is an increase in the fre quency of
responding which is dep endent on the stimulus. Therefore, in order to attribute the occurrence of a
response in the presence of a
38 A Primer of Operant Conditioning
stimulus to generalization, we must show that the increased fre quency does not occur in the absence of
the stimulus. For example, in order to attribute a child's calling another person "dada" to general
ization from .the reinforcement of the response "dada" in the pres ence of the father, we must be sure
that the emission of the response depends· on the appearance of another person and that it does not
simply reflect the fact that reinforcement has caused the child to re peat the word frequently .
Response Generalization
Reinforcement of an operant results not only in an increase in the frequency of the responses
composing that operant but also in an increase in the frequency of similar responses. Thus, after
reinforce ment of the response "dada," a child will be more likely to say "baba" and "gaga," as well as
"dada." This phenomenon is called response generalization . Continued reinforcement of "dada" and
lack of ;in'{o cem nt of other responses results in a preponderance of "dada" in the child's vocabulary.
As we have seen in Chapter 3, this process is fundamental in the shaping of behavior.
The amount of stimulus generalization is expressed by the relationship between the rates of responding
prevailing in the pres ence of each of a group of stimuli before and after reinforcement in the presence
of one of them. For example, the following experiment describes the measurement of a pigeon's
generalization to stimuli of different colors. The response to be measured is the pigeon's peck ing on a
key, and the discriminative stimuli are the illumination of the experimental chamber by red, orange,
yellow, green, and blue lights.
First, before we reinforce any of the pigeon's responses, we measure the rate of responding in the
presence of each stimulus by presenting the stimuli to the pigeon individually and counting the
responses made in the presence of each. Responding before rein
forcement is infrequent, as is shown by the typical rates in Figure 4.1 (the circles). The number of pecks
per minute in the presence of each color ranges from zero to five .
Since these rates of responding are typically very low, it has become customary to dispense with this
part of the measurement of generalization. Its purpose is to be sure that there is no pronounced
tendency to respond to one stimulus more than to another at the beginning of the experiment, since
such a tendency could confound
40
J,J..l 30 !:;
Z
~
J,J..l
~ 20 <Jl
J,J..l
<Jl
?2 <Jl
J,J..l
~ 10
After reinforcement
during yellow
Before reinforcement
~~ __ ~ ____ ~---o /
OL-~'----'t-----'t----~t----~t'-------
STIMULUS
the results of the measurements. It is wise, therefore, to make these initial measurements, but
traditional not to bother.
The next step in the measurement of generalization is reinforce ment of the response, pecking, in the
presence of one stimulus only-in this case, yellow illumination-for several hour-long daily sessions.
Then, reinforcement is discontinued, and during extinction, the number of responses in the presence of
each stimulus is meas ured. * In order to make the measurements more reliable, the rate is measured
several times in the presence of each stimulus. While the rate of responding is gradually declining
because of extinction, each color is presented for about thirty seconds until all five colors have been
presented. Then they are presented several more times, each time in a different order and with no
color being repeated until all the other colors have been presented. Usually five or six presenta tions of
each color are sufficient to ensure reliable measurements.
The triangles on the graph in Figure 4.1 represent the total rate of responding computed for all the
presentations of each color. The differences in the rates of responding to the five stimuli are reliable
since we can expect that any differences in the rates caused by the
'Responding in this experiment is reinforced on a schedule of intermittent reinforcement known as the
variable-interval. We will be discussing this schedule in Chapter 6. It is used in this experi ment because
the extinction performance which follows its termination is very smooth and orderly.
progressive decline of responding in extinction have been averaged out by the different orders of
presentation of the stimuli. However, because the measurements were made during extinction while
the rates were continously decreasing, the rates cannot be taken to mean anything except the relative
degree of control which the various colors exert over responding. They have no significance as absolute
rates of responding; and it is not meaningful to compare various organisms with respect to these rates
because their responding dur
ing extinction may decline at different speeds. Birds may, of course, be compared with respect to the
relative difference between the rates prevailing in two colors, since organisms may differ in the breadth
of their generalization. This difference can be measured by comparing the slopes of the curves in graphs
like the one shown in Figure 4.1.
As Figure 4.1 illustrates, the effect of generalization is to in crease the tendency to respond in the
presence of each of the stim uli. The amount by which the rate prevailing before reinforcement in the
presence of each stimulus has increased after reinforcement expresses the amount of generalization
from the yellow illumination to the other stimuli. All the stimuli now control more responding because
of the previous reinforcement of responding in the presence of yellow.
Notice that the amount of generalization becomes less and less the larger the difference between the
yellow stimulus and the stimu lus in question (as measured by the wavelength of each stimulus). This
orderly decrease in responding as the value of the physical prop erty being varied moves farther away
from the value in whose pres ence responding was reinforced is called' the gradient of generalization.
An organism is said to discriminate between two stimuli when it behaves differently in the presence of
each. If an organism responds identically in the presence of each of two or more stimuli, it does not
discriminate between them.
The generalization gradient reveals discrimination, therefore, insofar as the organism responds at a
different rate in the presence of each stimulus. The discrimination is by no means perfect: the pigeon
does not respond only in the presence of one stimulus and not at all in the presence of another. Rather,
the tendency to respond is differ
ent in the presence of each. Sometimes the difference is large, as between the rates prevailing with red
light and with yellow light in the above experiment. Sometimes it is very small or negligible, as it
would be between the rates prevailing in the presence of two yellow lights of only slightly different
brightnesses. The difference between the tendencies to respond in the presence of two stimuli is a
measure of the degree of the organism's discrimination between the two stimuli.
..... <,,'. reinforced in the presence of another stimulus, the difference be tween the rates of responding
in the presence of each of the two stimuli increases; the rate of reinforced responding in the presence
of
\;.:"> one stimulus remains high or increases, and the rate of unreinforced ~, responding in the
presence of the other stimulus decreases. This ) -.: ~ .. process is called thejgJ2!
1JJJiatwj.a-.discr.imiIJJJ.@,/}. The result of contin-
""-' \' ueddifferential reinforcement is to build up a high probability of , responding in the presence of
one stimulus and a very low proba- ..: bility of responding in the presence of the other stimulus. 'l' It
may seem strange to call this process the formation of a dis crimination when there is usually some
difference between re- . sponding in the presence of the two stimuli from the beginning. Nevertheless,
the term is current, and it covers those few cases in which there is no difference at all between the
behaviors in the presence of each of the two stimuli before the beginning of differen tial reinforcement.
The formation of a discrimination is studied over time. The results of a typical study appear in Figure
4.2. Here the pigeon's pecking is reinforced in the presence of a red key and not reinforced in the
presence of a green key. The graph shows the rate of responding in the presence of each stimulus as a
function of successive sessions of differential reinforcement. Notice that the difference between the
rates prevailing in the presence of the two stimuli is small at first and gradually becomes greater as the
number of presentations of the stimuli increases. The small initial difference is due to substantial
generalization between the two stimuli. The difference becomes greater as the rate of unreinforced
responding declines and the rate of reinforced responding increases.
The extent of the generalization between two stimuli influences the formation of a discrimination
between them in several ways. In
80
0::
U-I
c..
<fl
U-I
<fl
Z 0 c.. <fl
U-I
0:: 20
Responding in extinction
1234567 8 9
DAILY SESSIONS
order to understand these, it is best to review a rather long but basically simple experiment. Suppose
that we can make a pigeon's key anyone of three colors-red, orange, or yellow-and that these colors
are presented in succession a number of times during each daily session. At first (Part 1), for a week of
daily sessions, we rein
force responding in the presence of all three stimuli. Then (Part 2), for another week of sessions, we
reinforce responding only when the key is red and never when the key is either orange or yellow. Next
(Part 3), for a week, we extinguish responding in the presence of all three colors. Finally (Part 4), we
again reinforce responding only when the key is red.
The results of this experiment appear in Figure 4.3 in the form of a graph showing the rate of
responding in the presence of each of the three stimuli during each session under each of the four
procedures described above. These results reveal four effects of generalization on the process of
formation of a discrimination:
1. When responding is reinforced only in the presence of the red ,l5.e,y, the responding decreases more
rapidly in the presence orthe' yellow key than in the presence of the orange key, even though
responding is extinguished in the presence of each. Provided that responding continues to be reinforced
in the presence of one stimu
lus, generalization from that stimulus helps determine how rapidly Stimulus Control of Operant
Behavior 43
responding declines when extinguished in the presence of different stimuli. Thus, the greater
generalization from red to orange than from red to yellow slows the decline of responding in the
presence of the orange key.
2. As long as responses are reinforced in the presence of the red key, extinction does not reduce the
rate of responding in the pres ence of the other two stimuli to zero. Even after extinction has been
continued for some time, there is always some residual responding in the presence of the orange and
yellow keys, although reliably less responding in the presence of the yellow key. This unreinforced
responding is a result of generalization from the reinforced re sponding in the presence of red. Thus, as
the graph shows, when responding in the presence of red is also extinguished, there is a further
decrease in the rate of responding in the presence of both the orange and yellow keys.
1/' 3. Extinction produces the most rapid decrease in the rate of responding in the presence of the red
key. Because there is no rein forcement in the presence of any other stimulus, extinction is not retarded
by generalization from any other stimuli to red.
4. When responding is once again reinforced in the presence of the red key only, the rates of
responding to orange and yellow even tually return to the former low level that prevailed in Part 2 of
the
~ 5 z 5§
~ ~ p,..
en ~ en Z 0 p,..
en
~ ~
80 Part 1 60
40
20
Part 2 , , ,
Part 3 Part 4
Responding \ 1\ , ,
on yellOw'"" l \
/ .... ,
experiment, when the procedure was identical. This result is preced ed, however, by a surge of
responding up to a medium rate in the presence of orange and yellow and to a reliably higher rate in
orange. This rate never exceeds about one half the rate prevailing in red, however; and it gradually
decreases to the level maintained by gen eralization in Part 2.
These four effects of generalization involve changes in the rate of responding in the presence of each of
two stimuli which are caused by changes in the consequences (reinforcement or nonreinforcement) of
responding in the presence of a different stimulus. The rate of responding in the presence of the yellow
key, for example, depends not only on the conditions prevailing during yellow but also on the I
conditions prevailing during red. Such effects are of fundamental importance in pointing out that the
causes of behavior that occurs lunder a given set of circumstances may include events that occur under
different circumstances. More examples of such multiple determination of behavior will corne up
during our discussion of schedules of reinforcement in Chapters 6 and 7.
In Parts 1 and 2 of the experiment whose results are shown in Figure 4.3, the conditions of
reinforcement were always the same in the presence of the red key. The rate of responding during red
was not constant, however. Rather, it increased when responding was no longer reinforced during
orange and yellow. ~ nomenon
) two stimuli change it:t opposite directiqns: in our experiment, the rate I decreased in orange and
yellow and increased in red. We have an example of generalization when a child's unruly behavior extin
guishes more slowly than usual at horne because it is reinforced on the playground; we have an
example of behavioral contrast when the extinction of unruly behavior at horne makes for increasingly
fre quent unruly behavior on the playground.
Contrast seems to depend on tbe relation between the reiniorc ing co na itions associated with the two
stimuli. When the conse q"iieo'ces of a response become less reinforcing in the presence of one
stimulus, we can expect the frequency of the response to increase in the presence of another stimulus
where its consequences remain reinforcing. Contrast, like generalization, furnishes an example of
changes in behavior under one set of circumstances which are caused
It often happens that reinforcement of a' response in the presence of a stimulus does not bring all
properties of that stimulus into control over the behavior. Variation or elimination of the noncon
trolling property has no effect on the behavior; and if the properties are separated, the generalization
gradient across values of that prop erty is flat and unchanging. When this happens, the organism is said
to be not attending to that property of the stimulus.
As a simple example, the pecking of a pigeon was reinforced in the presence of a white triangle on a red
background and was never reinforced in the presence of a white circle on a green background.
Subsequently, with no reinforcement, the four parts of these two stimuli were presented separately:
the circle and the triangle on grey backgrounds and the colored backgrounds without superimposed fig
v' ures. None of the birds pecked at either part - circle or the green of the stimulus in whose presence
responding had not been rein forced. Some pecked only at the triangle, some only at the red
background, and some at both at various rates. The birds that pecked at the triangle and not at the red
background were attending only to the triangle, not to the red; their behavior had been brought under
the control of the property of form but not of the property of color of the stimulus that had set the
occasion for reinforcement. The behav ior was, in all cases, under the control of the original stimulus
when both parts were combined, but it was not always under the control of each individual part.
Attention is often directed_ to. one sensory modality to the exc1u- ;SlQn olhers , Forexample~ suppose
that a pigeon's peck is rein forced in the presence of a brightly lighted key and a loud clicking sound
from a loudspeaker. The pecking will be likely to come under , the control of the brightness of the key
and not under the control of , t,he noise. Decreases in the brightness of the key will result in a ...
~;decreased tendency to respond, while changes in the loudness of the """'}.noise will probably have
very little or no effect on behavior A-E!; ";. gtQD_J s .more Jikely_ to .attend to a visual stimulus
prominently dis- }\ played on the key that it pecks than to a relatively unlocalized nois·e . .r bls9 pig~9ns
in_ general have a greater tendency to attend to visual I ~') stimuli than to other kinds of sensory
stimuli.
\ . There are two factors that help determine which stimulus an I' organism will attend to at any
particular time: the inherited charac teristics of the organism and its experiences with its environment.
Organisms are born with dispositions to attend to particular aspects of the environment. From all the
stimuli available in its environment among which it is able to discriminate, the organism selectively
attends only to some.
) A common example of this phenomenon is the selective control , of the cat's behavior by movement,
almost to the total exclusion of t :.f color and brightness. If a cat is taught to discriminate between a \
I}'''? moving, dim, red object and a stationary, bright, green object, the ~ chances are excellent that
movement will be the controlling prppe(ty~ -..-r--_ .... _.- _ .... .. - ,I} The cat will come to respond to
moving objects regardless of changes \ \\ in brightness or hue. This does not mean, of course, that the
cat is .Y unable to discriminate brightness: it can easily be conditioned to
enter the brighter of two identical chambers if its food is always placed there. Apparently, the cat can
even discriminate between colors if color is the only difference between stimuli which are dif
ferentially associated with an adequate reinforcer. If JelL-to. its_ow.n .. ~yic .l. l)owever, .the cat is more
likely to rely on movement as a stiI~lUlus . . ~.
The previous experience of an organism may also cause it to attend to one of several stimuli or
properties of stimuli, even though all have the same consistent association with the reinforcement of
responses. One rule seems to be that once attention to one property has been established, the
organism will continue to attend to that property to the relative exclusion of other properties. For
example, if the organism has previously made a series of discriminations be
tween stimuli on the basis of brightness, it generally attends to the brightness of the stimuli in future
discriminations. If the cat, for example, were placed in a world where movement had no consistent
connection with reinforcement but brightness did, then the cat would come to attend selectively to the
brightness of stimuli, proba
It follows from the above rule that once a discrimination has been established on the basis of a
relatively large difference in one , property of stimuli, the introduction of a smaller difference in an- I
other property of stimuli will usually be ignored, unless the condi- ! tions of reinforcement are changed.
For example, if an organism
! develops a discrimination between a bright and a dim light and then . a slight difference between two
faint sounds is added to the discrim ination, the organism will rarely attend to the sounds. When the
sounds are presented later without the lights, they will have no dif ferential effects on behavior. The
lights as stimuli overshadow the sounds.
Supraordinate Stimuli
.s.up1"aimiillilie...,glWJJ1Unform the organism about the currently ,'. : relevant property of a group of
stimuli. Technically speaking, they .} \ are stimuli in whose presence one property rather than another
has, in the past, set the occasion for reinforcement of a response. With people, words are the most
common supraordinate stimuli. 'Tell me I.' the colors of these cards" occasions from the listener
responses to the '.lr. ,colors rather than the shapes or sizes of the cards. In fact, after hear- ~ ing these
words, the shapes and sizes will control behavior so little that the subject probably will not even be able
to recall them later. §¥praor9.inate stimuli also control th~ .. respDn£es of animals. Suppose that a
pigeon is exposed sequentially to all four combina tions of white figures on colored backgrounds with a
triangle and circle as figures and red and green as backgrounds. When the bird's
chamber is lighted yellow by a lamp on the side, responding is rein forced in the presence of those two
of the four stimuli that contain red; when it is lighted blue, responding is reinforced in the presence of
those two of the four stimuli that contain a triangle. The yellow and blue general illuminations indicate
whether responses to the ground or to the figure will be reinforced. The bird quickly comes to respond
appropriately.
Attention may be transferred from one group of stimuli to an other by a procedure of simultaneous
presentation of both stimuli followed by the fading of the originally controlling stimulus. Suppose that
key pecking is currently under the control of the color of the key (the bird pecks at green and does not
peck at red) and that we wish to shift control to geometric figures (so that the bird pecks at a trian gle
and does not peck at a circle). This could be accomplished by alternately presenting the triangle and the
circle, each against a grey background, and by reinforcing pecking on the triangle while extin guishing
pecking on the circle. It is more efficient, however, to present the figures initially on the appropriate
colored backgrounds (the triangle on green and the circle on red) and then gradually to fade the
backgrounds by lessening their intensities. At first the colors control the key pecking; but as the colors
become dimmer, the forms come into control. If the fading is carried out at the proper rate, no
change in the rate of responding in the presence of either stimulus will occur. Unfortunately, there is no
information available on pre cisely when the new stimuli, the figures, come into control over the
behavior. Nor is it known if the shift in attention is gradual or abrupt. But the fact that continued
reinforcement in the presence of the triangle and the fading green ground seems to be necessary
suggests that the transfer of control is gradual.
Sensory Preconditioning
There is some evidence that transfer of control may sometimes occur without explicit reinforcement.
This phenomenon is called sensory preconditioning. Two stimuli - for example, a light and a sound - are
first presented simultaneously to an organism several times. Then, some response is reinforced in the
presence of only one of the stimuli until that stimulus comes to be an effective discrimi
native stimulus for that response. If the organism is then placed in the presence of only the other
stimulus, it is found that the response has come under the control of this second stimulus also. In order
to assure that this result is not simply a result of a generalization be
tween the two stimuli, other organisms are trained and tested in the same way, but without the initial
exposure to the two stimuli simul taneously. The evidence, which is too intricate to be elaborated here,
indicates that sensory preconditioning is effective in transferring control but that the control by the
second stimulus is by no means as strong as the control generated by traditional methods. Sensory
preconditioning is, of course, effective in producing the first few responses in the presence of the new
stimulus, a situation which may be used as the basis for more conclusive procedures.
Although the least discriminable difference between stimuli may be very small, there is a limit to how
fine a discrimination can be made. For example, there is a difference between the intensities of two
sounds or the brightnesses of two lights that no organism can reliably discriminate, even under the
most favorable circumstances. These limits have been extensively studied in the science of psycho
physics and have been reliably established for many stimuli and animals.
It is dangerous, however, to maintain that a given organism cannot discriminate between two stimuli
without exhaustive at tempts to produce the discrimination; only one demonstration of a
successful discrimination is needed to disprove the assertion of its impossibility. It is safe to say that all
organisms have discriminative capacities that are never fully developed because their environments
never provide differential consequences of selective behavior in the presence of minimally different
stimuli. The cultured palate of the wine taster, the discerning nostrils of the perfumer, the critical ear of
the conductor, the sensitive fingers of the safecracker, and the edu
cated eyes of the painter are familiar illustrations of discriminative capacities that remain relatively
untapped in most human beings.
five
Conditioned reinforcers
Some stimuli come to be reinforcers for an organism because of their association with reinforcement in
the previous experience of the organism. These stimuli are called secondary, or conditioned,
reinforcers in order to distinguish them from innate, primary, or ,If it were not for the phenomenon of
conditioned reinforcement, we
would all be limited to reinforcers whose effectiveness is innate. Instead, through experience, new
stimuli are added to the class of effective reinforcers. A fraternity pin, meaningless at an earlier age,
reinforces the behavior of a teen-ager. The voice of a dog's master, ineffectual at first, comes to
reinforce the dog's behavior. Stock market quotations, at first dull lists of numbers, come to reinforce
an inves
tor's behavior. Under special circumstances, conditioned reinforcers may be highly individualized, as in
the case of idiosyncratic fetishes.
Just as there are two kinds of primary or innate reinforcers, so ithere are two kinds of conditioned
reinforcers. One kind is composed
,of stimuli whose appearance or presentation is reinforcing. These are I called positive conditioned
reinforcers. The other is composed of stimuli whose disappearance or withdrawal is reinforcing. These
are called
conditioned aversive stimuli (or, by some, negative conditioned reinforc ers). Both positive conditioned
reinforcers and conditioned aversive
Conditioned Reinforcers 51
stimuli have several effects on behavior, but we are concerned here only with their effects as
reinforcers.
Provided that a stimulus is discriminable and commands atten tion, it can become a conditioned positive
reinforcer or a conditioned aversive stimulus. At the start, we have a stimulus whose presenta tion or
withdrawal following an operant has no effect on the proba bility of reoccurrence of the responses.
After the organism undergoes
( experience with the stimulus, however, it becomes a reinforcer. The \ necessary experience turns out
to be reinforcement itself. A stimulus I in whose presence a positive reinforcer occurs becomes a
positive
conditioned reinforcer. The conditioned reinforcer is said to be based on the reinforcer experienced by
the organism in its presence. One
interpretation of this process, as we shall see, is that conditioned reinforcers owe their effectiveness to
the fact that they function as discriminative stimuli for later responses which are maintained by
\reinforcement in their presence. A stimulus in whose presence an 'laversive stimulus occurs becomes a
conditioned aversive stimulus. The conditioned aversive stimulus is based on the aversive stimulus.
T!,e formation of a con 9.~!ioned reinfOl;~er .is .. usually a gradual prqcess: severar reinforcements or
several occurrences of an aversive stimulus ' are necessary. Eventually, the stimulus takes on the rein
forcing or aversive properties of the stimulus presented to the organ ism in its presence.
It makes little difference whether the reinforcer on which the conditioned reinforcer is based is itself
innate or conditioned. A new conditioned positive reinforcer can usually be formed on the basis of
either an innate or conditioned positive reinforcer; and a new con
ditioned aversive stimulus can usually be formed on the basis of either an innate or conditioned
aversive stimulus.
As an example of the formation of a conditioned positive rein forcer, suppose that a hungry pigeon's
pecking is reinforced with food in the presence of a red light behind its key. During alternating periods
when the key is lit by a green light, pecking is not reinforced with food. Rather, pecks on the green key
produce the red key. Under these conditions, the pigeon will peck both the red and the green key. It
will peck the red key, of course, because pecking the red key is reinforced with food. It will peck the
green key because pecking the green key also is reinforced - wi th a conditioned reinforcer, the red key.
The red key has become a conditioned reinforcer because pecks on it have produced food in its
presence. The red key can be
further demonstrated as a reinforcer by showing that its presentation will reinforce not only pecking on
the green key but any response that it follows.
An example of the formation of a conditioned aversive stimulus occurs when an electric shock is
delivered to the feet of a rat during periods of noise through a loudspeaker. The noise will become a
conditioned aversive stimulus; its termination will reinforce the response which terminates it, just as
the termination of the electric shock reinforces responses that terminate the shock.
Conditioned reinforcers generally occur within chains of re sponses and stimuli. A chain is composed of a
series of responses joined together by stimuli that act both as conditioned reinforcers and as
discriminative stimuli. A chain begins with the presentation of a discriminative stimulus. When the
organism makes the appropriate response in the presence of this stimulus, a conditioned reinforcer is
presented. This conditioned reinforcer is also a discriminative stimulus which occasions the next
appropriate response. This re sponse is reinforced with another conditioned reinforcer, which is also a
discriminative stimulus for the next response, and so on. The l.ast .stimulus in the chain, on at least
some occasions, is a primary, or innate, reinforcer.
One example of a chain of behavior is the sequence of responses we emit when we go out to eat at a
restaurant. The initial discrimina tive stimuli may be a friend's telephone call, the time of day, or a
strong hunger contraction. The chain that follows is composed of many responses: rising; opening the
door; leaving the house; getting into, starting, driving, and parking the car; entering the restaurant;
sitting down; reading the menu; ordering; and eating. The environ mental stimulus that follows each
response - the open doors, the running engine, the restaurant facade, the appearance of the waiter,
the food-occasions the next response in the chain. We do not try to get into our car until the door is
open. Nor do we attempt to order food unless there is a waiter at our table. Each of these
discriminative stimuli is, in addition, a conditioned reinforcer. The opening of a door, for example, is
reinforcing because the open door is a stimulus in whose presence a response is reinforced. The entire
chain of behavior is maintained by the food we finally ingest; we simply do not go to restaurants which
furnish either bad food or no food at all.
An experimental example of a chain may begin when a pigeon is presented with a blue key. When the
pigeon pecks the key, it
Conditioned Reinforcers 53
changes to red. After the key turns red, the pigeon presses a pedal which turns the key yellow. During
yellow, displacing a bar changes the key to green. Finally, during green, pecks are reinforced with the
operation of a grain-delivery mechanism and its associated stimuli, in the presence of which the bird
approaches the grain magazine and eats. Each change of color is a conditioned reinforcer for the
response !that precedes and produces it as well as a discriminative stimulus for the response emitted in
its presence. The entire sequence is main- ! tained by the grain which the pigeon ultimately eats. The
Links in a Chain
Each unit composed of a discriminative stimulus, response, and reinforcer is called ~!J: k of the chain.
The experimental chain de scribed above, for example, has five links: blue-peck-red; red - treadle -
yellow; yellow - bar - f reen; green - peck - grain -maga zine operation (and other stimuli); grain
magazine-eat-food inges tion. Because each stimulus .has -a .. dual functiQ.n, ... a.s discriminative
stimul~ ' -and conditioned reinforcer, the ~inks oyerlap. In fact, it. is this dual function of the stimuli that
holds the chain together.
. Theoretically, chains may have any number of links, although in practice there is an upper limit. With
some long chains, responding will not occur in the presence of the first stimulus but will begin if the
second stimulus is presented instead. In most experimental studies of the basic principles governing
chains, three links are used: responding during one stimulus produces another, during which
responding produces the primary reinforcer and its associated stimuli, in whose presence the organism
moves to consume the reinforcer. The effectiveness of the second stimulus as a conditioned reinforcer
is assessed by measuring the behavior which it reinforces in the presence of the first stimulus.
A chain of at least two links is always involved whenever any behavior is emitted in obtaining a primary
reinforcer. For example, a rat presses a lever which produces a buzz and the presentation of a liquid
diet. On hearing the buzz, the rat moves to eat the reinforcer.
The responses involved in the final link of a chain are usually treated as a single operant, defined in
terms of their common effect in the consumption of the reinforcer. However, this operant itself
involves a chain, which may also be analyzed into its components. For example, in the presence of the
buzz, the rat moves to the food dispenser; once there, it places its head near the hole; with the hole
and head properly aligned, the rat licks and swallows. ~~b!~~ 9.D~~ in the chain is reinforced -by its
consequ I1<:es.
Such breakdowns as this are often useful or necessary in the refinement of the sequence or the
individual topographies of behav iors composing the original, grosser response. For example, if the rat
were taking an unnecessarily circuitous route to the food, we could speed it up by selectively changing
the consequences of only that bit of behavior in the chain. We could arrange that the move to the
magazine will be reinforced only if it is direct.
The fineness to which the analysis of behavior is carried is determined by the purpose of the analysis.
Consider the game of golf, for example. If we want to do no more than understand or manipulate the
golfer's general proficiency at the game, we need know only the total number of strokes he takes per
outing or per week. Each stroke is an instance of a response, without regard for the kind of club he
used. If, however, we want to manipulate, and pre
sumably improve, his game, we will have to analyze the play of each individual hole into a chain,
probably defining the links of the chain in terms of the club that is employed. This will allow
independent manipulations of each stroke since its consequences can be dealt with individually. If,
finally, we want to improve the topography of an individual stroke, such as the drive or putt, we must
analyze that single set of behaviors into a chain. Then, using such consequences as distance, accuracy,
and our approval as the terminal reinforcers in the chain,' we attempt to manipulate directly the
sequence and to
ln chains of stimuli and responses, each stimulus excepLthe..iirst --and..the s. £Unctlons:: as_.bQ.th
cQn.<:lI~~.2 Ji J! ri.fn.f9!Cer a~~ a. ,:!~ ~~~:t.Jl illa.tiYe._ .. s.ti~: The two functions are clearly and
functionally separate. ondiU.9~9-_ r~ I1forcer, b~ .?1!rr:t u!-Y E~!!lfo!~ __ !:.e- §pDnses--made..in __
lbs.q 'li.oJls J~~() e ... shain, ... tho §,,~ P'Q!-!§'~ s ~l}js;l:u~y. J~~j~ .. ! !1pp~ rar:. ce . As-a..cfu;S!:.LI!!~
I.1:~!iY~ iUmJJ,lu.s the 1LIllill1,l']"'Q.~£a:?.iQ!1~Jhe . .b eha io emitt~ . in .i ts. .. prel? :Q e· This behav
lior is reinforced by the appearance of the stimulus in the next link of
In theory, the answer to both questions seems to be no; but in prac tice, the exceptions in each instance
are of comparatively negligible ! importance. For most practical purposes, the functions as condi I
tioned reinforcer and as discriminative stimulus coexist as properties of the same stimulus in a chain.
Conditioned Reinforcers 55
Consider the first question. All discriminative stimuli cannot also be conditioned reinforcers, because
there is a limit to the num ber of links that will hang together in a chain. E~~!lts have ~own tb
a.LJhe..Ja.rtb~JL.siimulus _.fr:om the-.tex:m.i~e
stimulus of the second link is still effective as a discriminative stimu lus; it will occasion at least the one
response necessary for advance ment into the third link. This of course means that the stimulus of the
third link is an effective conditioned reinforcer. The fact remains, however, that the stimulus of the
second link will cease to function as a conditioned reipforcer even though it continues to function as a
discriminative stimulus. Because of the limit on the number of links in a chain, this state of affairs must
be reached eventually if the number of links is continually increased . .ExcepLfQ!.~i? lir:tit, ~2.~ ti ,:~_.
(iPluli ... b@ Sc}Y?~~ o!, .. .t~b.e ~tkcti _ co~9.~~:d r.einforcers . • Q.. ~ -(IV ~y \ I .,) hi ?) /..IJ0
The second question is whether a conditioned reinforcer, having been present when a particular bit of
behavior was reinforced, will necessarily function as a discriminative stimulus for that behavior. This
necessarily happens in chains, where a particular response is reinforced in each link by the presentation
of the stimulus that functions as the discriminative stimulus of the next link. It also occurs if we set up a
series of stimuli which will occur in sequence regardless of the response emitted by the organism. If any
of the stimuli we present are conditioned reinforcers, they will reinforce whatever behavior happens to
precede them. Thus, as we saw in Chapter 3, the usual result is a chain of regular, although not re
quired and therefore superstitious, behavior. Since each conditioned , reinforcer has accompanied the
reinforcer on which it is based, the conditioned reinforcer will fail to function as a discriminative stimu
lus only if neither a particular, required response nor regular, super
stitious behavior has been conditioned by the other reinforcer. Although this state of affairs is
conceivable, it probably never ac tuallyoccurs.
The strength of a conditioned reinforcer is measured in terms of its durability and its potency. ~ refers
to the length of time
Durabilitv
Potency
Different conditioned reinforcers used in exactly the same way may differ in their effectiveness as
reinforcers: one reinforcer may maintain a higher rate of responding than another. The highex...th.e rat
._.o resp'onding maintained by presentations of the conditioned
fi!.!!t ; ttle" hig~e th p~ nCY of tr i?fo~~ er : .. --... .- -.• "< ~ .. - The potency of a conditioned reinforcer-
that is, its ability to maintain a rate of responding- is determined by many factors. The following are
four of the most important factors.
J 1. The potency of a conditioned reinforcer increases with higher frequencies of presentation in its
presence of the primary or condi tioned reinforcer on which it is based. The evidence suggests that this
function is concave downward; that is to say, as the frequency of presentation becomes higher and
higher, the potency of the condi tioned reinforcer continues to increase, but less and less rapidly. A
point is reached beyond which further increases in frequency of reinforcement in the presence of the
conditioned reinforcer bring
Conditioned Reinforcers 57
about only negligible increases in the rate of responding that is maintained by presentations of the
conditioned reinforcer. .....-- 2. The schedule of presentations of the stimulus on which the conditioned
reinforcer is based also helps determine its potency. This phenomenon is discussed in Chapter 6.
v 3. In a chain, a conditioned reinforcer is less potent the greater its distance from the primary
reinforcement. Distance is measured in terms of either time or the number of links. Other factors being
equal, the organism encounters conditioned reinforcers of greater and greater potency as it works its
way through the chain.
v 4. The potency of a conditioned reinforcer also depends on the prevailing degree of motivation
relevant to the primary reinforcer on which it is ultimately based. I;:or.., example, .. conditipned
reinforcers bas..ecLon n~iD.f.Q!~Jmleni.hyJnQg .. ?Ig relativ.ely: w.~alL!Vh~~ ~ organ
It is possible to gain a degree of independence from the factors affecting the potency of a conditioned
reinforcer by forming condi tioned reinforcers based on two, several, or many primary rein forcers. In
such cases, the conditioned reinforcer gains its potency from all of the reinforcers on which it is based.
Such a stimulus is called a fif11,eraJized .conditio1:(eA..-reinIr},t£{;r, to indicate the generality of its
potency. Money, for example, owes its potency as a reinforcer to the wide range of primary and
conditioned reinforcers on which it is based.
We will have many opportunities to apply these considerations about conditioned reinforcement in the
next two chapters, which treat schedules of reinforcement.
SIX
Simple schedules
of positive reinforcement
As we have said before, it is by no means necessary to reinforce every occurrence of a response in order
to increase or maintain the rate of responding. In fact, if continuous reinforcement were the only case
ever studied, procedures and results of great interest would never have been discovered and
developed, and since reinforcement outside of the laboratory is almost never continuous, nearly all ap
plicability of the concept of reinforcement under natural conditions would be lost. A baby cries many
times before one of its cries brings an attentive mother. We try many approaches before solving a diffi
cult problem. A small boy may ask for lunch many times without success; but when a certain period of
time has elapsed since break fast, his request is granted. In each of these instances, only one
occurrence of a response is reinforced and many are not. In the cases of crying and solving a prbblem, a
number of unreinforced responses occur before one of them is reinforced, even though that number
varies from occasion to occasion. In the case of asking for lunch, it must be lunchtime before a response
is reinforced; the number of responses is relatively unimportant.
rein!gr~elllen s _ th.e rule followed by the environment - in an exper m~nt, by the apparatus - in
c!.etermining which among the man ~sg en. es oLa.r.es.}2.QQse _ reinforced_ - ~chedules 0..( rei.!!
f.9.r.~~~ ilY~~ l!!~ 2.!:derly., and p r9..f2Y..nd ,.effe<;Js __ onl he .organism's J;'at~ f resp.9..r). i. g- The
importance of sched ules of reinforcement cannot be overestimated_ No description, account, or
explanation of any operant behavior of any organism is complete unless the schedule of reinforcement
is specified_ Schedules are the mainsprings of behavioral control, and thus the study of sched ules is
central to the study of behavior. Every reinforcer occurs accord ing to some schedule, although many
schedules are so complicated that ingenuity, insight, and experimental analysis are needed to
formulate them precisely. Ib.e..effurlis worth-while, however, because the -J:ate Qf. ~spondi!l ~~~,.
~~~~ Y. __ !: ~~i; on i:Q led "by manipulating the schedule -of- Fei £Ol::c.em~I} .!h !' __ .):?y __ a!,y_
other method. Behavior that has been attributed to the supposed drives, needs, expectations,
ruminations, or insights of the organism can often be related much more exactly to regularities
produced by sched ules of reinforcement_ Many apparently erratic shifts in the rate of responding,
which had formerly been ascribed to nebulous motiva tional variables or to "free will," have been traced
by experiment to the influence of schedules of reinforcement.
Simple schedules of reinforcement can be classified into two types: ratio schedules and interval
schedules_ Rati9.._ ~(;hedules. pre ~giQ _ tha.l _,! .. _c:ertain number, .oJ_.'!~;;J2QJ.:!,g: s. .. be':ami.tte-
d--be£ore-ane E spo!:.§..e_ffi r~infor.ced _ The term ratio refers to the ratio of the total number of
responses to the one reinforced response_ With a ratio of 50 to 1, for example, an organism must emit
49 unreinforced re sponses preceding each reinforced response nt~p"'qLJ£l! 4J1.l!;? _
.. <;J ib e ~L (\ _ given interval . ~f time elap~e before .a response Gcm-ge J'ei.ni 9rced. The -relevant
interval can be measured from any event, but the end of the previous reinforcement is usually used.
Under ratio schedules the amount of time the organism takes to emit the necessary number of
responses is irrelevant, while under interval schedules the number of responses is irrelevant so long as
the organism emits the one response necessary for reinforcement after the interval has
elapsed. ,lnten@L s.che.dules._ha..ve--a • ..bui1t-i+l. safety factor which is absent in ratio sch~.9- ,:!~es:
If the number of re
sponses required by a ratio schedule is too high, the animal may never emit enough responses for
reinforcement, and responding may ex tinguish. The residual level of responding under extinction may
then be too low to produce reinforcement. Under interval schedules,
I however, the mere passage of time brings an opportunity for rein- I forcement; as long as the interval
has elapsed, only a single response . is needed for reinforcement. This single reinforcement, then, in
creases the rate of resporyping and ensures that responding wilJ not be extinguished. \Jv ) ~ ..1~!(;Yl
~ \\ \\V0 . ..y t'-{t;J" '
I ~v
Ratio and interval schedules can themselves be classified into two types: variable and fixed. When a
variable-ratio schedule is operating, the number of responses required fOr one~'re info cement varies
from reinforcement to reinforcement in an irregular, but usually repeating, fashion. A typical sequence
might reinforce the tenth response, then the hundredth, then the fiftieth, and continue following these
numbers of responses: 5, 30, 150, 15, 40, 90, 210. Then, after ten more responses, the sequence would
repeat and work through the same series of numbers over and over again until the session ended.
The value of a variable-ratio schedule is summarized by the average number of responses per
reinforcement, here 70. For conven ience, variable-ratio schedules are abbreviated a~and a number
following the abbreviation indicates the average value of the ratios. In this way, the schedule described
above is designated YR 1Q.-
A fixed-ratio schedule, on the other hand, consistenny-requires the same total number of responses for
every reinforced response. Fixed-ratio schedules are abbreviated....ER... A fixed-ratio schedule
requiring a total of 50 responses for each reinforced response is designated FR 50.
Similarly, il. varia!?!!_:fl!:t.fruaL ~YI) . ..schedule varies the amount of time that must elapse before a
response can be reinforced. A fixed interval LB2.!chedule holds the required lapse of time constant.
The basis of all known schedules of positive reinforcement, no matter how complicated, can be reduced
to variations of ratio and interval requirements, sometimes in combination with differential
reinforcement of particular properties of responding, such as pausing or high rates. In this chapter, we
will examine only the four elemen
tary ratio and interval schedules- YR, FR, VI, and FL We will discuss some of the more complicated cases
in Chapter 7.
CHARACTERISTIC PERFORMANCES:
Each schedule of reinforcement produces a characteristic per formance. Depending on the schedule
involved, the performance
may consist of a steady, predictable rate of responding or of regular, oscillating, and predictable
changes in rate. The appearance of this characteristic maintained performance is preceded by a period
of acquisition, which occurs when the animal's responding is first rein
forced according to the schedule. Although performance during acquisition is also regular and
predictable, it differs from the main tained performance. During acquisition, the performance is always
changing; but gradually, it comes closer and closer to the final main tained performance on the
schedule. For example, when one schedule is terminated and replaced by another schedule, the
maintained performance of the first schedule gradually changes, through a tran
sition period, to the maintained performance of the second. When a maintained performance on a
schedule is followed by a period in which reinforcement is withdrawn entirely and responding
extinguishes, the course and character of the extinction are to a very large extent determined by the
preceding schedule of reinforce ment. The following review of the simple schedules of reinforcement
describes the performances typical of the various schedules and ana lyzes the variables which have been
shown to determine the behav ior in each case. A discussion of extinction following reinforcement on
the four simple schedules follows later in this chapter.
Before describing and analyzing the performance maintained by each of the simple schedules of
reinforcement, it may be helpful to point out some of" the more common examples of schedules with
which many readers will already be familiar. !kha0.9J.i~L r.eiD.£Ql:ced Qn.-'Lfu~ Q.~ ratio schedule
in..an.y,Jactory thaLp.a.y.s..i,ts.employ;ees Oll. .. a pi.e.c.f.w.:Qd s.Y.~..t~~. Payment depends on how
much work is accom
plished: each tim~ a fixed number of items has been manufactured or serviced, the amount of the
payment increases. Most workers work quite rapidly when they are working. But there is usually a
pause, or breather, before beginning each block of work. This behavior is typical of that generated by
fixed-ratio schedules.
vi True fixed-interval schedules, in which the length of the interval between reinforced responses does
not vary at all, are difficult to find outside of the laboratory. However, there are numerous approxima
tions. One is the workday, whose duration is relatively constant. Preparations for leaving the office, for
example, increase in frequency as the time to leave draws nearer. It is not altogether clear in this case,
however, whether the reinforcing stimulus is the departure from the office or the arrival at home.
Reinfo..rcem~.D! .. S>'!l ._ ~!i ~!JJ .:j.}:!te.!Y'!Lsc:b: e.<:!~I~ .. i.~;U~ 2'e.mplified,.by tryJng. to .·.reach. a
busyperson ... on ... the..telephone. In this case, the behavior is the telephoning, and the reinforcer is
hearing the per son's voice instead of a busy signal. The schedule of reinforcement involves intervals
because time is required for the person to stop talking and hang up so that you can get through to him.
The intervals are variable because of the variable lengths of conversations on the telephone.
The best example of a variable.::r.atiu..schedule is the operation of the one-armed bandit, or slot
machine. The machines may payoff r:
every hundred plays on the average, but there are occasional in f stances of two successive winning
plays and many instances of more I than one hundred plays between pay-offs. The extremely high rate
of i responding and the persistence of many gamblers in playing these machines is typical of the
behavior generated by variable-ratio schedules of reinforcement.
The effects of variable-ratio and variable-interval schedules are vastly different. J2J,;uing ..
maintenan~e., VR sched~les prodl:l:c~ ,~ e.EY . ~~gb !!.c!. . .E-! arly !=o!lstant rates of responding,
sometimes ap eroachiDK!h~ £~ysi~al c~pabil~ties '?ft~e ~rga~ ~m. The rate may go as high as fifteen
responses per second in a pigeon. VI schedules also produce nearly constant rates of responding, e..l!
U.b~ri1tes are . .11sually ~.uo}'Ve.! values ha~ thos~ produced by VR schedl!le~. ".
100 I
responses
Reinforcement
VR rate =
5 responses/sec. 1 min.
Figure 6.1 An illustration of the performance generated by variable-ratio and variable interval schedules
programed in yoked experimental chambers.
how long it takes the first bird to emit the number of responses required by the ratio. Thus, by using
yoked experimental chambers, it is possible to hold the timing and frequency of reinforcement
constant and to compare directly the other effects of VR and VI schedules of reinforcement.
After sufficient exposure to this procedure, the performance of each bird stabilizes in the form shown in
Figure 6.1. Both birds exhibit a nearly constant rate of responding. However, the first bird, on the VR
schedule, has a rate nearly five times as fast as the second bird, on the VI schedule. It is not surprising
that both birds peck, since the pecking of each is reinforced. But what is responsible for the difference
in rate and for the apparent stability of both rates of responding over time?
Since both birds' pecks are reinforced at approximately the same time, the difference in rate must be
caused in some way by the interaction between the schedule and the behavior. Two factors of
importance are the differential reinforcement of interresponse times and the relation between how
rapidly the bird pecks and how often its pecks are reinforced.
Differential reinforcement of interresponse times. One aspect of the interaction between behavior and
the schedule of reinforcement involves !DJ:errespP'J}~J0i !I!~~! 0L IRIs .ni!.:z ter!g p"q ?~ . tim{.is.o.s.!
mp-Iy the amount of time that passes between two responses. Except for the first ;;-;-ns ~ e{,ery
response in "a "sequence terminates the inter
response time measured from the previous response. Thus, a response comes at the end of the
interresponse time with which it is associated. For example, if a pigeon pauses a relatively long time
between
two pecks on the key, the second peck is said to have a long inter- J response time. If the pigeon waits a
short time between pecks on the key, the second peck is said to have a short interresponse time. Any
sequence of responses can be described not only in terms of the rate at which the responses occur but
also in terms of the inter response times which make up the sequence. It is possible simply to list the
values of each IRT in the sequence. Usually, however, we settle for a description in terms of the number
of IRTs of various durations in the sequence - that is to say, a frequency distribution of IRTs.
The rate of responding and the IRTs of a sequence of responses are clearly related . .1ong...IKr:s .. a.~.o.
as aocjjl ted_ wjth lqw rates . oL re" ~JIDgingo.;' .short IRTs, with high rates of responding, o. ~ . s~g!len
e,,, 9f r,es.,Eonses at a higlL!:'! w.UL~ont~in relatively manY o..sluutlRTs; a s.,equ en C. LI~~.pp n.~es . a
L a" low: .rate "Vill~oI.l tain reI a H yo.e)y H :rl)fl:r,y l2...~Luas -- The IRT of a response is a characteristic
that can be modified by selective reinforcement in the same way that the topography-the form, force,
and duration-of a response can be modified (as dis
cussed in Chapter 3). If we reinforce only those responses which come at the end of short IRTs, we find
that short IRTs soon come to pre dominate in the performance, making for a high rate of responding. If
we reinforce only those responses that come at. the end of long IRTs, we soon find a low rate of
responding characterized by long IRTs. ~p..Qn,gingj~.tl:!Wi . . nfi.ue.n.c.e.d..b¥-lhe.,.diHereRtiaLrain'
forced responses in a sequence of responses tend to occur at the end of the shorter IRTs in the
sequence, the shorter IRTs will be differen tially reinforced. This is what happens on a VR schedule, and
the result is a tendency toward shorter IRTs and, hence, higher rates of responding. The VI schedule, on
the other hand, differentially rein forces relatively long IRTs because the reinforced responses in a
sequence will predominantly be those that terminate the longer IRTs in the sequence. This results in a
tendency toward longer IRTs and, hence, lower rates of responding.
Differential reinforcement of IRTs is possible because an organ ism's IRTs are variable rather than
constant. A.J¥.pical sequence is indic~ted in Figure 6.2. As the organism begins to re.spond UH"der-any
schedule of reinforcement, it emits a sequence of variable IRTs--that includes groups of responses
occurring more closely together than others. These groups are called 1JJfr~, and they are indicated by
the brackets in Figure 6.2. When a VI schedule interacts with such a sequence of varying IRTs, the
chances that anyone response will be reinforced are greater the longer its IRT. For example, in the
experi ment with yoked chambers, as time passes, the bird on the VR is emitting more and more
responses. The more responses it emits, the greater the chances that it will reach the number of
responses re quired for reinforcement on the VR, which will in turn make rein forcement immediately
available to the bird on the VI. Thus, the longer the bird on the VI schedule waits to emit each response,
the more likely it is that the bird on the VR will have satisfied his ratio and, hence, the more likely it is
that the response will be reinforced. Similarly, in an ordinary VI schedule programed only by time, the
longer the IRT, the greater the chances that the interval required by the schedule will have ended. This
differential reinforcement of long IRTs helps keep the rate of responding on VI schedules relatively low.
On a VR schedule, differential reinforcement of short IRTs comes about because it is more likely that
one of a burst of several responses will be reinforced than that the first response in the burst will be
reinforced. Reinforcement on the VR schedule depends, of course, on the number of responses
emitted. Now, because bursts of re
sponses involve many responses, the chances that the number of responses required for reinforcement
will be reached by one of the later responses in a burst are greater than the chances that the first
response in the burst will reach the required number. And the later responses in a burst, of course,
have shorter IRTs than the first re
f---1
1 sec.
Figure 6.2 A sequence of interresponse times. The brackets indicate bursts of responses.
s onse in the burst. Thus, the shorter IRTs are differentially rein forced. The lis_u1U s the high rate of
responding typically produced b VR he.q !:! les . . . --.
The effects of differential reinforcement of IRTs can be demon strated experimentally by making two
simple modifications of the schedule of reinforcement. These modifications arrange for the explicit
differential reinforcement of IRTs. They can be added to any schedule; here, however, we will use the VI
schedule.
The first modification restricts the value of the reinforced IRTs: in order to be reinforced, a response
must have an IRT that falls within certain limits of duration. This modification is made by hooking a
schedule that limits the reinforced IRTs onto the VI sched
ule. Wh ne one schedule is hooked onto the end of another, the resulting compound schedule is called
a tandem schedule. .. a~dem schedule built on a VI schedule is developed by first establishing
performance on the VI schedule alone and then adding the additional requirement, here the
requirement restricting the length of the IRTs. There is no change in stimulus when one schedule is
added to the other. After the next interyal of the VI ends, the first response is not reinforced as usual;
instead, reinforcement is with held until the organism emits a response having an IRT of the re quired
length. When an IRT is long or short enough, its terminal response is reinforced, and the next interval of
the VI schedule begins.
After the restricting requirement is added and the second part of the tandem schedule begins, there is
usually an initial decrease in the frequency of reinforcement because a reinforceable IRT usually does
not occur immediately. Then, because of the differential reinforce
ment of IRTs, the rate of responding increases if the required IRT is shorter than that usually emitted on
the VI schedule or the rate decreases if the required IRT is longer than usual.
A second modification, which affects the differential reinforce ment of short IRTs, is the limited hold.
With an ordinary VI schedule, once reinforcement becomes available with the termination of each
required interval, it remains available until the next response occurs and is reinforced. With a limited
hold, however, the reinforcement
remains available only for a definite, usually short, period of time. This period of time is called the limite
cL~ld . If no response occurs
I during the limited hold, there is no reinforcement and the next interval of the VI schedule begins.
Thus, if the organism does not respond quickly enough, reinforcement is no longer available. ~h,Qr
Umited ..h.QJQ . fU~<: tions to produce an i~C!~ase ..r&e of r~~.p.9ng.i ng. -"0,,_,. " .• - ._-_.-
The ··explicit modifications that are arranged by tandem sched ules and the limited hold offer proof that
differential reinforcement of IRTs does indeed influence the rate of responding. The effects of
differential reinforcement of IRTs account for much of the character of the performances produced by
various schedules of reinforcement. The performances are complicated, however, and a single factor
can offer only a partial explanation. All of the factors contributing to a performance or to part of a
performance on a schedule need to be investigated before the performance can be thoroughly
understood. Differential reinforcement of IRTs is a major factor in determin ing the rates of responding
on VR and VI schedules, but it is not the only one.
R.ate of responding and rate of .reinfQ.rre.rnent. A second factor affecting characteristic performance is
the relation between the rate of responding and the rate of reinforcement: ..w..b.gnJ:h~ !2 te o.!...!ein
for~~men! !~_ dependent on t~e rate ,Qf ..r-esponding,-the_rate .9..Lre sponding tends to be higher.
This is the case on the VR schedule where the faster the organism emits its ratio, the faster
reinforcement comes. The resulting higher rate of responding, in tum, makes shorter IRTs available for
differential reinforcement. Thus, in the case of the VR, the dependence of rate of reinforcement on rate
of re sponding and the differential reinforcement of short IRTs have an additive effect resulting in
extremely high rates of responding.
On a VI schedule, bigher rates of respo ci .n g d~ not res ulLin I.JIQ!f freguent rei!1fQITgrllim.t,_.hut
extremely low rates may result in a lower frequency of reinforcement by causing significantly long
delays between the availability of reinforcement and the reinforced response. Only at very low rates of
responding, therefore, does the correlation between the rates of responding and reinforcement tend
to affect the rate of responding on a VI schedule. This fact, in combi
nation with the differential reinforcement of relatively long IRTs, hctps maintttin.moderate a!~~ g.t!:~~
pPrlding on VI. schedules.
There are several factors responsible for the stability of the rates of responding on VI and VR schedules.
One of these involves the
actual values of the intervals and ratios composing the schedules: if the range and distribution of the
ratios or intervals are within certain limits, stability is maintained. These limits are not well defined at
present, but it is known that the sequence of intervals or ratios should be carefully chosen so that
neither time nor number is con
sistently correlated with reinforcement or nonreinforcement. This means that an acceptable sequence
must include a proper balance ranging from very short to long intervals or ratios, without any
systematic patterning in the sequence. l~s.~o!"t, the ability of VI and VR sched~l~s tQ.maintain stable
rates of responding depe~ds on .tp-.eir
ature; a stable rate of responding will be maintained as long as the organism is not required to go too
long without rein forcement and as long as no discriminable feature of the schedule reliably precedes
the occurrence or nonoccurrence of reinforcement.
Once a stable performance has been established, two factors make it resistant to change. First, the
behavior involved in re i sponding at a constant rate becomes a conditioned reinforcer because it is
present at the time of reinforcement. As a conditioned reinforcer, responding at a constant rate
reinforces the behavior which results in its occurrence, and that behavior is precisely the emission of
responses at the constant rate. Thus, constancy itself becomes rein forcing. Second, responding at a
constant rate is superstitiously main tained. Although it is certainly not required by the schedule for
reinforcement, responding at a constant rate is nevertheless rein forced, because it is the only rate
emitted by the organism and it thus prevails at the time of reinforcement.
At almost all ratio values, VR schedules will produce the charac teristic high, stable rate. However,
beyond certain values, the range and distribution of the ratios comprising the schedule become cru
cially important. Individual ratios of greater than a certain value result in abrupt pauses in responding.
Pauses also occur if not enough short and medium ratios are included in the schedule. Naturally,
pauses both lower the rate of responding and disturb the stable nature of the performance.
Abru .Ea us~ _ in _ the_Jl9rmallY--JimP.oth._and .rapid.. rate._ oL r.e.- spo.nqing .on " ratio schedule are
referred to as strai1:l!. Strain usually loccurs when the value of the ratio is increased too rapidly in an !
experiment. It is possible to maintain responding on ratio schedules lof extremely high values provided
that the value is approached slowly from lower values. If the value of the ratio is increased too
Simple Schedules of Positive Reinforcement 69