0% found this document useful (0 votes)

6 views

Learning and Memory - (Chapter 4 - Behavioral Learning)

Uploaded by

Shakhrizada Safaeva

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Learning and Memory - (Chapter 4 - Behavioral Learning)

Uploaded by

Shakhrizada Safaeva

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

CHAPTER

©iStock.com/Wavetop

Behavioral Learning

Chapter Outline
•• Learning Objectives •• Biological Constraints
•• Overview Summary
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

{{
•• Operant Conditioning Theory
•• Avoidance Conditioning
{{ Basic Components
{{ Learned Helplessness
•• Strengthening Behavior
•• Weakening Behavior
{{ Kinds of Reinforcers
{{ Effective Punishment
{{ Brain Basis for Reinforcement
{{ Indirect Issues With Punishment
{{ Factors That Impact Reinforcement
{{ Decelerators
•• Schedules of Reinforcement
•• Overview of Operant Conditioning Theory
{{ Ratio Schedules •• Chapter Summary
{{ Interval Schedules
•• Review Questions
{{ Other Schedules
•• Key Terms
•• Shaping •• Further Resources
•• Response Chains •• References
•• Stimulus Control

Rudmann, Darrell S.. Learning and Memory, SAGE Publications, Incorporated, 2017. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/roehampton-ebooks/detail.action?docID=6403233.
Created from roehampton-ebooks on 2024-11-11 17:27:23.
76 PART I Learning

Learning Objectives
1. Explain the basic assumptions of operant conditioning theory.
2. Describe the role of reinforcement for changing the rate of behavior.
3. Identify the impact on behavior change that different schedules of reinforcement have.
4. Classify different forms of reinforcement as demonstrations of shaping, chaining, and
stimulus control.
5. Explain the principle of biological constraints on reinforcement.
6. Describe avoidance conditioning as an alternate form of reinforcement.
7. Explain how operant conditioning theory explains punishment and why the theory sees
limited value in punishment.

Overview
Jane Goodall observed what was a then-surprising behavior of the chimpanzees she was
studying in Tanzania’s Gombe Stream National Park in the 1960s. She saw a male chim-
panzee trying to dig termites out of a termite mound (to eat) with blades of grass, rather
unsuccessfully, and then grab a twig to dig into the mound, a more useful approach. It
was commonly assumed at the time that only humans used tools. Since then, researchers
like Dr. Elizabeth Lonsdorf at the Lincoln Park Zoo have been studying how chimpan-
zees teach their young how to use tools (“Elizabeth Lonsdorf,” n.d.). Remarkably, not all
groups of chimpanzees learn and pass on the same set of behaviors for getting food. They
develop cultural behaviors for what they eat and how they use tools for food.
All animals can learn to associate behaviors with some goal, whether it is raising a
hand to ask a question, using a pen as a tool to dig out a bank card from the side of a car
seat, or avoiding traffic by taking an alternate route. In the previous chapter, we read
about research on several kinds of associationistic learning and their biological basis.
Simple associations, such learning to make a connection between a cue that signals an
event or how to make the movements needed to execute a skill, do not necessarily require
total awareness and sometimes, none at all. The study of behavioral learning attempts
to explain the process of learning from a strictly observational perspective of the actions
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

an organism takes, using behaviors as data (Skinner, 1938). This pro-

cess is strictly defined as movements that the organism or “learner”
FIGURE 4.1 Behavioral approach is making and that can be observed by others (Skinner, 1938, p. 6).
to learning Our focus in this chapter is behavioral aspects of learning
exclusively (see Fig. 4.1). The major theory presented here, oper-
ant conditioning, combined with Pavlov’s classical conditioning the-
Cognitive Behavioral ory comprised behaviorism, the most popular school of thought in
American psychology for most of the twentieth century.

Affective Social
Operant Conditioning Theory
Biological Exclusively behavioral explanations for how people and animals
learn can seem very constrictive today. In modern psychology, the
idea of an internal, mental world is often assumed and is the focus of

much study in both humans and animals. This was not always the case. Behavioral expla-
nations for learning are intentionally limited to what can be objectively observed in the
performance of subjects. No claims can be made about motivation, goals, expectations,
consciousness, personality, or knowledge. Historically, this limitation was embraced by
many in the field as a way to encourage psychological research to be as scientific and
rigorous as possible. Behaviorism was the catalyst that forced psychological research
to get its act together and clearly define how the field would approach its topic scientifi-
cally. One of the contributions that this strict approach made was the focus on research
methods that allowed for reproducibility. This meant well-thought-out experiments and
clearly defined operational definitions for variables.
This approach, while narrower than the approach most psychological researchers
take today, is still useful and occasionally quite relevant. Often it is worthwhile to avoid
making generalizations about the conscious experience of a subject, such as animals or
infants. As such, the power of the behavioral approach to describing the process of learn-
ing lies not just with reproducible research but is applicable equally to all learning organ-
isms, human or nonhuman, and for any particular domain or activity. The broad scope
of behavioral learning theories is remarkable and, as a result, remains an effective way
to talk about how people and animals change their behavior based on the outcomes of
their actions. The behavioral approach to learning has not become outdated so much as
it is now one approach within a set of approaches. The one steadfast rule when limiting
ourselves to examining learning from a behavioral perspective, is that there can be no
conceptualization of the “mind” as an explanation for how an organism learns.
Operant conditioning theory, a learning theory that relies solely on observable
behaviors to explain learning, describes how we learn from the consequences of behav-
iors. The outcome of the behaviors someone makes can alter the likelihood that the per-
son will behave in that way again. Like the theory that was the inspiration for it (classical
conditioning), operant conditioning is a content-neutral theory of learning that applies to
any organism that can make an action on the environment. In operant conditioning, the
behavior is usually optional or voluntary, whereas in classical conditioning the learned
association can be involuntary, or reflexive. It is one of the most heavily researched psy-
chological theories of the twentieth century. The theory has been broadly applied to situ-
ations as diverse as classroom management, drug addiction and rehabilitation, animal
conservation, treatment of phobias and other anxiety disorders, and management tech-
niques in business. The basic paradigm of the theory is that the organism can operate on
the environment (whether an object, animal, or another person); and assuming there is a
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

resulting consequence from the environment, that consequence will alter the chance that
the organism will make that same behavior again.
In the late nineteenth century, Edward Thorndike carried out the first studies in
“comparative animal psychology.” Comparative psychology is the study of animals to
gain insights into human psychology; as the name describes, the purpose is for compari-
son between humans and nonhumans. To find out if animals showed signs of intelligence,
he built cages that included trapdoors that would open if the animal performed some
basic action, like pulling a string or stepping on a lever.
First using chickens and, later, cats, he would place a hungry animal in one of his
“puzzle boxes” and see how long it took the animal to get out for food (see Fig. 4.2).
After the animals figured it out, he would return them to the cage. Using this method,
Thorndike found that animals usually spent a lot of time trying various behaviors before
accidentally finding how a specific behavior would make the door pop open. This kind
of learning is known as trial-and-error learning. Once an animal had learned how to

FIGURE 4.2 Diagram of a sample puzzle box

Source: By Jacob Sussman [Public domain], via Wikimedia Commons.

FIGURE 4.3 Typical “learning curve” of amount of time for a cat to escape
one of Thorndike’s puzzle boxes over many trials (Thorndike,
1898). In this graph, learning is indicated by a lower value
over time.
150
Time to Escape (sec)

100
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

0
0 12 24
Trial

Note: Hypothetical values; are approximates.

escape, it escaped much more quickly on later trials. In other words, the time it took the
animal to escape or escape latency dropped rapidly with repeated trials (see Fig. 4.3).
From this, Thorndike proposed his law of effect, stating that behaviors that lead to a
“preferred” or satisfying situation are likely to be repeated. In contrast, behaviors that lead
to an “annoying” or aversive situation were less likely to be repeated (Thorndike, 1911).

For Thorndike, this comparison was fairly straightforward: Animals would work to attain
and preserve some situations and would try to avoid or get away from others. Situations
that were satisfying tended to encourage the behavior that preceded it. By this, he did not
mean satisfaction in terms of what is good for one’s life or for society in general. People could
engage in actions that seemed pleasing but were essentially bad habits. Nor could what any
one individual finds satisfying be determined in advance; what some would find satisfying,
others would find annoying.
By today’s standards, such a claim may seem intuitively correct and simple. At the
time however, another prevailing view was that solving a problem or puzzle required
“insight,” a moment of sudden realization of the solution, an idea popular with Gestalt
psychologists. Thorndike had produced a principle of learning that did not appear to be
due to insight but rather to acquired behaviors, and he provided a method for document-
ing the learning process too. An animal who developed an insight solution to escaping the
cage would show a very different frequency of responding than trial-and-error learning.
Thorndike’s approach required discrete, separate trials, which meant that the ani-
mal was put into the cage and the researcher had to be present to watch and wait
for the animal to discover the solution. It was fairly time-intensive. This changed with
B. F. Skinner’s approach to behavioral learning.
B. F. Skinner, the person most identified with operant conditioning theory, took
Thorndike’s approach and made several improvements. The cage would house the
animal more continuously instead of on a trial-to-trial basis. The cages provided a
lever or switch for pressing to receive a small amount of food. The number of times
the lever was pressed was recorded as the dependent variable. Skinner’s adapted
cage from Thorndike’s “puzzle box” became known informally as a “Skinner box.”
(See Fig. 4.4.)
As a result, the actions of the animal could be recorded cumulatively, like an odom-
eter, allowing for a much more nuanced look at the pattern of behavior. This technique
provided some benefits. First, the continuous data recording meant the animal didn’t
have to be monitored continuously. Second, the data pattern could be analyzed from a
more systematized perspective—that is, showing closer evidence of the relationship
between the behavior and its consequence (see Fig. 4.5). Usually this consequence is the
delivery of food.

FIGURE 4.4 Diagram of a Skinner box

Source: Adapted from Rice University/Wikimedia Commons. CC BY-SA 3.0.

FIGURE 4.5 Example of a cumulative response graph. In this graph, more

responses mean more learning is occurring.

Cumulative Button
Presses
0 15 30
Time (min)

Note: Hypothetical values; are approximates.

Finally, as we will see later, it was possible to rig the cage so that the lever would not
always present food, but do so only on some particular timing or schedule. This flexibility
led to many conceptual developments for operant conditioning theory as well as popular-
izing the approach.
Skinner also encouraged psychologists to get beyond viewing all behaviors as
stimulus-response driven, as Pavlov and Thorndike had done. In his view, too many
actions did not seem reflexive but involved an organism taking actions to earn or avoid
some consequence. His primary focus was the study of how existing behaviors could
become more likely to occur or “strengthened” after the presentation of a reward, a
process he named positive reinforcement. He studied the training process involved
to encourage animals to make sophisticated behaviors, such as pigeons playing ping
pong. He studied what happened when rewards were provided randomly to see if
animals developed superstitious behavior, and examined what happened when the
rewards stopped.
From the perspective of Skinner and most behaviorists, all actions by people or other
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

animals are done because they have been rewarded or positively reinforced in the past,
whether the behavior is socially desirable or not. A screaming child in a quiet store has
learned that tantrums work to get what he or she wants. Students have been encouraged
to take notes from better focus and exam performance. Someone who robs a bank teller
successfully will tend to want to do it again.
According to Skinner, all of society is affected by these basic principles of behavioral
learning. Skinner described a utopian society based on the principles of rewards rather
than punishments in Walden Two (Skinner, 1948/2005). The focus on behavior and condi-
tioning for learning, in this approach, is total. Knowledge itself is a kind of behavior, and
the language we produce and share is behavior as well (Skinner, 1957/2015). From this
perspective, language itself is knowledge, which is behavior.
Behavioral accounts of learning have relied on a popular set of techniques and tools,
including pecking cages with continuous data recorders. Sometimes mazes are used with
pathways to force the organism (which could be a large variety of species, not limited to

a mouse, rat, cat, bird, or cockroach) to make a choice between two options that present
rewards at different rates. These are called “T mazes” because they are usually in the
shape of the letter T. Behavioral learning studies have included infants as well as adults,
but with adults these are usually presented with more sophisticated, such as comput-
erized financial games in which players have to make some choices between different
incentives to earn small amounts of money.
Let’s take a closer look at the major tenets of the theory next.

Basic Components
In operant conditioning, the individual organism reacts to cues in the environment,
makes a behavior in response, and that behavior may become more or less likely to be
repeated depending on how the environment responds. The individual “operates” on the
environment, hence the name. When I see a vending machine, if I decide to drop some
coins into it, I can be conditioned by the result to repeat this behavior at a later time if
the machine provides me with what I requested. Three components must be present for
operant conditioning to occur: a cue that makes clear what behaviors are accepted (e.g.,
a vending machine), a behavior in response (e.g., paying), and a consequence from that
behavior (e.g., a bag of chips).
A behavioral response. Before any conditioning takes place, the organism has to
execute a behavior. This action is in response to the antecedent cues in the environ-
ment that signal what behaviors may be acceptable. The behavior is called a response
(sometimes “respondent behavior”), even though it precedes any consequence, since it is
presumed that the behavior follows cues that trigger the action in the environment, such
as the presence of the vending machine.
The specific action is not actually of interest, from a theoretical perspective. The
likelihood of making that behavior is of interest, however: that is, the probability of
making that same action when presented with the same antecedent cue later on (the
rate of response or “response tendency”). Will I use the vending machine the next
time I walk by it and I’m hungry? The chance of performing the action is what oper-
ant conditioning theory is attempting to explain. Additionally, the behavior itself can
be helpful to others and oneself, or it may not be. Successfully robbing a bank does
produce a reward, despite the negative consequences for society and personal risk
to one’s safety (as well as freedom). Eating salty potato chips may not be in the best
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

interest of my health. In other words, the desirability of the behavior is irrelevant to

the theory. Many “bad” or socially deviant behaviors can have a short-term reward,
such as getting high.
A consequence. For operant conditioning to occur, a consequence must arrive as
a result of the behavior. Several options are possible: there may be no consequence to
the behavior at all, something may be presented to the organism, or something may be
removed. If there is no consequence for the behavior, then the behavior and the likeli-
hood for it occurring again is said to have become extinct. No reward or punishment
occurs. The individual is now less likely to do that behavior again, since it doesn’t produce
any results. As we saw with classical conditioning theory, the process of extinction from
a lack of a consequence applies to operant conditioning as well. This might happen when
a timid student raises his hand in class and the teacher doesn’t notice the hand. The
student may decide it’s not worth trying later on. Or, the vending machine might lock up
and not deliver the chips.

Besides no consequence, two options remain. Something may be presented, or some-

thing may be removed. In operant conditioning theory, when the environment provides
something as a result of an action, it is termed positive, indicating that something was
added. (The vending machine provides the chips.) When the environment removes
something as a result of an action, it is termed negative, indicating that something was
removed. The terms positive and negative are purely mathematical, and do not indicate
an emotion or the desirability of what is added or removed. That is a separate issue.
If the machine doesn’t provide my snack, then isn’t my losing the coins into the vend-
ing machine the something that I lost, a subtraction? While this seems sensible, this is
not how operant conditioning theory views it. The payment was the necessary action in
order to receive a consequence, since the vending machine is not likely to spit out a snack
as I pass by. Something must be removed, such as dropping all my papers as I stretch to
reach the buttons, and watching students’ work fall and slip under the machine where I
can’t reach it.
A response tendency. Finally, the consequence has to have an impact on the likeli-
hood of the behavior, or response tendency, for operant conditioning to be complete.
Learning occurs when the response rate for that behavior has been affected. Assuming
a behavior was made and a consequence occurred as a result of that behavior, one of
two options is possible. The response tendency can be increased or “strengthened,”
meaning the behavior is more likely to occur in the future to the same cue, or it can be
decreased or “weakened,” meaning that the behavior is less likely to occur in the future
to the same cue.
There are specific terms and research findings that are involved when the response
tendency of a behavior is strengthened or weakened, and the next sections delve more
deeply into these issues.

Strengthening Behavior
If a behavior produces a consequence, and the consequence increases the chance of the
behavior happening again (what’s called “strengthening the behavior”), the learning is
called reinforcing, and the consequence itself is called a reinforcer. Keep in mind that
while in casual conversation people may use the word “reinforcing” to mean something
that is pleasing, from the perspective of operant conditioning theory, whether a conse-
quence is pleasing or not is irrelevant. How desirable a consequence is does not matter;
what matters is whether the probability of the behavior rises or not. Does the door open
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

when pushed? If so, then the act of pushing the door may be reinforced. If complaining
to mom and dad helps us to get what we want as children, then complaining may become
reinforced, a learned strategy we can take with us into adulthood. Sometimes what is
reinforcing is a behavior that gets us away from a consequence, such as lying to get out
of trouble.
Reinforcement shares many of the same features as Pavlov’s classical condition-
ing. Acquisition usually starts slowly and grows. If a previously reinforced behavior is
no longer reinforced (the vending machine stops working, or the door now appears to
be locked), then the behavior can follow the process of extinction and relearned more
quickly if reinforcement starts again (spontaneous recovery). Likewise, the learner
can generalize from one antecedent cue to another (other vending machines and doors)
as well as discriminate between some cues and others.
So far, Thorndike’s Law and our definition of reinforcement have only described the
scenario under which operant conditioning tends to occur. What is a reinforcer, besides

the consequence that makes the operant behavior more likely? Several theories have
been proposed to define what is learned in reinforcement beyond Thorndike’s Law of
Effect. Initially, Hull (1943) and Miller (1948) proposed that the reinforcer reduced a
biological need or drive. This certainly seems true for primary reinforcers that help an
organism to survive. Making behaviors to provide water, food, and avoidance of pain
appeases the need an organism has. But, what about other reinforcers that aren’t
primary needs?
The most popular theory today to explain what is meant by “reinforcement” is called
Premack’s principle, the idea that the relative probability of some behavior to another
is what makes it reinforcing. Premack suggested that “any response A will reinforce
any other response B, if and only if the independent rate of A is greater than that of
B” (1959, p. 220). That is, all behaviors are on a continuum of frequency rates, and any
behavior that is likely to be performed is reinforcing for a behavior that is less likely
to be performed. A less probable behavior is reinforced by a more probable behavior.
As such, the nature of the behaviors are not as important as their relative likelihood.
For example, it is not that pressing a lever has a particularly high likelihood of being
performed by a mouse so much as the likelihood of eating, which the action of working
the lever affords. Whenever parents tell a child, “you can play after you have finished
your homework,” they are implicitly using Premack’s principle. All other things being
equal, doing homework has a relatively low likelihood for being chosen by a child with
free time. But, if it is buttressed with a more preferable activity, then doing homework
seems more worthwhile. The less preferred activity opens the door to doing the more
preferred activity. This is a common way to encourage or reward a great many activities
we otherwise might not do, including household chores, paying bills, or studying.
An extension of Premack’s principle is the response deprivation hypothesis, which
claims that the reinforcement is not strictly based on the relative rates as much as
whether the rate of one behavior would restrict or hold back the rate of the other below
what is normally done. That is, when a behavior is restricted past normal (for example,
a child is not allowed to play any video games for a weekend), it usually becomes more
reinforcing when we have the opportunity to do it (Timberlake & Allison, 1974).

Kinds of Reinforcers
Reinforcers are classified by how they are presented to the learner: by whether they are
added to the situation or presented, or by whether they are removed from the situation.
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

A reinforcer that is presented following a behavior is called a positive reinforcer to

denote its additive nature, whereas a reinforcer that is removed following a behavior is
called negative reinforcer, to denote how reinforcement was caused by its removal from
the situation. In either case, the learner is pleased or satisfied with the result (hence,
“reinforcement”).
Positive reinforcers are fairly intuitive and easy to understand, since we all have
experience with receiving something we want or like. They function as rewards. Positive
reinforcers can be ones that support basic primary life functions, like food and water; or
they can be more secondary, such as material goods, attention from someone important
to us, or the opportunity to do an activity we have been wanting to do. Simple praise from
another person is a positive reinforcer.
Negative reinforcers are less intuitive, but they are omnipresent nonetheless. By
definition, something had to have been removed as a result of a behavior, and that
removal was pleasing. This might be the removal of pain after taking medication, or this

could be making a payment on a credit card to avoid the phone calls. Typically, nega-
tive reinforcers operate on escape from something unwanted or aversive, and this could
include the removal of guilt or anxiety. This kind of learning is called escape condition-
ing, and we use these to get out of and avoid situations that are threatening or difficult.
Lying, in essence, is negatively reinforced, assuming it is successful, and demonstrates
that a behavior can be reinforced, but that doesn’t make the behavior socially desirable
or acceptable.
In the long run, behaviors can be made to preemptively try to avoid an aversive
stimulus, such as taking an antihistamine before mowing the lawn. People tend to drive
in their lane to avoid accidents. Escape conditioning is usually applied to situations of
learning in the moment; in a longer time frame, the term avoidance conditioning is
used to describe when an organism works to avoid an aversive stimulus. If successful,
the behavior has become negatively reinforced. We’ll take a closer look at avoidance con-
ditioning later in this chapter.
Reinforcement is classified as conditioned reinforcement (or secondary reinforce-
ment) when the consequence can be replaced by some other stimulus that also becomes
reinforcing. For example, after a few grades have been given out in a class, an instructor
can find that just praise for good work is enough on some activities. Likewise, a puppy
being potty trained can be weaned off treats and be reinforced with pats and praise later
on, and “good dog!” becomes reinforcing on its own. The reinforcing consequence becomes
linked to other consequences in what most theorists believe is an example of classical
conditioning within the operant conditioning framework. Simple examples of conditioned
reinforcers are family photos and trophies we keep around our homes and places of work.

Brain Basis for Reinforcement

The basal ganglia, located deep inside the brain, is a major structure responsible for
supporting learning by reinforcement (see Fig. 4.6). It is a collection of components that
collects information from other areas of the cortex, the outer shell of the brain, and con-
nects to the thalamus, which will then route information back to the cortex, including
the motor cortex that is responsible for voluntary behaviors. The putamen and caudate
nucleus make up the striatum, which is the channel for input into the basal ganglia.
This routing system, overall, is called the cortico-striatal system. It works as a
feedback loop within the brain. This system, particularly the striatum, underlies rein-
forcement for actions (Shiflett & Balleine, 2011; e.g., Yin, Ostlund, Knowlton, & Balleine,
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

2005). Additionally, there is evidence that damage to the striatum makes one unable to
learn from positive reinforcement. Cook and Kesner (1988) taught a group of rats to
make a left–right direction judgment inside of a maze. Another set of rats learned a
spatial judgment decision that was based on the place in the maze, but not their own
left–right perspective. Lesions were made to the caudate nucleus of all of the rats. The
ability to make left–right judgments was impaired in the first set of rats, but no perfor-
mance problems were found with the second. They hypothesize that the caudate nucleus
is needed for an organism to process spatial cues that reference the external environ-
ment relative to itself.
Neuroscientists have noted that many of the molecules involved in this learning pro-
cess have ties to the process of drug addiction, implying that drug addiction is possibly
a kind of accelerated learning with chemical assistance (Shiflett & Balleine, 2011). But,
it’s too soon to draw firm conclusions, since the exact molecular process in learning is not
yet well established.

FIGURE 4.6 Diagram of the basal ganglia and related areas (from Chapter 3).

Caudate
nucleus

Globus
Palladus
and
Putamen

Thalamus

Amygdala

Source: Garrett (2015, Figure 11.20, p. 361).

At a neural level, the effectiveness of positive reinforcement appears to be related to

the neurotransmitter dopamine (Berridge & Robinson, 1998). Neurons in the ventral
tegmental area (VTA), a small region in the midbrain of mammals at the top of the
stem of the brain, create the dopamine that will enter the striatum. When activated,
the VTA provides a rush of arousal and excitement (e.g., Bandler, Chi, & Flynn, 1972).
When a behavior results in a successful outcome, these neurons may release dopamine
into the striatum. It could be that this dopamine release strengthens the association
between the neurons that make the behavior and the neurons that acknowledged the
result, making the behavior more likely to occur in the future, the dopamine reinforce-
ment hypothesis. Or, it could be that the release strengthens the association between
the cue and the rewarding outcome if acted on, the incentive salience hypothesis
(e.g., Berridge, 2006; Robinson & Berridge, 1993). The cue itself becomes connected to
the reward through the same dopamine system, and this is what provides the motivation
or urge to take action.
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

Factors That Impact Reinforcement

A variety of factors impact the effectiveness of the reinforcer for its ability to reinforce a
behavior. The timing and frequency of the reinforcer matters. Shorter delays are better.
Typically, the reinforcement must be fast, for several observed reasons. First, a delay
can cause the learner to fail to connect the reinforcing consequence to a behavior. Also,
the consequence could be reinforced to the wrong behavior. Or, the perceived value of the
reinforcer simply drops, the further out in time it might be obtained.
The timeliness of the reinforcer is particularly important with special populations,
such as drug users (Critchfield & Kollins, 2001) and young children in comparison to
older adults (Green, Fry, & Myerson, 1994). The need to delay reinforcement over a long
period of time is sometimes necessary, but tolerating the delay can be difficult. Cocaine
addicts, for example, have been found to be strongly affected by delays in reinforcement

compared to controls, which seems related to their being more impulsive overall (Coffey,
Gudleski, Saladin, & Brady, 2003). The delay in the reward causes a discounting of the
quality or value of the reward, an effect called delay-discounting. Another study found
heroin addicts discount monetary rewards at a rate twice as high as controls (Kirby,
Petry, & Bickel, 1999). This effect has also been found in current smokers, who discount
the value of delayed payment more than people who never smoked and ex-smokers
(Bickel, Odum, & Madden, 1999).
The frequency, or how often, the learner is reinforced plays a role over the long term.
Some behaviors can be continuously reinforced, so the individual earns the reinforce-
ment with each act. Most of the time, outside of laboratory conditions, this isn’t possible;
parents can’t reinforce their children for every appropriate behavior. Often behaviors
are on a reinforcement schedule of some kind, a topic within operant conditioning theory
that has been heavily researched. We will examine example schedules of reinforcement
more closely in a later section of this chapter.
Other known factors that affect reinforcement include the history a learner has with
a reinforcer and the schedule when it has usually been provided. Any change to an exist-
ing schedule is similar to changing the rules on the learner, and that can be met with
problems. Second, learners can show resistance to being disrupted when the rate of
behavior required is fairly high, a phenomenon known as behavioral momentum. Like
the concept of velocity of a moving body in physics, the individual keeps up a behavior
and resists change. This often occurs when we start out with easy tasks that are quick
to master and begin moving on to more challenging ones. The focused effort and resis-
tance to distraction is desirable for successfully obtaining personal goals, such as quit-
ting smoking, for example. This means that typically it is best for learning to engage
heavily in a particular activity in a particular setting (such as an office or the soccer field),
so that disruptions will be less effective (Nevin & Grace, 2000). Individual motivation for
the behavior matters as well and is a topic we will encounter in the next chapter.

Schedules of Reinforcement
Reinforcement works fastest when the reinforcing consequence is provided with each
and every appropriate behavior, a continuous reinforcement schedule. Of course, this
isn’t always feasible; it’s too demanding on the trainer. (Perhaps the only situation when
continuous reinforcement can be provided in most real-world situations is by computer,
such as video games.) Reinforcing a behavior on an intermittent or noncontinuous sched-
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

ule can provide some benefits. Researchers have found this kind of partial reinforcement
is not only effective but can make what is learned last for a longer period of time. They
have devised schedules of reinforcement where the reinforcement is contingent on how
many times a behavior has been performed or on a time interval. The performances
of people or animals on these schedules are then plotted on cumulative recordings for
comparison.
Putting a behavior on a noncontinuous schedule has a benefit for learning besides less
work for the teacher. Perhaps counterintuitively, the frequency of the conditioned behav-
ior will continue to stay high even if not being reinforced each time. When the behavior
will not trigger reinforcement every time, the behavior becomes harder to extinguish and,
instead, lasts longer. When a behavior isn’t reinforced, the learner will continue trying.
Intermittent reinforcement schedules make the behavior resistant to extinction, an effect
known as the partial reinforcement effect. So not only does partial reinforcement allow
for looser monitoring by the teacher, but also what is reinforced tends to continue.

Why this is good for the process of learning is not hard to see. It helps athletes if they
continue playing the sport even after a loss. The gambling industry would have a prob-
lem if slot machines paid out with every bet. Students study a lot of information that may
not make an appearance on an exam or even come up for discussion in class.
One possible reason for the partial reinforcement effect is that that learner may have
trouble figuring out when the reinforcement is no longer happening. If the reinforce-
ment doesn’t happen with every relevant action, then the learner has to decide when to
give up. According to the generalization decrement hypothesis (Capaldi, 1966), when
nonreinforced trials (extinction) have been part of the conditioning, the learners will be
kept from identifying when their behavior is being extinct from when they are going to
be reinforced eventually. Some days I give my dogs a rawhide chew before leaving for
work in the morning. I usually mean to, but sometimes I forget or I am running late.
But, sometimes I have simply run out of chews, and I have none to give them. The dogs
will continue to whine and look expectantly when I leave, because for them, the days
when they will get a treat and the days when they do not, appear to be identical. Another
example is playing the slots at a casino. Long periods without a payout are expected as
part of the normal interaction with the machine, and those instances when a payout may
occur “look” identical to those when one will not.
The four basic, most commonly studied kinds of schedules vary on two dimensions:
whether reinforcement is provided at a fixed or a variable number of correct behav-
iors (ratio schedules) and whether the reinforcement is provided over a fixed or vari-
able period of time (interval schedules). Initially, these were studied individually for
their effects on the rates of subject response. Generally, now these may be presented
concurrently, with one schedule operating through one lever in a cage while another
operates on a second. This way, researchers can compare the two and see which one
the subject prefers.

Ratio Schedules
One of the four basic schedules is the fixed ratio (FR) schedule. With this schedule, the
learner receives reinforcement after performing the desired action a specific or fixed
number of times. This schedule is not uncommon in sales promotions: the online store
Amazon.com currently offers free shipping if the customer racks up $35 in purchases
of stock from their warehouses. Papa John’s offers customers a free large pizza for every
$25 spent with their “rewards” program. Kroger, a Midwest grocery chain, offers a
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

$0.10 discount on fuel at their attached gas stations for each $100 of groceries purchased
in their store. School children might be offered a ticket for a prize after reading a cer-
tain number of pages or books. A very common reinforcement in “platformer” video
games such as the Super Mario Bros. is to give the player a bonus life after the collec-
tion of 100 coins. Freelance work is often like this: Payment is made whenever the work
is completed, such as writing reviews of household items for a consumer reviews web-
site. Google’s advertising payment system for YouTube can make payments based on the
number of people who look at a view and do not skip through an ad.
In any case, this schedule is defined by the delivery of the reinforcement being tied
to the correct number of behaviors being performed. Once completed, individuals can
continue for more as they wish (see Fig. 4.7).
Researchers denote these fixed interval schedules with the number of required
behaviors by using “FR” followed by the number. So, an “FR20” schedule means the
animal needs to make the action twenty times before the reinforcement will be given.

FIGURE 4.7 Sample schedules of reinforcement. The tick marks indicate

moments of reinforcement. As with other cumulative response
graphs, an increase indicates more of a behavior, and a flat line
indicates a pause.
More VR FR VI
FI

Cumulative Responses Less

Time

Source: Adapted from Gazzaniga, M., Heatherton, T., & Halpern, D. (2012).

Likewise, variable ratio schedules are denoted with a “VR,” and fixed interval and
variable interval are denoted with “FI” and “VI” in kind.
One of the notable aspects of this kind of schedule is that it tends to produce “pauses”
in performing. Immediately after the reinforcing consequence, the individual tends to
take a break, called a postreinforcement pause. Once I have earned a free pizza and
fulfilled it, I am less likely to order more pizza the following night. A video game player
might be fairly happy to receive the bonus life, but earning another will not be a top
priority immediately afterward, and the pursuit of coins may slough off. Once a child
receives the prize for reading, he or she is not likely to sit down and continue. Most often,
the individual takes a break.
These pauses might be due to fatigue on the part of the individual, or perhaps satia-
tion. Once I’ve ordered the pizza, I don’t immediately need another. A comparison of
schedules of reinforcement with rats indicated that the pauses are most likely due to
the more pressing nature of short-term consequences over long-term consequences
(Tanno & Sakagami, 2008).
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

With the variable ratio (VR) schedule, the learner is reinforced on some average
number of performances. For example, slot machines provide a payout on some average
number of pulls. Despite some advertising, the actual rate of payouts on slot machines
is generally kept an industry secret. A top jackpot for a machine may run from 1 in
50,000 to 1 in 2.5 million, depending on the payout and casino (“House Edge (Gambling
Lessons),” n.d.). As a result, there is some average amount of betting behavior that will
cause a payout, but the player does not know how much it will be. Since the process is
random as well, there is really no telling which pull will be the lucky one.
Purely commission-based sales are like this—it’s not clear how many leads the sales-
person must approach before gaining a sale. Several could come in a row, or there can be
a dry spell.
This schedule tends to produce the highest rates of performance of all of the simplest
four schedules, but it can result in burnout. Most animals and people simply cannot keep
up this pattern of responding forever.

Interval Schedules
Interval schedules are reinforcement schedules that are keyed on a time period, rather
than the number of behavioral responses. Fixed interval (FI) schedules of reinforce-
ment mean reinforcement will arrive after a specific amount of time has elapsed, as long
as the desired behavior has been done by then. It’s a fairly common method of being paid
for work: usually there is a payday, and the employee is paid for the hours worked on
that day. Similarly, the knowledge a student gains is reinforced (if you will) by having the
opportunity to use the knowledge on a scheduled exam.
It’s a simple, common schedule to keep, but it does not necessarily encourage the
best performance. As shown in Figure 4.7, the performance tends to show a scalloping
function. Whatever behaviors are necessary tend to happen just prior to the scheduled
reinforcement. So, for a weekly Friday quiz, this pattern would expect that students
would study heavily Thursday and early Friday, but would not study after the quiz at all.
This isn’t a terrible concern if the quizzes are weekly, but if the course uses a traditional
midterm and final exam format that are scheduled two months apart, students will often
wait until about a week prior to each to really engage with the material.
This “scalloping” pattern of inactivity followed by incrementally more focus and work
occurs in a number of settings. We become more alert to the boarding process as the time
for boarding comes nearer. If we are waiting for a friend to pick us up, we won’t bother
to look until it gets close to time. As the time approaches, we will check almost continu-
ously. This can happen waiting for a store to open, waiting for the doctor to call us at the
appointed time from the waiting room, or when waiting for a bus. Waiting for a deposited
check to clear the bank on a given day encourages this. With the advent of online bank-
ing, it’s become easier to check and re-check whether it has been deposited—in a pattern
that seems so similar to animals in a Skinner box excitedly making the behaviors neces-
sary for the timed release of food.
In a variable interval (VI) schedule, reinforcement comes at an average time inter-
val, but it is not scheduled as rigidly as with fixed interval. So, instead of Friday quizzes,
there may be a pop quiz once a week, on any day. My dogs know that dinner for them
will be in the evening but exactly when is not clear, so their attentiveness (and probably
hunger) is on alert anytime in the evening that I am in the kitchen before they have been
fed. Likewise, the mail is delivered once a day at a general but not specific time, so we
generally check after we are sure the mail has probably been delivered. This is a very
different behavior than with the FI schedules, when we might check increasingly if the
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

delivery time were fixed.

Variable interval schedules produce a more uniform performance, but they also don’t
produce levels of performance like VR schedules. In fact, often the behavior rate matches
the reinforcement schedule itself, a pattern called the matching law. Once people learn
when they will generally be reinforced, they create a behavior pattern that keeps them-
selves from working unnecessarily. If mail delivery is nearly always before 1:00 p.m.,
then it’s best to wait until after then to keep from having to check more than once in the
same day. One behavioral response for one reinforcement.

Other Schedules
Researchers have experimented with a variety of schedules and combinations of sched-
ules beyond the basic four. Reinforcing a response only after a certain amount of time
has passed is known as a differential reinforcement of low rates (DRL) schedule. This

schedule promotes a lower rate of responding, but with a lot of wasted efforts (Richards,
Sabol, & Seiden, 1993). If DRL reinforcement is set at every 15 seconds, then a response
at 14 seconds will be ignored and it will take another 15 seconds before a response is
reinforced. This means the learner has to pause for a period of time that he or she must
estimate. While it might make sense to try always to wait too long, such as 30 seconds, it’s
common to find study participants try right around the threshold, so more than half of
their responses go unreinforced. As a result, the organism has to create other behaviors
that take up time in between the intervals.
Alternatively, researchers can encourage bursts of high responding using a differential
reinforcement of high rates (DRH) schedule, in which a fixed number of responses have
to occur within a certain time frame. Because this schedule reinforces high rates of respond-
ing, this schedule typically produces higher bursts of responses than any other schedule.

Shaping
Having said all that, the concept of reinforcement depends on the behavior being pres-
ent to be reinforced. It doesn’t make assumptions about the learner’s ability to make
attempts at the desired behavior or to model the behavior of others. Thus, it’s theoreti-
cally necessary for the theory to explain how reinforcement works when the learner has
not yet made the behavior that would earn reinforcement. The basic idea is that a new
behavior can be produced if a series of approximate behaviors can be reinforced. This is
called shaping. Initially, any simple behavior that acts as a first step or attempt at the
behavior is reinforced. Then, gradually, the behavior that will be reinforced has to come
closer and closer to the ideal behavior. It’s similar to lowering the bar for what is accept-
able, and then carefully changing the rules to accept only a higher and higher standard
of what will qualify for the reinforcer.
For example, no one expects a child entering kindergarten to be able to write numbers
or letters perfectly on the first day. So, parents and teachers will encourage the attempt
and continuous improvement by reinforcing basic actions initially. Several months in, the
child will be expected to have made several months’ worth of progress, and an identi-
cal behavior then will not be acceptable. Similarly, a soccer coach will encourage basic
accomplishments on a “set play,” such as a corner kick, but will expect more later on in
the season. When training a dog to use a doggie door to eliminate outside, simply going
outside might be reinforced at first. Then, investigating the door and, later, using the
door as well as going outside would be reinforced.
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

In most formal education settings, students are not expected to have the material and
skills mastered on first day of the term. Hence, obtaining the fundamental skills early on
is reinforced by grades or praise; but, by the end of the term, those fundamental skills
will simply be assumed. Typically, shaping involves using incremental steps for improve-
ment that are small enough that the learner has a reasonable chance of success (and
thus, reinforcement). This means not setting intermediate goals that are unrealistic. So,
expecting all As the semester after a rough term of Fs and Ds might not be wise; focus-
ing on Cs would be a smarter move for the next term.

Response Chains
Several behaviors can be taught in a sequence as well, which are called response chains
or, more simply, chaining. As the term sounds, the learner is reinforced for producing
not just one behavior but several in a row. These can be the same action repeatedly,

but often they are several different behaviors that must be performed in a row before
reinforcement. Animal shows at zoos and dog obedience competitions provide many
examples of trained chains of behavior. Each trained behavior has a history of being
reinforced, and each behavior is on a cue. Sometimes the cues are presented by the
trainer, other times the environment naturally provides them, such as the presentation
of a hoop to jump through. A prior behavior can be the cue for the next as well; often our
morning routines for getting ready for the day are chained behaviors.
The three most commonly researched kinds of response chains are forward chains,
backward chains, and total task chains. A total task chain is another term for calling
individual behaviors at random, or a “random block” of movements as discussed in the
skill-learning section of Chapter 2.
In a forward chain, the learner is taught the behaviors in the order they will later
be retrieved. The chain of behaviors is broken down into steps, and each step is worked
on one at a time separately and then recombined with the other steps learned so far.
For example, this would mean working on the walk up to the lane to bowl a ball first,
before focusing on the arm movement. Or, a child who has to memorize the preamble to
the Declaration of Independence would start at the beginning (“When in the course of
human events, . . .”) and memorize the first section before moving on. In a forward chain,
the learner gets very good at the beginning of the chain and usually ends up lost or tired,
or both. This is probably by far the most common approach people make when trying to
learn a response chain: start at the beginning, but it may not be the best.
In a backward chain, the learner starts with the last, final behavior and works
backward from that. This approach has several advantages over the forward chain, even
though it seems odd at the time. So, the child memorizing the preamble would start at
the end first. Why might this be beneficial? Unlike the forward chain, when the learner
studies each step, the remaining steps are familiar and easy. Simultaneously, the easier
steps are at the end, when the learner is more likely to be fatigued by the performance.
Finally, during learning, each attempt is more likely to end on a positive note—familiar,
comfortable territory—than in forward chaining, where each new step ends someplace
awkward and unfamiliar. In forward chaining, the learner faces what he or she has not
learned yet with each practice. In backward chaining, each attempt should end on a suc-
cess. This is why animal trainers may prefer this approach—having teaching sessions
end on a high note, as the learner tires, is a big advantage.
Which of the three is best may depend on the needs of the situation. A child working
on a series of multiplication problems that she will have to solve in class under a time
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

limit might find that the total task chain (or random block) is best, assuming the in-
class test presents the math problems completely randomized. If the in-class timed test
looks exactly like the practice homework, then the forward or backward chains would be
more appropriate. Similarly, we usually don’t expect people to recall the preamble to the
Declaration of Independence in a randomized fashion, hence the forward or backward
chains are more appropriate.

Stimulus Control
Both of the topics of shaping and response chains involve how to encourage the learner
to create the optimal behavior(s) in a given situation. But underlying these is a need for
the environment (for example, a coach) to cue what behavior is wanted for the learner
to accurately perceive. This relates to the issues of how the learner generalizes that cue
and discriminates it from others. Altogether this involves the extent to which the cue is

under stimulus control: how the behavior is influenced by different stimuli. In some
situations, such as a construction worker operating heavy machinery or an orchestra fol-
lowing a conductor, a very precise level of stimulus control is desired, because inaccurate
communication about what behaviors are necessary can be disastrous. In a classroom
environment, educators talk about “prompting”: setting the stage for the proper set of
responses. This might involve nonverbal cues to have the class settle down (e.g., flicking
the light switch), or verbal cues that acknowledge the reinforcement for doing as asked:
“If we clean up quickly, we may be able to have an extra few minutes of recess.”
As it was with classical conditioning (Chapter 2), the roles of generalization and dis-
crimination apply to the antecedent cues that the learner may respond to. In operant
conditioning, generalization defines the act of responding similarly to stimuli that are
similar in some way. Generally, this is desirable. We want young children to learn to heed
the command to “stop!” despite the exact pitch or tenor of the voice saying it. Imagine
professional soccer players being unsure how to behave during a game because the soccer
ball was a color they hadn’t seen before. At the same time, there have to be some limits.
A soccer player should be able to identify a youth ball from a professional one. In operant
conditioning, discrimination defines the act of responding differently to different stimuli.
The ability to generalize and discriminate contexts appears to involve the hippocam-
pus (Maren, 2001). Experiments using rats attempt to isolate the relevant parts of the
brain by providing two similar but not identical cages (of different shapes or sizes). The
different cages act as different contexts. By exposing the rats to an aversive stimulus in
one cage and not the other, researchers gain some idea of whether the rats can tell the
difference between the cages by how they show fear. Damage to the dorsal or upper back
of the hippocampus have been found to make the rats more likely to become fearful in
both contexts, rather than just one (Antoniadis & McDonald, 2000; Frankland, Cestari,
Filipkowski, McDonald, & Silva, 1998; Gilbert, Kesner, & Lee, 2001). The hippocampus
appears to be necessary for being able to tell different locations apart. Neuroscientists
believe the hippocampus plays a role in indexing our experiences into separate memo-
ries, so our experiences don’t simply run together.

Biological Constraints
From earlier examples, it might begin to sound like reinforcement, particularly positive
reinforcement, can be used to train animals, children, coworkers, or spouses to do almost
anything. Despite a lot of enthusiasm from psychologists of that era, it became clear
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

after a while that the trainable behaviors had to be within the species’ biological options.
Whatever abilities or endowments evolution had given the animal, those are the only
capabilities or talents that are available to train. In fact, those behaviors that are normal
actions (behaviors such as chewing, rooting, pecking) tend to be the most easily trained.
In some cases, these “default” behaviors can overcome reinforced training over time,
despite the reinforcement. The return of instinctive behaviors for reinforcement not-
withstanding training is called instinctive drift. This is a situation where innately-based
behaviors beat out conditioned behaviors.
In this way, a framework for operant conditioning is one of communication. Instead
of replacing behavior or subverting it, the kind of learning operant conditioning is most
useful for is channeling existing behavior so that it occurs when it is most appropriate
and effective. In this sense, reinforcement is not a kind of manipulation as much as it is
instruction.

Summary
In sum, reinforcement is one of two theoretical outcomes when the environment provides
a consequence for a behavior, and it can be made on a continuous or intermittent sched-
ule. If the behavior is not being made, shaping can be used by reinforcing approximate
actions in order to arrive at the goal behavior. One behavior can be linked to another into
a chain of behaviors. The training of such response chains can be done chronologically,
in reverse, or in a random block. When a behavior is properly conditioned to a cue, it is
said to be under stimulus control, generalizable to appropriate other similar cues and yet
discriminative of inappropriate ones.

Avoidance Conditioning
Avoidance conditioning is defined as learning to make behavior to prevent or delay an
aversive stimulus, and is conceptually similar to negative reinforcement, or escape con-
ditioning. In either case, the learner has learned to make a behavior so as to avoid or get
away from some consequence. This could be paying bills early to avoid a late fee, or put-
ting on a sweater when cold. Escape conditioning refers to the act of making a behavior
to remove the aversive stimulus, like twisting to get out of a tangle of bedsheets in the
middle of the night. Avoidance conditioning is a broader concept that includes behaviors
to prevent the aversive from presenting itself, such as going to the gym earlier in the day
to avoid a crowd.
For avoidance conditioning to happen, the learner must be able to detect a cue that
signals the approaching aversive stimulus. This preaversive stimulus alerts the indi-
vidual to the possibility of danger or pain and may have been classically conditioned
in the past. A car that is weaving while being driven is a clue to others that the driver
may not be fully alert. When I notice my teenage son’s bleary eyes on the way to drop-
ping him off at school in the morning, I suspect he stayed up too late and will be a real
grouch today.
Since the behavior becomes more likely as the result of avoiding something annoy-
ing, such as a crowded gym, the process is classified as reinforcement, even though an
aversive is involved. An aversive stimulus is typically involved in punishment as well; but,
as we will see in the next section, the aversive stimulus arrives as a result of the behavior
and is usually inescapable. If I’m not paying attention and I slam my hand in the car door
as I close it, there isn’t a point during the consequence that I can “undo” the aversive to
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

escape it, making the pain a punisher. However, I can be more careful next time, which
is avoidance conditioning.
In some situations, the aversive stimulus has to be actively avoided, such as making
sure I move my hand before closing the door. In other situations, what has been learned
is passive avoidance. For example, I might not make eye contact with my boss and might
remain silent when she asks for volunteers to serve on a lengthy committee.
Avoidance learning of either kind (active or passive) tends to be difficult to unlearn.
A successful strategy for learning to avoid something unpleasant provides a sense of
control and comfort. Having to unlearn to avoid something (e.g., confrontation, getting
blood work done, paying bills on time) is often better for us in the long run. In some
cases, successfully avoiding something unpleasant means not getting to develop the skills
and experience we need to handle similar situations in the future (such as turning down
a request from a friend or coworker).

Learned Helplessness
We make behaviors to avoid situations we dislike fairly routinely, whether trying not to
stand too close to others in elevators, or choosing to sit in a quiet area of the library to
study. These avoidant behaviors are an attempt by the organism to exert control over his
or her environment. Positive reinforcement is similar in this sense; both involve attempt-
ing to improve a situation by gaining something pleasing or avoiding a problem.
In some cases, an organism can exert control over a situation but has learned not
to try. Seligman and colleagues demonstrated that it is possible to be conditioned into
being helpless (Overmier & Seligman, 1967; Seligman & Maier, 1967), a phenomenon
called learned helplessness. As a result, someone will not try to exert control even when
it’s possible, due to an experience of the same situation. In this study (as well as others
in this line of research), two groups of eight dogs were restrained in a manner similar
to how Pavlov had done in his studies. These dogs received electrical shocks that were
not physically harmful but were painful. The independent variable was whether a dog
could end the shocks by pressing a panel with its head (the “escape group”), or not (the
“no-escape group”), or a third control group of eight that didn’t participate at this point
(the “no-harness group”). In the learning phase, the dogs in the escape and no-escape
conditions received sixty-four shocks at ninety-second intervals. Normally, dogs are very
quick to learn that they can escape a punishing situation through some action. They are
conditioned to recognize any signals that predict the shock, such as a flashing light or a
buzzer. But in this study, one group was not allowed to try. The dogs in the escape condi-
tion learned that pressing the panel ended the shocks quickly; the dogs in the no-escape
condition stopped trying by the thirtieth shock.
The following day, dogs from all three conditions were placed into a cage that had a
partition separating one side from the other. An electrical current could be presented on
either side of the box in the floor. The box was designed to be easy to escape the shock
by jumping over the partition. A light would signal the side of the box where a shock was
about to be delivered ten seconds later. A dog that jumped the barrier would escape the
shock completely. The dependent variables were the percent of each group that failed to
escape the shocks and the mean time in seconds it took them to escape in ten trials.
As shown in Figure 4.8, the dogs in the escape and no-harness condition avoided the
punishment quickly; the no-harness group took longer on average, most likely because
they were learning about the relationship between the light and the shock during testing.
Six of the dogs in the no-escape group didn’t try to escape the shock at all, or did so only
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

once. These six were retested a week later, and five of the six failed to escape on every trial,
indicating that they had learned not to try. Even when one of the no-escape dogs escaped
the shock on one trial, it would revert back to no avoidance on the next (see Fig. 4.8).
Our history with a situation informs us about the likelihood of any operant behav-
ior’s being successful; behaviors that seem to have little to do with the environment are
extinct (Maier & Seligman, 1976). Since actions and outcomes are completely indepen-
dent, the organism is conditioned not to attempt any avoidance behaviors due to its his-
tory in this situation.
Studies like this are unpleasant due to the use of punishment with animals, but they
provide a striking explanation of how people learn to exert less control over their envi-
ronments in a variety of situations. Learned helplessness has also been found in humans
using unavoidable loud noises. (See Abramson, Seligman, & Teasdale, 1978, for a discus-
sion about how learned helplessness is different in humans and animals.) Seligman and col-
leagues’ work on learned helplessness ties to a wide range of real-life experiences. People
often find they have no control over some events in some part of their life, such as the death

FIGURE 4.8 When given the opportunity to escape, dogs that could not
previously avoid the shock would often not try to do so.
6

Mean Number of Failures to Escape

0
Previously Inescapable Previously Escapable

Source: Adapted from Fancher, R. E., & Rutherford, A. (2011).

of a loved one, or a chronic, serious illness. Despite what control we can exert over our lives,
experiences of a loss of control can explain the helplessness that people show later. This
hopelessness may manifest itself as depression, or simply “giving in” to the situation.
Learned helplessness can be unlearned, and possibly prevented. Simply by walking
the dogs in a no-escape condition over the barrier, they start walking over the bound-
ary on their own. Even better, animals can be inoculated better from learned helpless-
ness. If animals are first exposed to shock in a situation where they can escape, then
exposure to a situation they cannot escape does not seem to teach learned helplessness
later (Williams & Lierle, 1986). Early experiences matter for learning that control of the
situation is possible. Initial experiences in the classroom, for example, should probably
provide children with success (Seligman, 1975).
Skinner and Thorndike were themselves more impressed with what reinforcement
could do to shape learning than could punishment; but punishment can be instructive.
Unfortunately, a reliance on aversives in learning can also lead to serious secondary
problems besides learned helplessness, as we will see in the next section.
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

Weakening Behavior
As opposed to increasing the likelihood of a behavior, a behavior can also produce a con-
sequence that lowers the likelihood of that behavior’s happening again, or weakening it.
This is the formal definition of punishment. Like reinforcement, punishment may be
experienced when some aversive stimulus or activity is added after the behavior (posi-
tive punishment) or some valued object or activity is removed (negative punishment).
Recall that the use of term “positive” and “negative” is not about the emotional result
of punishment, but the addition or subtraction of a stimulus. Punishment, by definition,
decreases the behavior, so it is always experienced as something emotionally negative.
Premack’s principle can be extended to punishment. When a probable behavior is
followed by a less probable behavior, the less probable behavior is punishing. A child
may want to talk out of turn and disrupt class normally; but if it results in consequences
he or she doesn’t usually engage in, those behaviors are punishing (e.g., loss of a “good
behavior” ticket, writing his or her name on the board, or a visit to the principal’s office).

Positive punishment means the punishment comes in the form of the consequence
presenting a situation or required activity that is irritating, harmful, or even painful.
Examples of positive punishment are relatively easy to come by, simply because it is so
frequently experienced. A speeding driver may be pulled over and given a ticket. Talking
out of turn in class may provoke a verbal reprimand from the teacher. Using a hot oven
without care may result in getting burned. A parent might give a child extra chores due
to back talk.
To feel punished from the removal or omission of something valued or desired as
with negative punishment is akin to a loss of privileges (Lerman & Vorndran, 2002).
An aversive is not provided as part of the consequence, so much as the consequence is
that an activity that is valued is removed (using Premack’s principle, a behavior with a
higher likelihood of occurring is no longer permitted). This can come in a range of forms,
but common examples involve a cut in hours on the job or a demotion, a loss of driving
privileges, losing a phone due to carelessness, or being grounded. Arguably running out
of gas embodies both forms of punishment: dealing with the difficulty of being stranded,
having to call for help and rearrange plans, as well as the loss of the freedom that comes
with driving. We will see more on the topic of punishment after reviewing the extensive
research on positive reinforcement.
As we saw in the prior section on learned helplessness, a learner can learn from pun-
ishment. For a variety of reasons that we will encounter below, B. F. Skinner saw society
as overrelying on punishment as a tool for teaching, seeing reinforcement as a more opti-
mal method for encouraging behavior. Note that punishment, by itself, does not indicate
a more correct or optimal action. This is part of the problem Skinner and others have
cited. The sole use of punishment to change and correct behavior doesn’t communicate
what should occur, just what should not. Admonishing a child to stop being so messy, for
example, doesn’t focus on how or why he should keep his room clean. Scolding doesn’t
encourage cleaning so much as punish being messy. It can be subtle, but the direct mes-
sage of the punishment is aimed at ending a behavior, even if the punisher’s goal is to
encourage some other behavior.
In contrast to reinforcement, there has been a long history of debate about both the
effectiveness as well as the disadvantages of the use of punishment, as we will see next.
The use of punishment among people is so common that it’s possible to intuit a number of
the disadvantages. One disadvantage, discussed above, is that punishment tends to stop
a specific behavior rather than to discourage a range of related behaviors; and it may be
contextually specific to the situation. When people learn where a police officer is likely to
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

be parked along a stretch of road looking for speeders, people will slow down—but often
only in that area or only when they have spotted the police officer.

Effective Punishment
A recent literature review of research on punishment and its effectiveness has found
several problems (Lerman & Vorndran, 2002). For instance, the use of punishment in
real-world situations, such as classrooms, the workplace, clinics, or at home, can be com-
plicated. Punishers vary tremendously. Also, an undesirable behavior that needs to be
suppressed might be reinforced on one schedule while punishment is being administered
alongside it. But most research studies on punishment rely solely on electric shock and
animals to evaluate punishment, and usually not with concurrent schedules. So the rein-
forcement schedule is being extinct while the punishment schedule begins—a kind of
research confound. There has not been extensive, high-quality research on the use of

punishment in applied settings, despite the clear need. Also, it’s not certain how many
studies have found punishment to be ineffective, because they are not submitted for
publication or are not accepted due to the lack of results. This is a problem in terms of
gauging the overall effectiveness of the use of punishment.
Advocating the use of punishment is a problem, as Vollmer (2002) notes. While
trained behaviorists and clinicians are aware of the nuances and issues surrounding
the use of punishment, many of the people who might decide to carry it out will not be
highly trained. Without clear guidelines that would stem from having extensive, applied
research findings available, it is difficult to offer solid advice to those people (foster par-
ents, teachers, nurses) who would administer the punishment.
But, it’s clear humans and animals can learn from punishment, even when a behav-
ior is also being reinforced. In some cases, a behavior must be stopped (self-injury, for
example), and some form of punishment may have to be used. Under what arrangements
has punishment been found to be effective?
First, research has repeatedly found that the punishment has to be introduced at
full strength to encourage a response to be suppressed (Lerman & Vorndran, 2002). It
cannot be slowly increased, step-by-step, or else the learner can habituate to the punish-
ment. This is a mistake many parents make by first cajoling, then scolding, and then yell-
ing at a misbehaving child. By then, the child has begun ignoring the verbal reprimands.
The escalation of punishment is a problem, since the punisher doesn’t get what is desired
at a lower level, becomes angry, and escalates it until it’s gone overboard.
The habituation to a punisher tends to hold true only for mild punishers; stronger
punishers do not usually have this problem. This presents an ethical quandary in real-
world situations. Generally, caregivers, parents, and teachers will want to use the least
amount of punishment necessary to suppress the behavior; yet research finds escalation
to be a problem, so it’s unclear where the optimal amount lies. Often, practitioners have
to rely on common practices or guidelines from others, instead of from research findings,
to determine what is appropriate and safe at this time.
Since nearly all research on punishment has relied on a single form of punisher (elec-
tric shock), it’s not clear that other kinds of punishments may or may not hold to this
same pattern. There is some evidence that smaller amounts of punishment may be effec-
tive after larger punishments have been used, or if there is another alternative behavior
that is being reinforced at the same time.
Second, the punishment has to be delivered with each misbehavior, every time, in
what’s called a “continuous schedule,” as we will see in the next section. The learner has
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

to find that the behavior will, without a doubt, produce the punishing consequence. This
is, of course, an issue in real-world settings. Most punishments are on an intermittent,
occasional schedule. Often, when a behavior that is punished sometimes continues to
occur, the caregiver may move to a continuous schedule of punishment. Unfortunately,
research has found that a behavior that has been punished intermittently will tend to
resist a continuous schedule (Lerman & Vorndran, 2002). As a result, the intensity
of punishment may have to be increased, which can run into an escalation problem
described earlier.
Third, the punishment has to be delivered immediately after the undesired behavior,
without delay. This makes clear which behavior specifically produced the punishing con-
sequence. Of course, the person being punished may avoid the punishment, sometimes by
simply running away. Even delays of 30 seconds have been found to reduce the effective-
ness of punishment when used with adults (Banks & Vogel-Sprott, 1965). Young children
may forget the behavior even took place a few minutes ago, and it’s not entirely clear that

animals can recall the past at all (Roberts, 2002). This presents a problem, since often the
individual responsible for punishment may not be around when the behavior occurs, or a
complete lack of delay may not be possible. What may be needed is a process to “bridge”
when a behavior occurred with the use of punishment; but here, again, applied research
on these techniques is sparse (Lerman & Vorndran, 2002). Activities such as recording
the behavior for discussion later might be useful in some situations.
The immediate timing of a punisher may not always be appropriate. Most behav-
iors are already being reinforced, and if the punishment arrives before the reinforce-
ment, then the effectiveness of the punishment is reduced (or even ignored). Consider a
teacher who, knowing Timothy is about to talk out of turn and say something inappro-
priate because he usually does, frowns at him and uses his name firmly just as Timothy
blurts out some foul language in front of the class. The class reacts in shock and laughter
a second later, which Timothy enjoys. The reinforcing attention may completely negate
the on-time-delivery of the punishment from the teacher. For effectiveness, the punish-
ment has to follow the reinforcement for the behavior.
It can be helpful to create a conditioned punisher, a cue that signals that a punish-
ment is eminent if the behavior continues. This includes something as simple as using
the word “no.” When a conditioned punisher has been established, then the cue can be
used in other environments as well, generalizing the punishment. However, research on
conditioned punishers is still considered fairly preliminary (Lerman & Vorndran, 2002)
Finally, some other alternative behavior must be reinforced. Ideally it should be
some behavior that is logically or physically incompatible with the punished behavior.
For example, if a child is being punished for wandering around the classroom, then the
child should also be reinforced for sitting properly. Performing the reinforced behavior
makes the punished behavior impossible, and the combination of the two consequences
clearly conditions the child toward the wanted behavior.

Indirect Issues With Punishment

Some of the concerns regarding the use of punishment are not necessarily about the
effectiveness of punishment itself, but the by-products or secondary effects that stem
from it. Generally, these have to do with the negative emotions associated with it, and
actions the punished individual may take in response to being punished.
Unlike reinforcement, the emotional nature of being punished can become a major
issue for this approach to learning, particularly positive punishment. Whether something
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

we like is removed or something aversive is added, people and animals can become upset.
This may seem obvious, but this effect of punishment isn’t present in reinforcement and
is noticeable. When told to write their name on the board, students may emotionally
react with fear, shame, or anger. Few people enjoy being pulled over by the police and
waiting as other people drive by, watching. Regardless of the manner or form, punish-
ment involves creating a situation that is unwanted or disliked, and a negative emotional
reaction is not surprising.
In what may stem from the emotional reaction, retaliation against the punisher or
anyone else nearby is possible with punishment and virtually never exists with rein-
forcement; we are rarely angry when our actions produce an outcome that is satisfying.
The punished individual can seek to get back at the situation or person that delivered
the punishment. This can be trivial, such as punching a countertop after standing up
and feeling the sharp, unexpected pain from ramming the top of your head into an
open cabinet door in your kitchen. Or it can be much more severe, such as carrying out

acts of violence against people in some authoritative position (parents, teachers, work
supervisors, politicians, police officers).
Similarly, punished individuals may seek to simply continue the behavior, but in other
contexts—a kind of escape. Instead of learning not to do the behavior at all, the punished
animal or person will just do it somewhere else where the punisher is not around. This
could include learning not to cuss around parents, for example, instead of not cussing at
all. Essentially, punishment can become a situation that encourages avoidance learning
of the punisher, rather than suppressing the action entirely. Ironically, avoidance condi-
tioning from escaping punishment is quick and easy to learn, and it can be more effective
than the punishment itself.
Often, the issue at hand with retaliation and avoidance is that the punished individual
does not necessarily learn that the behavior provoked the punishment. More often, the
punished individual learns that the behavior plus the presence of the punisher provoked
the punisher. What the learner may have learned is the nature of the punisher. For a
parent or a teacher, for instance, this is not an ideal framework for communication. It
removes the onus of the behavior from the individual and places it on the situation, and
what is learned is essentially misdirected.
Another potential downside to punishment is total withdrawal of the individual from
the situation. As we saw with learned helplessness, a punished individual may essentially
give up regarding the situation and completely stop interacting with the environment.
Repeated, poor scores in one class could trigger a student to decide to simply give up
across all classes, like a generalized form of learned helplessness. While this does not
always occur with the use of punishment, the risk of total withdrawal increases with the
frequency of its use.
Punishment can present problems for the person administering the punishment as
well. Since punishment works best when the punishment is delivered with every single
instance of the behavior, the person relying on punishment to teach proper behavior
constantly has to monitor the situation, which can be fatiguing or simply not possible.
Additionally, compliance on the part of the punisher can be an issue. The individual in
the punishing role is often aware of at least some of the problems with punishment. As
a result, often the individuals who are supposed to mete out punishment according to
school policy (or police department policy, or what-have-you) will not apply the con-
sequences evenly. Since not everyone is equally engaged or committed to seeing this
approach through, the end result is not uniform. For example, a school district may
rely on a student color-coding system that indicates how well each child is performing.
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

Each child starts the school day on “green” and then moves to different colors as they
misbehave, down to “black.” Enough of the “black” marks may mean a loss of privi-
leges, parent conferences, or suspension. There is no formal way to recover or move
back up the color chain, so it’s purely a system of punishment. As a result, different
teachers are likely to carry this system out unevenly. Some are sure to use it con-
stantly, but others will apply it only in truly bad, egregious situations; or allow kids to
move back up; or use the system only during formal instruction time. (People can be
pretty creative or intentionally lazy with implementing systems in which they do not
see the merit.)

Decelerators
What can be missed in debates about whether punishment is appropriate or effective,
either in child rearing, in education, or in public policy on criminal behavior, is that

punishment can take a lot of forms. Some of those forms may be more suitable or
necessary depending on the situation. Below are kinds of decelerators, forms of beha
vior control for the purpose of slowing down or stopping some behavior. Some, but not
all of them, involve punishment.
One way to end a behavior is not to allow it to be reinforced any longer (extinction).
Technically, this approach is not punishment, since a stimulus is not being added or
removed as a result of the behavior. An assumption of operational conditioning theory
is that all behaviors are done for the purpose of earning or avoiding a consequence. If a
child has discovered that she likes the attention from the teacher that comes from dis-
rupting the class, then she will do it when she wants that attention. Her teacher could
start by withholding the attention—ignoring it.
This approach may or may not be possible. First, if the actions are dangerous (e.g.,
standing on a desk), highly disruptive (e.g., screaming), or considered immoral (e.g., yell-
ing inflammatory curse words at others), then immediate action has to be taken. Second,
the strength of one reinforcer tends to be greater than the strength of one extinction,
especially if the reinforcement was not continuous in the past. So, extincting a behavior
can take a while. Finally, because the reinforcement is no longer being provided, emo-
tional outbursts called extinction bursts can occur. Behavior can regress, as well, to
earlier actions that seem immature. Essentially, when extinction is occurring, the rules
of reinforcement are being changed, and the individual is likely to get upset for a period
of time. When someone tries the extinction approach, he or she has to be willing to stick
with it for a while, because to give in at some point means to reinforce on a new schedule,
making the behavior even more resistant to extinction.
Another decelerator is overcorrection, which means to have the individual make
restitution by fixing the problem and making it better, or by having the individual dem-
onstrate the correct steps (this is known as “positive-practice overcorrection”). Someone
who is arrested for painting on the school walls, for example, could be brought in on a
Saturday to repaint.
A decelerator that similarly involves promoting the proper behavior is escape
extinction. The individual cannot dodge the work or activity, but must do it—no escape is
allowed. A child who refuses to eat a healthy dinner, but likes to skip the meal, and then snack
late into the night might be told he or she must eat dinner before leaving the table from now
on. No exceptions until enough dinner has been eaten. While not enjoyable to administer, it
makes a point and forces the individual to give up learned avoidance responses.
Response blocking is the term for not allowing a behavior to occur by restraining
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

the individual. When is this necessary? When the individual is about to do something that
will injure himself or others. Like escape extinction, the goal is not to allow the undesired
action to occur.
One unusual but potentially effective method of curtailing interest in a reinforcer
is stimulus saturation. (Sometimes this technique is referred to informally as a kind
of “reverse psychology.”) If an individual is overly focused on a reinforcer, and it’s dis-
rupting other activities, the stimulus saturation approach calls for providing as much of
the reinforcer as possible—overloading the individual with it—until he or she has had
enough and interest in it wanes. A kid who has a strong need for attention from peers and
spends class time using jokes for the attention could be asked to bring a joke to class each
day to write on the board. (There are a lot of days in the school year.)
Some decelerators are forms of negative punishment. A response cost is a behavior
that produces a cost each and every time it is done, such as a late fee for not paying a

bill on time. Some classroom teachers use tokens such as tickets as rewards for good
behavior, and they can take them away as part of a planned response cost for undesir-
able behavior. Time-outs are a form of negative punishment, in that the child is removed
from participation with the group for a short period of time. Of course, this approach
assumes that the child doesn’t like the isolation; however, some children find the isolation
reinforcing.
Decelerators that involve positive punishment include verbal reprimands (scolding)
and spanking and other forms of corporal punishment. The use of rare, sharp ver-
bal reprimands is known to help focus mild misbehavior in a classroom environment,
such as calling someone’s name sharply and telling them to focus. Corporal or physical
punishment in school settings is against the law in most U.S. states, but it is legal for
parents to use in all fifty states. There is controversy surrounding it within the media.
However, the majority of psychological research is against the practice, certainly as a
primary parenting tool for a number of reasons. First, there is a high risk of all of the
disadvantages of punishment occurring with the presentation of a physical aversive from
a parent. Second, physical punishment is readily mimicked by the child with his or her
peers and siblings, which means the use of physical force becomes taught as a method
for handling relationship problems with others. It’s not uncommon that the use of physi-
cal punishment is used during a period of high emotional strain from the parent as well,
when smarter, saner thinking does not prevail. Avoiding its use entirely helps to make
sure any application of it is not done during duress. Simply put, there are many other
disciplinary options than hitting a child, and usually other options provide similar results
without resorting to the use of violence.
With any decelerator that relies on punishment, it’s important that a more appro-
priate behavior be reinforced. Ideally, the reinforced behavior should make the pun-
ished behavior impossible to do. With the reinforcement of incompatible behavior,
the desired action is encouraged while the undesired action is discouraged. So, if a
child is being punished for getting up out of her seat and wandering around the room,
then she should also be reinforced for sitting properly. If a child is forgetting to put
his name on his homework, then in addition to the point deduction (punishment), he
should also be reinforced when he gets it correct. Without this reinforcement, the
punisher is relying on avoidance learning to encourage the child to sit properly or
to remember to put down his name. It’s more direct to simply reinforce the desired
behavior.
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

Overview of Operant Conditioning Theory

The basic components of operant conditioning theory are straightforward, but grasp-
ing them presents us with problems since they are sometimes misused in the media
and casual conversation. In response to some cue in the environment, an organism
(the learner) makes a behavior that may or may not invite a response from the envi-
ronment. This operant behavior has a particular baseline rate of strength that could
be high, low, or in-between. The behavior might receive a consequence from the envi-
ronment, which will then strengthen the likelihood of the behavior’s happening again
(reinforcement) or weaken it (punishment). If there is no noticeable consequence to
the behavior, then the behavior is being extinct, from the perspective of operant con-
ditioning theory. Figure 4.9 shows the four possibilities when a consequence results
from a behavior.

FIGURE 4.9 A table of operant conditioning situations, by whether a

stimulus is presented or removed and whether the behavior is
strengthened or weakened
Behavior
Strengthened Weakened
Stimulus

Positive Positive
Presented
Reinforcement Punishment

Negative Negative
Removed
Reinforcement Punishment

Source: Adapted from Seligman, M. E., & Maier, S. F. (1967).

Notice that operant conditioning does not claim that learning might occur. The conse-
quence of a behavior will exert an influence on the base rate of the behavior, although this
may not always be true, as we will see in the next chapter. Additionally, every behavior
we make is done because of prior experience from the consequences of that behavior:
behavior is never done purposelessly. The learner is working toward an improved state
of affairs or is trying to avoid a worse one. Finally, since what is being taught and learned
is the relative frequency of behaviors, the overall desirability or societal value placed
on a behavior does not matter. Even behavior that might provoke a punishment may be
reinforced in other ways. As a result, a child may find the attention he or she gets from
acting out in class reinforcing, despite the negative consequences that follow. Getting a
tattoo is painful, but the reinforcement for it outweighs the short-term punishment.

CHAPTER SUMMARY
Operant conditioning theory attempts to explain human those behaviors that the animal or human is normally
and animal learning using strictly behavioral obser- capable of biologically.
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

vations. Reinforcement is when an individual gives a Reinforcement is typically not continuous, and
behavioral response to a cue in the environment that behaviors on a noncontinuous reinforcement schedule
results in a consequence that makes that response more show a benefit of being more resistant to extinction.
likely to happen again. Punishment is when an individual Commonly studied schedules include ratio schedules
gives a behavioral response that makes that response that involve reinforcement for a fixed or average num-
less likely to happen again. In either case, a stimulus can ber of behaviors and interval schedules that involve
be presented as part of the consequence (positive rein- reinforcement for a fixed or average length of time.
forcement or positive punishment), or a stimulus can be In addition to positive reinforcement, operant
removed (negative reinforcement or negative punish- conditioning theory also attempts to explain avoid-
ment). If a desired behavior is not being made yet, the ance conditioning and punishment. Avoidance con-
trainer can shape the behavior by reinforcing smaller ditioning involves making a behavior to escape from
behaviors that gradually approximate the desired an unwanted consequence, and success here means
behavior. Ultimately, all reinforcement is restricted to being reinforced for avoiding something. Of course,

sometimes we avoid situations that it would be better improving on his or her own, and an optimal behavior
for us to confront. should be reinforced simultaneously.
Punishment can be effective, but it has a num- Operant conditioning restricts itself to the
ber of drawbacks. Among other problems, punish- observerable and has become a strong theory in its
ment can become habituated, can produce anger and own right. It is now one of the older learning theories
other negative emotional responses, can provoke a within the field of psychology. It isn’t that operant con-
dislike of the punisher, and can result in avoidance ditioning has been debunked or discarded, but rather
conditioning rather than ending the behavior. These that theory and research have moved beyond it. New
are typically most true for positive punishment. To theory includes the social context, such as learning
be effective, punishment has to be strong, delivered from watching others; motivations and goals; as well
without delay after the undesired behavior, each time as mental representations of knowledge—as we will
it occurs. It is best if the learner has no interest in find in the next chapters.

REVIEW QUESTIONS
1. What is the primary driver of all learned behavior, 4. Can any behavior be taught with any animal using
according to operant conditioning theory? reinforcement techniques? Why not?
2. Can punishment be used to increase the frequency 5. What are the four primary kinds of schedules, and
of a behavior? Why not? how do they impact the frequency of behaviors?
3. How can shaping, response chains, and stimulus 6. Both avoidance conditioning and punishment involve
control be used to communicate to a learner that an aversive stimulus or situation, but in different
which needs to be performed? ways and to different effects. Explain the difference.

Antecedent cues 81 Corporal punishment 101 Dopamine 85

Avoidance conditioning 84 Cortico-striatal Dopamine reinforcement
Backward chain 91 system 84 hypothesis 85
Basal ganglia 84 Decelerators 100 Escape conditioning 84
Behavioral momentum 86 Delay-discounting 86 Escape extinction 100
Behavioral response 81 Delays 85 Escape latency 78
Caudate nucleus 84 Differential reinforcement of Extinction 81
Comparative psychology 77 high rates 90 Extinction bursts 100
Conditioned reinforcement 84 Differential reinforcement of Fixed interval 89
Consequence 81 low rates 89 Fixed ratio 87
Continuous reinforcement Discriminate 82 Forward chain 91
schedule 86 Discrimination 92 Generalization 92

Generalization decrement Positive reinforcer 83 Response tendency 82

hypothesis 87 Postreinforcement pause 88 Schedule 86
Generalize 82 Preaversive stimulus 93 Shaping 90
Incentive salience hypothesis 85 Premack’s principle 83 Spontaneous recovery 82
Instinctive drift 92 Punishment 95 Stimulus control 92
Interval schedules 87 Putamen 84 Stimulus saturation 100
Law of effect 78 Ratio schedules 87 Striatum 84
Matching law 89 Reinforcement of incompatible Thalamus 84
Negative 82 behavior 101 Time-outs 101
Negative punishment 95 Reinforcer 82 Total task chain 91
Negative reinforcer 83 Reinforcing 82 Trial-and-error learning 77
Operant conditioning 77 Response 81 Variable interval 89
Overcorrection 100 Response blocking 100 Variable ratio 88
Partial reinforcement effect 86 Response chains 90 Ventral tegmental area 85
Positive 82 Response cost 100 Verbal reprimands 101
Positive punishment 95 Response deprivation
Positive reinforcement 80 hypothesis 83

FURTHER RESOURCES
•• A dog that dances to the Grease soundtrack with her {{ http://www.army.mil/article/111511/It_s_a_bird__
trainer: It_s_a_plane_/
{{ https://www.youtube.com/watch?v=n936e073z58 •• An NPR story on approaching drug addiction as a
andfeature=youtu.be matter of changing habits:
•• “What Shamu Taught Me About a Happy Marriage”: {{ http://www.npr.org/blogs/health/2015/01/05/
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

A New York Times article on dolphin training 371894919/what-heroin-addiction-tells-us-about-

techniques and marriage: changing-bad-habits
{{ http://www.nytimes.com/2006/06/25/fashion/ •• A parent asks the police to supervise his spanking of
2510ve.html?pagewanted=alland_r=0 his daughter. Would an operant conditioning theorist be
•• “It’s a bird! It’s a plane!” An article by B.F. Skinner supportive of this technique for improving behavior?
describing his research on training pigeons to guide {{ http://www.today.com/health/florida-father-asks-
missiles: police-supervise-daughters-spanking–1D80400270

REFERENCES
Abramson, L. Y., Seligman, M. E., & Teasdale, J. D. (1978). for context discrimination but not for contextual condition-
Learned helplessness in humans: Critique and reformula- ing. Behavioral Neuroscience, 112(4), 863–874. https://doi
tion. Journal of Abnormal Psychology, 87(1), 49. .org/10.1037/0735–7044.112.4.863
Antoniadis, E. A., & McDonald, R. J. (2000). Amygdala, Gilbert, P. E., Kesner, R. P., & Lee, I. (2001). Dissociating
hippocampus and discriminative fear conditioning to con- hippocampal subregions: A double dissociation between
text. Behavioural Brain Research, 108(1), 1–19. https://doi dentate gyrus and CA1. Hippocampus, 11(6), 626–636.
.org/10.1016/S0166–4328(99)00121–7 https://doi.org/10.1002/hip0.1077
Bandler, R. J., Chi, C. C., & Flynn, J. P. (1972). Biting attack Green, L., Fry, A. F., & Myerson, J. (1994). Discounting of
elicited by stimulation of the ventral midbrain tegmentum delayed rewards: A life-span comparison. Psychological
of cats. Science, 177(4046), 364–366. https://doi.org/10.1126/ Science, 5(1), 33–36.
science.177.4046.364
House Edge (Gambling Lessons). (n.d.). Retrieved April 6,
Banks, R. K., & Vogel-Sprott, M. (1965). Effect of delayed 2017, from http://vegasclick.com/gambling/houseedge
punishment on an immediately rewarded response in humans.
Journal of Experimental Psychology, 70(4), 357–359. https:// Hull, C. L. (1943). Principles of behavior: An introduc-
doi.org/10.1037/h0022233 tion to behavior theory. Retrieved from http://doi.apa.org/
psycinfo/1944-00022-000
Berridge, K. C. (2006). The debate over dopamine’s role in
reward: The case for incentive salience. Psychopharmacology, Kirby, K. N., Petry, N. M., & Bickel, W. K. (1999). Heroin
191(3), 391–431. https://doi.org/10.1007/s00213–006–0578-x addicts have higher discount rates for delayed rewards
than non-drug-using controls. Journal of Experimental
Berridge, K. C., & Robinson, T. E. (1998). What is the role Psychology: General, 128(1), 78–87. https://doi.org/10.1037/
of dopamine in reward: Hedonic impact, reward learning, or
0096–3445.128.1.78
incentive salience? Brain Research Reviews, 28(3), 309–369.
https://doi.org/10.1016/S0165–0173(98)00019–8 Lerman, D. C., & Vorndran, C. M. (2002). On the status of
knowledge for using punishment implications for treating
Bickel, W. K., Odum, A. L., & Madden, G. J. (1999). Impulsivity
behavior disorders. Journal of Applied Behavior Analysis,
and cigarette smoking: Delay discounting in current, never,
35(4), 431–464. https://doi.org/10.1901/jaba.2002.35–431
and ex-smokers. Psychopharmacology, 146(4), 447.
Maier, S. F., & Seligman, M. E. (1976). Learned helplessness:
Capaldi, E. J. (1966). Partial reinforcement: A hypothesis
Theory and evidence. Journal of Experimental Psychology:
of sequential effects. Psychological Review, 73(5), 459–477.
General, 105(1), 3–46. https://doi.org/10.1037/0096–3445.105.1.3
https://doi.org/10.1037/h0023684
Coffey, S. F., Gudleski, G. D., Saladin, M. E., & Brady, Maren, S. (2001). Neurobiology of Pavlovian fear condition-
K. T. (2003). Impulsivity and rapid discounting of delayed ing. Annual Review of Neuroscience, 24(1), 897–931.
hypothetical rewards in cocaine-dependent individuals. Miller, N. E. (1948). Studies of fear as an acquirable drive:
Experimental and Clinical Psychopharmacology, 11(1),
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

I. Fear as motivation and fear-reduction as reinforcement

18–25. https://doi.org/10.1037/1064–1297.11.1.18 in the learning of new responses. Journal of Experimental
Cook, D., & Kesner, R. P. (1988). Caudate nucleus and memory Psychology, 38(1), 89.
for egocentric localization. Behavioral and Neural Biology, Nevin, J. A., & Grace, R. C. (2000). Behavioral momentum:
49(3), 332–343. https://doi.org/10.1016/S0163–1047(88)90338-X Empirical, theoretical, and metaphorical issues. Behavioral
Critchfield, T. S., & Kollins, S. H. (2001). Temporal discounting: and Brain Sciences, 23(1), 117–125.
Basic research and the analysis of socially important behavior. Overmier, J. B., & Seligman, M. E. (1967). Effects of inescap-
Journal of Applied Behavior Analysis, 34(1), 101–122. able shock upon subsequent escape and avoidance respond-
Elizabeth Lonsdorf, primatologist, emerging explorer. ing. Journal of Comparative and Physiological Psychology,
(n.d.). National Geographic. Retrieved from http://www 63(1), 28.
.nationalgeographic.com/explorers/bios/elizabeth-lonsdorf/
Premack, D. (1959). Toward empirical behavior laws:
Frankland, P. W., Cestari, V., Filipkowski, R. K., McDonald, I. Positive reinforcement. Psychological Review, 66(4),
R. J., & Silva, A. J. (1998). The dorsal hippocampus is essential 219–233. https://doi.org/10.1037/h0040891

Richards, J. B., Sabol, K. E., & Seiden, L. S. (1993). DRL Skinner, B. F. (2015). Verbal behavior. Mansfield Centre, CT:
interresponse-time distributions: Quantification by peak Martino Fine Books. (Original work published 1957)
deviation analysis. Journal of the Experimental Analysis
Tanno, T., & Sakagami, T. (2008). On the primacy of molecu-
of Behavior, 60(2), 361–385. https://doi.org/10.1901/jeab
lar processes in determining response rates under vari-
.1993.60–361
able-ratio and variable-interval schedules. Journal of the
Roberts, W. A. (2002). Are animals stuck in time? Psychological Experimental Analysis of Behavior, 89(1), 5–14. https://doi
Bulletin, 128(3), 473–489. https://doi.org/10.1037/0033–2909 .org/10.1901/jeab.2008.89–5
.128.3.473
Thorndike, E. L. (1898). Animal intelligence. Nature, 58,
Robinson, T. E., & Berridge, K. C. (1993). The neural basis 390.
of drug craving: An incentive-sensitization theory of addic-
Thorndike, E. L. (1911). Animal intelligence: Experimental
tion. Brain Research Reviews, 18(3), 247–291. https://doi
studies. New York, NY: Macmillan.
.org/10.1016/0165–0173(93)90013-P
Seligman, M. E., & Maier, S. F. (1967). Failure to escape trau- Timberlake, W., & Allison, J. (1974). Response depri-
matic shock. Journal of Experimental Psychology, 74(1), 1. vation: An empirical approach to instrumental perfor-
mance. Psychological Review, 81(2), 146–164. https://doi
Seligman, M. E. P. (1975). Helplessness: On depression, .org/10.1037/h0036101
development, and death (Vol. xv). New York, NY: W H
Freeman/Times Books/ Henry Holt and Co. Vollmer, T. R. (2002). Punishment happens: Some comments
on Lerman and Vorndran’s review. Journal of Applied
Shiflett, M. W., & Balleine, B. W. (2011). Molecular sub- Behavior Analysis, 35(4), 469–473. https://doi.org/10.1901/
strates of action control in cortico-striatal circuits. Progress jaba.2002.35–469
in Neurobiology, 95(1), 1–13. https://doi.org/10.1016/
j.pneurobi0.2011.05.007 Williams, J. L., & Lierle, D. M. (1986). Effects of stress con-
trollability, immunization, and therapy on the subsequent
Skinner, B. F. (1938). The behavior of organisms: An experi- defeat of colony intruders. Animal Learning and Behavior,
mental analysis. Retrieved from http://psycnet.apa.org/ 14(3), 305–314.
psycinfo/1939–00056–000
Yin, H. H., Ostlund, S. B., Knowlton, B. J., & Balleine, B. W.
Skinner, B. F. (1948). Superstition in the pigeon. Journal of (2005). The role of the dorsomedial striatum in instrumen-
Experimental Psychology, 38, 168–172. tal conditioning: Striatum and instrumental conditioning.
Skinner, B. F. (2005). Walden Two. Indianapolis, IN.: Hackett European Journal of Neuroscience, 22(2), 513–523. https://
Publishing Company, Inc. (Original work published 1948) doi.org/10.1111/j.1460–9568.2005.04218.x
Copyright © 2017. SAGE Publications, Incorporated. All rights reserved.

Grade 1 Sped Seatwork and Quizzes
100% (1)
Grade 1 Sped Seatwork and Quizzes
36 pages
SAFe For Teams Digital Workbook (5.1.1)
100% (1)
SAFe For Teams Digital Workbook (5.1.1)
159 pages
Roush Performance Group 4
No ratings yet
Roush Performance Group 4
9 pages
Subject Description: at The End of The Course, The Students Must Be Able To Apply Concepts and Solve Problems Involving Conic Sections, Systems of
No ratings yet
Subject Description: at The End of The Course, The Students Must Be Able To Apply Concepts and Solve Problems Involving Conic Sections, Systems of
5 pages
Science Unit Plan-Bee Bots
No ratings yet
Science Unit Plan-Bee Bots
16 pages
Learning and Memory - (Chapter 3 - Neurological Basis of Learning)
No ratings yet
Learning and Memory - (Chapter 3 - Neurological Basis of Learning)
34 pages
3 AmadeusAlteaNDC SeatAvailability 18.1 ImplementationGuide 20230623
No ratings yet
3 AmadeusAlteaNDC SeatAvailability 18.1 ImplementationGuide 20230623
81 pages
Preventive Maintenance Checklist: For Titration Excellence Titrators
No ratings yet
Preventive Maintenance Checklist: For Titration Excellence Titrators
1 page
3_SeatAvailability_18.1_Implementation_guide-v9-20201130
No ratings yet
3_SeatAvailability_18.1_Implementation_guide-v9-20201130
78 pages
Financial Trainings
No ratings yet
Financial Trainings
1 page
PHYSICAL EDUCATION ASSESSMENT FORM
No ratings yet
PHYSICAL EDUCATION ASSESSMENT FORM
1 page
SAFe Advanced Scrum Master Digital Workbook (5 - 1)
100% (1)
SAFe Advanced Scrum Master Digital Workbook (5 - 1)
175 pages
Cheat Sheet Final FMV PDF
No ratings yet
Cheat Sheet Final FMV PDF
3 pages
Rig Move Operational Process Flowchart New A4
No ratings yet
Rig Move Operational Process Flowchart New A4
1 page
Risk Management - 2013 - V5
No ratings yet
Risk Management - 2013 - V5
5 pages
Business Finance
No ratings yet
Business Finance
33 pages
Consulting Module
No ratings yet
Consulting Module
1 page
Coaching Work Flow
No ratings yet
Coaching Work Flow
1 page
Course Description Basic Competencies Course Structure Course Structure
No ratings yet
Course Description Basic Competencies Course Structure Course Structure
2 pages
CP 0n SLO in The Workplace-Lifting Plan - 6 Slides Per Page - R2
No ratings yet
CP 0n SLO in The Workplace-Lifting Plan - 6 Slides Per Page - R2
6 pages
14 - Integration Management - 2013 V5
No ratings yet
14 - Integration Management - 2013 V5
5 pages
Quality Management - 2013 - V5
No ratings yet
Quality Management - 2013 - V5
6 pages
ch15 Foundations of OS
No ratings yet
ch15 Foundations of OS
31 pages
ML0004
No ratings yet
ML0004
2 pages
Ferrexpo Annual Report 2023
No ratings yet
Ferrexpo Annual Report 2023
244 pages
R19-MOD Generic MBD Functional Tolerancing and Annotations Basics
No ratings yet
R19-MOD Generic MBD Functional Tolerancing and Annotations Basics
57 pages
Basic Petroleum Mathematics
No ratings yet
Basic Petroleum Mathematics
54 pages
KU05 AA DivideAndConquer 02
No ratings yet
KU05 AA DivideAndConquer 02
18 pages
Ch-2-Organisation-Performance
No ratings yet
Ch-2-Organisation-Performance
1 page
7.2 F Training Need Identification
No ratings yet
7.2 F Training Need Identification
1 page
Batch Management_Basic (12 Files Merged)
No ratings yet
Batch Management_Basic (12 Files Merged)
512 pages
4
No ratings yet
4
1 page
Foundations of Organization Structure: Organizational Behavior
No ratings yet
Foundations of Organization Structure: Organizational Behavior
21 pages
12 - Foundation of Organizational Structure-1
No ratings yet
12 - Foundation of Organizational Structure-1
22 pages
ITIL Objectives For Configuration Management
No ratings yet
ITIL Objectives For Configuration Management
2 pages
Wacm Cs Defining Behavior
No ratings yet
Wacm Cs Defining Behavior
1 page
Lift Truck Manual
No ratings yet
Lift Truck Manual
84 pages
The Key Elements of Governance 0
No ratings yet
The Key Elements of Governance 0
1 page
Materi 2
No ratings yet
Materi 2
49 pages
August 2018 - ADAC Functionality Indicators v1
No ratings yet
August 2018 - ADAC Functionality Indicators v1
44 pages
Matrix Diagram: Compare Two or More Sets of Data Against One Another
No ratings yet
Matrix Diagram: Compare Two or More Sets of Data Against One Another
8 pages
SNOW UserGuide 4.x
No ratings yet
SNOW UserGuide 4.x
72 pages
Untitled
No ratings yet
Untitled
1 page
Practical 25 Dsu
No ratings yet
Practical 25 Dsu
10 pages
Adobe Scan 02 Nov 2024
No ratings yet
Adobe Scan 02 Nov 2024
23 pages
PDF (Ebook PDF) Internal Controls Audit and Fraud Prevention and Detection Custom Etext For Arizona State University Download
100% (5)
PDF (Ebook PDF) Internal Controls Audit and Fraud Prevention and Detection Custom Etext For Arizona State University Download
51 pages
Chapter 15 - Foundation of Organization Structure
100% (3)
Chapter 15 - Foundation of Organization Structure
22 pages
Science Grade 5 - Tos - 4TH Quarter
No ratings yet
Science Grade 5 - Tos - 4TH Quarter
2 pages
OVERVIEW A3 v3 08032020
No ratings yet
OVERVIEW A3 v3 08032020
1 page
LineView OEE-Pocket-Guide 0118 V1
No ratings yet
LineView OEE-Pocket-Guide 0118 V1
3 pages
ITIL Foundation - Session 1 - ITIL 2011 Processmodel
No ratings yet
ITIL Foundation - Session 1 - ITIL 2011 Processmodel
1 page
New Almaz Rs KyyXXYGfXZ
No ratings yet
New Almaz Rs KyyXXYGfXZ
2 pages
CLM - COMMON HRD
No ratings yet
CLM - COMMON HRD
25 pages
COMPENSATION Class 1 PDF
No ratings yet
COMPENSATION Class 1 PDF
5 pages
IP-AP2 Week 1
No ratings yet
IP-AP2 Week 1
3 pages
18CEC304L Construction Engineering and Management Laboratory
No ratings yet
18CEC304L Construction Engineering and Management Laboratory
2 pages
BEDBI Course List PDF
No ratings yet
BEDBI Course List PDF
1 page
McKinsey 7s Framework Final (Draft 2)
50% (2)
McKinsey 7s Framework Final (Draft 2)
2 pages
Ref LSSGB Reference Materialv15 01022023130423
100% (1)
Ref LSSGB Reference Materialv15 01022023130423
415 pages
01 - New Project Funnel Process_v5
No ratings yet
01 - New Project Funnel Process_v5
1 page
Separation Anxiety
No ratings yet
Separation Anxiety
14 pages
Penn State Schreyer Thesis Archive
100% (2)
Penn State Schreyer Thesis Archive
8 pages
Spells To Bring Back Lost Lover in 48 Hours +27639233909 Traditional Healers, Love Spells / Family Problems.. Australia, Canada, Dubai, Germany
No ratings yet
Spells To Bring Back Lost Lover in 48 Hours +27639233909 Traditional Healers, Love Spells / Family Problems.. Australia, Canada, Dubai, Germany
9 pages
Beyond Victomhood
No ratings yet
Beyond Victomhood
63 pages
Dr. Dominico C. Idanan, Ceso Vi
No ratings yet
Dr. Dominico C. Idanan, Ceso Vi
2 pages
The Importance of Value Education in Schools
No ratings yet
The Importance of Value Education in Schools
7 pages
Jee
No ratings yet
Jee
126 pages
Rtm 3 Sociolinguistics and Elt 24251
No ratings yet
Rtm 3 Sociolinguistics and Elt 24251
2 pages
Employee Narrative Template 4
No ratings yet
Employee Narrative Template 4
2 pages
DRJ-2023 Vol14 E-Mag Article4 May-8
No ratings yet
DRJ-2023 Vol14 E-Mag Article4 May-8
11 pages
English A1.1 Pca Pud 8th Egb
No ratings yet
English A1.1 Pca Pud 8th Egb
19 pages
2022 Nace Internship and Co Op Survey Executive Summary
No ratings yet
2022 Nace Internship and Co Op Survey Executive Summary
7 pages
Shortt Taylor
No ratings yet
Shortt Taylor
2 pages
Staff Selection Commission: Combined Higher Secondary (10+2) Level Examination, 2021
No ratings yet
Staff Selection Commission: Combined Higher Secondary (10+2) Level Examination, 2021
2 pages
Teachers' Assessment Competence - Integrating Knowledge, Process
No ratings yet
Teachers' Assessment Competence - Integrating Knowledge, Process
13 pages
PTSOT
No ratings yet
PTSOT
28 pages
Needs Analysis Report
No ratings yet
Needs Analysis Report
9 pages
13 Doorways To Empowered - Engaged Energy Fueling Results: by Dawna Jones
No ratings yet
13 Doorways To Empowered - Engaged Energy Fueling Results: by Dawna Jones
8 pages
Abt-Ccp151-Tsm 2012-06
100% (1)
Abt-Ccp151-Tsm 2012-06
202 pages
Assignment 1 8602
No ratings yet
Assignment 1 8602
35 pages
Announcement SMP 2024-2025 Final
No ratings yet
Announcement SMP 2024-2025 Final
3 pages
Curr. Map Eng 10 - 1st Quarter - 4
No ratings yet
Curr. Map Eng 10 - 1st Quarter - 4
3 pages
Desirable Traits of A Household Worker - Final Lesson Plan
No ratings yet
Desirable Traits of A Household Worker - Final Lesson Plan
9 pages
Employee Performance Review - Organic (Sup and Up)
No ratings yet
Employee Performance Review - Organic (Sup and Up)
4 pages
Commerce EM PDF
No ratings yet
Commerce EM PDF
344 pages
DR P. N. Mishra,-Resume
No ratings yet
DR P. N. Mishra,-Resume
19 pages
Language Testing of Listening
No ratings yet
Language Testing of Listening
9 pages
Grade: 4th NCSS Standard:: People, Places, and Environments
No ratings yet
Grade: 4th NCSS Standard:: People, Places, and Environments
5 pages

Learning and Memory - (Chapter 4 - Behavioral Learning)

Uploaded by

Learning and Memory - (Chapter 4 - Behavioral Learning)

Uploaded by

CHAPTER

an organism takes, using behaviors as data (Skinner, 1938). This pro-

FIGURE 4.2 Diagram of a sample puzzle box

Source: By Jacob Sussman [Public domain], via Wikimedia Commons.

Note: Hypothetical values; are approximates.

FIGURE 4.4 Diagram of a Skinner box

Source: Adapted from Rice University/Wikimedia Commons. CC BY-SA 3.0.

FIGURE 4.5 Example of a cumulative response graph. In this graph, more

Note: Hypothetical values; are approximates.

interest of my health. In other words, the desirability of the behavior is irrelevant to

Besides no consequence, two options remain. Something may be presented, or some-

A reinforcer that is presented following a behavior is called a positive reinforcer to

Brain Basis for Reinforcement

Source: Garrett (2015, Figure 11.20, p. 361).

At a neural level, the effectiveness of positive reinforcement appears to be related to

Factors That Impact Reinforcement

FIGURE 4.7 Sample schedules of reinforcement. The tick marks indicate

Cumulative Responses Less

delivery time were fixed.

Mean Number of Failures to Escape

Source: Adapted from Fancher, R. E., & Rutherford, A. (2011).

Indirect Issues With Punishment

Overview of Operant Conditioning Theory

FIGURE 4.9 A table of operant conditioning situations, by whether a

Source: Adapted from Seligman, M. E., & Maier, S. F. (1967).

Antecedent cues 81 Corporal punishment 101 Dopamine 85

Generalization decrement Positive reinforcer 83 Response tendency 82

A New York Times article on dolphin training 371894919/what-heroin-addiction-tells-us-about-

I. Fear as motivation and fear-reduction as reinforcement

You might also like

FIGURE 4.4 Diagram of a Skinner box

FIGURE 4.5 Example of a cumulative response graph. In this graph, more

FIGURE 4.7 Sample schedules of reinforcement. The tick marks indicate

FIGURE 4.9 A table of operant conditioning situations, by whether a