Unit III - Part 2 - Operant Conditioning
Unit III - Part 2 - Operant Conditioning
Operant conditioning
Punishment
Reinforcement
Decrease
Increase behavior
behavior
Positive
Positive Negative
reinforcement
punishment punishment
Add appetitive Negative
Add noxious Remove appetitive
stimulus reinforcement
stimulus stimulus
following correct
following behavior following behavior
behavior
Escape
Remove noxious Active avoidance
stimulus Behavior avoids noxious
following correct stimulus
behavior
2
However, both kinds of learning can affect behavior. Classically conditioned stimuli-for example,
a picture of sweets on a box-might enhance operant conditioning by encouraging a child to approach and
open the box.
Schedules of reinforcement
Schedules of reinforcement are rules that control the delivery of reinforcement. The rules specify
either the time that reinforcement is to be made available, or the number of responses to be made, or both.
Many rules are possible, but the following are the most basic and commonly used.
Fixed interval schedule: Reinforcement occurs following the first response after a fixed time has
elapsed after the previous reinforcement. This schedule yields a "break-run" pattern of response; that is,
after training on this schedule, the organism typically pauses after reinforcement, and then begins to
respond rapidly as the time for the next reinforcement approaches.
Variable interval schedule: Reinforcement occurs following the first response after a variable
time has elapsed from the previous reinforcement. This schedule typically yields a relatively steady rate of
response that varies with the average time between reinforcements.
Fixed ratio schedule: Reinforcement occurs after a fixed number of responses have been emitted
since the previous reinforcement. An organism trained on this schedule typically pauses for a while after a
reinforcement and then responds at a high rate. If the response requirement is low there may be no pause;
if the response requirement is high the organism may quit responding altogether.
Variable ratio schedule: Reinforcement occurs after a variable number of responses have been
emitted since the previous reinforcement. This schedule typically yields a very high, persistent rate of
response.
3
Contingencies of reinforcement
Contingencies of reinforcement refer to the specific conditions and rules that govern the
relationship between behavior and its consequences in operant conditioning. These contingencies
determine when and how reinforcement or punishment is delivered based on an individual's behavior.
There are four primary types of contingencies of reinforcement: positive reinforcement, negative
reinforcement, positive punishment, and negative punishment. Here's an explanation of each with suitable
examples:
Positive Reinforcement:
Positive reinforcement occurs when a desirable consequence is added after a behavior, making the
behavior more likely to be repeated in the future. Eg. A child receives a sticker (desirable consequence)
for completing their homework (behavior). The child is more likely to continue completing homework in
the future to earn more stickers.
Negative Reinforcement:
Negative reinforcement involves the removal or avoidance of an aversive stimulus after a
behavior, increasing the likelihood of the behavior being repeated. Eg. A driver fastens their seatbelt
(behavior) to stop the annoying seatbelt warning sound (aversive stimulus). The driver is more likely to
buckle up in the future to avoid the sound.
Positive Punishment:
Positive punishment involves adding an aversive stimulus after a behavior to decrease the
likelihood of that behavior happening again. Eg. A child receives a scolding (aversive stimulus) for hitting
their sibling (behavior). The child is less likely to engage in hitting behavior in the future to avoid being
scolded.
4
Negative Punishment:
Negative punishment is the removal of a desirable stimulus after a behavior, leading to a decrease
in the likelihood of that behavior in the future. Eg. A teenager loses their phone (removal of a desirable
stimulus) for breaking curfew (behavior). The teenager is less likely to violate curfew in the future to
avoid losing their phone.
These contingencies of reinforcement are fundamental principles in behaviorism and are used to
understand and modify behavior in various contexts, from parenting and education to animal training and
psychology. The choice of which contingency to apply depends on the desired outcome and the specific
behavior that needs modification.
Shaping
Shaping is a conditioning method much used in animal training and in teaching nonverbal humans.
It depends on operant variability and reinforcement, as described above. The trainer starts by identifying
the desired final (or "target") behavior. Next, the trainer chooses a behavior that the animal or person
already emits with some probability. The form of this behavior is then gradually changed across
successive trials by reinforcing behaviors that approximate the target behavior more and more closely.
When the target behavior is finally emitted, it may be strengthened and maintained by the use of a
schedule of reinforcement.
Shaping is a technique used in operant conditioning to gradually develop and reinforce a target
behavior by rewarding successive approximations of that behavior. It involves breaking down a complex
behavior into smaller, more manageable steps and reinforcing each step until the desired behavior is
achieved. Shaping is particularly useful when the target behavior is not initially present or is rare.
Example of shaping in operant conditioning:
Teaching a Rat to Press a Lever Let's say you want to teach a rat to press a lever, but the rat has never
done this before.
Initial Behavior: The rat does not naturally press the lever, so there's no behavior to reinforce initially.
Identify Steps: You break down the desired behavior into smaller steps. These steps could include:
Approaching the lever
Touching the lever with its nose
Nudging the lever with its paw
Pressing the lever with its paw
Reinforce Approximations: You start by reinforcing the rat for any behavior that is a step closer to the
final goal. For instance, you may initially reward the rat for approaching the lever. Once that behavior is
established, you only reward the rat when it touches the lever with its nose, and so on.
Gradual Progress: As the rat becomes accustomed to receiving rewards for each successive
approximation, it will naturally start engaging in behaviors that are closer and closer to pressing the lever.
Final Behavior: Over time, with consistent reinforcement of each step, the rat learns to press the lever
with its paw, which is the final target behavior.
Shaping allows you to train complex behaviors that might not occur naturally or immediately. It relies on
the principles of positive reinforcement to guide an organism toward the desired behavior by rewarding
incremental progress.
This technique is not limited to rats; it can be used with dogs, dolphins, humans, and many other
species to teach a wide range of behaviors, from simple actions to highly complex tasks. Shaping is often
used in animal training, education, and therapy to help individuals learn and acquire new skills.
5
Learned helplessness
Learned helplessness is the behavior exhibited by a subject after enduring repeated aversive
stimuli beyond their control. It was initially thought to be caused by the subject's acceptance of their
powerlessness, by way of their discontinuing attempts to escape or avoid the aversive stimulus, even when
such alternatives are unambiguously presented. Upon exhibiting such behavior, the subject was said to
have acquired learned helplessness. Over the past few decades, neuroscience has provided insight into
learned helplessness and shown that the original theory had it backward: the brain's default state is to
assume that control is not present, and the presence of "helpfulness" is what is learned first. However, it is
unlearned when a subject is faced with prolonged aversive stimulation.
Learned helplessness is a state that occurs after a person has experienced a stressful situation
repeatedly. They come to believe that they are unable to control or change the situation, so they do not try-
even when opportunities for change become available. Psychologists first described learned helplessness
in 1967 by Prof. Martin Seligman after a series of experiments on animals, suggesting that their findings
could apply to humans. Learned helplessness leads to increased feelings of stress and depression. For
some people, it is linked with post-traumatic stress disorder (PTSD).
According to the American Psychological Association, learned helplessness occurs when someone
repeatedly faces uncontrollable, stressful situations, then does not exercise control when it becomes
available. They have “learned” that they are helpless in that situation and no longer try to change it, even
6
when change is possible. Once a person having this experience discovers that they cannot control events
around them, they lose motivation. Even if an opportunity arises that allows the person to alter their
circumstances, they do not take action. Individuals experiencing learned helplessness are often less able to
make decisions. Learned helplessness can increase a person’s risk of depression.
Prof. Martin Seligman, one of the psychologists credited with defining learned helplessness, has
detailed three key features:
- becoming passive in the face of trauma
- difficulty learning that responses can control trauma
- an increase in stress levels
Biofeedback
Biofeedback is the process of gaining greater awareness of many physiological functions of one's
own body by using electronic or other instruments, and with a goal of being able to manipulate the body's
systems at will. When you raise your hand to wave hello to a friend, or lift your knee to take another step
on the Stairmaster, you control these actions. Other body functions - like heart rate, skin temperature,
and blood pressure - are controlled involuntarily by your nervous system. You don't think about making
your heart beat faster. It just happens in response to your environment, like when you're nervous, excited,
or exercising.
One technique can help you gain more control over these normally involuntary functions. It's
called biofeedback, and the therapy is used to help prevent or treat conditions, including migraine
headaches, chronic pain, incontinence, and high blood pressure. The idea behind biofeedback is that, by
harnessing the power of your mind and becoming aware of what's going on inside your body, you can gain
more control over your health.
Biofeedback promotes relaxation, which can help relieve a number of conditions that are related to
stress. During a biofeedback session, electrodes are attached to your skin. Finger sensors can also be used.
These electrodes/sensors send signals to a monitor, which displays a sound, flash of light, or image that
represents your heart and breathing rate, blood pressure, skin temperature, sweating, or muscle activity.
When you're under stress, these functions change. Your heart rate speeds up, your muscles tighten, your blood
pressure rises, you start to sweat, and your breathing quickens. You can see these stress responses as they
happen on the monitor, and then get immediate feedback as you try to stop them. Biofeedback sessions are
typically done in a therapist's office, but there are computer programs that connect the biofeedback sensor to
your own computer.
A biofeedback therapist helps you practice relaxation exercises, which you fine-tune to control
different body functions. For example, you might use a relaxation technique to turn down the brainwaves that
activate when you have a headache. Several different relaxation exercises are used in biofeedback therapy,
including:
Deep breathing
Progressive muscle relaxation -- alternately tightening and then relaxing different muscle groups
Guided imagery -- concentrating on a specific image (such as the color and texture of an orange) to focus
your mind and make you feel more relaxed
Mindfulness meditation -- focusing your thoughts and letting go of negative emotions
As you slow your heart rate, lower your blood pressure, and ease muscle tension, you'll get instant feedback on
the screen. Eventually, you'll learn how to control these functions on your own, without the biofeedback
equipment.
7
you are hungry, then your drive is increased to one. If you are really hungry, your drive becomes two. If
you are thirsty, your drive to satisfy the hunger and thirst becomes three. As drives accumulate, your
overall motivation increases. (Detailed version will be discussed later in the next unit).
Premack Principle
The Premack principle is a theory of reinforcement that states that a less desired behavior can be
reinforced by the opportunity to engage in a more desired behavior. The theory is named after its
originator, psychologist David Premack.
The Premack principle states that a higher probability behavior will reinforce a less probable
behavior. Created by psychologist David Premack, the principle has become a hallmark of applied
behavior analysis and behavior modification. The Premack principle has received empirical support and is
frequently applied in child rearing and dog training. It is also known as relativity theory of reinforcement
or grandma's rule.
Origins of the Premack Principle
Before the Premack principle was introduced, operant conditioning held that reinforcement was
contingent upon the association of a single behavior and a single consequence. For example, if a student
does well on a test, the studying behavior that resulted in his success will be reinforced if the teacher
compliments him. In 1965, psychologist David Premack expanded on this idea to show that one behavior
could reinforce another.
Premack was studying Cebus monkeys when he observed that behaviors that an individual
naturally engages in at a higher frequency are more rewarding than those the individual engages in at a
lower frequency. He suggested that the more rewarding, higher-frequency behaviors could reinforce the
less rewarding, low-frequency behaviors.
Since Premack first shared his ideas, multiple studies with both people and animals have supported
the principle that bears his name. One of the earliest studies was conducted by Premack himself. Premack
first determined if his young child participants preferred playing pinball or eating candy. He then tested
them in two scenarios: one in which the children had to play pinball in order to eat candy and the other in
which they had to eat candy in order to play pinball. Premack found that in each scenario, only the
children who preferred the second behavior in the sequence showed a reinforcement effect, evidence for
the Premack principle.
In a later study by Allen and Iwata demonstrated that exercising amongst a group of people with
developmental disabilities increased when playing games (a high-frequency behavior) was made
contingent on exercising (a low-frequency behavior).
In another study, Welsh, Bernstein, and Luthans found that when fast food workers were promised
more time working at their favorite stations if their performance met specific standards, the quality of their
performance at other workstations improved.
Brenda Geiger found that providing seventh and eighth grade students with time to play on the
playground could reinforce learning by making play contingent on the completion of their work in the
classroom. In addition to increasing learning, this simple reinforcer increased students’ self-discipline and
the time they spent on each task, and reduced the need for teachers to discipline students.
The Premack principle can successfully be applied in many settings and has become a hallmark of
applied behavior analysis and behavior modification. Two areas in which the application of the Premack
principle has proven especially useful is child rearing and dog training. For example, when teaching a dog
9
how to play fetch, the dog must learn that if he wants to chase the ball again (highly desired behavior), he
must bring the ball back to his owner and drop it (less desired behavior).
The Premack principle is used all the time with children. Many parents have told children they
must eat their vegetables before they can have dessert or they have to finish their homework before they’re
allowed to play a video game. This tendency of caregivers to use the principle is why it is sometimes
called “grandma’s rule.” While it can be very effective with children of all ages, it’s important to note that
not all children are equally motivated by the same rewards. Therefore, in order to successfully apply the
Premack principle, caregivers must determine the behaviors that are most highly motivating to the child.
Limitations of the Premack's Principle
There are several limitations to the Premack principle. First, one’s response to an application of the
principle is dependent on context. The other activities available to the individual at a given moment and
the individual’s preferences will play a role in whether the chosen reinforcer will produce the less-
probable behavior.
Second, a high-frequency behavior will often occur at a lower rate when it’s contingent on a low-
frequency behavior than when it’s not contingent on anything. This could be the result of there being too
great a difference between the probability of performing the high and low frequency behaviors. For
example, if one hour of study time only earns one hour of video game play and studying is an extremely
low-frequency behavior while video game playing is an extremely high-frequency behavior, the individual
may decide against studying to earn video game time because the large amount of study time is too
onerous.
Baseline Behavior: Your baseline behavior is the amount of time you typically spend watching TV or
playing video games when there are no restrictions. Let's say you usually spend two hours on these
activities.
Deprivation: Your parents decide to restrict your screen time to just one hour per day until you finish
your homework. This creates a perceived deprivation because you want to spend more time engaging in
the enjoyable behavior.
Motivation: Due to the restriction, you are now motivated to engage in the behavior (screen time) even
more intensely because you're deprived of the opportunity to do so. You really want to watch TV and play
games, but you can only do it for one hour.
Completion of Homework: To gain access to the enjoyable behavior (screen time), you're more
motivated to complete your homework quickly and effectively. The restriction has made homework
completion more appealing because it allows you to enjoy your favorite activities.
So, in this scenario, the Response Deprivation Theory can be explained as follows: When access to a
desired behavior (screen time) is restricted or deprived, it increases your motivation to engage in that
behavior, which, in turn, makes you more willing to complete a less desirable task (homework) to gain
access to the enjoyable behavior.
The theory suggests that individuals will engage in less desirable behaviors if it's necessary to gain
access to more enjoyable ones, creating a balance between different activities.
In summary, the Response Deprivation Theory helps explain how restrictions or limitations on
desired behaviors can motivate individuals to engage in those behaviors more intensively to regain access,
even if they must complete less enjoyable tasks to do so.
***********************************