0% found this document useful (0 votes)
25 views

Operant Conditioning

The presentation provides an overview of operant conditioning and explains the relevant terminology for it including concepts like learning, behavior, reinforcement, punishment and shaping. It provides a description of Edward Thorndike's puzzle box and the laws of learning he inferred from it. It details B. F. Skinner's experiment and his findings related to reinforcement and the schedules of reinforcement.

Uploaded by

Roselle Maxwell
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Operant Conditioning

The presentation provides an overview of operant conditioning and explains the relevant terminology for it including concepts like learning, behavior, reinforcement, punishment and shaping. It provides a description of Edward Thorndike's puzzle box and the laws of learning he inferred from it. It details B. F. Skinner's experiment and his findings related to reinforcement and the schedules of reinforcement.

Uploaded by

Roselle Maxwell
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

OPERANT CONDITIONING

WHAT IS LEARNING?
• Learning can be defined as any relatively permanent change in behavior brought by
experience or practice.
• Here, ‘relatively permanent’ refers to the fact that when people learn anything, some part of
their brain is physically changed to record what they’ve learned (Farmer et al., 2013; Loftus &
Loftus, 1980). It is a process of memory.
• The kind of experience after a response plays a role in its repetition.
• Some changes are controlled by genetic blueprint , a result of biological maturation. Once the
maturation readiness has been reached, then practice and experience play their important
part.
• Children learn to walk because their nervous systems, muscle strength, and sense of balance
have reached the point where walking is physically possible for them.
BEHAVIOR:

• The actions or reactions of a person or an animal in response to external or


internal stimuli.
• There are two kinds of behavior that all organisms are capable of doing:
involuntary and voluntary.
• Classical conditioning is the kind of learning that occurs with automatic,
involuntary behavior.
OPERANT CONDITIONING:

• Operant conditioning refers to the kind of learning that applies to voluntary


behavior. It is both different and similar to classical conditioning.

• Also referred to as instrumental conditioning, it is a method of learning that


employs rewards and punishments for behavior. Through operant
conditioning, an association is made between a behavior and a consequence
(whether negative or positive) for that behavior.
EDWARD LEE THORNDIKE’S PUZZLE BOX:

• Edward Thorndike (1874-1949) was one of the first researchers to explore and attempt to outline the
laws of learning voluntary responses.
• Thorndike placed a hungry cat inside a ‘puzzle box’ from which the only escape was to press a lever
located on the floor of the box. He placed a dish of food outside the box, so the cat is highly motivated
to get out. He observed that the cat would move around the box, pushing and rubbing against the wall
to escape, and eventually accidentally would push the lever to open the door. The cat was then fed from
a dish placed just outside the box.
• Here, the lever is the stimulus, the pushing of it is the response, and the consequence is both escape
and food.
• The cat did not learn to push the lever and escape right away. After a number of trials and errors, the
cat took less and less time to push the lever.
THORNDIKE’S PUZZLE BOX:
THORNDIKE’S LAWS OF LEARNING:

Thorndike explained three laws of learning on the basis of his research and also gave five
secondary laws in connection with his trial and error learning theory.
Primary Laws:
• Law of readiness:
Readiness implies physical and mental preparedness to undertake a task. It is the basis of learning.
The law of readiness is explained by the statement, “When an individual is ready to act or to learn,
he acts or learns more effectively and with greater satisfaction than when not ready.” The condition
of readiness has two effects- satisfaction and annoyance. When an individual is ready to act and
permitted, one experiences pleasure, and if not permitted, is annoyed. Similarly, an individual not
ready to learn is asked to learn, they are annoyed and if prevented from learning, is satisfied.
Law of Exercise : According to Thorndike, the law of exercise is foremost in the process of
learning. If a response to a stimulus is repeated again and again, a connection gets established
between the stimulus and the response. This connection strengthens with practice and weakens
with disuse. A person learns by practice and repetition. This Law has two aspects and as such
has two related or allied doctrines, (i) Law of Use and (ii) Law of Disuse. The Law of Use states,
“When a modifiable connection is made between a situation and a response, that connection’s
strength is, other things being equal, increased.”
• Similarly, the Law of Disuse states, “When a modifiable connection is not made between a
situation and response, over a length of time, the connection’s strength is decreased.” Briefly
we may say that other things being equal, exercise strengthens and lack of exercise weakens
the bond between situation and response.
• Law of Effect:
• According to Thorndike, the principle of effect is the fundamental law of teaching and
learning. The law states that “When pleasant or satisfying consequences follow or attend a
response, the latter tends to be repeated. When painful or annoying consequences attend a
response it tends to be eliminated.” That is the bond between the situation and response
strengthens with satisfying results and weakens-with the displeasure and discomfort.
An action which brings a feeling of pleasure is more effectively learnt, whereas an action which
brings a feeling of displeasure is not properly learned. When an action is associated with a
feeling of the annoyance the individual tends to avoid it.
SECONDARY LAWS OF LEARNING:

• Law of Multiple response: According to it the organism varies or changes


its response till an appropriate behavior is produced. Without varying the
responses, the correspondence for the solution might never be elicited. If the
individual wants to solve a puzzle, they need to try different ways rather than
mechanically persisting in the same way. Thorndike's cat in the puzzle box
moved about and tried many ways to come out till finally she hit the lever
with her paw which opened the door and jumped out.
• The Law of Set or Attitude: Learning is guided by a total set or attitude of the organism,
which determines not only what the person will do but what will satisfy or annoy him. How an organism
will respond to a specific stimulus depends on the learner’s attitude or mental set. One learner may be
very keen to learn a task and another individual may have no interest in learning the task For instance,
unless the cricketer sets himself to make a century, he will not be able to score more runs.

• The law of pre-potency of elements: According to this law, the learner reacts
selectively to the important or essential elements in the situation and neglects the other
features or elements which may be irrelevant or non-essential. The ability to deal with the
essential or the relevant part of the situation, makes analytical and insightful learning
possible.
Law Of Response by Analogy:

According to this law, the individual makes use of old experiences or acquisitions while learning a new
situation. There is a tendency to utilize common elements in the new situation that existed in a similar
past situation. The learning of driving a car, for instance, is facilitated by the earlier acquired skill of
driving a motor cycle or even riding a bicycle because the perspective or maintaining a balance and
controlling the handle helps in steering the car.

The Law of Associative Shifting:

According to this law we may get a response, of which a learner is capable, associated with any other
situation to which he is sensitive. Thorndike illustrated this by the act of teaching a cat to stand up at a
command. A fish was dangled before the cat while he said ‘ stand up'. After a number of trials by
presenting the fish after uttering the command ‘stand up’, he later ousted the fish and only the command
of ‘stand up’ was found sufficient to evoke the response in the cat and she stood up on her hind legs.
BURRHUS FREDERIC SKINNER (1904-1990):

Skinner was the behaviorist who assumed leadership of


the field after John Watson and believed that
psychologists should only study measurable, observable
behavior. He found in the work of Thorndike a way to
explain all behavior as the product of learning. He even
gave the learning of voluntary behavior the name :
operant conditioning. Voluntary behavior is what
people and animals do to operate in the world.

He is called the Father of Operant Conditioning.


SKINNER’S EXPERIMENT:
Skinner invented the Skinner box, which is also called
the operant chamber. There is a device in the box,
which can deliver food pellets into a tray at random.
Inside the box, there is a lever which when pressed
activates the device for delivering food pellets. A
hungry rat is left inside the box. The rat exhibits random
activities while exploring the box. Accidentally the rat
presses the lever and a pellet of food is delivered. The
first time it happens, the rat does not learn the
connection between the response of lever pressing and
food pellets. Sooner or later, the rat learns that the
consequence of lever pressing is positive; lever pressing
brings food.
REINFORCEMENT:

One of Skinner’s major contributions to behaviorism is the concept of reinforcement. The word
itself means to strengthen and Skinner defined reinforcement as anything that, when following
a response, causes that response to be more likely to happen again. Typically, it is a
consequence that is in some way pleasurable to the organism, relating back to Thorndike’s law
of effect.
Its effect is the heart of operant conditioning.
POSITIVE AND NEGATIVE REINFORCEMENT:

The reinforcement of a response by the addition or experience of a pleasurable


consequence, such as a reward or a pat in the back is called positive reinforcement.
Getting food when hungry or a paycheck when you need money are examples of
positive reinforcement.

Pain can be a reinforcer too if it is removed. If a person’s behavior gets pain to stop,
the person is more likely to do the same thing again. Thus, following a response with
the removal or escape from something unpleasant will also increase the likelihood of
that response being repeated. This process is called negative reinforcement.
SCHEDULES OF REINFORCEMENT:
The timing of reinforcement can make a tremendous difference in the speed at which learning
occurs and the strength of the learned response. However, Skinner (1956) found that reinforcing
every response was not necessarily the best schedule of reinforcement.
Partial reinforcement effect: Responses that are reinforced after some, but not all, correct
responses will be more resistant to extinction than a response that receives continuous
reinforcement (a reinforcer for each and every correct response).
It may be easier to teach a new behavior using continuous reinforcement, but partially reinforced
behavior is not only more difficult to suppress but also more like real life. In the real world, people
get reinforced partially for their work.
There can be different patterns or schedules of partial reinforcement.
• When timing of the response is more important, it is called an interval schedule.
• When it is the number of responses that is important, the schedule is called ratio schedule
because a certain number of responses is required for each reinforcer.
• The other way in which schedules can differ is in whether the number of responses or interval of
time is fixed or variable.
FIXED INTERVAL SCHEDULE OF
REINFORCEMENT:
Reinforcer is received after a certain, fixed interval of time has
passed. If Professor Conner were teaching a rat to press a lever
to get food pellets, she might require the rat to push the lever at
least once within a 2- minute time span to get a pellet. It
wouldn’t matter how many times the rat pushed the bar, the rat
would only get a pellet at the end of the interval if they had
pressed the bar at least once. It is the first correct response that
gets reinforced at the end of the interval.
Such schedules do not produce a fast rate of responding. The
response rate goes up just before the reinforcer and then drops
off immediately after, until it is almost time for the next food
pellet. It’s similar to the way in which factory workers speed up
production just before payday and slow down just after payday.
VARIABLE INTERVAL SCHEDULE:

The interval of time after which the individual must


respond in order to receive a reinforcer changes
from one time to the next. E.g., a rat might receive
a food pellet when it pushes the lever every 5
minutes in average. Sometimes, the interval might
be 2 minutes, sometimes 10, but the rat must push
the lever at least once after that interval to get the
pellet. Because the rat can’t predict how long the
interval is going to be, the rat pushes the bar more
or less continuously, producing the smooth graph.
Once again, speed is not important, so the rate of
responding is slow but steady.
FIXED RATIO OF SCHEDULE:
It is the number of responses that counts. The number of
responses required to receive each reinforcer will always be
the same number.
The rate of responding is very fast, especially when
compared to the fixed interval schedule, and there are little
‘breaks’ in the response pattern immediately after a
reinforcer is given. The rapid response rate occurs because
the rat wants to get to the next reinforcer just as fast as
possible, and the number of lever pushes counts. The
pauses or breaks come right after a reinforcer, because the
rat knows about ‘about how many’ lever pushes will be
needed to get to the next reinforcer because it’s always the
same. Fixed schedules– both ratio and interval– are
predictable, which allows rest breaks.
In human terms, anyone who does piecework, in which a
certain number of items have to be completed before
payment is given, is reinforced on a fixed ratio schedule.
VARIABLE RATIO OF SCHEDULE:

In this the number of responses changes from one trial to


another. In the rat example, the rat might be expected to push
the bar an average of 20 times to get reinforcement. That
means that sometimes the rat would push the lever only 10
times before a reinforcer comes, but at other times it might
take 30 lever pushes or more.
The figure shows a line that is just as rapid a response rate as
the fixed ratio of schedule because the number of responses
still matters. But the graph is much smoother because the rat
is taking no rest breaks. It can’t afford to do so because it
doesn’t know how many times it may have to push the lever to
get the next food pellet. The rat pushes as fast as they can and
eats while pushing. It is the unpredictability of the variable
schedule that makes the responses more or less continuous–
just as in a variable interval schedule.
Buying lottery tickets, or any kind of gambling is an example of
this schedule. People don’t know how many tickets they will
have to buy, and they’re afraid that if they don’t buy the next
one, that will be the ticket that would have won, so they keep
buying and buying.
• Regardless of the schedule of reinforcement one uses, two additional factors
contribute to making reinforcer of a behavior as effective as possible.
• Timing: A reinforcer should be given as immediately as possible after the
desired behavior. Delaying reinforcement tends not to work well especially
when dealing with animals and small children.
• Behavior reinforced: The second factor is to reinforce only the desired
behavior. Many parents make the mistake of giving a child who has not done
some chore the promised treat anyway, completely undermining the child’s
learning of the chore.
PUNISHMENT:

• Punishment is actually the opposite of reinforcement. It is any event or stimulus that, when
following a response, causes that response to be less likely to happen again. It weakens
responses, whereas reinforcement, both positive and negative, strengthens responses.
• There are two ways in which a punishment can happen:
• Punishment by application: It occurs when something unpleasant is added to the
situation or applied. This is the kind of punishment that many child development specialists
strongly recommend parents avoid using with their children because it can escalate into abuse.
• Punishment by removal: Behavior is punished by the removal of something pleasurable or
desired after the behavior occurs. Grounding a teenager is removing the freedom to do what
the teenager wants to do and is an example of this kind of punishment.
SHAPING:
Shaping, or behavior-shaping, is a variant of operant conditioning. Instead of waiting for a subject to exhibit a
desired behavior, any behavior leading to the target behavior is rewarded. For example, Skinner discovered
that, in order to train a rat to push a lever, any movement in the direction of the lever had to be rewarded, until
finally, the rat was trained to push a lever. Once the target behavior is reached, however, no other behavior is
rewarded. In other words, the subject behavior is shaped, or molded, into the desired form.
• For example , if Jody wanted to train his dog to jump through a hoop , he would have to start with some
behavior that the dog is already capable of doing on its own . Then he would gradually “ mold “ that starting
behavior into the jump – something the dog is capable of doing but not likely to do on its own. Jody would
have to start with the hoop on the ground in front of Rover’s face and then call the dog through the hoop,
using the treat as bait . After Rover steps through the hoop (as the shortest way to the treat), Jody should
give Rover the treat (positive reinforcement) . Then he could raise the hoop just a little , reward him for
walking thought it again, raise the hoop, reward him… until Rover is jumping through the hoop to get the
treat. The goal is achieved be reinforcing each successive approximation (small steps one after the other that
get closer and closer to the goal). Through pairing of a sound such as a whistle or clicker with the primary
reinforcer of food, animal trainers can use the sound as a secondary reinforcer and avoid having an overfed
learner.
Thank you!!

You might also like