Operant Conditioning
Operant Conditioning
WHAT IS LEARNING?
• Learning can be defined as any relatively permanent change in behavior brought by
experience or practice.
• Here, ‘relatively permanent’ refers to the fact that when people learn anything, some part of
their brain is physically changed to record what they’ve learned (Farmer et al., 2013; Loftus &
Loftus, 1980). It is a process of memory.
• The kind of experience after a response plays a role in its repetition.
• Some changes are controlled by genetic blueprint , a result of biological maturation. Once the
maturation readiness has been reached, then practice and experience play their important
part.
• Children learn to walk because their nervous systems, muscle strength, and sense of balance
have reached the point where walking is physically possible for them.
BEHAVIOR:
• Edward Thorndike (1874-1949) was one of the first researchers to explore and attempt to outline the
laws of learning voluntary responses.
• Thorndike placed a hungry cat inside a ‘puzzle box’ from which the only escape was to press a lever
located on the floor of the box. He placed a dish of food outside the box, so the cat is highly motivated
to get out. He observed that the cat would move around the box, pushing and rubbing against the wall
to escape, and eventually accidentally would push the lever to open the door. The cat was then fed from
a dish placed just outside the box.
• Here, the lever is the stimulus, the pushing of it is the response, and the consequence is both escape
and food.
• The cat did not learn to push the lever and escape right away. After a number of trials and errors, the
cat took less and less time to push the lever.
THORNDIKE’S PUZZLE BOX:
THORNDIKE’S LAWS OF LEARNING:
Thorndike explained three laws of learning on the basis of his research and also gave five
secondary laws in connection with his trial and error learning theory.
Primary Laws:
• Law of readiness:
Readiness implies physical and mental preparedness to undertake a task. It is the basis of learning.
The law of readiness is explained by the statement, “When an individual is ready to act or to learn,
he acts or learns more effectively and with greater satisfaction than when not ready.” The condition
of readiness has two effects- satisfaction and annoyance. When an individual is ready to act and
permitted, one experiences pleasure, and if not permitted, is annoyed. Similarly, an individual not
ready to learn is asked to learn, they are annoyed and if prevented from learning, is satisfied.
Law of Exercise : According to Thorndike, the law of exercise is foremost in the process of
learning. If a response to a stimulus is repeated again and again, a connection gets established
between the stimulus and the response. This connection strengthens with practice and weakens
with disuse. A person learns by practice and repetition. This Law has two aspects and as such
has two related or allied doctrines, (i) Law of Use and (ii) Law of Disuse. The Law of Use states,
“When a modifiable connection is made between a situation and a response, that connection’s
strength is, other things being equal, increased.”
• Similarly, the Law of Disuse states, “When a modifiable connection is not made between a
situation and response, over a length of time, the connection’s strength is decreased.” Briefly
we may say that other things being equal, exercise strengthens and lack of exercise weakens
the bond between situation and response.
• Law of Effect:
• According to Thorndike, the principle of effect is the fundamental law of teaching and
learning. The law states that “When pleasant or satisfying consequences follow or attend a
response, the latter tends to be repeated. When painful or annoying consequences attend a
response it tends to be eliminated.” That is the bond between the situation and response
strengthens with satisfying results and weakens-with the displeasure and discomfort.
An action which brings a feeling of pleasure is more effectively learnt, whereas an action which
brings a feeling of displeasure is not properly learned. When an action is associated with a
feeling of the annoyance the individual tends to avoid it.
SECONDARY LAWS OF LEARNING:
• The law of pre-potency of elements: According to this law, the learner reacts
selectively to the important or essential elements in the situation and neglects the other
features or elements which may be irrelevant or non-essential. The ability to deal with the
essential or the relevant part of the situation, makes analytical and insightful learning
possible.
Law Of Response by Analogy:
According to this law, the individual makes use of old experiences or acquisitions while learning a new
situation. There is a tendency to utilize common elements in the new situation that existed in a similar
past situation. The learning of driving a car, for instance, is facilitated by the earlier acquired skill of
driving a motor cycle or even riding a bicycle because the perspective or maintaining a balance and
controlling the handle helps in steering the car.
According to this law we may get a response, of which a learner is capable, associated with any other
situation to which he is sensitive. Thorndike illustrated this by the act of teaching a cat to stand up at a
command. A fish was dangled before the cat while he said ‘ stand up'. After a number of trials by
presenting the fish after uttering the command ‘stand up’, he later ousted the fish and only the command
of ‘stand up’ was found sufficient to evoke the response in the cat and she stood up on her hind legs.
BURRHUS FREDERIC SKINNER (1904-1990):
One of Skinner’s major contributions to behaviorism is the concept of reinforcement. The word
itself means to strengthen and Skinner defined reinforcement as anything that, when following
a response, causes that response to be more likely to happen again. Typically, it is a
consequence that is in some way pleasurable to the organism, relating back to Thorndike’s law
of effect.
Its effect is the heart of operant conditioning.
POSITIVE AND NEGATIVE REINFORCEMENT:
Pain can be a reinforcer too if it is removed. If a person’s behavior gets pain to stop,
the person is more likely to do the same thing again. Thus, following a response with
the removal or escape from something unpleasant will also increase the likelihood of
that response being repeated. This process is called negative reinforcement.
SCHEDULES OF REINFORCEMENT:
The timing of reinforcement can make a tremendous difference in the speed at which learning
occurs and the strength of the learned response. However, Skinner (1956) found that reinforcing
every response was not necessarily the best schedule of reinforcement.
Partial reinforcement effect: Responses that are reinforced after some, but not all, correct
responses will be more resistant to extinction than a response that receives continuous
reinforcement (a reinforcer for each and every correct response).
It may be easier to teach a new behavior using continuous reinforcement, but partially reinforced
behavior is not only more difficult to suppress but also more like real life. In the real world, people
get reinforced partially for their work.
There can be different patterns or schedules of partial reinforcement.
• When timing of the response is more important, it is called an interval schedule.
• When it is the number of responses that is important, the schedule is called ratio schedule
because a certain number of responses is required for each reinforcer.
• The other way in which schedules can differ is in whether the number of responses or interval of
time is fixed or variable.
FIXED INTERVAL SCHEDULE OF
REINFORCEMENT:
Reinforcer is received after a certain, fixed interval of time has
passed. If Professor Conner were teaching a rat to press a lever
to get food pellets, she might require the rat to push the lever at
least once within a 2- minute time span to get a pellet. It
wouldn’t matter how many times the rat pushed the bar, the rat
would only get a pellet at the end of the interval if they had
pressed the bar at least once. It is the first correct response that
gets reinforced at the end of the interval.
Such schedules do not produce a fast rate of responding. The
response rate goes up just before the reinforcer and then drops
off immediately after, until it is almost time for the next food
pellet. It’s similar to the way in which factory workers speed up
production just before payday and slow down just after payday.
VARIABLE INTERVAL SCHEDULE:
• Punishment is actually the opposite of reinforcement. It is any event or stimulus that, when
following a response, causes that response to be less likely to happen again. It weakens
responses, whereas reinforcement, both positive and negative, strengthens responses.
• There are two ways in which a punishment can happen:
• Punishment by application: It occurs when something unpleasant is added to the
situation or applied. This is the kind of punishment that many child development specialists
strongly recommend parents avoid using with their children because it can escalate into abuse.
• Punishment by removal: Behavior is punished by the removal of something pleasurable or
desired after the behavior occurs. Grounding a teenager is removing the freedom to do what
the teenager wants to do and is an example of this kind of punishment.
SHAPING:
Shaping, or behavior-shaping, is a variant of operant conditioning. Instead of waiting for a subject to exhibit a
desired behavior, any behavior leading to the target behavior is rewarded. For example, Skinner discovered
that, in order to train a rat to push a lever, any movement in the direction of the lever had to be rewarded, until
finally, the rat was trained to push a lever. Once the target behavior is reached, however, no other behavior is
rewarded. In other words, the subject behavior is shaped, or molded, into the desired form.
• For example , if Jody wanted to train his dog to jump through a hoop , he would have to start with some
behavior that the dog is already capable of doing on its own . Then he would gradually “ mold “ that starting
behavior into the jump – something the dog is capable of doing but not likely to do on its own. Jody would
have to start with the hoop on the ground in front of Rover’s face and then call the dog through the hoop,
using the treat as bait . After Rover steps through the hoop (as the shortest way to the treat), Jody should
give Rover the treat (positive reinforcement) . Then he could raise the hoop just a little , reward him for
walking thought it again, raise the hoop, reward him… until Rover is jumping through the hoop to get the
treat. The goal is achieved be reinforcing each successive approximation (small steps one after the other that
get closer and closer to the goal). Through pairing of a sound such as a whistle or clicker with the primary
reinforcer of food, animal trainers can use the sound as a secondary reinforcer and avoid having an overfed
learner.
Thank you!!