0% found this document useful (0 votes)
10 views24 pages

Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning

The document outlines lectures on Model-Free Prediction in Reinforcement Learning, focusing on Incremental Monte-Carlo Learning and Temporal-Difference Learning. It discusses the advantages and disadvantages of Monte Carlo (MC) and Temporal-Difference (TD) methods, including examples such as Blackjack and Random Walk. The content emphasizes estimating value functions for unknown Markov Decision Processes (MDPs) and the concept of bootstrapping in TD learning.

Uploaded by

Hadia Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning

The document outlines lectures on Model-Free Prediction in Reinforcement Learning, focusing on Incremental Monte-Carlo Learning and Temporal-Difference Learning. It discusses the advantages and disadvantages of Monte Carlo (MC) and Temporal-Difference (TD) methods, including examples such as Blackjack and Random Walk. The content emphasizes estimating value functions for unknown Markov Decision Processes (MDPs) and the concept of bootstrapping in TD learning.

Uploaded by

Hadia Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

AI-832 Reinforcement Learning

Instructor: Dr. Zuhair Zafar

Lecture # 11 & 12: Model Free Prediction


Recap

• Synchronous vs. Asynchronous Dynamic Programming

• Ideas for asynchronous DP?


Today’s Agenda

• Incremental Monte-Carlo Learning

• Temporal-Difference Learning

• Advantages and Disadvantages of MC vs. TD


Model-Free Reinforcement Learning

• Previous lectures:
• Planning by dynamic programming
• Solve a known MDP
• This week lectures:
• Model-free prediction
• Estimate the value function of an unknown MDP
• Next subsequent lectures:
• Model-free control
• Optimize the value function of an unknown MDP
Monte-Carlo Reinforcement Learning
Monte-Carlo Policy Evaluation
First Visit Monte-Carlo Policy Evaluation
Every-Visit Monte Carlo Policy Evaluation
Blackjack Example
Blackjack Value Function after Monte-Carlo Learning
Incremental Mean

Error Term

Old Estimate
The new mean, 𝜇𝑘 , is the old mean, 𝜇𝑘−1 , plus some step size, 1Τ𝑘 , (a little increment)
towards the difference between the new element, 𝑥𝑘 , and what you thought the mean was.
Incremental Monte-Carlo Updates
Example of Monte Carlo
Example episodes

• 𝑆1 , 𝑎1 , +3, 𝑆1 , 𝑎1 , +2, 𝑆2 , 𝑎2 , −4, 𝑆1 , 𝑎1 , +4, 𝑆2 , 𝑎2 , −3, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒

• 𝑆2 , 𝑎2 , −2, 𝑆1 , 𝑎1 , +3, 𝑆2 , 𝑎2 , −3, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒

Compute 𝑉 𝑆1 and 𝑉 𝑆2 using First visit and Every Visit Monte Carlo methods.
Today’s Agenda

• Incremental Monte-Carlo Learning

• Temporal-Difference Learning

• Advantages and Disadvantages of MC vs. TD


Temporal Difference Learning

The idea of substituting the remainder of the trajectory with our estimate of what will happen from
that point onwards, this is called bootstrapping. We update our guess of the value function towards
subsequent guess.
Monte-Carlo and Temporal Difference Learning
Driving Home Example
Driving Home Example: MC vs. TD
Today’s Agenda

• Incremental Monte-Carlo Learning

• Temporal-Difference Learning

• Advantages and Disadvantages of MC vs. TD


Advantages and Disadvantages of MC vs. TD
Bias/Variance Trade-Off
Advantages and Disadvantages of MC vs. TD
Random Walk Example
Random Walk: MC vs. TD

You might also like