0% found this document useful (0 votes)

10 views24 pages

Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning

The document outlines lectures on Model-Free Prediction in Reinforcement Learning, focusing on Incremental Monte-Carlo Learning and Temporal-Difference Learning. It discusses the advantages and disadvantages of Monte Carlo (MC) and Temporal-Difference (TD) methods, including examples such as Blackjack and Random Walk. The content emphasizes estimating value functions for unknown Markov Decision Processes (MDPs) and the concept of bootstrapping in TD learning.

Uploaded by

Hadia Ramzan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views24 pages

Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning

Uploaded by

Hadia Ramzan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

AI-832 Reinforcement Learning

Instructor: Dr. Zuhair Zafar

Lecture # 11 & 12: Model Free Prediction

Recap

• Synchronous vs. Asynchronous Dynamic Programming

• Ideas for asynchronous DP?

Today’s Agenda

• Incremental Monte-Carlo Learning

• Temporal-Difference Learning

• Advantages and Disadvantages of MC vs. TD

Model-Free Reinforcement Learning

• Previous lectures:
• Planning by dynamic programming
• Solve a known MDP
• This week lectures:
• Model-free prediction
• Estimate the value function of an unknown MDP
• Next subsequent lectures:
• Model-free control
• Optimize the value function of an unknown MDP
Monte-Carlo Reinforcement Learning
Monte-Carlo Policy Evaluation
First Visit Monte-Carlo Policy Evaluation
Every-Visit Monte Carlo Policy Evaluation
Blackjack Example
Blackjack Value Function after Monte-Carlo Learning
Incremental Mean

Error Term

Old Estimate
The new mean, 𝜇𝑘 , is the old mean, 𝜇𝑘−1 , plus some step size, 1Τ𝑘 , (a little increment)
towards the difference between the new element, 𝑥𝑘 , and what you thought the mean was.
Incremental Monte-Carlo Updates
Example of Monte Carlo
Example episodes

• 𝑆1 , 𝑎1 , +3, 𝑆1 , 𝑎1 , +2, 𝑆2 , 𝑎2 , −4, 𝑆1 , 𝑎1 , +4, 𝑆2 , 𝑎2 , −3, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒

• 𝑆2 , 𝑎2 , −2, 𝑆1 , 𝑎1 , +3, 𝑆2 , 𝑎2 , −3, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒

Compute 𝑉 𝑆1 and 𝑉 𝑆2 using First visit and Every Visit Monte Carlo methods.
Today’s Agenda

• Incremental Monte-Carlo Learning

• Temporal-Difference Learning

• Advantages and Disadvantages of MC vs. TD

Temporal Difference Learning

The idea of substituting the remainder of the trajectory with our estimate of what will happen from
that point onwards, this is called bootstrapping. We update our guess of the value function towards
subsequent guess.
Monte-Carlo and Temporal Difference Learning
Driving Home Example
Driving Home Example: MC vs. TD
Today’s Agenda

• Incremental Monte-Carlo Learning

• Temporal-Difference Learning

• Advantages and Disadvantages of MC vs. TD

Advantages and Disadvantages of MC vs. TD
Bias/Variance Trade-Off
Advantages and Disadvantages of MC vs. TD
Random Walk Example
Random Walk: MC vs. TD

An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
42 pages
Monte Carlo 1
No ratings yet
Monte Carlo 1
245 pages
Module 5-rl
No ratings yet
Module 5-rl
54 pages
M3
No ratings yet
M3
57 pages
Model Free Prediction (2)
No ratings yet
Model Free Prediction (2)
38 pages
Lecture 5 - ModelFreePrediction
No ratings yet
Lecture 5 - ModelFreePrediction
79 pages
Lecture 4: Model-Free Prediction: David Silver
No ratings yet
Lecture 4: Model-Free Prediction: David Silver
51 pages
MLT- Module 5
No ratings yet
MLT- Module 5
77 pages
Lecture#4_Temporal_DifferenceTD_Learning_Q_Learning_&_SARSA_2024
No ratings yet
Lecture#4_Temporal_DifferenceTD_Learning_Q_Learning_&_SARSA_2024
62 pages
qp ans
No ratings yet
qp ans
40 pages
2.2+Model Free+Control
No ratings yet
2.2+Model Free+Control
92 pages
2023_week3_modelfree
No ratings yet
2023_week3_modelfree
63 pages
3 Evaluation
No ratings yet
3 Evaluation
41 pages
Model free methods
No ratings yet
Model free methods
31 pages
Module 5
No ratings yet
Module 5
40 pages
Temporal Difference Models_ Model-Free Deep RL for Model-Based Control
No ratings yet
Temporal Difference Models_ Model-Free Deep RL for Model-Based Control
14 pages
5 Temporal Difference Learning
No ratings yet
5 Temporal Difference Learning
25 pages
5th Unit Notes Full File (1)
No ratings yet
5th Unit Notes Full File (1)
22 pages
TEMPORAL DIFFERENCE LEARNING
No ratings yet
TEMPORAL DIFFERENCE LEARNING
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
ML - Unit-3 - Reinforcement Learning
No ratings yet
ML - Unit-3 - Reinforcement Learning
47 pages
DSA5102_lecture12
No ratings yet
DSA5102_lecture12
41 pages
Lecture#5 Monte Carlo Methods Part I
No ratings yet
Lecture#5 Monte Carlo Methods Part I
28 pages
19 - Monte Carlo and Temporal Difference for Markov Decision Processes.pptx
No ratings yet
19 - Monte Carlo and Temporal Difference for Markov Decision Processes.pptx
57 pages
F20-AI-L10
No ratings yet
F20-AI-L10
45 pages
Unit Iii Monte Carlo & Temporal Difference Methods
No ratings yet
Unit Iii Monte Carlo & Temporal Difference Methods
18 pages
16 RL
No ratings yet
16 RL
51 pages
f
No ratings yet
f
15 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
Lec 5
No ratings yet
Lec 5
13 pages
DLMAIRIL01_Q4-2024_Session4
No ratings yet
DLMAIRIL01_Q4-2024_Session4
80 pages
Unit 06 Temporal Difference Learning
No ratings yet
Unit 06 Temporal Difference Learning
9 pages
Monte Carlo Learning
No ratings yet
Monte Carlo Learning
14 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
RL
No ratings yet
RL
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
48 pages
Dissecting Reinforcement Learning-Part9
No ratings yet
Dissecting Reinforcement Learning-Part9
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
Unit 4
No ratings yet
Unit 4
49 pages
RL UNIT - IV
No ratings yet
RL UNIT - IV
25 pages
5.4-Reinforcement Learning-Part2-Learning-Algorithms
No ratings yet
5.4-Reinforcement Learning-Part2-Learning-Algorithms
15 pages
Unit 4
100% (1)
Unit 4
7 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
RL
No ratings yet
RL
9 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
CS-878 Lecture-02 Logistic Regression
No ratings yet
CS-878 Lecture-02 Logistic Regression
55 pages
11. Eigen Values and Eigen Vectors
No ratings yet
11. Eigen Values and Eigen Vectors
53 pages
Chapter 6: Temporal Difference Learning: Objectives of This Chapter
No ratings yet
Chapter 6: Temporal Difference Learning: Objectives of This Chapter
49 pages
Lecture W5ab
No ratings yet
Lecture W5ab
56 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
42 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Lecture 34 - Model Based Reinforcement Learning
No ratings yet
Lecture 34 - Model Based Reinforcement Learning
26 pages
Lecture 14 15 - Temporal Difference Learning, Lambda-return, Backward View of TD (Lambda)
No ratings yet
Lecture 14 15 - Temporal Difference Learning, Lambda-return, Backward View of TD (Lambda)
26 pages
Lecture W6b
No ratings yet
Lecture W6b
33 pages
ML Unit-4 - RTU
No ratings yet
ML Unit-4 - RTU
18 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Lecture W7ab
No ratings yet
Lecture W7ab
21 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Lecture 35 36 - Exploration vs. Exploitation
No ratings yet
Lecture 35 36 - Exploration vs. Exploitation
18 pages
notes
No ratings yet
notes
6 pages
Lecture 22 - Value Function Approximation
No ratings yet
Lecture 22 - Value Function Approximation
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
37 RL
No ratings yet
37 RL
18 pages
Lecture 19 - Model-free Control, Off-Policy Learning
No ratings yet
Lecture 19 - Model-free Control, Off-Policy Learning
9 pages
Self Reading - KNN - Notes
No ratings yet
Self Reading - KNN - Notes
7 pages
Lesson 8-Image Segmentation - Traditional Approaches
No ratings yet
Lesson 8-Image Segmentation - Traditional Approaches
35 pages
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet

Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning

Uploaded by

Lecture 11 12 - Model Free Prediction, Monte-Carlo Learning, Temporal Difference Learning

Uploaded by

AI-832 Reinforcement Learning

Instructor: Dr. Zuhair Zafar

Lecture # 11 & 12: Model Free Prediction

• Synchronous vs. Asynchronous Dynamic Programming

• Ideas for asynchronous DP?

• Incremental Monte-Carlo Learning

• Advantages and Disadvantages of MC vs. TD

• 𝑆1 , 𝑎1 , +3, 𝑆1 , 𝑎1 , +2, 𝑆2 , 𝑎2 , −4, 𝑆1 , 𝑎1 , +4, 𝑆2 , 𝑎2 , −3, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒

• 𝑆2 , 𝑎2 , −2, 𝑆1 , 𝑎1 , +3, 𝑆2 , 𝑎2 , −3, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒

• Incremental Monte-Carlo Learning

• Advantages and Disadvantages of MC vs. TD

• Incremental Monte-Carlo Learning

• Advantages and Disadvantages of MC vs. TD

You might also like