0% found this document useful (0 votes)

54 views3 pages

Fundamentals of Reinforcement Learning Learning Objectives

This document outlines the modules and lessons that will be covered in a course on fundamentals of reinforcement learning. Module 1 covers the k-armed bandit problem and explores action value estimation methods, exploration vs exploitation tradeoffs, and optimism in the face of uncertainty. Module 2 introduces Markov decision processes and defines goals, episodes, returns, and modeling continuing tasks. Module 3 discusses value functions, policies, and Bellman equations. Module 4 examines dynamic programming techniques including policy evaluation, policy iteration, and generalized policy iteration.

Uploaded by

Rishav Goyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views3 pages

Fundamentals of Reinforcement Learning Learning Objectives

Uploaded by

Rishav Goyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Fundamentals of Reinforcement Learning: Learning Objectives

Module 00: Welcome to the Course

Understand the prerequisites, goals and roadmap for the course.

Module 01: The K-Armed Bandit Problem

Lesson 1: The K-Armed Bandit Problem

Define reward
Understand the temporal nature of the bandit problem
Define k-armed bandit
Define action-values

Lesson 2: What to Learn? Estimating Action Values

Define action-value estimation methods
Define exploration and exploitation
Select actions greedily using an action-value function
Define online learning
Understand a simple online sample-average action-value estimation method
Define the general online update equation
Understand why we might use a constant stepsize in the case of non-stationarity

Lesson 3: Exploration vs. Exploitation Tradeoff

Define epsilon-greedy
Compare the short-term benefits of exploitation and the long-term benefits of
exploration
Understand optimistic initial values
Describe the benefits of optimistic initial values for early exploration
Explain the criticisms of optimistic initial values
Describe the upper confidence bound action selection method
Define optimism in the face of uncertainty

Module 02: Markov Decision Processes

Lesson 1: Introduction to Markov Decision Processes

Understand Markov Decision Processes, or MDPs
Describe how the dynamics of an MDP are defined
Understand the graphical representation of a Markov Decision Process
Explain how many diverse processes can be written in terms of the MDP
framework

Lesson 2: Goal of Reinforcement Learning

Describe how rewards relate to the goal of an agent
Understand episodes and identify episodic tasks

Lesson 3: Continuing Tasks

Formulate returns for continuing tasks using discounting
Describe how returns at successive time steps are related to each other
Understand when to formalize a task as episodic or continuing

Module 03: Values Functions & Bellman Equations

Lesson 1: Policies and Value Functions

Recognize that a policy is a distribution over actions for each possible state
Describe the similarities and differences between stochastic and deterministic
policies
Identify the characteristics of a well-defined policy
Generate examples of valid policies for a given MDP
Describe the roles of state-value and action-value functions in reinforcement
learning
Describe the relationship between value functions and policies
Create examples of valid value functions for a given MDP

Lesson 2: Bellman Equations

Derive the Bellman equation for state-value functions
Derive the Bellman equation for action-value functions
Understand how Bellman equations relate current and future values
Use the Bellman equations to compute value functions

Lesson 3: Optimality (Optimal Policies & Value Functions)

Define an optimal policy
Understand how a policy can be at least as good as every other policy in every
state
Identify an optimal policy for given MDPs
Derive the Bellman optimality equation for state-value functions
Derive the Bellman optimality equation for action-value functions
Understand how the Bellman optimality equations relate to the previously
introduced Bellman equations
Understand the connection between the optimal value function and optimal
policies
Verify the optimal value function for given MDPs

Module 04: Dynamic Programming

Lesson 1: Policy Evaluation (Prediction)

Understand the distinction between policy evaluation and control
Explain the setting in which dynamic programming can be applied, as well as its
limitations
Outline the iterative policy evaluation algorithm for estimating state values under
a given policy
Apply iterative policy evaluation to compute value functions

Lesson 2: Policy Iteration (Control)

Understand the policy improvement theorem
Use a value function for a policy to produce a better policy for a given MDP
Outline the policy iteration algorithm for finding the optimal policy
Understand “the dance of policy and value”
Apply policy iteration to compute optimal policies and optimal value functions

Lesson 3: Generalized Policy Iteration

Understand the framework of generalized policy iteration
Outline value iteration, an important example of generalized policy iteration
Understand the distinction between synchronous and asynchronous dynamic
programming methods
Describe brute force search as an alternative method for searching for an optimal
policy
Describe Monte Carlo as an alternative method for learning a value function
Understand the advantage of Dynamic programming and “bootstrapping” over
these alternative strategies for finding the optimal policy

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (650)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1859)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (651)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4104)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1278)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (945)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Simulation-Based Optimization: Abhijit Gosavi
100% (1)
Simulation-Based Optimization: Abhijit Gosavi
530 pages
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (929)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (841)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2547)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
A Multi Agent Deep Reinforcement Learning Approach Enabled D - 2022 - Energy Con
No ratings yet
A Multi Agent Deep Reinforcement Learning Approach Enabled D - 2022 - Energy Con
16 pages
Review Article Drl-Based Intelligent Resource Allocation For Diverse Qos in 5G and Toward 6G Vehicular Networks: A Comprehensive Survey
No ratings yet
Review Article Drl-Based Intelligent Resource Allocation For Diverse Qos in 5G and Toward 6G Vehicular Networks: A Comprehensive Survey
21 pages
Physics-Based Deep Learning
No ratings yet
Physics-Based Deep Learning
220 pages
19A05502T Artificial Intelligence
No ratings yet
19A05502T Artificial Intelligence
2 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
487 pages
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
No ratings yet
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
103 pages
Cyberspace Monitoring Using AI and Graph Theoretic Tools
No ratings yet
Cyberspace Monitoring Using AI and Graph Theoretic Tools
38 pages
RL Concepts and Methods
No ratings yet
RL Concepts and Methods
8 pages
An Implementation of Genetic Algorithms As A Basis For A Trading System On The Foreign Exchange Market
No ratings yet
An Implementation of Genetic Algorithms As A Basis For A Trading System On The Foreign Exchange Market
7 pages
AWS DEEPRACER (AutoRecovered) (AutoRecovered)
No ratings yet
AWS DEEPRACER (AutoRecovered) (AutoRecovered)
9 pages
Learning From Reinforcement: - Introduction (10.1) - Failure Is The Surest Path To Success (10.2)
No ratings yet
Learning From Reinforcement: - Introduction (10.1) - Failure Is The Surest Path To Success (10.2)
12 pages
100 Practical Applications and Use Cases of Generative AI in Media EN
100% (1)
100 Practical Applications and Use Cases of Generative AI in Media EN
117 pages
Fai Unit 4 Notes
No ratings yet
Fai Unit 4 Notes
21 pages
CS5500: Reinforcement Learning Assignment 3: Additional Guidelines
No ratings yet
CS5500: Reinforcement Learning Assignment 3: Additional Guidelines
7 pages
A Survey of Deep Reinforcement Learning in Video Games
No ratings yet
A Survey of Deep Reinforcement Learning in Video Games
13 pages
Survey
No ratings yet
Survey
20 pages
Unit 5 2
No ratings yet
Unit 5 2
31 pages
AI Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems
No ratings yet
AI Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems
20 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
A Survey On Test-Time Scaling in Large Language Models: What, How, Where, and How Well
No ratings yet
A Survey On Test-Time Scaling in Large Language Models: What, How, Where, and How Well
43 pages
Deep Reinforcement Learning-Based Collaborative Vi
No ratings yet
Deep Reinforcement Learning-Based Collaborative Vi
16 pages
Auto-Scaling Techniques For Elastic Applications in Cloud Environments
No ratings yet
Auto-Scaling Techniques For Elastic Applications in Cloud Environments
44 pages
Machine Learning Based Computation Offloading in Mul 2024 Journal of Systems
No ratings yet
Machine Learning Based Computation Offloading in Mul 2024 Journal of Systems
31 pages
Visual Language Navigation: A Survey and Open Challenges: Sang Min Park Young Gab Kim
No ratings yet
Visual Language Navigation: A Survey and Open Challenges: Sang Min Park Young Gab Kim
63 pages
Week-8 Lecture Notes
No ratings yet
Week-8 Lecture Notes
121 pages
GAN Technical Final Report
No ratings yet
GAN Technical Final Report
21 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
CV PHD Draft2
No ratings yet
CV PHD Draft2
2 pages
A - Review - of - Artificial - Intelligence - Applications - in - Manufacturing - Operations
No ratings yet
A - Review - of - Artificial - Intelligence - Applications - in - Manufacturing - Operations
19 pages

Fundamentals of Reinforcement Learning Learning Objectives

Uploaded by

Fundamentals of Reinforcement Learning Learning Objectives

Uploaded by

Fundamentals of Reinforcement Learning: Learning Objectives

Module 00: Welcome to the Course

Module 01: The K-Armed Bandit Problem

Lesson 1: The K-Armed Bandit Problem

Lesson 2: What to Learn? Estimating Action Values

Lesson 3: Exploration vs. Exploitation Tradeoff

Module 02: Markov Decision Processes

Lesson 1: Introduction to Markov Decision Processes

Lesson 2: Goal of Reinforcement Learning

Lesson 3: Continuing Tasks

Module 03: Values Functions & Bellman Equations

Lesson 1: Policies and Value Functions

Lesson 2: Bellman Equations

Lesson 3: Optimality (Optimal Policies & Value Functions)

Module 04: Dynamic Programming

Lesson 1: Policy Evaluation (Prediction)

Lesson 2: Policy Iteration (Control)

Lesson 3: Generalized Policy Iteration

You might also like