ITEC327-W11-Asynchronous
ITEC327-W11-Asynchronous
Artificial
Reinforcement
Intelligence and
Learning
Machine Learning
Review of Last Week
• Data Representation
• Stacked (Deep) Autoencoders
• Generative Adversarial Networks (GANs)
Reconstructions
Generator Discriminator
• Generate data that • Tell real data from
looks similar to the fake data
training data
Schedule
Week Workshops
Key concepts in RL
agent – observations – actions – environment – rewards
Robotics
• agent: program controlling a robot
• environment: the real world
• observations: the agent observes the environment through a set
of sensors
• actions: sending signals to activate motor
• positive rewards: whenever it approaches the target destination
• negative rewards: whenever it wastes time or goes in the wrong
direction.
Ms Pac-Man
• agent: the program controlling Ms. Pac-Man
• environment: a simulation of the Atari game
• actions: 9 possible joystick positions (upper left, down, centre,…)
• observations: screenshots,
• rewards: the game points.
the cart is now moving toward the right (obs[1] > 0).
The pole is still tilted toward the right (obs[2] > 0), but
its angular velocity is now negative (obs[3] < 0), so it
will likely be tilted toward the left after the next step.
• Solution:
Run many episodes and normalize all the action returns (by subtracting
the mean and dividing by the standard deviation).
Then, actions with a negative advantage were bad while actions with a
positive advantage were good
link
Link:
Self-test
• Key Concepts:
• Environment / agent / rewards / actions
• TF-Agents Library
• Policy search
After workshop
• Review today’s workshop, including slides, jupyter notebook, and
textbook (Chapter 17)
• Go through TF’s tutorials on Reinforcement Learning and DQN