RL problems
- Temporal difference (TD) methods Q-learning (Sutton, 1988; Watkins, 1989)
- TD Sarsa (Rummery and Niranjan, 1994; Singh and Sutton, 1996)
- Policy search (Williams, 1992; Baxter and Bartlett, 2001)
- direct policy search (Ng and Jordan, 2000)
- Dynamic programming (Bellman, 1957)
- Model-based or Model-learning methods (Moore and Atkeson, 1993; Kearns and Singh, 1998)
- Bayesian RL (Dearden et al., 1999)
- Instance based methods (Ormoneit and Sen, 2002)
- Relational reinforcement learning (RRL) (Dzeroski et al., 2001)
- Batch learning methods (e.g., Least Squares Policy Iteration (Lagoudakis and Parr, 2003)
- Fitted-Q Iteration (Ernst et al., 2005)
- function approximators for unlimited Q and π.