Blogs · Reinforcement Learning

Reinforcement Learning - Theoretical Foundations: Part I

Finally the most intriguing part (to me)

2021.01.04 · 5 min read · by Zhenlin Wang · updated 2022-08-19

Introduction

Recently I’ve been learning about reinforcement learning from amazing lectures from David Silver. These provide an overview of the classical algorithms in RL and potential challenges for future researches, in the subsequent blogs, I’ll talk about the major aspects of RL and provide some solid math details on how algorithms in RL is executed.

What is Reinforcement Learning

An RL agent may include one or more of these components:

It is derived for a classical problem - Exploration vs Exploitation. There is no supervisor, only a reward signal. Sometimes, feedback is delayed, not instantaneous. Time really matters (sequential, non i.i.d data); and Agent’s actions affect the subsequent data it receives.

Prediction vs Control

RL problems is often classified into a prediction problem or a control problem

Markov Decision Process (MDP)

Before venturing into the exact algorithms, let’s lay out some fundamental math concepts here.

Prior Knowledge

Problem setup

  1. This is an RL setting where the environment is fully observable
  2. The current state completely characterises the process
  3. Almost all RL problems can be formalised as MDPs (Bandit are MDP with 1 state & finite/infinite actions)

Terminologies

  1. Markov Reward Process
  1. Return
  1. State-value function
  1. Bellman Equation
  1. MDP
  1. Policy
  1. State-value function and Action-value function for MDP (Differs from 3.)
  1. Applying Bellman equation on $v_{\pi}(s)$ and $q_{\pi}(s,a)$
  1. Optimality
  1. Solving for optimality
  1. To read more on extensions, refer to Page 49 of this slides.