Blogs · Reinforcement Learning · Machine Learning

Reinforcement Learning: Theoretical Foundations, Part I

A practical introduction to reinforcement learning concepts: agent, environment, state, action, reward, policy, return, and the exploration-exploitation tradeoff.

2021.01.04 · 1 min read · by Zhenlin Wang

Introduction

Reinforcement learning (RL) studies how an agent learns to make decisions by interacting with an environment.

At each time step:

  1. The agent observes a state.
  2. The agent chooses an action.
  3. The environment returns a reward and a new state.
  4. The agent updates its behavior to get more reward over time.

Core Terms

The goal is to learn a policy that maximizes expected return.

Return and Discounting

The discounted return is:

$$ G_t = r_{t+1} + \gamma r_{t+2} + \gamma^2 r_{t+3} + \dots $$

where $\gamma$ is the discount factor.

Discounting helps make long-horizon problems mathematically manageable and encodes how much future rewards matter.

Exploration and Exploitation

The agent must balance:

Too little exploration can trap the agent in a poor policy. Too much exploration can waste reward.

Why RL Is Hard

RL is difficult because:

RL is powerful, but it should not be used when supervised learning, planning, or simpler optimization would solve the problem.

Closing

The foundation of RL is the agent-environment loop. Everything else, value functions, policy gradients, Q-learning, and actor-critic methods, builds on this loop.