Heron Event-Driven MARL

Introduction

HERON is a private domain-agnostic multi-agent reinforcement learning framework for hierarchical control systems where deployment timing is not clean, synchronized, or fully observable.

Most MARL policies are trained under env.step(), where every agent observes, acts, and updates in lockstep. Real systems behave differently: sensors tick at different rates, observations arrive late, and actuators respond after delay. HERON keeps the same agent definitions across step-based training and event-driven evaluation, so the deployment gap can be measured instead of guessed.

What it focuses on

Heterogeneous agent schedules with delay and jitter controls
Declarative feature visibility for realistic observability
Pluggable coordination protocols for hierarchical agents
Dual-mode execution for step-based training and event-driven testing
Power-grid and built-in demo environments for policy stress testing

Why I built it

The point of HERON is to make asynchronous deployment failures attributable. Instead of treating performance drops as one vague sim-to-real gap, the framework lets experiments isolate timing, observability, and coordination structure as separate causes.