The master equation in mean field games, part 1 (via Delarue 2021)

Published:

These are my informal notes on the master equation in mean field game theory, closely following the accessible sketch in Delarue (2021) with the same notation and supplemented when necessary.

See the reference for full details.

What is the master equation?

Originally from physics, the idea of a master equation is to have a unified equation to model the evolution of a system over time and (probabilistic) states. (Wiki)

When the probabilities of the elementary processes are known, one can write down a continuity equation for W, from which all other equations can be derived and which we will call therefore the “master” equation.

Hence, the master equation in field game theory is a unified equation that describes succintly a mean field game system: the “backward” equation for agent optimization, and the “forward” equation for population dynamics, as well as the initial conditions.

Part 1: setting up the mean field game

Basic setup

Let players be \(i \in \{1 \dots N\}\) and times \(t \in \mathcal{T} = \{1 \dots T\}\). Each player \(i\) has a dynamic \(d\)-dimensional state \(X^i_t \in \mathbb{R}^d\), which follows the dynamics:

\[\begin{align} dX_t^i &= \alpha_t^i dt + dW_t^i & t \in [0, T] \end{align}\]

with an idiosyncratic Brownian motion \(W_t^i\) and progressively-measurable process \(\{a_t^i\}_{t \in \mathcal{T}}\), where all the \((W_t^j, a_t^j)\) are independent across all players \(j\). Initial conditions IID.

\(i\) faces the cost functional:

\[\begin{align} J^i (\alpha^1, \dots, \alpha^N) &= \mathbb{E}\left[ g(X_T^i, \bar{\mu}_T^N) + \int_0^T \left(f(X_t^i, \bar{\mu}_t^N) + \frac{1}{2} |\alpha_t^i|^2\right) dt \right] &\text{cost functional} \\ \bar{\mu}_t^N &= \frac{1}{N} \sum_{i=1}^N \delta_{X_t^i} &\text{empirical measure} \end{align}\]

noting \(\alpha^i\) is a stochastic process \(\{\alpha^i_t\}\). Intuitively, player \(i\) controls its strategy \(\alpha^i\) to optimize \(J^i\), against the strategies of other players \(\alpha^j\) for \(j \neq i\).

The mean field game (MFG) problem

Generalizing this (and removing subscripts \(i\)), let us formulate the mean field game problem.

First, the state dynamics for any identical agent still follow the above dynamics.

\[\begin{align} d X_t &= \alpha_t dt + d W_t \end{align}\]

Second, the distribution of these agent states evolves as a flow of probability measures \(\{\mu_t\}_{t \in \mathcal{T}}\), where \(\mu_t \in \mathcal{P}_2(\mathbb{R}^d)\), that is, the space of probability measures of \(\mathbb{R}^d\) with finite second moments.

Third, any identical agent minimizes the cost function according to an identical strategy \(\alpha = \{a_t\}_{t \in \mathcal{T}}\):

\[\begin{align} J(\alpha) &= \mathbb{E} \left[ g(X_T, \mu_T) + \int_0^T \left( f(X_t, \mu_t) +\frac{1}{2} |a_t|^2 \right) dt \right] \end{align}\]

noting that this requires the mean field flow \(\{\mu_t\}_t\).

Lastly, the initial condition is that \(X_0 \in \mathbb{R}^d\) is defined on a probability space \((\Omega, \mathcal{F}, \mathbb{P})\).

The goal of the mean field game problem is to find \(\mu_t\) such that we have a fixed point with the states (that themselves in return are determined by \(\mu_t\)). Assuming there is a unique optimizer \(\{X_t^{*, \mu}\}_{t \in \mathcal{T}}\), the fixed point problem is:

\[\begin{align} \mu_t &= \mathcal{L}(X_t^{*, \mu}) & \forall t \in \mathcal{T} \end{align}\]

This specifies a mean field in the (distribution of) states. The scheme is to find \(\forall t: \mu_t \mapsto X_t^{*} \mapsto \mu_t\), such that each agent optimizes their state (\(X_t^*\)) against the mean field (\(\mu_t\)), and in return the mean field is generated by the states. Presumably there is an associated optimal control \(a_t^{*}\) that generates the \(X_t^{*}\) and therefore the mean field.

The MFG problem, in standard PDE form

Now we characterize the above problem as a pair of partial differential equations, per the standard MFG setup. First, note we can define the value function \(u(t, x)\) as a minimizer:

\[\begin{align} u(t, x) &= \inf_{\alpha} \mathbb{E}\left[ g(X_T, \mu_T) + \int_t^T \left( f(X_s, \mu_s) + \frac{1}{2} |\alpha_s|^2 \right) ds | X_t = x \right] \end{align}\]

The HJB equation defines the dynamics of this value function \(u(.)\). For some small time increment \(\delta t\), the steps to finding the optimal controlling and plugging it back into the HJB (post-optimization form in the notes) are briefly given by:

\[\begin{align} u(t,x) &= \int_{\alpha_t} \mathbb{E}\left[ \int_t^{t+\delta t} (f(X_s, \mu_s) + \frac{1}{2} |\alpha_t|^2) ds + u(t+\delta t, X_{t + \delta t})| X_t = x \right] & \text{dynamic programming relation} \\ du &= (\partial_t u + \alpha_t \nabla_x u + \frac{1}{2} \Delta_x u) dt + \nabla_x u dW_t &\text{Ito's Lemma on value function} \\ 0 &= \partial_t u + \inf_{\alpha_t} \left[ \alpha_t \nabla_x u + \frac{1}{2} |\alpha_t|^2 + \frac{1}{2} \Delta_x u + f(x, \mu_t) \right] & \text{HJB, pre-optimization} \\ \alpha_t^{*} &= -\nabla_x u & \text{optimal control via the minimization} \\ 0 &= \partial_t u + \frac{1}{2} \Delta_x u - \frac{1}{2} | \nabla_x u |^2 + f(x, \mu_t) & \text{HJB, post-optimization (using $a_t^*$)} \end{align}\]

with shorthand \(u := u(t, x)\) and terminal (boundary) condition \(u(T, x) = g(x, \mu_T)\). This constitutes the backward equation of the MFG system.

Since this gives us the optimal control \(a_t^*\), we can therefore write the dynamics of \(X_t\) as:

\[\begin{align} d X_t^{*, \mu} &= -\nabla_x u(t, X_t^{*, \mu}) dt + dW_t \end{align}\]

Then the Fokker-Planck equation for the density \(\mu_t\) of \(X_t\) is given by:

\[\begin{align} \partial_t \mu_t &= \text{div}_x (\partial_x u(t, x) \mu_t) + \frac{1}{2} \Delta_x \mu_t \end{align}\]

with initial (boundary) condition \(\mu_{t=0} = \mu_0\), for some initial distribution. This constitutes the forward equation of the MFG system.

Checkpoint: summary of the MFG system so far

Collecting the partial differential equations above, we obtain the MFG system of the coupled backward-forward equations:

\[\begin{align} 0 &= \partial_t u + \frac{1}{2} \Delta_x u - \frac{1}{2} | \nabla_x u |^2 + f(x, \mu_t) &\text{HJB, backward equation} \\ \partial_t \mu_t &= \text{div}_x (\partial_x u(t, x) \mu_t) + \frac{1}{2} \Delta_x \mu_t &\text{FP, forward equation} \\ u(T,x) &= g(x, \mu_T) \hspace{1 cm} \mu_{t=0} = \mu_0 &\text{backward, forward boundary conditions} \end{align}\]

Assume that existence and uniqueness of a solution hold for this system. Our goal is to obtain the master equation which unifies both the backward and forward dynamics.

Unifying the forward-backward dynamics and the key challenge

Suppose we already have existence and uniqueness to the above MFG system, so that we have an initial flow \(\mu^\circ\) and an equilibrium flow \(\{\mu_t\}_t\) over the time range \(t \in [t^\circ, T]\). Then the state dynamics, the HJB equation, and the dynamic programming relation are given as:

\[\begin{align} dX_t &= -\partial_x u^\mu (t, X_t) dt + dW_t &\text{state dynamics} \\ \partial_t u^\mu (t, x) &= -\frac{1}{2} \Delta u^\mu(t,x) + \frac{1}{2} |\partial_x u^\mu(t,x)|^2 - f(x, \mu_t) &\text{HJB} \\ u^\mu (T, x) &= g(x, \mu_T) &\text{HJB, terminum} \\ \mathcal{U}(t^\circ, x^\circ, \mu^\circ) &= \mathbb{E}\left[ \int_{t^\circ}^{t^\circ + \varepsilon} \left( f(X_s, \mu_s) + \frac{1}{2}|\partial_x u^\mu(s, X_s)|^2 \right) ds + \mathcal{U}(t^\circ + \varepsilon, X_{t^\circ + \varepsilon}, \mu_{t^\circ + \varepsilon}) \right] &\text{dynamic programming relation} \end{align}\]

where as a reminder, the running cost comes from plugging in the optimal control into the original cost functional. Since the generalized value function \(\mathcal{U}\) represents the minimization under the mean field flow, the goal will be to find a suitable expansion by applying a stochastic chain rule to this relation, and therefore obtain an “HJB-style” equation in these now-unified forward-backward dynamics.

The difficulty, however, is that this generalized value function \(\mathcal{U}\) depends directly on the mean field flow \(\mu\) (as well as indirectly via \(u^\mu\)), and this \(\mu\) is a probability measure. So we need a way to take derivatives on the space of probability measures, where probability measures are the primary objects of interest.