Non-Markovian MFGs: Memory and Dynamics

Updated 4 July 2026

Non-Markovian mean-field games are models where agents’ decisions depend on both current states and historical dynamics, integrating memory effects into strategic interactions.
These frameworks employ techniques such as time-fractional derivatives, path-dependent survivor measures, and generalized BSDEs to capture non-Markovian behavior.
They offer robust approaches for analyzing Nash equilibria in complex systems, including anomalous diffusion, absorption phenomena, and recursive utility settings.

Non-Markovian mean-field games are mean-field game models in which the representative agent’s optimization problem or the consistency condition depends on temporal history rather than only on the current state. In the supplied literature, non-Markovianity appears in several distinct forms: subdiffusive dynamics generated by an inverse stable subordinator and encoded by Caputo time-fractional derivatives (Qing et al., 2018); path-dependent interactions through the empirical sub-probability measure of survivors and the history of absorptions (Campi et al., 2019); weak-form games with coefficients depending on the joint distribution of states and controls on path space (Possamaï et al., 2021); and recursive utility portfolio games with Epstein-Zin preferences formulated without any Markov structure and characterized by BSDEs (Fu et al., 12 May 2025). Across these formulations, the common objective is to define Nash equilibria in a continuum of strategically interacting agents when memory, delayed effects, absorption history, or non-separable intertemporal preferences invalidate the classical dynamic programming paradigm.

1. Core formulations of non-Markovianity

A first class of non-Markovian mean-field games arises from anomalous diffusion. In the time-fractional formulation, the microscopic state process is obtained by time-changing a diffusion $Y_s$ with the inverse $E_t$ of a strictly increasing $\beta\in(0,1)$ -stable subordinator $D_t$ . The resulting trajectory

$X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$

is continuous, non-Markovian, non-Gaussian, and carries a long-term memory because $E_t$ “pauses” the motion for random heavy-tailed times (Qing et al., 2018). In this setting, the law $m(t,x)$ satisfies a time-fractional Fokker-Planck equation, and equilibrium is described by a coupled time-fractional HJB-FP system.

A second class is path dependence through absorption. In the framework of games with smooth dependence on past absorptions, each player’s trajectory lives in $X=C([0,T];\mathbb{R}^d)$ , with absorption time

$\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$

and interaction enters through the non-normalized empirical sub-probability measure of survivors,

$\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$

together with the fraction absorbed,

$E_t$ 0

In the MFG limit, both $E_t$ 1 and $E_t$ 2 retain the history of losses from the game (Campi et al., 2019).

A third class is fully non-Markovian weak-form control. Here the state process is specified on path space $E_t$ 3, the control is an $E_t$ 4-valued predictable process, and the controlled law is defined by Girsanov transformation. The coefficients may depend on the full stopped path $E_t$ 5 and on the time-indexed law flow $E_t$ 6, allowing interaction through the joint distribution of players’ states and controls (Possamaï et al., 2021).

A fourth class comes from recursive utility. In mean field portfolio games with Epstein-Zin preferences, the game is developed “in a general non-Markovian framework” and “without assuming any Markov structure.” The representative agent’s utility is defined by a BSDE-type recursion with externality $E_t$ 7, and equilibrium requires

$E_t$ 8

The non-Markovian feature is not only path dependence of coefficients, but also recursion in utility and conditioning on common noise (Fu et al., 12 May 2025).

These formulations show that “non-Markovian” is not a single modeling choice. It may refer to memory in the state dynamics, memory in the interaction term, a weak formulation on path space, or recursive intertemporal preferences.

2. Time-fractional mean-field games and subdiffusive memory

The time-fractional MFG system studied in “Variational time-fractional Mean Field Games” couples a backward time-fractional HJB equation for the value function $E_t$ 9 with a forward time-fractional Fokker-Planck equation for the density $\beta\in(0,1)$ 0: $\beta\in(0,1)$ 1

$\beta\in(0,1)$ 2

Here the forward and backward Caputo derivatives are nonlocal in time, since they are convolutions against the kernel $\beta\in(0,1)$ 3, and therefore explicitly encode memory (Qing et al., 2018).

The forward Caputo derivative is

$\beta\in(0,1)$ 4

and the backward Caputo derivative on $\beta\in(0,1)$ 5 is

$\beta\in(0,1)$ 6

Because these operators are nonlocal in time, the instantaneous rate of change depends on the entire history. In the limit $\beta\in(0,1)$ 7, they recover the classical Markovian derivative, which the paper identifies with the kernel limit $\beta\in(0,1)$ 8 and $\beta\in(0,1)$ 9 (Qing et al., 2018).

In the quadratic case $D_t$ 0, the system becomes

$D_t$ 1

The paper interprets this as an extension of variational mean-field games to the subdiffusive situation, providing an Eulerian interpretation of time-fractional MFG systems (Qing et al., 2018).

Under the assumptions that $D_t$ 2 is continuous in $D_t$ 3, satisfies $D_t$ 4, is increasing in $D_t$ 5, and the Lasry-Lions-type monotonicity condition

$D_t$ 6

holds, together with $D_t$ 7 and $D_t$ 8, the paper proves existence of a classical solution

$D_t$ 9

uniqueness under monotonicity, and mass conservation and positivity: $X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 0 and $X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 1 for all $X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 2 (Qing et al., 2018).

The motivating applications listed in the paper are anomalous transport in porous media and biological cells, latency and long-memory in high-frequency trading, and subdiffusive supply-demand dynamics. A plausible implication is that time-fractional MFGs are especially suited to strategic environments where waiting-time heterogeneity is itself part of the aggregate interaction.

3. Path-dependent interactions through absorption and survivor measures

The absorption framework develops a non-Markovian MFG in which players leave the game when their private states hit the boundary of a domain $X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 3. The controlled representative-player dynamics are

$X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 4

where $X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 5 is obtained from

$X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 6

through the re-parameterization

$X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 7

and

$X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 8

This construction makes the interaction depend on the measure of surviving states and therefore on the cumulative history of absorptions (Campi et al., 2019).

The representative-player cost is

$X_t=Y_{E_t}, \qquad dY_s=v(s,Y_s)\,ds+\sqrt{2}\,dB_s$ 9

When $E_t$ 0, the paper states that one sees explicitly the dependence on the whole history $E_t$ 1 and equivalently on $E_t$ 2 (Campi et al., 2019).

A strict feedback MFG solution is a pair $E_t$ 3 such that $E_t$ 4 minimizes $E_t$ 5 among all feedbacks and, if $E_t$ 6 solves the controlled SDE under $E_t$ 7, then

$E_t$ 8

for all $E_t$ 9. The same formulation is extended to relaxed feedbacks $m(t,x)$ 0. In fixed-point form, the solution is characterized by

$m(t,x)$ 1

or equivalently through a law-operator $m(t,x)$ 2 whose fixed points are MFG solutions (Campi et al., 2019).

Under Lipschitz and sub-linear growth of $m(t,x)$ 3 in $m(t,x)$ 4, continuity in the measure variable relative to $m(t,x)$ 5-Wasserstein at Wiener-absolutely-continuous laws, compactness of $m(t,x)$ 6, nondegeneracy of $m(t,x)$ 7, admissible initial law $m(t,x)$ 8, and a convexity assumption $m(t,x)$ 9 on the Hamiltonian minimizers, the paper proves existence of both a relaxed feedback solution $X=C([0,T];\mathbb{R}^d)$ 0 and, under an extra control-convexity assumption, a strict feedback solution $X=C([0,T];\mathbb{R}^d)$ 1 (Campi et al., 2019).

Uniqueness is established under additional assumptions: $X=C([0,T];\mathbb{R}^d)$ 2 splits as $X=C([0,T];\mathbb{R}^d)$ 3, the Lasry-Lions condition

$X=C([0,T];\mathbb{R}^d)$ 4

holds for all $X=C([0,T];\mathbb{R}^d)$ 5, the drift is independent of $X=C([0,T];\mathbb{R}^d)$ 6, and for each fixed $X=C([0,T];\mathbb{R}^d)$ 7 the single-player control problem has a unique minimizer. In that case, any two feedback solutions coincide (Campi et al., 2019).

The same paper links the continuum model to the finite- $X=C([0,T];\mathbb{R}^d)$ 8 game. In the finite-dimensional interaction case

$X=C([0,T];\mathbb{R}^d)$ 9

a feedback MFG solution induces an $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 0-Nash equilibrium for the $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 1-player game with $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 2 as $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 3. No explicit rate is given, although the text notes that in many examples one obtains $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 4 by standard law-of-large-numbers arguments (Campi et al., 2019).

4. Weak formulation, McKean-Vlasov BSDEs, and equilibrium characterization

The weak-form theory developed in “Non-asymptotic convergence rates for mean-field games: weak formulation and McKean-Vlasov BSDEs” considers a fully non-Markovian setting with drift control and interaction through the joint distribution of states and controls. Under a reference probability measure $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 5, the uncontrolled coordinate process satisfies

$\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 6

Given a control $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 7 and a path-law flow $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 8, the controlled measure $\tau^\varphi=\inf\{t\in[0,T]:\varphi(t)\notin O\}\wedge T,$ 9 is defined by the Girsanov density

$\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 0

under which

$\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 1

The drift is bounded, Lipschitz in path, measure, and control, and dissipative in the path variable: $\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 2 The reward data $\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 3 and $\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 4 are Borel, polynomial-growth, and Lipschitz in all arguments (Possamaï et al., 2021).

For a pair $\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 5, the objective is

$\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 6

and a mean-field equilibrium is defined by optimality of $\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 7 under $\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 8 together with the consistency condition

$\mu_t^N=\frac1N\sum_{i=1}^N \delta_{X_t^{N,i}}\,1_{\{t<\tau^{N,i}\}},$ 9

The paper then introduces the Hamiltonian

$E_t$ 00

with measurable selection $E_t$ 01, and proves that $E_t$ 02 is a mean-field equilibrium if and only if there exist processes $E_t$ 03 solving a generalized McKean-Vlasov BSDE, with

$E_t$ 04

Moreover $E_t$ 05 (Possamaï et al., 2021).

This BSDE characterization is the central replacement for the Markovian HJB/master-equation route. Under the Lipschitz and dissipativity assumptions on $E_t$ 06, and either a small terminal payoff $E_t$ 07 or a smooth terminal payoff in the measure argument, the BSDE admits a unique solution $E_t$ 08 in

$E_t$ 09

The proof combines change of measure, a fixed-point argument on $E_t$ 10, contraction in a weighted norm, and the use of dissipativity to obtain global-in- $E_t$ 11 bounds (Possamaï et al., 2021).

A common misconception is that well-posedness and uniqueness in mean-field games necessarily require short time horizon, separability assumptions, or Lasry-Lions monotonicity. In this weak non-Markovian setting, the paper explicitly states that its existence and uniqueness results “do not require short time horizon, separability assumptions on the coefficients, nor Lasry and Lions's monotonicity conditions, but rather smallness, or alternatively regularity, conditions on the terminal reward and a dissipativity condition on the drift” (Possamaï et al., 2021).

5. Recursive utilities and non-Markovian portfolio games

The portfolio-game model with Epstein-Zin preferences provides a different non-Markovian mechanism, centered on recursive utility rather than on path-dependent state dynamics alone. On a filtered space carrying common noise $E_t$ 12 and idiosyncratic noise $E_t$ 13, the wealth dynamics of a representative agent are

$E_t$ 14

with $E_t$ 15, bounded progressive coefficients, and $E_t$ 16 (Fu et al., 12 May 2025).

Given an externality $E_t$ 17, the Epstein-Zin utility satisfies the recursion

$E_t$ 18

where

$E_t$ 19

$E_t$ 20

$E_t$ 21

In the MFG limit, equilibrium requires the externality to coincide with the conditional log-average of optimal consumption and terminal wealth: $E_t$ 22 The paper establishes a uniqueness result by proving a one-to-one correspondence between Nash equilibria and the solutions to a class of BSDEs (Fu et al., 12 May 2025).

The core BSDE is

$E_t$ 23

with the driver

$E_t$ 24

The corresponding equilibrium controls are

$E_t$ 25

A necessary stochastic maximum principle tailored to Epstein-Zin utility and a nonlinear transformation are the key ingredients in obtaining this characterization (Fu et al., 12 May 2025).

The stochastic maximum principle introduces adjoint processes $E_t$ 26 and $E_t$ 27, with Hamiltonian

$E_t$ 28

and first-order conditions

$E_t$ 29

The nonlinear transformation

$E_t$ 30

then converts the adjoint system into the BSDE for equilibrium. According to the paper, this “log-ratio” transform is the key link between the SMP adjoints and the final BSDE characterization of the MFG Nash equilibrium (Fu et al., 12 May 2025).

In the deterministic case, where $E_t$ 31 depend only on $E_t$ 32, the BSDE reduces to a deterministic ODE and then to a Riccati equation, yielding an explicit closed-form solution for the equilibrium investment and consumption policies (Fu et al., 12 May 2025).

6. Solution concepts, convergence, and analytical themes

Across the supplied works, non-Markovian MFGs are solved through several distinct but related analytical mechanisms.

Framework	Equilibrium device	Principal well-posedness or limit statement
Time-fractional subdiffusion (Qing et al., 2018)	Variational formulation via convex duality	Existence of a classical solution; uniqueness under monotonicity
Past absorptions (Campi et al., 2019)	Fixed point in feedback or relaxed feedback form	Existence of relaxed and strict feedback solutions; approximate Nash equilibria with $E_t$ 33
Weak path-dependent MFG (Possamaï et al., 2021)	Generalised McKean-Vlasov BSDE	Unique solution in $E_t$ 34 under dissipativity and smallness or smoothness
Epstein-Zin portfolio game (Fu et al., 12 May 2025)	BSDE plus stochastic maximum principle	One-to-one correspondence between Nash equilibria and a class of BSDEs; uniqueness result

In the time-fractional model, the coupled HJB-FP system is the Euler-Lagrange system of two dual convex problems. The primal functional $E_t$ 35 is defined on $E_t$ 36 with $E_t$ 37, the dual functional $E_t$ 38 is defined on $E_t$ 39 satisfying the fractional-FP constraint, and Fenchel-Rockafellar duality yields

$E_t$ 40

Taking first variations recovers precisely the HJB and FP equations with the correct fractional-time operators (Qing et al., 2018).

In the absorption model, existence proceeds by truncating drift and cost, applying the bounded-data existence result of Campi-Delarue via Kakutani’s theorem, passing to the limit using tightness in $E_t$ 41 and a martingale-problem characterization, and finally using a mimicking theorem of Brunick-Shreve to convert the limit into feedback form (Campi et al., 2019).

In the weak non-Markovian path-space formulation, the equilibrium BSDE is used both for well-posedness and for limit theory. The paper proves non-asymptotic convergence estimates for the $E_t$ 42-player game: $E_t$ 43 and

$E_t$ 44

Here $E_t$ 45 measures the control-interaction correction and

$E_t$ 46

In the state-dependent-law case, the paper states that one can insert classical rates such as $E_t$ 47, leading overall to error of order $E_t$ 48 in dimension $E_t$ 49 (Possamaï et al., 2021).

The same weak-form program also treats closed-loop equilibria and, in the Markovian case, the master equation. When a smooth master solution exists, the paper recovers

$E_t$ 50

and establishes

$E_t$ 51

which yields convergence of the finite- $E_t$ 52 HJB system to the master equation (Possamaï et al., 2021).

A plausible implication of these results is that non-Markovianity shifts the analytic emphasis from PDE structure alone toward convex duality, weak formulations, stochastic maximum principles, and BSDE fixed points.

7. Conceptual distinctions, misconceptions, and research directions

One recurring distinction is between memory in dynamics and memory in interaction. In the time-fractional model, memory is generated at the microscopic level by a subdiffusion process $E_t$ 53, and the macroscopic equations inherit Caputo derivatives (Qing et al., 2018). In the absorption model, the state dynamics are standard diffusions but the mean-field interaction depends on survivor measures and the accumulated fraction absorbed, which is path-dependent through $E_t$ 54 and $E_t$ 55 (Campi et al., 2019). In the weak formulation and Epstein-Zin portfolio game, non-Markovianity is instead encoded through full path dependence of coefficients, law dependence on path space, or recursive utility characterized by BSDEs (Possamaï et al., 2021, Fu et al., 12 May 2025).

A second conceptual point concerns the role of monotonicity. In the time-fractional and absorption settings, uniqueness is obtained under monotonicity hypotheses of Lasry-Lions type (Qing et al., 2018, Campi et al., 2019). By contrast, the weak-form BSDE framework explicitly provides existence and uniqueness results that do not require Lasry-Lions monotonicity, replacing it with dissipativity of the drift and smallness or regularity assumptions on the terminal reward (Possamaï et al., 2021). This does not eliminate monotonicity from the subject; rather, it identifies an alternative route to well-posedness in non-Markovian environments.

A third misconception is that abandoning Markov structure necessarily makes equilibrium characterization intractable. The supplied papers provide four counterexamples. Time-fractional systems preserve a variational Euler-Lagrange structure (Qing et al., 2018). Absorption models admit strict and relaxed feedback fixed-point formulations and induce approximate Nash equilibria for large finite games (Campi et al., 2019). Weak path-dependent games are characterized by generalized McKean-Vlasov BSDEs and support non-asymptotic convergence rates (Possamaï et al., 2021). Recursive-utility portfolio games admit a one-to-one correspondence between Nash equilibria and BSDE solutions, and even explicit closed forms in the deterministic case (Fu et al., 12 May 2025).

The available research directions stated explicitly in the supplied material also point to broader generality. The Epstein-Zin study states that its combination of stochastic maximum principle, martingale-optimality principle, nonlinear adjoint-to-value transformation, and BSDE fixed-point arguments in BMO spaces can be exported to forward-utility MFGs, rank-dependent or habit-formation utilities, partial-information or jump-diffusion environments, and state-constraint or impulse-control MFGs (Fu et al., 12 May 2025). This suggests that non-Markovian mean-field games are less a narrow subclass than a collection of methodologies for equilibrium analysis beyond the classical dynamic programming paradigm.