Time-Consistent Equilibrium Strategy

Updated 13 November 2025

Time-Consistent Equilibrium Strategy is a control policy where local optimality is achieved by preventing any beneficial one-shot deviations by future selves.
It employs a dynamic programming-type backward recursion with nonlinear Hamiltonians to address risk-sensitive objectives and convergence to risk-neutral policies.
The framework generalizes classical dynamic programming, effectively addressing time-inconsistent objectives such as non-exponential discounting and mean-variance criteria.

A time-consistent equilibrium strategy describes a control policy for stochastic or dynamic systems with time-inconsistent objectives, where classical optimality fails due to non-standard preferences (e.g., non-exponential discounting, mean-variance, or recursive utilities). Instead of global optimality, equilibrium strategies are locally optimal in the sense that at any future time, no infinitesimal one-shot deviation by any "future self" can improve (reduce) the cost or increase the reward to first order. This subgame-perfect Nash equilibrium perspective recasts control into a dynamic game between temporally distinct selves, yielding strategies that are credible and dynamically stable.

1. Formal Definition and Motivation

Consider a finite-horizon, countable-state Markov decision process (MDP) with action space $U$ , running costs $f_{T,t}$ , terminal cost $g_T$ , and transition kernel $q_\epsilon$ . The risk-sensitive objective for a policy $\pi=(\pi_1,\ldots,\pi_T)$ and initial state $x$ at time $t$ is: $J^{\epsilon}_{t,T}(x;\pi) = \epsilon \log \mathbb E^{\pi}_x \left[ \exp \left( \epsilon^{-1} \left( \sum_{s=t}^T f_{T,s}(X_s, \pi_s(X_s)) + g_T(X_{T+1}) \right)\right)\right].$ This cost is time-inconsistent: the Bellman principle fails, so a globally optimal policy is unattainable.

A $T$ -step policy $\pi^* \in \Pi$ is an $\epsilon$ -equilibrium if for every $t \in \{1,\dots,T\}$ , state $x \in X$ , and deviation $u \in U(x)$ ,

$J^{\epsilon}_{t,T}(x;\pi^*) \le J^{\epsilon}_{t,T}(x;(u, \pi^*_{t+1}, \ldots, \pi^*_T)),$

i.e., no one-shot deviation at any time yields a lower objective. The limiting ( $\epsilon \to 0$ ) case defines a risk-neutral equilibrium with the same step-optimality in the limiting cost functional.

2. Dynamic Programming-Type Characterization

The equilibrium strategy is constructed via a dynamic programming-type backward recursion that incorporates non-classical Hamiltonians:

For $\epsilon > 0$ , define

$A_\epsilon(x,u;h) = \epsilon \log \sum_{z \in X} e^{\epsilon^{-1} h(z)} q_\epsilon(z|x,u),$

In the $\epsilon \to 0$ limit, large deviations yield

$A_0(x,u;h) = \sup_{z \in X} \bigl\{ h(z) - I(z|x,u) \bigr\},$

where $I(z|x,u)$ is the large-deviation rate function.

The backward induction for $t = T, T-1, ..., 1$ :

Terminal: $\Theta^\epsilon_{T,T+1}(x) = g_T(x)$
Recursion:

$\pi^\epsilon_t(x) \in\arg\min_{u \in U(x)} \bigl\{ f_{T,t}(x,u) + A_\epsilon(x,u; \Theta^\epsilon_{T,t+1}) \bigr\},$

$\Theta^\epsilon_{T,t}(x) = f_{T,t}(x, \pi^\epsilon_t(x)) + A_\epsilon(x, \pi^\epsilon_t(x); \Theta^\epsilon_{T,t+1}),$

The limit ( $\epsilon \to 0$ ) recovers deterministically the risk-neutral equilibrium policy via the min-sup recursion.

3. Existence and Uniqueness of Equilibria

Under regularity assumptions:

Lyapunov tightness of the transition kernel,
Continuity and inf-compactness of costs,
Uniform control over rare transitions, the backward recursion admits a unique solution $\Theta^\epsilon_{T,t}$ in a weighted Banach space. The constructed $\pi^\epsilon$ is an $\epsilon$ -equilibrium. Any equilibrium policy must solve these backward equations; in the limit, the unique solution yields a risk-neutral time-consistent policy.

4. Limiting Case and Convergence

As $\epsilon \to 0^+$ , the risk-sensitive model converges to a risk-neutral control problem characterized by large deviation theory:

The value functions $\Theta^\epsilon_{T,t}$ converge uniformly in weighted sup-norm to $\Theta^0_{T,t}$ .
Any sequence of $\epsilon$ -equilibria $\{\pi^\epsilon\}$ has limit points that are $0$-equilibria (risk-neutral).
If the risk-neutral recursion admits a unique solution, then the convergence is strong: both value functions and policies converge pointwise.

5. Time-Consistency and Strategic Structure

A time-consistent equilibrium strategy enforces local optimality at every time: once adopted, no future decision-maker has an incentive to deviate at their decision point. This local optimality ensures dynamic stability even though global optimality is unattainable. Each equilibrium policy arises from a tractable finite-horizon backward sweep, structurally similar to classical dynamic programming but with nonlinear Hamiltonians reflecting the underlying time-inconsistent criteria.

The min–sup structure in the $\epsilon \to 0$ limit reflects a deterministic min–max principle: the decision-maker optimizes against the most adverse future states, as characterized by the large deviation rate function. In risk-sensitive control problems with "rare-state" transitions, the equilibrium strategy can hedge against vanishing probabilities in a robust fashion.

6. Computational Considerations and Practical Implementation

The stepwise recursive computation for equilibrium strategies is tractable, without requiring global optimization over the entire policy space. Provided the state space $X$ and action constraints $U(x)$ are manageable, the backward induction can be efficiently solved numerically. The convergence properties guarantee stability of the algorithm as $\epsilon$ is varied, and if uniqueness holds in the limiting recursion, the computed policy will continue to be time-consistent under small perturbations of model parameters.

7. Connections to Broader Theory

The time-consistent equilibrium strategy framework generalizes classical dynamic programming to contexts where preference structures (risk-sensitivity, non-exponential discounting, recursive utilities) inherently violate time-consistency. It has direct analogs in mean-variance portfolio selection, mixed objective stochastic control, robust MDPs, and dynamic games between temporally distinct "selves." The methodology yields solutions that, while not globally optimal, are subgame-perfect Nash equilibria in the sense of self-enforcement and local optimality (Mei, 2019).

The framework offers flexibility for modelers to accommodate complex real-world preferences while maintaining computational tractability and dynamic credibility of policies.

PDF Markdown Chat (Pro)

References (1)

Time-inconsistent Risk-sensitive Equilibrium for Countable-stated Markov Decision Processes (2019)

Follow Topic

Get notified by email when new papers are published related to Time-Consistent Equilibrium Strategy.