Time-Consistent Equilibrium Strategy
- Time-Consistent Equilibrium Strategy is a control policy where local optimality is achieved by preventing any beneficial one-shot deviations by future selves.
- It employs a dynamic programming-type backward recursion with nonlinear Hamiltonians to address risk-sensitive objectives and convergence to risk-neutral policies.
- The framework generalizes classical dynamic programming, effectively addressing time-inconsistent objectives such as non-exponential discounting and mean-variance criteria.
A time-consistent equilibrium strategy describes a control policy for stochastic or dynamic systems with time-inconsistent objectives, where classical optimality fails due to non-standard preferences (e.g., non-exponential discounting, mean-variance, or recursive utilities). Instead of global optimality, equilibrium strategies are locally optimal in the sense that at any future time, no infinitesimal one-shot deviation by any "future self" can improve (reduce) the cost or increase the reward to first order. This subgame-perfect Nash equilibrium perspective recasts control into a dynamic game between temporally distinct selves, yielding strategies that are credible and dynamically stable.
1. Formal Definition and Motivation
Consider a finite-horizon, countable-state Markov decision process (MDP) with action space , running costs , terminal cost , and transition kernel . The risk-sensitive objective for a policy and initial state at time is: This cost is time-inconsistent: the Bellman principle fails, so a globally optimal policy is unattainable.
A -step policy is an -equilibrium if for every , state , and deviation ,
i.e., no one-shot deviation at any time yields a lower objective. The limiting () case defines a risk-neutral equilibrium with the same step-optimality in the limiting cost functional.
2. Dynamic Programming-Type Characterization
The equilibrium strategy is constructed via a dynamic programming-type backward recursion that incorporates non-classical Hamiltonians:
- For , define
- In the limit, large deviations yield
where is the large-deviation rate function.
The backward induction for :
- Terminal:
- Recursion:
The limit () recovers deterministically the risk-neutral equilibrium policy via the min-sup recursion.
3. Existence and Uniqueness of Equilibria
Under regularity assumptions:
- Lyapunov tightness of the transition kernel,
- Continuity and inf-compactness of costs,
- Uniform control over rare transitions, the backward recursion admits a unique solution in a weighted Banach space. The constructed is an -equilibrium. Any equilibrium policy must solve these backward equations; in the limit, the unique solution yields a risk-neutral time-consistent policy.
4. Limiting Case and Convergence
As , the risk-sensitive model converges to a risk-neutral control problem characterized by large deviation theory:
- The value functions converge uniformly in weighted sup-norm to .
- Any sequence of -equilibria has limit points that are $0$-equilibria (risk-neutral).
- If the risk-neutral recursion admits a unique solution, then the convergence is strong: both value functions and policies converge pointwise.
5. Time-Consistency and Strategic Structure
A time-consistent equilibrium strategy enforces local optimality at every time: once adopted, no future decision-maker has an incentive to deviate at their decision point. This local optimality ensures dynamic stability even though global optimality is unattainable. Each equilibrium policy arises from a tractable finite-horizon backward sweep, structurally similar to classical dynamic programming but with nonlinear Hamiltonians reflecting the underlying time-inconsistent criteria.
The min–sup structure in the limit reflects a deterministic min–max principle: the decision-maker optimizes against the most adverse future states, as characterized by the large deviation rate function. In risk-sensitive control problems with "rare-state" transitions, the equilibrium strategy can hedge against vanishing probabilities in a robust fashion.
6. Computational Considerations and Practical Implementation
The stepwise recursive computation for equilibrium strategies is tractable, without requiring global optimization over the entire policy space. Provided the state space and action constraints are manageable, the backward induction can be efficiently solved numerically. The convergence properties guarantee stability of the algorithm as is varied, and if uniqueness holds in the limiting recursion, the computed policy will continue to be time-consistent under small perturbations of model parameters.
7. Connections to Broader Theory
The time-consistent equilibrium strategy framework generalizes classical dynamic programming to contexts where preference structures (risk-sensitivity, non-exponential discounting, recursive utilities) inherently violate time-consistency. It has direct analogs in mean-variance portfolio selection, mixed objective stochastic control, robust MDPs, and dynamic games between temporally distinct "selves." The methodology yields solutions that, while not globally optimal, are subgame-perfect Nash equilibria in the sense of self-enforcement and local optimality (Mei, 2019).
The framework offers flexibility for modelers to accommodate complex real-world preferences while maintaining computational tractability and dynamic credibility of policies.