Closed-form CE-ML Estimator
- Closed-form CE-ML is a statistical estimator that inversely learns agent payoffs in 2x2 games using observed action frequencies under correlated equilibrium assumptions.
- It exploits the tractable structure of the CE polytope to derive closed-form solutions for both the equilibrium distribution and payoff parameters via payoff ratios α and β.
- Empirical evaluations reveal that CE-ML delivers superior accuracy and computational efficiency compared to ICE and LBR-ML in scenarios such as coordination and traffic games.
A Closed-form Correlated Equilibrium Maximum-Likelihood Estimator (CE-ML) provides a statistically efficient and interpretable method for inverse learning of agent payoffs in games under the assumption that observed joint action frequencies are generated according to a Correlated Equilibrium (CE). The CE-ML estimator exploits the tractable combinatorial structure of the CE polytope in strict coordination games and yields parameters that are directly consistent with empirical frequencies, leveraging a closed-form solution for both the equilibrium distribution and underlying payoff parameters. This approach is specialized for scenarios where agent strategies coordinate through CE, and offers explicit trade-offs between interpretability, computational efficiency, and fidelity to observed behavior (Salazar et al., 15 Jan 2026).
1. Inverse Learning in Games and Correlated Equilibrium
Consider a two-player game with each player having actions , , leading to four possible joint action profiles , where , , , . Each player’s utility for a profile is specified via a feature mapping and linear parameterization: , with . Observing i.i.d. samples drawn from an unknown equilibrium strategy , inverse game-theoretic learning asks for parameters such that as a CE matches empirical action frequencies.
A joint distribution over is a correlated equilibrium for payoffs if, for each player , any deviation satisfies
2. Structure of the CE Polytope
Under the no strictly dominated strategies assumption, the CE polytope of a coordination game has exactly five extreme points (vertices) [Calvo-Armengol 2003]. Each CE is a mixture over these vertices, parameterized as with . Of particular importance is , the unique interior CE, for which the equilibrium probabilities are determined by payoff ratios:
- Define
- The interior CE probabilities are:
The other four vertices are degenerate CEs, placing probability mass on one or two pure profiles.
3. Maximum-Likelihood Estimation: Closed-Form Solution
Given action counts and empirical frequencies , the log-likelihood under CE parameterization is
Due to non-concavity of the log-likelihood in , the optimum always lies at a CE vertex. The estimator proceeds by:
- Computing for (log-likelihood under each vertex).
- Selecting and setting .
- Optimizing for the selected vertex.
When (interior CE), partial derivatives with respect to and stationarity yield closed-form MLEs:
Substituting definitions, this translates to linear constraints on :
With normalization (e.g., fixing or specifying a component), this yields explicitly. If another vertex is selected, the constraints collapse to linear equalities reflecting the induced payoff ordering for pure or edge CEs.
4. Assumptions, Regularity, and Computational Properties
CE-ML relies on:
- Absence of strictly dominated strategies, so remain finite and the polytope has five vertices.
- Sufficient data support: , to ensure are defined.
- Unique maximal vertex for the likelihood (ties resolved arbitrarily).
Computation is efficient: single-pass statistics and closed-form solutions yield total complexity . In degenerate data scenarios (e.g., samples occupy just two action cells), if denominators in the MLE vanish, any consistent with the observed payoff orderings suffices.
Numerical stability is ensured by selecting vertices that allocate zero mass to unobserved profiles when for some .
5. Illustrative Application: Chicken-Dare Game
In the "chicken-dare" case [Bestick et al. 2013], 1000 simulated samples yield frequencies , , , . Applying the closed-form:
The interior vertex likelihood dominates, so payoff ratio constraints directly recover and up to scale, matching the true payoffs within numerical tolerance. This outcome demonstrates the estimator's ability to precisely reconstruct underlying game parameters when agent behavior is CE-conforming.
6. Empirical Evaluation and Performance
CE-ML was evaluated alongside Inverse Correlated Equilibrium (ICE) and Logit Best Response ML (LBR-ML) estimators on synthetic and SUMO traffic interaction data, with four primary experiments:
- E1 (Chicken via CE): As increases, CE-ML MAE/RMSE improves (), besting ICE and LBR-ML.
- E2 (Traffic, maximum-entropy CE): CE-ML achieves MAE , ICE $0.111$, LBR-ML , with CE-ML also giving the best KL and prediction accuracy.
- E3 (Traffic with signaling device): CE-ML identifies the correct mixture vertex () and 86.4% decision accuracy (ICE: 41.8%).
- E4 (No coordination): CE-ML fails (non-CE data), but LBR-ML with fitted attains 72.6% accuracy, equaling the best fixed rationality baseline.
7. Comparison to Logit Best Response ML and Practical Recommendations
| Criterion | CE-ML | LBR-ML |
|---|---|---|
| Interpretability | Explicit payoff-ratio (α,β); mixture vertex interpretable | Includes rationality λ; models stochastic adaptation |
| Computational Cost | , closed-form solution | Requires linear system, nonconvex optimization, |
| Behavioral Assumptions | One-shot correlation device, perfect regret consistency | Repeated logit best responses, bounded rationality; robust to non-CE behavior |
CE-ML achieves fast, closed-form inverse learning for small games when agent behavior plausibly arises from a CE, such as in coordinated, signaled, or regulated environments. In settings without a central correlating device—such as unregulated traffic or when stochastic adaptation is prominent—LBR-ML better captures bounded rationality and noisy, non-equilibrium patterns, albeit with higher computational overhead and additional parameters (Salazar et al., 15 Jan 2026).