Closed-form CE-ML Estimator

Updated 22 January 2026

Closed-form CE-ML is a statistical estimator that inversely learns agent payoffs in 2x2 games using observed action frequencies under correlated equilibrium assumptions.
It exploits the tractable structure of the CE polytope to derive closed-form solutions for both the equilibrium distribution and payoff parameters via payoff ratios α and β.
Empirical evaluations reveal that CE-ML delivers superior accuracy and computational efficiency compared to ICE and LBR-ML in scenarios such as coordination and traffic games.

A Closed-form Correlated Equilibrium Maximum-Likelihood Estimator (CE-ML) provides a statistically efficient and interpretable method for inverse learning of agent payoffs in $2\times2$ games under the assumption that observed joint action frequencies are generated according to a Correlated Equilibrium (CE). The CE-ML estimator exploits the tractable combinatorial structure of the CE polytope in strict coordination games and yields parameters that are directly consistent with empirical frequencies, leveraging a closed-form solution for both the equilibrium distribution and underlying payoff parameters. This approach is specialized for scenarios where agent strategies coordinate through CE, and offers explicit trade-offs between interpretability, computational efficiency, and fidelity to observed behavior (Salazar et al., 15 Jan 2026).

1. Inverse Learning in $2\times2$ Games and Correlated Equilibrium

Consider a two-player game with each player $i \in \{1,2\}$ having actions $A_1 = \{a_1^1,a_1^2\}$ , $A_2 = \{a_2^1,a_2^2\}$ , leading to four possible joint action profiles $A = \{a(1),a(2),a(3),a(4)\}$ , where $a(1) = (a_1^1,a_2^1)$ , $a(2) = (a_1^1,a_2^2)$ , $a(3) = (a_1^2,a_2^1)$ , $a(4) = (a_1^2,a_2^2)$ . Each player’s utility for a profile $a(l)$ is specified via a feature mapping and linear parameterization: $u^i(a(l)) = \phi_i(a(l))^\top w_i$ , with $w_i \in \mathbb{R}^d$ . Observing $T$ i.i.d. samples $D = \{a^{(t)}\}$ drawn from an unknown equilibrium strategy $\sigma^*$ , inverse game-theoretic learning asks for parameters $w = (w_1,w_2)$ such that $\sigma^*(w)$ as a CE matches empirical action frequencies.

A joint distribution $\sigma$ over $A$ is a correlated equilibrium for payoffs $u$ if, for each player $i$ , any deviation $a_i' \in A_i$ satisfies

$\sum_{a\in A} \sigma[a]\left[u^i(a)-u^i(a_i',a_{-i})\right]\geq 0.$

2. Structure of the $2\times2$ CE Polytope

Under the no strictly dominated strategies assumption, the CE polytope of a $2\times2$ coordination game has exactly five extreme points (vertices) $\{\sigma_{(v)}(w)\}_{v=1}^5$ [Calvo-Armengol 2003]. Each CE is a mixture over these vertices, parameterized as $\sigma(w,y)=\sum_{v=1}^5 y_v \sigma_{(v)}(w)$ with $y\in\Delta_5$ . Of particular importance is $\sigma_{(3)}(w)$ , the unique interior CE, for which the equilibrium probabilities are determined by payoff ratios:

Define

$\alpha = \frac{|u_1(a(1))-u_1(a(3))|}{|u_1(a(4))-u_1(a(2))|},\qquad \beta = \frac{|u_2(a(1))-u_2(a(2))|}{|u_2(a(4))-u_2(a(3))|}.$

The interior CE probabilities are:

$\begin{aligned} \sigma_{(3)}[a(1)] &= \frac{1}{(1+\alpha)(1+\beta)}\ \sigma_{(3)}[a(2)] &= \frac{\alpha}{(1+\alpha)(1+\beta)}\ \sigma_{(3)}[a(3)] &= \frac{\beta}{(1+\alpha)(1+\beta)}\ \sigma_{(3)}[a(4)] &= \frac{\alpha\beta}{(1+\alpha)(1+\beta)} \end{aligned}$

The other four vertices are degenerate CEs, placing probability mass on one or two pure profiles.

3. Maximum-Likelihood Estimation: Closed-Form Solution

Given action counts $T_l = \#\{t: a^{(t)}=a(l)\}$ and empirical frequencies $f_l = T_l/T$ , the log-likelihood under CE parameterization is

$\ell(w,y) = \sum_{l=1}^4 T_l \log[\sigma(w,y)[a(l)]].$

Due to non-concavity of the log-likelihood in $y$ , the optimum always lies at a CE vertex. The estimator proceeds by:

Computing $\ell_v(w)$ for $v=1\dots 5$ (log-likelihood under each vertex).
Selecting $v^* = \arg\max_v \ell_v(w)$ and setting $y = e_{v^*}$ .
Optimizing $w$ for the selected vertex.

When $v^*=3$ (interior CE), partial derivatives with respect to $\alpha,\beta$ and stationarity yield closed-form MLEs:

$\hat{\alpha} = \frac{T_2+T_4}{T_1+T_3}, \qquad \hat{\beta} = \frac{T_3+T_4}{T_1+T_2}.$

Substituting definitions, this translates to linear constraints on $w_i$ :

$\frac{|u_1(a(1))-u_1(a(3))|}{|u_1(a(4))-u_1(a(2))|} = \frac{T_2+T_4}{T_1+T_3}, \qquad \frac{|u_2(a(1))-u_2(a(2))|}{|u_2(a(4))-u_2(a(3))|} = \frac{T_3+T_4}{T_1+T_2}.$

With normalization (e.g., fixing $\|w_i\| = 1$ or specifying a component), this yields $w_i$ explicitly. If another vertex is selected, the constraints collapse to linear equalities reflecting the induced payoff ordering for pure or edge CEs.

4. Assumptions, Regularity, and Computational Properties

CE-ML relies on:

Absence of strictly dominated strategies, so $\alpha, \beta$ remain finite and the polytope has five vertices.
Sufficient data support: $T_1+T_3>0$ , $T_1+T_2>0$ to ensure $\hat\alpha,\hat\beta$ are defined.
Unique maximal vertex for the likelihood (ties resolved arbitrarily).

Computation is efficient: single-pass statistics and closed-form solutions yield total complexity $O(T)$ . In degenerate data scenarios (e.g., samples occupy just two action cells), if denominators in the MLE vanish, any $w$ consistent with the observed payoff orderings suffices.

Numerical stability is ensured by selecting vertices that allocate zero mass to unobserved profiles when $T_l=0$ for some $l$ .

5. Illustrative Application: Chicken-Dare Game

In the "chicken-dare" case [Bestick et al. 2013], 1000 simulated samples yield frequencies $f_1=0.28$ , $f_2=0.17$ , $f_3=0.12$ , $f_4=0.43$ . Applying the closed-form:

$\hat{\alpha} = \frac{0.17+0.43}{0.28+0.12} = \frac{0.60}{0.40} = 1.5, \qquad \hat{\beta} = \frac{0.12+0.43}{0.28+0.17} = \frac{0.55}{0.45} \approx 1.22.$

The interior vertex likelihood dominates, so payoff ratio constraints directly recover $w_1$ and $w_2$ up to scale, matching the true payoffs within numerical tolerance. This outcome demonstrates the estimator's ability to precisely reconstruct underlying game parameters when agent behavior is CE-conforming.

6. Empirical Evaluation and Performance

CE-ML was evaluated alongside Inverse Correlated Equilibrium (ICE) and Logit Best Response ML (LBR-ML) estimators on synthetic and SUMO traffic interaction data, with four primary experiments:

E1 (Chicken via CE): As $T$ increases, CE-ML MAE/RMSE improves ( $0.133/0.138 \to 0.084/0.097$ ), besting ICE and LBR-ML.
E2 (Traffic, maximum-entropy CE): CE-ML achieves MAE $\approx 0.017$ , ICE $0.111$, LBR-ML $\approx 0.041$ , with CE-ML also giving the best KL and prediction accuracy.
E3 (Traffic with signaling device): CE-ML identifies the correct mixture vertex ( $y_4\approx 1$ ) and 86.4% decision accuracy (ICE: 41.8%).
E4 (No coordination): CE-ML fails (non-CE data), but LBR-ML with fitted $\lambda=(1,3)$ attains 72.6% accuracy, equaling the best fixed rationality baseline.

7. Comparison to Logit Best Response ML and Practical Recommendations

Criterion	CE-ML	LBR-ML
Interpretability	Explicit payoff-ratio (α,β); mixture vertex interpretable	Includes rationality λ; models stochastic adaptation
Computational Cost	$O(T)$ , closed-form solution	Requires $4\times 4$ linear system, nonconvex optimization, $O(\text{iter}\cdot 4^3)$
Behavioral Assumptions	One-shot correlation device, perfect regret consistency	Repeated logit best responses, bounded rationality; robust to non-CE behavior

CE-ML achieves fast, closed-form inverse learning for small $2\times2$ games when agent behavior plausibly arises from a CE, such as in coordinated, signaled, or regulated environments. In settings without a central correlating device—such as unregulated traffic or when stochastic adaptation is prominent—LBR-ML better captures bounded rationality and noisy, non-equilibrium patterns, albeit with higher computational overhead and additional parameters (Salazar et al., 15 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Inverse Learning in $2\times2$ Games: From Synthetic Interactions to Traffic Simulation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Closed-form Correlated Equilibrium Maximum-Likelihood Estimator (CE-ML).