Papers
Topics
Authors
Recent
Search
2000 character limit reached

Closed-form CE-ML Estimator

Updated 22 January 2026
  • Closed-form CE-ML is a statistical estimator that inversely learns agent payoffs in 2x2 games using observed action frequencies under correlated equilibrium assumptions.
  • It exploits the tractable structure of the CE polytope to derive closed-form solutions for both the equilibrium distribution and payoff parameters via payoff ratios α and β.
  • Empirical evaluations reveal that CE-ML delivers superior accuracy and computational efficiency compared to ICE and LBR-ML in scenarios such as coordination and traffic games.

A Closed-form Correlated Equilibrium Maximum-Likelihood Estimator (CE-ML) provides a statistically efficient and interpretable method for inverse learning of agent payoffs in 2×22\times2 games under the assumption that observed joint action frequencies are generated according to a Correlated Equilibrium (CE). The CE-ML estimator exploits the tractable combinatorial structure of the CE polytope in strict coordination games and yields parameters that are directly consistent with empirical frequencies, leveraging a closed-form solution for both the equilibrium distribution and underlying payoff parameters. This approach is specialized for scenarios where agent strategies coordinate through CE, and offers explicit trade-offs between interpretability, computational efficiency, and fidelity to observed behavior (Salazar et al., 15 Jan 2026).

1. Inverse Learning in 2×22\times2 Games and Correlated Equilibrium

Consider a two-player game with each player i{1,2}i \in \{1,2\} having actions A1={a11,a12}A_1 = \{a_1^1,a_1^2\}, A2={a21,a22}A_2 = \{a_2^1,a_2^2\}, leading to four possible joint action profiles A={a(1),a(2),a(3),a(4)}A = \{a(1),a(2),a(3),a(4)\}, where a(1)=(a11,a21)a(1) = (a_1^1,a_2^1), a(2)=(a11,a22)a(2) = (a_1^1,a_2^2), a(3)=(a12,a21)a(3) = (a_1^2,a_2^1), a(4)=(a12,a22)a(4) = (a_1^2,a_2^2). Each player’s utility for a profile a(l)a(l) is specified via a feature mapping and linear parameterization: ui(a(l))=ϕi(a(l))wiu^i(a(l)) = \phi_i(a(l))^\top w_i, with wiRdw_i \in \mathbb{R}^d. Observing TT i.i.d. samples D={a(t)}D = \{a^{(t)}\} drawn from an unknown equilibrium strategy σ\sigma^*, inverse game-theoretic learning asks for parameters w=(w1,w2)w = (w_1,w_2) such that σ(w)\sigma^*(w) as a CE matches empirical action frequencies.

A joint distribution σ\sigma over AA is a correlated equilibrium for payoffs uu if, for each player ii, any deviation aiAia_i' \in A_i satisfies

aAσ[a][ui(a)ui(ai,ai)]0.\sum_{a\in A} \sigma[a]\left[u^i(a)-u^i(a_i',a_{-i})\right]\geq 0.

2. Structure of the 2×22\times2 CE Polytope

Under the no strictly dominated strategies assumption, the CE polytope of a 2×22\times2 coordination game has exactly five extreme points (vertices) {σ(v)(w)}v=15\{\sigma_{(v)}(w)\}_{v=1}^5 [Calvo-Armengol 2003]. Each CE is a mixture over these vertices, parameterized as σ(w,y)=v=15yvσ(v)(w)\sigma(w,y)=\sum_{v=1}^5 y_v \sigma_{(v)}(w) with yΔ5y\in\Delta_5. Of particular importance is σ(3)(w)\sigma_{(3)}(w), the unique interior CE, for which the equilibrium probabilities are determined by payoff ratios:

  • Define

α=u1(a(1))u1(a(3))u1(a(4))u1(a(2)),β=u2(a(1))u2(a(2))u2(a(4))u2(a(3)).\alpha = \frac{|u_1(a(1))-u_1(a(3))|}{|u_1(a(4))-u_1(a(2))|},\qquad \beta = \frac{|u_2(a(1))-u_2(a(2))|}{|u_2(a(4))-u_2(a(3))|}.

  • The interior CE probabilities are:

σ(3)[a(1)]=1(1+α)(1+β) σ(3)[a(2)]=α(1+α)(1+β) σ(3)[a(3)]=β(1+α)(1+β) σ(3)[a(4)]=αβ(1+α)(1+β)\begin{aligned} \sigma_{(3)}[a(1)] &= \frac{1}{(1+\alpha)(1+\beta)}\ \sigma_{(3)}[a(2)] &= \frac{\alpha}{(1+\alpha)(1+\beta)}\ \sigma_{(3)}[a(3)] &= \frac{\beta}{(1+\alpha)(1+\beta)}\ \sigma_{(3)}[a(4)] &= \frac{\alpha\beta}{(1+\alpha)(1+\beta)} \end{aligned}

The other four vertices are degenerate CEs, placing probability mass on one or two pure profiles.

3. Maximum-Likelihood Estimation: Closed-Form Solution

Given action counts Tl=#{t:a(t)=a(l)}T_l = \#\{t: a^{(t)}=a(l)\} and empirical frequencies fl=Tl/Tf_l = T_l/T, the log-likelihood under CE parameterization is

(w,y)=l=14Tllog[σ(w,y)[a(l)]].\ell(w,y) = \sum_{l=1}^4 T_l \log[\sigma(w,y)[a(l)]].

Due to non-concavity of the log-likelihood in yy, the optimum always lies at a CE vertex. The estimator proceeds by:

  • Computing v(w)\ell_v(w) for v=15v=1\dots 5 (log-likelihood under each vertex).
  • Selecting v=argmaxvv(w)v^* = \arg\max_v \ell_v(w) and setting y=evy = e_{v^*}.
  • Optimizing ww for the selected vertex.

When v=3v^*=3 (interior CE), partial derivatives with respect to α,β\alpha,\beta and stationarity yield closed-form MLEs:

α^=T2+T4T1+T3,β^=T3+T4T1+T2.\hat{\alpha} = \frac{T_2+T_4}{T_1+T_3}, \qquad \hat{\beta} = \frac{T_3+T_4}{T_1+T_2}.

Substituting definitions, this translates to linear constraints on wiw_i:

u1(a(1))u1(a(3))u1(a(4))u1(a(2))=T2+T4T1+T3,u2(a(1))u2(a(2))u2(a(4))u2(a(3))=T3+T4T1+T2.\frac{|u_1(a(1))-u_1(a(3))|}{|u_1(a(4))-u_1(a(2))|} = \frac{T_2+T_4}{T_1+T_3}, \qquad \frac{|u_2(a(1))-u_2(a(2))|}{|u_2(a(4))-u_2(a(3))|} = \frac{T_3+T_4}{T_1+T_2}.

With normalization (e.g., fixing wi=1\|w_i\| = 1 or specifying a component), this yields wiw_i explicitly. If another vertex is selected, the constraints collapse to linear equalities reflecting the induced payoff ordering for pure or edge CEs.

4. Assumptions, Regularity, and Computational Properties

CE-ML relies on:

  • Absence of strictly dominated strategies, so α,β\alpha, \beta remain finite and the polytope has five vertices.
  • Sufficient data support: T1+T3>0T_1+T_3>0, T1+T2>0T_1+T_2>0 to ensure α^,β^\hat\alpha,\hat\beta are defined.
  • Unique maximal vertex for the likelihood (ties resolved arbitrarily).

Computation is efficient: single-pass statistics and closed-form solutions yield total complexity O(T)O(T). In degenerate data scenarios (e.g., samples occupy just two action cells), if denominators in the MLE vanish, any ww consistent with the observed payoff orderings suffices.

Numerical stability is ensured by selecting vertices that allocate zero mass to unobserved profiles when Tl=0T_l=0 for some ll.

5. Illustrative Application: Chicken-Dare Game

In the "chicken-dare" case [Bestick et al. 2013], 1000 simulated samples yield frequencies f1=0.28f_1=0.28, f2=0.17f_2=0.17, f3=0.12f_3=0.12, f4=0.43f_4=0.43. Applying the closed-form:

α^=0.17+0.430.28+0.12=0.600.40=1.5,β^=0.12+0.430.28+0.17=0.550.451.22.\hat{\alpha} = \frac{0.17+0.43}{0.28+0.12} = \frac{0.60}{0.40} = 1.5, \qquad \hat{\beta} = \frac{0.12+0.43}{0.28+0.17} = \frac{0.55}{0.45} \approx 1.22.

The interior vertex likelihood dominates, so payoff ratio constraints directly recover w1w_1 and w2w_2 up to scale, matching the true payoffs within numerical tolerance. This outcome demonstrates the estimator's ability to precisely reconstruct underlying game parameters when agent behavior is CE-conforming.

6. Empirical Evaluation and Performance

CE-ML was evaluated alongside Inverse Correlated Equilibrium (ICE) and Logit Best Response ML (LBR-ML) estimators on synthetic and SUMO traffic interaction data, with four primary experiments:

  • E1 (Chicken via CE): As TT increases, CE-ML MAE/RMSE improves (0.133/0.1380.084/0.0970.133/0.138 \to 0.084/0.097), besting ICE and LBR-ML.
  • E2 (Traffic, maximum-entropy CE): CE-ML achieves MAE 0.017\approx 0.017, ICE $0.111$, LBR-ML 0.041\approx 0.041, with CE-ML also giving the best KL and prediction accuracy.
  • E3 (Traffic with signaling device): CE-ML identifies the correct mixture vertex (y41y_4\approx 1) and 86.4% decision accuracy (ICE: 41.8%).
  • E4 (No coordination): CE-ML fails (non-CE data), but LBR-ML with fitted λ=(1,3)\lambda=(1,3) attains 72.6% accuracy, equaling the best fixed rationality baseline.

7. Comparison to Logit Best Response ML and Practical Recommendations

Criterion CE-ML LBR-ML
Interpretability Explicit payoff-ratio (α,β); mixture vertex interpretable Includes rationality λ; models stochastic adaptation
Computational Cost O(T)O(T), closed-form solution Requires 4×44\times 4 linear system, nonconvex optimization, O(iter43)O(\text{iter}\cdot 4^3)
Behavioral Assumptions One-shot correlation device, perfect regret consistency Repeated logit best responses, bounded rationality; robust to non-CE behavior

CE-ML achieves fast, closed-form inverse learning for small 2×22\times2 games when agent behavior plausibly arises from a CE, such as in coordinated, signaled, or regulated environments. In settings without a central correlating device—such as unregulated traffic or when stochastic adaptation is prominent—LBR-ML better captures bounded rationality and noisy, non-equilibrium patterns, albeit with higher computational overhead and additional parameters (Salazar et al., 15 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Closed-form Correlated Equilibrium Maximum-Likelihood Estimator (CE-ML).