Papers
Topics
Authors
Recent
Search
2000 character limit reached

LBR-ML: Logit Best Response MLE in 2x2 Games

Updated 22 January 2026
  • LBR-ML is a parametric inverse learning method that models adaptive, bounded-rational game play in repeated 2x2 settings using logit best responses.
  • It employs a Markov chain framework to derive a unique stationary distribution for inferring player utility parameters and rationality levels.
  • Empirical studies, including traffic and synthetic experiments, demonstrate its effectiveness in recovering decision metrics under various sample sizes.

The Logit Best Response Maximum-Likelihood Estimator (LBR-ML) is a parametric inverse learning approach for modeling strategic adaptation in repeated 2×22\times2 games. Designed to infer player utility parameters and rationality levels from joint-action data, LBR-ML uniquely emphasizes stochastic, path-dependent dynamics arising from bounded-rationality, as opposed to consistency with static equilibrium concepts. This methodology formalizes the connection between behavioral game theory and statistical estimation by directly fitting the long-run stationary distribution induced by repeated logit best response updates to observed data through maximum-likelihood optimization (Salazar et al., 15 Jan 2026).

1. Game Model and Stochastic Logit Best Response

LBR-ML operates in two-player, 2×22\times2 normal-form games. Each player i{1,2}i\in\{1,2\} selects between two actions Ai={ai1,ai2}A_i=\{a_i^1,a_i^2\}, yielding four possible joint action profiles A=A1×A2={a(1),a(2),a(3),a(4)}A=A_1\times A_2=\{a(1),a(2),a(3),a(4)\} with standard lex order:

  • a(1)=(a11,a21)a(1) = (a_1^1,a_2^1)
  • a(2)=(a11,a22)a(2) = (a_1^1,a_2^2)
  • a(3)=(a12,a21)a(3) = (a_1^2,a_2^1)
  • a(4)=(a12,a22)a(4) = (a_1^2,a_2^2)

Each player’s payoff is linear in known features: uiwi(a)=ϕi(a)wiu_i^{w_i}(a) = \phi_i(a)^\top w_i for wiRdw_i\in\mathbb{R}^d. The core behavioral assumption is that agents select actions via the (Blume) logit best response: at each stage kk, given the opponent’s last action, player ii chooses aia_i with probability

σi(λi,wi)[aiai(k)]=exp(λiuiwi(ai,ai(k)))aiexp(λiuiwi(ai,ai(k)))\sigma_i(\lambda_i, w_i)[a_i\,|\,a_{-i}(k)] = \frac{\exp\left(\lambda_i\cdot u_i^{w_i}(a_i, a_{-i}(k)) \right)}{\sum_{a_i'} \exp\left(\lambda_i\cdot u_i^{w_i}(a_i', a_{-i}(k))\right)}

where λi0\lambda_i \geq 0 is the rationality (inverse-temperature) parameter. As λi\lambda_i \to \infty, choices concentrate on best responses; as λi0\lambda_i \to 0, behavior becomes random.

2. Markov Transition Dynamics and Stationary Distribution

One-step transition probabilities form a 4×44\times4 Markov chain, with

P(λ,w)k=i=12σi(λi,wi)[ai()ai(k)]P(\lambda, w)_{k\ell} = \prod_{i=1}^2 \sigma_i(\lambda_i, w_i)[a_i(\ell)\,|\,a_{-i}(k)]

Defining

  • s1=σ1(λ1,w1)[a11a21]s_1 = \sigma_1(\lambda_1, w_1)[a_1^1\,|\,a_2^1], s2=σ1(λ1,w1)[a11a22]s_2 = \sigma_1(\lambda_1, w_1)[a_1^1\,|\,a_2^2],
  • t1=σ2(λ2,w2)[a21a11]t_1 = \sigma_2(\lambda_2, w_2)[a_2^1\,|\,a_1^1], t2=σ2(λ2,w2)[a21a12]t_2 = \sigma_2(\lambda_2, w_2)[a_2^1\,|\,a_1^2]

the explicit transition matrix P(λ,w)P(\lambda,w) is given by:

Row/Col a(1)a(1) a(2)a(2) a(3)a(3) a(4)a(4)
a(1)a(1) s1t1s_1 t_1 s1(1t1)s_1(1-t_1) (1s1)t1(1-s_1)t_1 (1s1)(1t1)(1-s_1)(1-t_1)
a(2)a(2) s2t1s_2 t_1 s2(1t1)s_2(1-t_1) (1s2)t1(1-s_2)t_1 (1s2)(1t1)(1-s_2)(1-t_1)
a(3)a(3) s1t2s_1 t_2 s1(1t2)s_1(1-t_2) (1s1)t2(1-s_1)t_2 (1s1)(1t2)(1-s_1)(1-t_2)
a(4)a(4) s2t2s_2 t_2 s2(1t2)s_2(1-t_2) (1s2)t2(1-s_2)t_2 (1s2)(1t2)(1-s_2)(1-t_2)

For 0<λi<0<\lambda_i<\infty, PP is primitive, guaranteeing a unique stationary distribution σ(λ,w)\sigma^*(\lambda, w), which solves

σ=σP,=14σ[a()]=1\sigma^* = \sigma^* P,\qquad \sum_{\ell=1}^4 \sigma^*[a(\ell)] = 1

Closed-form expressions are available for action marginals x=Pr(a11), y=Pr(a21)x=\Pr(a_1^1),\ y=\Pr(a_2^1): x=s2+(s1s2)t21(s1s2)(t1t2),y=t2+(t1t2)s21(s1s2)(t1t2)x = \frac{s_2 + (s_1-s_2)t_2}{1 - (s_1-s_2)(t_1-t_2)}\,,\qquad y = \frac{t_2 + (t_1-t_2)s_2}{1 - (s_1-s_2)(t_1-t_2)} Joint-action probabilities are then {σ[a(1)]=xy, σ[a(2)]=x(1y), σ[a(3)]=(1x)y, σ[a(4)]=(1x)(1y)}\{\sigma^*[a(1)] = xy,\ \sigma^*[a(2)] = x(1{-}y),\ \sigma^*[a(3)] = (1{-}x)y,\ \sigma^*[a(4)] = (1{-}x)(1{-}y)\}.

3. Likelihood-based Parameter Estimation

Given TT i.i.d. samples D={a(t)}t=1TD = \{a^{(t)}\}_{t=1}^T from the stationary regime, LBR-ML seeks (λ,w)(\lambda,w) maximizing the log-likelihood: LT(λ,w)=t=1Tσ(λ,w)[a(t)]L_T(\lambda, w) = \prod_{t=1}^T \sigma^*(\lambda, w)[a^{(t)}] or equivalently, minimizing the negative log-likelihood: (λ,w)=t=1Tlogσ(λ,w)[a(t)]\ell(\lambda,w) = -\sum_{t=1}^T \log \sigma^*(\lambda, w)[a^{(t)}] The parameter set consists of λ=(λ1,λ2)0\lambda=(\lambda_1,\lambda_2)\ge 0 and w=(w1,w2)w = (w_1,w_2). No regularizer is imposed in the basic LBR-ML, but 2\ell_2 or 1\ell_1 penalties are possible.

Optimization proceeds by alternating:

  • Computing s1,s2,t1,t2s_1,s_2,t_1,t_2 given (λ,w)(\lambda,w)
  • Assembling P(λ,w)P(\lambda,w) and evaluating stationary x,yx,y
  • Calculating \ell and its gradients via chain rule
  • Taking an update step using an off-the-shelf optimizer (e.g., L-BFGS)

Per-iteration computational overhead is O(T+d)O(T + d): O(T)O(T) for the likelihood/gradient, O(1)O(1) for the stationary distribution.

4. Identifiability and Optimization Properties

LBR-ML’s identifiability hinges on the variation of the feature maps ϕi(a)\phi_i(a) across all aAa\in A; otherwise, wiw_i is not uniquely determined, especially since multiplicative scaling between λi\lambda_i and wiw_i produces equivalent behavior. Common practice is to fix λi\lambda_i or anchor a component of ww for scale identification.

The loss surface (λ,w)\ell(\lambda,w) is typically non-convex in joint parameters; thus, global convergence is not assured and multiple random restarts are necessary. Locally, smoothness and Lipschitz-gradient properties follow from the analytic structure of exponentials and rational functions.

5. Empirical Evaluation and Use Cases

Chicken–Dare Synthetic Experiment

For ground-truth parameters w1=[0.3,0.7]w_1^*=[0.3,0.7], w2=[0.4,0.6]w_2^*=[0.4,0.6] and sample sizes T{500,1000,2000}T\in\{500,1000,2000\}, LBR-ML achieves:

TT MAE RMSE
500 0.155 0.194
1000 0.433 0.442
2000 0.131 0.176

Predicted joint-action distributions match the correlated equilibrium maximum-likelihood (CE-ML) baseline less closely at small TT but improve as TT increases.

SUMO Traffic Interaction

With ww (dimension 8) generated under CE assumptions, LBR-ML reports MAE \sim0.04–0.05, outperforming inverse compositional estimation (ICE) but trailing CE-ML (MAE \sim0.017). Fitted (λ1,λ2)(3.0,3.0)(\lambda_1,\lambda_2) \approx (3.0,3.0) suggest near-deterministic agent policies.

Traffic Without Coordination

Under scenarios lacking any correlated equilibrium structure, LBR-ML with λ\lambda estimated delivers 72.6%\approx72.6\% decision prediction accuracy, compared to 62%\approx62\% for fixed λ=1.0\lambda=1.0. Estimated λ1=1.0\lambda_1=1.0, λ2=3.0\lambda_2=3.0 reflect heterogenous bounded rationality.

6. Implementation Considerations and Guidelines

Practical guidance includes:

  • Normalize feature maps ϕi\phi_i for stability.
  • Initialize λi1\lambda_i\approx1, wiw_i via ordinary logistic regression on marginal distributions.
  • Employ L-BFGS with line-search and minimum 10 random restarts.
  • Monitor λi\lambda_i: large values indicate near-deterministic best response, small values approach random play.

Identifiability issues persist if features lack variation or scale is unconstrained; fixing λi\lambda_i or setting a reference weight component circumvents degeneracy.

Advantages:

  • Captures adaptive, path-dependent strategies without centralized correlating devices.
  • Recovers both individual utility weights and agent rationality parameters.
  • Interpretable even absent any static equilibrium in the generated data.

Limitations:

  • Non-convexity in (λ,w)(\lambda,w) necessitates random restarts.
  • Scale between λi\lambda_i and wiw_i is non-identifiable unless anchored.
  • Assumes data stationary under a single logit-response regime.

Recommended Usage:

  • Preferable whenever emergent game interaction lacks static correlated equilibrium structure.
  • Suitable for traffic modeling and other repeated-interaction domains where agent bounded rationality and adaptation are central.
  • Not suitable when data closely follows a correlated equilibrium generated from a central device, where CE-ML provides superior fit.

LBR-ML provides a practical and theoretically grounded framework for inferring behaviorally meaningful models from aggregate multi-agent data, leveraging the analytic tractability of the 2×22\times2 case to support robust optimization and comprehensive parameter recovery (Salazar et al., 15 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Logit Best Response Maximum-Likelihood Estimator (LBR-ML).