LBR-ML: Logit Best Response MLE in 2x2 Games
- LBR-ML is a parametric inverse learning method that models adaptive, bounded-rational game play in repeated 2x2 settings using logit best responses.
- It employs a Markov chain framework to derive a unique stationary distribution for inferring player utility parameters and rationality levels.
- Empirical studies, including traffic and synthetic experiments, demonstrate its effectiveness in recovering decision metrics under various sample sizes.
The Logit Best Response Maximum-Likelihood Estimator (LBR-ML) is a parametric inverse learning approach for modeling strategic adaptation in repeated games. Designed to infer player utility parameters and rationality levels from joint-action data, LBR-ML uniquely emphasizes stochastic, path-dependent dynamics arising from bounded-rationality, as opposed to consistency with static equilibrium concepts. This methodology formalizes the connection between behavioral game theory and statistical estimation by directly fitting the long-run stationary distribution induced by repeated logit best response updates to observed data through maximum-likelihood optimization (Salazar et al., 15 Jan 2026).
1. Game Model and Stochastic Logit Best Response
LBR-ML operates in two-player, normal-form games. Each player selects between two actions , yielding four possible joint action profiles with standard lex order:
Each player’s payoff is linear in known features: for . The core behavioral assumption is that agents select actions via the (Blume) logit best response: at each stage , given the opponent’s last action, player chooses with probability
where is the rationality (inverse-temperature) parameter. As , choices concentrate on best responses; as , behavior becomes random.
2. Markov Transition Dynamics and Stationary Distribution
One-step transition probabilities form a Markov chain, with
Defining
- , ,
- ,
the explicit transition matrix is given by:
| Row/Col | ||||
|---|---|---|---|---|
For , is primitive, guaranteeing a unique stationary distribution , which solves
Closed-form expressions are available for action marginals : Joint-action probabilities are then .
3. Likelihood-based Parameter Estimation
Given i.i.d. samples from the stationary regime, LBR-ML seeks maximizing the log-likelihood: or equivalently, minimizing the negative log-likelihood: The parameter set consists of and . No regularizer is imposed in the basic LBR-ML, but or penalties are possible.
Optimization proceeds by alternating:
- Computing given
- Assembling and evaluating stationary
- Calculating and its gradients via chain rule
- Taking an update step using an off-the-shelf optimizer (e.g., L-BFGS)
Per-iteration computational overhead is : for the likelihood/gradient, for the stationary distribution.
4. Identifiability and Optimization Properties
LBR-ML’s identifiability hinges on the variation of the feature maps across all ; otherwise, is not uniquely determined, especially since multiplicative scaling between and produces equivalent behavior. Common practice is to fix or anchor a component of for scale identification.
The loss surface is typically non-convex in joint parameters; thus, global convergence is not assured and multiple random restarts are necessary. Locally, smoothness and Lipschitz-gradient properties follow from the analytic structure of exponentials and rational functions.
5. Empirical Evaluation and Use Cases
Chicken–Dare Synthetic Experiment
For ground-truth parameters , and sample sizes , LBR-ML achieves:
| MAE | RMSE | |
|---|---|---|
| 500 | 0.155 | 0.194 |
| 1000 | 0.433 | 0.442 |
| 2000 | 0.131 | 0.176 |
Predicted joint-action distributions match the correlated equilibrium maximum-likelihood (CE-ML) baseline less closely at small but improve as increases.
SUMO Traffic Interaction
With (dimension 8) generated under CE assumptions, LBR-ML reports MAE 0.04–0.05, outperforming inverse compositional estimation (ICE) but trailing CE-ML (MAE 0.017). Fitted suggest near-deterministic agent policies.
Traffic Without Coordination
Under scenarios lacking any correlated equilibrium structure, LBR-ML with estimated delivers decision prediction accuracy, compared to for fixed . Estimated , reflect heterogenous bounded rationality.
6. Implementation Considerations and Guidelines
Practical guidance includes:
- Normalize feature maps for stability.
- Initialize , via ordinary logistic regression on marginal distributions.
- Employ L-BFGS with line-search and minimum 10 random restarts.
- Monitor : large values indicate near-deterministic best response, small values approach random play.
Identifiability issues persist if features lack variation or scale is unconstrained; fixing or setting a reference weight component circumvents degeneracy.
7. Advantages, Limitations, and Recommended Usage
Advantages:
- Captures adaptive, path-dependent strategies without centralized correlating devices.
- Recovers both individual utility weights and agent rationality parameters.
- Interpretable even absent any static equilibrium in the generated data.
Limitations:
- Non-convexity in necessitates random restarts.
- Scale between and is non-identifiable unless anchored.
- Assumes data stationary under a single logit-response regime.
Recommended Usage:
- Preferable whenever emergent game interaction lacks static correlated equilibrium structure.
- Suitable for traffic modeling and other repeated-interaction domains where agent bounded rationality and adaptation are central.
- Not suitable when data closely follows a correlated equilibrium generated from a central device, where CE-ML provides superior fit.
LBR-ML provides a practical and theoretically grounded framework for inferring behaviorally meaningful models from aggregate multi-agent data, leveraging the analytic tractability of the case to support robust optimization and comprehensive parameter recovery (Salazar et al., 15 Jan 2026).