NegotiationGym: Simulation Platform

Updated 17 December 2025

NegotiationGym is a configurable simulation environment that formalizes automated negotiation and multi-agent social interactions.
It integrates an OpenAI Gym-inspired API with modular scenario definitions and agent abstractions for systematic experimentation.
The platform supports various negotiation games, enabling both LLM-based and rule-based agents to learn and optimize negotiation strategies.

NegotiationGym is a general-purpose, configuration-driven simulation and benchmarking environment for research on automated negotiation and multi-agent social interaction. It defines an API, actor abstraction, and extensible scenario registry that formalizes canonical negotiation games—bargaining, resource exchange, multi-issue dialogue—and supports LLM-based, rule-based, and hybrid agents. NegotiationGym is specifically architected for experimental and learning-based research, enabling systematic evaluation and iteration of agent policies, utility criteria, and self-improving negotiation behaviors across diverse scenarios (Mangla et al., 5 Oct 2025, Bianchi et al., 2024, Laroche, 2017).

1. Architectural Foundations and Core Components

NegotiationGym extends core constructs from the OpenAI Gym interface to negotiation dialogues among multiple agents. Its architecture comprises:

Scenario definitions: Each scenario specifies state spaces (resources, valuations, hidden beliefs), action grammars (offers, partial/complete acceptances, message primitives), dialogue structure, and terminal conditions (e.g., agreement, maximal rounds).
Agent abstractions: Agents, implemented through a UtilityAgent base class, wrap LLMs or other policy modules. They expose optimization hooks—compute_utility(), learn_from_feedback(), and optionally optimize()—enabling direct or indirect self-improvement as they participate in repeated games.
Environment orchestrator: The central driver (e.g., SelectorGCSimulation) coordinates scenario initialization from a JSON/YAML config, manages turn-taking via group chat logic, applies environment constraints, and logs each turn and outcome.
Persistent state and logging: All dialogue actions and hidden agent states are serialized (typically as JSON), isolating public observation streams from private reasoning for robust analysis and possible counterfactual replay.
User interface: Both CLI and web-based GUI (backed by a MongoDB queue) support definition, execution, and monitoring of negotiation experiments.

This architecture is shaped by requirements for modular configuration, structured and auditable dialogues, reproducible experiments, and agent-agnostic extension with learning or heuristic policies (Mangla et al., 5 Oct 2025, Bianchi et al., 2024).

2. Scenario and Game Formalizations

NegotiationGym supports a wide spectrum of negotiation environments by formalizing the following scenario categories:

Resource Exchange: Agents possess resource vectors $\mathbf{R}_i \in \mathbb{Z}^k$ and attempt to maximize aggregate resources through bilateral trades. Turn-based proposals encode what each agent gives and receives; utility is typically $u_i = \|\mathbf{R}'_i\|_1$ after trades are applied (Bianchi et al., 2024).
Multi-Turn Ultimatum: A one-shot or multi-round game where one agent proposes a division $(w_1, w_2)$ of a pot $W$ , and the other may accept or counter. Payoffs correspond to accepted splits or zero if the proposal is rejected (Bianchi et al., 2024).
Buyer–Seller Games (Incomplete Information): Agents are endowed with hidden private minima or valuation— $c$ for the seller, $v$ for the buyer. Utilities are $u_s = p-c$ for the seller and $u_b = v-p$ for the buyer, with normalized forms for analysis and benchmarking surplus sharing (Mangla et al., 5 Oct 2025).
Complex Multi-Issue Negotiation: Each option is a tuple of $\ell$ features with per-feature or per-option costs; agents communicate through offer, request, repeat, and accept/partial-accept actions. Terminal rewards depend on the option chosen or absence of agreement, explicitly accounting for misunderstanding states and adversarial/ cooperative attitudes via reward weights $\omega^i$ and $\alpha^i$ (Laroche, 2017).

The system can encode additional scenarios by subclassing a base scenario class and supplying state definitions, turn protocols, action grammars, and payoff mappings.

3. Formal Models and Utility Functions

NegotiationGym employs mathematically explicit reward and transition models:

Resource Exchange:

State: $\mathbf{R}_1, \mathbf{R}_2 \in \mathbb{Z}^k$ .
Action: Trades $T = (\Delta_1, \Delta_2)$ feasible iff $\mathbf{R}_i - \Delta_i \geq 0$ .
Payoff: $u_i = \|\mathbf{R}'_i\|_1$ .

Ultimatum:

State: Pot $W$ to be split.
Action: Proposal $(w_1, w_2)$ ; accept/reject.
Payoff: $u_i = w_i$ if accepted, else $0$.

Buyer–Seller:

Parameters: $p$ (final price), $c$ (seller min), $v$ (buyer max), $a$ (ask).
Utilities:
- $u_{\text{buyer}} = \frac{\text{budget} - p}{\text{budget}} \in [0,1]$ ,
- $u_{\text{seller}} = \frac{p-\text{floor}}{\text{ask}-\text{floor}} \in [0,1]$ ,
- Surplus shares: $\text{buyer\_ss} = \frac{\text{ask}-p}{\text{ask}-\text{floor}}$ , $\text{seller\_ss} = \frac{p-\text{floor}}{\text{ask}-\text{floor}}$ .
No-deal episodes yield zero utility.

Multi-Issue (Complex Negotiation):

Each option $\tau$ is a tuple in $\mathcal{F}^1 \times \cdots \times \mathcal{F}^\ell$ .
Cost: $c^i_\tau = \hat{c}^i_\tau + \sum_k \hat{c}^i_k$ for agent $i$ .
Rewards: $R^i = \omega^i - c^i_{\tau^*} + \alpha^i \sum_{j\neq i} (\omega^j - c^j_{\tau^*})$ for agreement, with other forms for failure or misunderstanding; $\alpha^i$ allows modeling adversarial or cooperative attitudes (Laroche, 2017).
Observations integrate noise models (feature error rate: $FER$ ), supporting stochasticity in communication channels.

This formal structure enables direct application of RL, multi-agent RL, or bandit optimization solvers and supports parametric analysis of strategic behavior and equilibria.

4. Agent Model and Self-Optimization

Agents in NegotiationGym act in repeated simulation rounds, each consisting of interactive dialogue with one or more partners. The UtilityAgent abstraction supports:

Utility evaluation: compute_utility(E) returns the scalar utility for completed episodes, enabling learning-from-outcome.
Self-improvement: learn_from_feedback(E) and optionally optimize(E). The default method is prompt-based reflection: for agents with self_improve=true, the agent assembles a window of recent episodes (e.g., $K=10$ ), generates a reflection prompt, and invokes its LLM model to synthesize updated system prompts aimed at utility maximization. Alternate strategies—contextual bandits, offline RL, or gradient-free search—are permitted via override (Mangla et al., 5 Oct 2025).
Interaction with the environment: At each turn, the agent formats the observation into a prompt (which may include scenario context, dialogue history, and feedback), invokes the underlying policy (LLM or scripted), and parses output to structured actions using dedicated XML/JSON grammar modules (Bianchi et al., 2024).

This model supports side-by-side comparison of reflection-driven adaptation, hand-crafted policy variants, and learning-based optimization in negotiation tasks.

5. API, Configuration, and Extensibility

NegotiationGym exposes both programmatic and configuration-driven APIs:

Configuration schema (JSON/YAML): Specifies the scenario, agent list, negotiation parameters, termination rules, and agent-level optimization metadata. Below is a representative agent spec:

{
  "name": "Buyer",
  "description": "Seeks the lowest possible price",
  "prompt": "...",
  "utility_class": "BuyerAgent",
  "strategy": { "max_price": 400 },
  "self_improve": true,
  "optimization_target": true
}

Environment class: Implements Gym-style reset(), step(action), render(), and provides access to cumulative state and logs.
Agent interface: Agents are loaded as per their declared utility_class, and may be extended via subclassing for rule-based, RL, imitation learning, or LLM-centric approaches.
Scenario registry: Out-of-the-box scenarios (e.g., resource exchange, buyer–seller, multi-issue scheduling, time-pressured bargaining) are parameterized and registered with the environment for easy instantiation.
Serialization and logging: Full state—including private parameters and complete interaction history—can be serialized for reproducibility and audit, supporting advanced analysis like counterfactual replay or multi-agent diagnostics.

CLI and GUI frontends allow simulation execution, with job management, monitoring, and output visualization, further supporting reproducible research and collaborative workflow (Mangla et al., 5 Oct 2025).

6. Evaluation Protocols and Metrics

NegotiationGym supports a rigorous evaluation regime, including:

Per-episode metrics: Utility for each agent, surplus shares, deal success (binary).
Aggregate statistics: Cumulative average utility, win rate, deal rate, surplus Pareto front analysis.
Behavioral diagnostics: Anchoring coefficient (Spearman correlation between initial and final offers), acceptance curve estimation $\hat{P}(\mathrm{accept}\mid\$offer=k) $, split-the-difference bias, scaling bias (pot size sensitivity), social behavior impact of induced personas (e.g., “Cunning” or “Desperate”), and fairness indices (e.g.,$ |R¹ - R^{2| $)</sup> (<a href="/papers/2402.05863" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Bianchi et al., 2024</a>).</li> <li><strong>Experiment protocols</strong>: Agents are evaluated over$ N$ repeated episodes, sometimes alternating roles (first/second mover). Metrics are visualized as matrices/heatmaps, and full game transcripts are available for post hoc inspection.}

Experimental case studies have demonstrated, for example, that prompt-reflecting buyers in buyer–seller settings systematically increase their utility while adversely affecting seller returns, and that when both parties are permitted to reflect, outcomes become more balanced with higher deal rates (Mangla et al., 5 Oct 2025).

7. Implementation Practices and Advanced Use

Implementation is facilitated via repository cloning and standard Python environment setup:

git clone https://github.com/chrishokamp/multi-agent-social-simulation.git
cd multi-agent-social-simulation
pip install -r requirements.txt
negotiation_gym run --config my_config.json

or for GUI-based execution:

1	negotiation_gym ui --db mongodb://localhost:27017/negotiation_gym

(Mangla et al., 5 Oct 2025).

Best practices for extending or adapting NegotiationGym include:

Isolating parsing logic for structured action grammars (XML/JSON) to allow scenario-specific extensions and alternate natural language templates.
Seeding all stochasticity (costs, beliefs) for experimental reproducibility.
Encapsulating resource and trade operations in custom objects for transactional integrity.
Separating public dialogue channels from private agent logics to allow partial observability and privacy-preserving analysis.
Maintaining modular scenario and agent registries for rapid prototyping.
Bundling reference analysis scripts/notebooks to compute key statistics and visualize negotiation trajectories.

These principles, embodied across NegotiationGym, facilitate both empirical and theoretical research across negotiation, multi-agent learning, and algorithmic social science (Mangla et al., 5 Oct 2025, Bianchi et al., 2024, Laroche, 2017).

PDF Markdown Chat (Pro)

References (3)

NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment (2025)

How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis (2024)

The Complex Negotiation Dialogue Game (2017)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to NegotiationGym.