NegotiationGym: Simulation Platform
- NegotiationGym is a configurable simulation environment that formalizes automated negotiation and multi-agent social interactions.
- It integrates an OpenAI Gym-inspired API with modular scenario definitions and agent abstractions for systematic experimentation.
- The platform supports various negotiation games, enabling both LLM-based and rule-based agents to learn and optimize negotiation strategies.
NegotiationGym is a general-purpose, configuration-driven simulation and benchmarking environment for research on automated negotiation and multi-agent social interaction. It defines an API, actor abstraction, and extensible scenario registry that formalizes canonical negotiation games—bargaining, resource exchange, multi-issue dialogue—and supports LLM-based, rule-based, and hybrid agents. NegotiationGym is specifically architected for experimental and learning-based research, enabling systematic evaluation and iteration of agent policies, utility criteria, and self-improving negotiation behaviors across diverse scenarios (Mangla et al., 5 Oct 2025, Bianchi et al., 2024, Laroche, 2017).
1. Architectural Foundations and Core Components
NegotiationGym extends core constructs from the OpenAI Gym interface to negotiation dialogues among multiple agents. Its architecture comprises:
- Scenario definitions: Each scenario specifies state spaces (resources, valuations, hidden beliefs), action grammars (offers, partial/complete acceptances, message primitives), dialogue structure, and terminal conditions (e.g., agreement, maximal rounds).
- Agent abstractions: Agents, implemented through a
UtilityAgentbase class, wrap LLMs or other policy modules. They expose optimization hooks—compute_utility(),learn_from_feedback(), and optionallyoptimize()—enabling direct or indirect self-improvement as they participate in repeated games. - Environment orchestrator: The central driver (e.g.,
SelectorGCSimulation) coordinates scenario initialization from a JSON/YAML config, manages turn-taking via group chat logic, applies environment constraints, and logs each turn and outcome. - Persistent state and logging: All dialogue actions and hidden agent states are serialized (typically as JSON), isolating public observation streams from private reasoning for robust analysis and possible counterfactual replay.
- User interface: Both CLI and web-based GUI (backed by a MongoDB queue) support definition, execution, and monitoring of negotiation experiments.
This architecture is shaped by requirements for modular configuration, structured and auditable dialogues, reproducible experiments, and agent-agnostic extension with learning or heuristic policies (Mangla et al., 5 Oct 2025, Bianchi et al., 2024).
2. Scenario and Game Formalizations
NegotiationGym supports a wide spectrum of negotiation environments by formalizing the following scenario categories:
- Resource Exchange: Agents possess resource vectors and attempt to maximize aggregate resources through bilateral trades. Turn-based proposals encode what each agent gives and receives; utility is typically after trades are applied (Bianchi et al., 2024).
- Multi-Turn Ultimatum: A one-shot or multi-round game where one agent proposes a division of a pot , and the other may accept or counter. Payoffs correspond to accepted splits or zero if the proposal is rejected (Bianchi et al., 2024).
- Buyer–Seller Games (Incomplete Information): Agents are endowed with hidden private minima or valuation— for the seller, for the buyer. Utilities are for the seller and for the buyer, with normalized forms for analysis and benchmarking surplus sharing (Mangla et al., 5 Oct 2025).
- Complex Multi-Issue Negotiation: Each option is a tuple of features with per-feature or per-option costs; agents communicate through offer, request, repeat, and accept/partial-accept actions. Terminal rewards depend on the option chosen or absence of agreement, explicitly accounting for misunderstanding states and adversarial/ cooperative attitudes via reward weights and (Laroche, 2017).
The system can encode additional scenarios by subclassing a base scenario class and supplying state definitions, turn protocols, action grammars, and payoff mappings.
3. Formal Models and Utility Functions
NegotiationGym employs mathematically explicit reward and transition models:
Resource Exchange:
- State: .
- Action: Trades feasible iff .
- Payoff: .
Ultimatum:
- State: Pot to be split.
- Action: Proposal ; accept/reject.
- Payoff: if accepted, else $0$.
Buyer–Seller:
- Parameters: (final price), (seller min), (buyer max), (ask).
- Utilities:
- ,
- ,
- Surplus shares: , .
- No-deal episodes yield zero utility.
Multi-Issue (Complex Negotiation):
- Each option is a tuple in .
- Cost: for agent .
- Rewards: for agreement, with other forms for failure or misunderstanding; allows modeling adversarial or cooperative attitudes (Laroche, 2017).
- Observations integrate noise models (feature error rate: ), supporting stochasticity in communication channels.
This formal structure enables direct application of RL, multi-agent RL, or bandit optimization solvers and supports parametric analysis of strategic behavior and equilibria.
4. Agent Model and Self-Optimization
Agents in NegotiationGym act in repeated simulation rounds, each consisting of interactive dialogue with one or more partners. The UtilityAgent abstraction supports:
- Utility evaluation:
compute_utility(E)returns the scalar utility for completed episodes, enabling learning-from-outcome. - Self-improvement:
learn_from_feedback(E)and optionallyoptimize(E). The default method is prompt-based reflection: for agents withself_improve=true, the agent assembles a window of recent episodes (e.g., ), generates a reflection prompt, and invokes its LLM model to synthesize updated system prompts aimed at utility maximization. Alternate strategies—contextual bandits, offline RL, or gradient-free search—are permitted via override (Mangla et al., 5 Oct 2025). - Interaction with the environment: At each turn, the agent formats the observation into a prompt (which may include scenario context, dialogue history, and feedback), invokes the underlying policy (LLM or scripted), and parses output to structured actions using dedicated XML/JSON grammar modules (Bianchi et al., 2024).
This model supports side-by-side comparison of reflection-driven adaptation, hand-crafted policy variants, and learning-based optimization in negotiation tasks.
5. API, Configuration, and Extensibility
NegotiationGym exposes both programmatic and configuration-driven APIs:
- Configuration schema (JSON/YAML): Specifies the scenario, agent list, negotiation parameters, termination rules, and agent-level optimization metadata. Below is a representative agent spec:
1 2 3 4 5 6 7 8 9
{ "name": "Buyer", "description": "Seeks the lowest possible price", "prompt": "...", "utility_class": "BuyerAgent", "strategy": { "max_price": 400 }, "self_improve": true, "optimization_target": true } - Environment class: Implements Gym-style
reset(),step(action),render(), and provides access to cumulative state and logs. - Agent interface: Agents are loaded as per their declared
utility_class, and may be extended via subclassing for rule-based, RL, imitation learning, or LLM-centric approaches. - Scenario registry: Out-of-the-box scenarios (e.g., resource exchange, buyer–seller, multi-issue scheduling, time-pressured bargaining) are parameterized and registered with the environment for easy instantiation.
- Serialization and logging: Full state—including private parameters and complete interaction history—can be serialized for reproducibility and audit, supporting advanced analysis like counterfactual replay or multi-agent diagnostics.
CLI and GUI frontends allow simulation execution, with job management, monitoring, and output visualization, further supporting reproducible research and collaborative workflow (Mangla et al., 5 Oct 2025).
6. Evaluation Protocols and Metrics
NegotiationGym supports a rigorous evaluation regime, including:
- Per-episode metrics: Utility for each agent, surplus shares, deal success (binary).
- Aggregate statistics: Cumulative average utility, win rate, deal rate, surplus Pareto front analysis.
- Behavioral diagnostics: Anchoring coefficient (Spearman correlation between initial and final offers), acceptance curve estimation $\hat{P}(\mathrm{accept}\mid\$offer=k)|R1 - R2|N$ repeated episodes, sometimes alternating roles (first/second mover). Metrics are visualized as matrices/heatmaps, and full game transcripts are available for post hoc inspection.
Experimental case studies have demonstrated, for example, that prompt-reflecting buyers in buyer–seller settings systematically increase their utility while adversely affecting seller returns, and that when both parties are permitted to reflect, outcomes become more balanced with higher deal rates (Mangla et al., 5 Oct 2025).
7. Implementation Practices and Advanced Use
Implementation is facilitated via repository cloning and standard Python environment setup:
1 2 3 4 |
git clone https://github.com/chrishokamp/multi-agent-social-simulation.git cd multi-agent-social-simulation pip install -r requirements.txt negotiation_gym run --config my_config.json |
1 |
negotiation_gym ui --db mongodb://localhost:27017/negotiation_gym |
Best practices for extending or adapting NegotiationGym include:
- Isolating parsing logic for structured action grammars (XML/JSON) to allow scenario-specific extensions and alternate natural language templates.
- Seeding all stochasticity (costs, beliefs) for experimental reproducibility.
- Encapsulating resource and trade operations in custom objects for transactional integrity.
- Separating public dialogue channels from private agent logics to allow partial observability and privacy-preserving analysis.
- Maintaining modular scenario and agent registries for rapid prototyping.
- Bundling reference analysis scripts/notebooks to compute key statistics and visualize negotiation trajectories.
These principles, embodied across NegotiationGym, facilitate both empirical and theoretical research across negotiation, multi-agent learning, and algorithmic social science (Mangla et al., 5 Oct 2025, Bianchi et al., 2024, Laroche, 2017).