WhoDunitEnv: Multi-Agent Collaboration Environment
- WhoDunitEnv is a multi-agent framework where agents collaborate under partial observability to identify a hidden culprit.
- It offers tunable parameters such as number of suspects, attribute sets, and communication bandwidth to adjust experimental complexity.
- Its structured roles and communication protocols enable the study of error prevention, robust intervention, and distributed decision-making.
WhoDunitEnv is a configurable, multi-agent collaboration environment designed as an episodic, turn-based stochastic game, with a primary application in the study of coordination, information-sharing, and robust communication among agents operating under information asymmetry or partial observability. It was introduced as a research platform to test monitoring and intervention mechanisms for error prevention in multi-agent systems, particularly in scenarios where a single “rogue” agent’s erroneous action can terminate a collaborative process and induce failure (Barbi et al., 9 Feb 2025).
1. Formal Specification and RL-Style Framework
WhoDunitEnv is formally defined as a tuple , where:
- denotes the state space. Each state is given by %%%%3%%%%:
- : set of suspect IDs.
- : the (hidden) culprit index.
- : agent knowledge sets, with private, distributed information.
- Asymmetric 2-agent: (“Accuser”) = full attributes of ; (“Intel”) = all suspects’ full descriptions except .
- Symmetric m-agent: Each agent receives a disjoint subset of culprit’s attributes.
- : communication channel, storing the structured message history.
- : current turn.
- : index of active agent.
- : agent-specific action spaces:
- Asymmetric (Accuser): .
- Asymmetric (Intel): .
- Symmetric: .
- : at each turn, agent receives its private knowledge , the full lineup (depending on mode), the communication channel , and the turn .
- Transition map :
- For communication actions (, etc.): message is added to , is incremented, control passes in round-robin order.
- For : terminal transition, outcome evaluated as “success” if .
- Reward : Defined at episode end (on accusation or timeout), sparse and zero-discounted. Correct accusation yields , incorrect , timeout or non-terminal steps $0$.
- Discount : System uses zero-discounted rewards; only episode outcome matters.
2. Modular Parameters and Environmental Knobs
WhoDunitEnv exposes a range of tunable parameters critical for experimental design:
- : Number of suspects (typically 6–25).
- : Number of attributes per suspect (typically 10–20).
- : Cardinality for each attribute (often 2–3).
- : Max turns per episode before termination (e.g., 31 for asymmetric, 20 for symmetric).
- Communication structure:
- Role symmetry (asymmetric 2-agent vs. symmetric m-agent).
- Message channel format (property=value + candidate lists, or flexible/free-form).
- Bandwidth: One query/fact per turn (fixed by action space, tunable).
- Intervention hyperparameter: Maximum number of mid-episode resets (0, 1, or 2), for experimenter-imposed agent corrections.
These parameters allow systematic control over instance difficulty, information distribution, and coordination demands.
3. Environment Initialization and Rollout Pseudocode
WhoDunitEnv provides a reproducible setup and simulation API, as specified by the provided pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
def make_whodunit_env(N, attributes, variant="asymmetric", Tmax=31): # 1. Sample suspect pool S = list(range(1, N+1)) suspect_profiles = {} for s in S: suspect_profiles[s] = {a: random.choice(attributes[a]) for a in attributes} # 2. Choose culprit c = random.choice(S) # 3. Build private knowledge if variant == "asymmetric": K_accuser = suspect_profiles[c].copy() K_intel = suspect_profiles.copy() agents = ["Accuser", "Intel"] else: # symmetric facts = list(suspect_profiles[c].items()) random.shuffle(facts) m = len(agents) K_list = [dict(facts[i::m]) for i in range(m)] # 4. Initialize dialogue C = [] state = (S, c, {K_accuser, K_intel}, C, t=1, i=0) return state, suspect_profiles def step(state, action): S, c, K, C, t, i = state if action.type == "accuse": done = True correct = (action.s_id == c) reward = +1 if correct else -1 return (state, reward, done) else: # communication C2 = C + [format_message(i, action)] i2 = (i+1) % num_agents t2 = t + 1 done = (t2 > Tmax) reward = 0 return ((S, c, K, C2, t2, i2), reward, done) |
This procedural design enables episodic rollouts, agent-driven action selection, and stepwise state progression.
4. Communication, Roles, and Knowledge Distribution
The core research utility of WhoDunitEnv arises from its explicit structuring of knowledge, role asymmetry, and communication:
- In the asymmetric 2-agent case, one agent (Accuser) has exclusive access to the culprit’s attributes, but no lineup context, while the other (Intel) has complete suspect profiles but must infer which facts isolate the culprit.
- In the symmetric m-agent variant, agents each possess a mutually exclusive subset of the culprit’s facts, enforcing distributed evidence integration.
- Dialogue unfolds as agents transmit structured messages, either targeted or broad, with every query, response, or assertion appended to a persistent, ordered channel.
- Action spaces and message formatting enforce constraints on bandwidth and informativeness, resulting in tension between efficiency, certainty, and failure propagation.
This suggests the environment is well-suited for isolating the impact of knowledge partitioning, interface design, and error-prevention interventions within multi-agent communication protocols.
5. Illustrative Toy Scenario and Gameplay Flow
A representative minimal scenario illustrates the sequential and communicative nature of play:
- 3 suspects with differing attributes (e.g., hat color, glasses, mood).
- The Accuser, knowing all attributes of the culprit, issues structured queries to Intel, who holds all profiles but does not know the culprit’s index.
- Over alternating turns, the Accuser requests attribute confirmations or broad-narrowing facts, with Intel responding per available knowledge.
- The episode ends upon accusation, with success contingent on the correct identification.
The following table summarizes action roles in the asymmetric 2-agent variant:
| Agent Role | Knowledge Access | Action Types |
|---|---|---|
| Accuser | Full description of culprit | request-specific, request-broad, accuse |
| Intel | Full descriptions of all suspects (not c) | respond, respond-broad |
6. Extensibility and Research Applications
WhoDunitEnv’s modularity supports a broad spectrum of experimental manipulations:
- Adjust , attribute sets, and value cardinalities to vary reasoning complexity.
- Alter communication protocols (structured vs. free-form), message cost, or add reward shaping (e.g., negative per turn).
- Employ different intervention hyperparameters for robustness analysis.
- Evaluate interventions to detect and override “rogue agent” actions, as demonstrated by agent monitoring and mid-episode resets (Barbi et al., 9 Feb 2025).
This modularity enables controlled studies of distributed inference, emergent language, and failure recovery within multi-agent RL and coordination contexts.
7. Empirical Impact and Significance
Experiments leveraging WhoDunitEnv, alongside code generation and systemic simulation tasks, demonstrate the effectiveness of monitoring and intervention architectures in reducing catastrophic system-level failures attributed to agent errors. Success-rate improvements up to 17.4% were reported on WhoDunitEnv tasks when such strategies were employed (Barbi et al., 9 Feb 2025). This establishes WhoDunitEnv as a benchmark for both algorithmic robustness research and the analysis of communication protocol design in distributed AI.