Extending consensus sampling to multiple or agentic interactions

Develop an extension of Algorithm 1 (Consensus sampling from k distributions, risk-competitive with the safest s) to settings involving multiple interactions, including agentic interactions, where the system engages in more than one interaction rather than a single-shot prompt–response scenario.

Background

The paper proposes a consensus sampling algorithm that aggregates k generative models to produce outputs with risk bounded relative to the safest s models, and analyzes per-prompt robustness, abstention, and information leakage.

While Section “Varying prompts” treats conditioning on a single prompt, the work’s guarantees are primarily single-shot. The authors explicitly note the need to go beyond this setting, identifying multi-step or agentic interaction scenarios as an open direction. Such settings raise additional challenges, including composition across rounds and potential cumulative leakage, that are not addressed by the current framework.

References

There are many open questions such as how to extend it to multiple (possibly agentic) interactions.

— Consensus Sampling for Safer Generative AI (2511.09493 - Kalai et al., 12 Nov 2025) in Section: Conclusions, Limitations, and Future Work — Future directions

Extending consensus sampling to multiple or agentic interactions

Background

References

Related Problems