Create a Video View Paper

Refute-or-Promote: Adversarial Multi-Agent Review for High-Precision Defect Discovery

This presentation examines a novel adversarial pipeline designed to address the precision crisis in language model-assisted defect discovery. Through staged adversarial review, context isolation, cross-model critique, and mandatory empirical validation, the Refute-or-Promote methodology achieves an 83% rejection rate of false positives while successfully identifying CVEs and compiler bugs. The talk reveals why model consensus is uninformative and demonstrates that reliability emerges only from systematic adversarial challenge and runtime verification.

Script

When curl permanently closed its bug bounty and HackerOne paused the Internet Bug Bounty, the reason was clear: language models were flooding maintainers with plausible-sounding defect reports that were almost always wrong. The Refute-or-Promote methodology addresses this precision crisis through adversarial multi-agent review—a pipeline where agents are explicitly mandated to destroy findings, not refine them.

The problem starts with plausibility bias and correlated errors. When multiple language models see the same context, they make the same mistakes together—achieving consensus even when the analysis is fundamentally wrong. The pipeline counters this through stratified context hunting and context asymmetry: parallel candidate generation across isolated contexts, with cold-start reviewers assigned to eliminate anchoring.

Adversarial kill mandates form the core mechanism. Agents receive explicit track assignments with a destructive role: find grounds to disprove, not evaluate or refine. This is not cooperative debate. A final cross-model critic stage deploys agents from distinct model families to catch error modes invisible to homogeneous ensembles—in one library, 16% of same-family approvals were flagged for correctness errors by cross-model review.

Over 31 days across seven targets including security libraries and the ISO C plus plus standard, the pipeline processed 171 candidates. The adversarial review rejected 83% before human operators ever saw them. The survivors produced 4 CVEs affecting libfuse, lcms2, and OpenSSL, plus confirmed compiler conformance bugs and merged security fixes. But here's the critical lesson: in one case, 80 agents unanimously endorsed a Bleichenbacher oracle in OpenSSL—later falsified by empirical testing.

That false positive—endorsed by unanimous consensus but disproven by runtime validation—reveals the non-negotiable requirement: no candidate advances absent empirical demonstration. Model agreement is uninformative in the presence of shared bias. Reliability emerges only from adversarial challenge, context isolation, cross-model critique, and empirical grounding. The tradeoff is real: this unidirectional precision filter depends on human orchestration to resurrect improperly killed true positives.

Refute-or-Promote demonstrates that consensus is not correctness—that precision in high-stakes language model applications demands systematic adversarial decomposition and empirical gates. The methodology, its orchestration artifacts, and the evidence from production deployments are all available. If you're interested in exploring how adversarial architecture can move beyond plausible generation toward trustworthy discovery, visit EmergentMind.com to learn more and create your own video explaining this work.