Question-Objection Dialogue Protocol
- Question–objection dialogue is a structured protocol that refines answers through iterative adversarial questioning and explicit justifications.
- It employs distinct roles—Defender, Objectioner, and Host—with turn-taking to identify logical gaps and drive methodical revisions.
- Empirical results, such as a 19.4% accuracy boost on GSM8K for small models, demonstrate its impact on reasoning quality and coherence.
A question–objection dialogue is a structured protocol for iterative reasoning, where an initial answer to a question is subjected to external criticism solely in the form of questions, obligating explicit, revisable justification in response. Unlike standard single-turn or Chain-of-Thought (CoT) prompting—which proceed as unidirectional monologues—question–objection dialogue introduces an explicit, asymmetric adversarial interplay between solution and critique, with strict turn-taking roles: a Defender proposes or revises the answer, an Objectioner raises only interrogative challenges without offering alternative solutions, and a Host adjudicates the final synthesis. This methodology is formally instantiated in the FOR-Prompting (From Objection to Revision) protocol, which demonstrates significant empirical gains on both reasoning benchmarks and qualitative measures of solution quality (Zhang et al., 2 Oct 2025).
1. Formal Roles and Turn-Taking Structure
The FOR-Prompting protocol defines three agent roles, each realized by a dedicated prompt template and (typically) a separate model call per turn:
- Defender (): Proposes an initial answer to the question and iteratively revises it in light of received objections. At each round , the Defender’s output is , with serving as the draft solution.
- Objectioner (Debater, ): Receives the Defender’s latest answer and issues a set of adversarial, interrogative criticisms , targeting logical gaps, unstated assumptions, or overlooked edge cases. The Objectioner is explicitly prohibited from suggesting fixes.
- Host (): Aggregates the history of the exchange after 0 revision rounds, producing the final answer 1 as the protocol’s output.
The protocol executes as follows:
- The Defender produces the initial answer 2.
- For 3 to 4:
- The Objectioner observes 5 and emits 6 (a question set).
- The Defender revises, generating 7 conditioned on 8.
- The Host synthesizes a final answer 9, generally echoing 0 or making minimal presentational refinements.
This role separation induces a controlled, adversarial revision process, bounding the revision trajectory and analysis to a single accountable reasoning chain.
2. Pseudocode and Template Formalization
At the highest level, FOR-Prompting proceeds according to the following pseudocode (omitting implementation-specific details):
3
The Objectioner receives a canonical prompt: “Produce a concise list of clarifying or adversarial questions that expose gaps, hidden assumptions, or counterexamples. Do not provide any answers or solutions.” Formally, each objection round is
1
where each 2 is a natural language question challenging the Defender’s position. The Defender then receives all objections so far (“Here is the original question 3 and the following objections... Please revise your reasoning...”) and outputs
4
This formal separation ensures that only questions guide the revision process, with no direct provision of fixes by the adversarial agent.
3. Empirical Performance and Quantitative Evaluation
FOR-Prompting’s effectiveness has been demonstrated on the GSM8K mathematical reasoning benchmark using high-capacity and small neural models (Zhang et al., 2 Oct 2025). The protocol yields the following results:
| Protocol/Model | Accuracy | Reasoning | Coherence | Holistic Judge (gpt4.1_eval) |
|---|---|---|---|---|
| Single-prompt baseline | 0.68 | 0.04 | 0.10 | 0.51 |
| CoT (GPT-4.1) | 0.90 | 0.18 | 0.31 | 0.96 |
| FOR-Prompting (3 rounds) | 0.90 | 0.31 | 0.41 | 0.97 |
| Single-prompt (Llama3.2:1B) | 5.6% | — | — | — |
| FOR (1 round, Llama3.2:1B) | 24.3% | — | — | — |
| FOR (3 rounds, Llama3.2:1B) | 25.0% | — | — | — |
Compared to single-prompt baselines, FOR-Prompting achieves an approximate 22 percentage-point increase in accuracy, matches CoT in correctness, and exceeds CoT by more than 10 points on judged reasoning and coherence. Notably, on Llama3.2:1B, FOR-Prompting raises GSM8K accuracy from 5.6% to 25.0% (three rounds), an absolute gain of 19.4%. This demonstrates the protocol’s value even for small-scale, on-device models.
4. Illustrative Example
A representative FOR-Prompting exchange on the task “How many r’s in ‘strarrtrabbbery’?” proceeds as follows:
- Defender (5): "There are 4 ‘r’s in
strarrtrabbbery." - Objectioner (6):
1. "Are you sure there are only 4 ‘r’s?" 2. "Could you show your counting process?" 3. "Did you verify each position in the string matches your tally?"
- Defender Revision (7): "Let’s index the letters 1…15 and highlight each ‘r’: ... That is 5 ‘r’s (positions 3,5,6,8,14)."
- Host (8): "Answer: There are 5 ‘r’s."
Objections directly surface an oversight (the undercount), and the enforced revision yields the correct answer. Additional rounds could further refine methodological details, such as explicit indexing, formatting, or edge-case discussion.
5. Theoretical Distinctions from Related Reasoning Protocols
Relative to prior structured prompting approaches, FOR-Prompting implements several key departures:
- Chain-of-Thought (CoT): Structures reasoning as a serial, internal monologue; no explicit adversarial or external questioning.
- Tree-of-Thought (ToT): Explores multiple hypothetical reasoning trajectories, branching and searching diverse solution candidates, but lacks an external interrogative role.
- FOR-Prompting: Explicitly divides generation and critique, with the Objectioner limited to raising questions, never proposing new solutions. This asymmetry enforces a single, legible, accountable chain of reasoning (the Defender’s), while “external” pressure from questions compels self-revision and more explicit justification.
Formally, whereas CoT generates 9, FOR-Prompting progresses as a revision cascade:
0
with each 1 strictly interrogative.
6. Extensions, Limitations, and Open Questions
FOR-Prompting’s role-structured, prompt-level architecture is task-agnostic and model-agnostic. Demonstrated domains include mathematics (GSM8K) and open-ended planning; natural extensions include legal reasoning, code review, requirement elicitation, and taxonomy refinement, wherever systematic questioning surfaces hidden assumptions.
Potential extensions include:
- Adaptive round control: Stopping criterion based on objectioner exhaustiveness (no new questions) or answer stabilization, instead of fixed 2.
- Learned objection policies: Training the Objectioner to select maximally informative questions (active learning approaches).
- Domain-specific validators: Integrating the Host or Objectioner with formal tools (unit tests, SMT solvers) to ground objections in external constraints.
Limitations include increased token cost and latency per round, with possible mitigations for small model scenarios; no convergence guarantees if objection prompts are poorly designed; reliance on robust models in all roles, as weak Objectioners may fail to surface critical gaps.
7. Significance and Broader Implications
FOR-Prompting exemplifies an externalized, question-driven revision protocol that formalizes adversarial questioning as a root mechanism for self-revision. This enables detailed tracing of reasoning, transparent surfacing of trade-offs and assumptions, and empirical gains across model scales. FOR-Prompting matches or exceeds the best-known prompt-only monologue protocols in correctness, surpasses them in human-judged and model-judged reasoning quality, and generalizes robustly even in resource-constrained (small model, edge device) contexts. These findings highlight the broad potential of question–objection dialogue, both as a research instrument for studying objection-guided reasoning and as a deployable prompting protocol for complex, high-stakes problem-solving (Zhang et al., 2 Oct 2025).