Papers
Topics
Authors
Recent
Search
2000 character limit reached

AI systems out-persuade expert humans

Published 15 Jun 2026 in cs.CY and cs.AI | (2606.16475v1)

Abstract: Many societal decisions are settled by contests of persuasion. Conversational AI is a powerful new entrant in these contests, but whether it can out-persuade skilled and highly incentivized humans has remained unclear. Here, in a series of four preregistered experiments (n = 18,978 conversations from 6,923 people), we pitted AI systems against a range of human persuaders, including laypeople, winners of a separately preregistered four-round online persuasion tournament, professional canvassers, and world championship debaters. We found that AI systems were reliably more persuasive than expert humans, even when expert humans chose their issues, researched in advance, underwent hours of live, structured practice, and were incentivized with £1,000 cash bonuses. In a follow-up study, AI's advantage persisted after experts received a coaching tool that let them practice against the AI that beat them, review their performance history, and see what AI would have said at key moments. We found converging evidence that AI's advantage stemmed from rapidly deploying larger quantities of information: after coaching, expert humans could tie an AI constrained to respond at human speeds and with human-length messages. In a final study, we show that AI's advantage extends to consequential real-world behavior: AI was nearly 3x more effective than professional canvassers from a UK fundraising firm at raising real-money donations to Save the Children. Together, these results establish that frontier AI systems out-persuade expert humans in conversation, with significant implications for political communication.

Summary

  • The paper demonstrates that advanced AI systems outperform expert human persuaders in both attitudinal shifts and donation behavior, with improvements up to +8.2 percentage points.
  • The paper attributes AI’s persuasive advantage to its high throughput and fact density, which nearly accounts for the observed impact (R² = 0.89).
  • The paper discusses societal implications, highlighting benefits for democratized advocacy alongside risks such as disinformation and automated coercion.

Summary of "AI systems out-persuade expert humans" (2606.16475)

Experimental Design and Scope

The paper conducts four preregistered online experiments comprising 18,978 conversations with 6,923 participants to rigorously compare the persuasive efficacy of advanced conversational AI and expert human persuaders. Human comparators include random laypersons, tournament-selected laypersons, elite debaters (world and continental champions), professional canvassers, and coached elite debaters. Multiple state-of-the-art LLMs (Claude Opus 4.1/4.6, ChatGPT-4o, GPT-5.4, Grok 4.20, Gemini 2.5) are deployed using empirically validated prompting strategies targeting attitudinal and behavioral shifts. Incentives and preparation are maximized for human persuaders, including extensive issue research, live practice, and substantial cash bonuses. The evaluation encompasses both attitudinal persuasion (policy stance shifts) and consequential real-world behaviors (charitable giving).

Quantitative Outcomes and Key Claims

AI systems consistently outperform every human class in both attitude change and behavioral persuasion. Specifically:

  • Against random laypersons, AI yields an additional +8.2 percentage point (pp) shift.
  • Against tournament-selected laypersons, AI achieves +5.6 pp.
  • Against elite debaters, AI surpasses by +4.6 pp.
  • Against professional canvassers, AI is +5.9 pp more persuasive in attitude shift and nearly 3x more effective in eliciting real-money donations (+10.8 pp; AI: +17.2 pp vs. canvassers: +6.4 pp).
  • Coaching elite debaters using a tailored AI tool increases their message length and fact density but only marginally narrows the gap (+1.0 pp), failing to achieve parity.

Robustness checks confirm AI superiority at finer granularity: no individual human persuader outperformed pooled AI, and AI's advantage persisted in every policy issue and across demographic/political subgroups.

Analysis of Persuasive Mechanisms

The authors attribute AI's advantage primarily to throughput—AI delivers far more fact-checkable information per conversational turn due to higher word count and rapid response. When AI is throttled to human message length and speed, its advantage vanishes (from +4.1 pp to statistically insignificant 0.0 pp). Regression analyses show that fact density nearly fully accounts for persuasive effect (overall R2=0.89R^2 = 0.89). Perceived informational strength—the degree to which the partner's arguments are strong and the conversation is educational—drops precipitously when AI throughput is constrained, whereas affective dimensions (empathy, enjoyment) show much smaller declines.

Real-World Generalization and Behavioral Impact

The paper extends analysis beyond laboratory persuasion to real-world behavioral outcomes, with AI eliciting not only substantial attitude shifts but also higher rates and quantities of charitable giving. AI is rated superior by participants on all seven hypothesized causal mechanisms for donation, including implementation intentions, commitment escalation, and impact-efficacy information, despite being prompted for only one strategy. This suggests effective prompt generalization and flexible resource deployment by frontier models.

Professional canvassers—selected from a firm with extensive experience fundraising for Save the Children—are outperformed in both donation frequency and amount, even when given optimal issue selection and preparation windows.

Theoretical and Practical Implications

The findings recast the landscape of political and commercial persuasion, indicating that contests may increasingly be determined by access to frontier LLMs rather than traditional human expertise. This raises the prospect of consolidation of influence—either to actors who possess or deploy the most capable models (e.g., corporations, campaigns, states) or to model vendors themselves, with the power to bias which arguments are advanced.

Potential societal benefits include democratization of advocacy, enabling under-resourced actors to access skilled rhetorical tools. However, outcomes are critically contingent on the truthfulness and selectivity of AI information compared to the persuasion it displaces. Risks include amplification of disinformation, scaling of coercive influence operations, and susceptibility of human auditors/judges to AI manipulation. Self-limiting factors may include platform access controls, public awareness (though explicit labeling as AI-generated does not reduce persuasive impact), and real-world exposure constraints.

Future Research Directions

Three domains are identified for further inquiry:

  1. Modality sensitivity—extension to audio/video and face-to-face settings, where rapport and embodied empathy may be more salient.
  2. High-stakes persuasion—evaluation of AI efficacy in materially significant outcomes (e.g., voting, major donations).
  3. Displacement effects—quantifying how AI-driven persuasive exposure alters net societal impact compared to current baseline exposure.

Conclusion

The paper establishes that advanced LLMs, prompted for high information-density, reliably out-persuade even maximally incentivized expert humans across both attitudinal shifts and real-world actions. The throughput advantage—rapid delivery of voluminous factual claims—is the dominant explanatory variable. Human improvement via direct coaching on AI strategies is insufficient to close the gap. As access to persuasive AI proliferates, fundamental questions arise regarding deployment control, fairness, and the societal consequences of mass automated persuasion.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

Explain it Like I'm 14

What is this paper about?

This paper asks a simple but important question: in a back‑and‑forth conversation, can today’s AI chatbots convince people better than top human persuaders? The authors ran big, careful tests and found that, yes, modern AI systems are usually more persuasive than skilled humans—even world‑champion debaters and professional canvassers. They also show why: AI can quickly share more useful information, faster.

What did the researchers want to find out?

In plain terms, they asked four questions:

  • Can AI beat expert humans at changing people’s opinions on political and social issues?
  • Are there any situations where humans can catch up or match AI?
  • Why does AI have an advantage—is it about speed, message length, or something else?
  • Does AI’s edge show up in real actions (like donating money), not just opinions?

How did they test it?

The team ran four preregistered experiments. “Preregistered” means they publicly wrote down their plan and measurements before starting, to keep things fair and avoid changing the rules after seeing results.

They used a custom chat platform that randomly paired people (the “persuadees”) with either:

  • An AI model, or
  • A human persuader (different types, explained below)

Before and after each chat, people rated their opinions on a 0–100 scale. The main idea was simple: see how much each chat shifted someone’s opinion compared to a neutral, non‑political chat (the “control”).

To keep the language clear:

  • “Throughput” = how much text and how fast a partner can write. AI writes long, detailed replies almost instantly; humans write shorter messages and take longer.
  • “Fact density” = how many checkable facts someone includes. The researchers used an automated checker to count fact‑like claims in each conversation.

Study 1: AI vs different kinds of humans

They compared AI to:

  • Random adults online (laypeople)
  • Top performers from a separate persuasion tournament (“selected laypeople”)
  • Elite competitive debaters (including world and continental champions)

All humans had good incentives (bonuses and prize pools). Debaters even chose the issues they felt best at and were paid to research in advance.

Study 2: Can humans catch up—or can we slow AI down?

Two tests:

  • Coaching debaters: Debaters got a special practice tool to study the AI that beat them, review their past chats, and practice responses.
  • Constraining AI: They forced AI to reply at human‑like speed and length (about 50 words, ~90 seconds delay), to see if its advantage depends on writing more, faster.

They also checked results across individual persuaders, different topics, and different kinds of people being persuaded.

Study 3: Real‑world experts

They tested AI against professional canvassers (people whose job is to persuade in everyday life) on shifting opinions.

Study 4: Real‑world action (donations)

They asked whether AI can inspire people to donate real money. After chatting with AI or a professional canvasser, people could donate part of a £1 bonus to Save the Children.

What did they find?

Here are the main results, explained simply:

  • AI out‑persuaded every human group on opinions:
    • Better than random laypeople
    • Better than the top tournament performers
    • Better than elite debaters (even with prep, practice, and cash incentives)
    • Better than professional canvassers
  • Coaching helped humans improve a bit, but not enough to close the gap.
  • When AI was forced to type like a human (shorter and slower), its advantage disappeared. In that “fair-speed” setup, coached debaters and constrained AI were about equally persuasive.
  • No individual human persuader beat the (unconstrained) AI overall. Even the best person didn’t surpass AI’s average effect.
  • Why is AI better? Because it packs in more, useful, checkable information per chat. Two clues:
    • When AI was constrained to human speed/length, people said its arguments felt weaker and they learned less.
    • The more facts a chat contained, the more it changed opinions—for humans and AI alike. Fact density strongly predicted persuasion.
  • It’s not just words—it changed behavior too. In the donation test, AI raised nearly three times as much money as the professional canvassers, both by getting more people to donate and by increasing the average amount given. Even though the AI was only instructed to use one donation tactic (showing concrete impact), it still outperformed canvassers across many persuasive techniques (like helping people plan how to act, or strengthening their commitment).

Why does this matter?

This work suggests we’re entering a world where access to strong AI systems can decide contests of influence—like political campaigns, fundraising drives, or public debates. That could:

  • Concentrate power with groups who can afford the best AI systems (big campaigns, companies, or governments).
  • Shift influence toward the companies that build these AI tools (depending on what they allow their systems to argue for).
  • Help under‑resourced groups too, if powerful persuasion becomes cheap and widely available (e.g., small charities, public defenders).
  • Inform citizens better—if the information AI provides is accurate. Accuracy varies by model, so safeguards and fact‑checking matter.
  • Create risks: better tools for scams, disinformation, or manipulative campaigns.

There are also limits. Powerful persuasion doesn’t guarantee reach—platform rules and identity checks can block access. People might also learn to distrust AI messages (though early evidence says labeling content as AI doesn’t reduce its persuasiveness much). And outside of study settings, getting long, focused conversations is hard.

Key takeaways

  • Today’s AI chat systems are, on average, more persuasive than skilled humans in text conversations.
  • The edge comes mainly from throughput: AI writes more facts, faster.
  • If you make AI write like a human (shorter and slower), humans can keep up.
  • AI’s advantage shows up in real actions, like raising more money for charity.
  • Society should prepare for both benefits (cheap, high‑quality advocacy) and risks (manipulation, unequal access), with attention to accuracy and safeguards.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a single, concrete list of what remains missing, uncertain, or unexplored in the paper, framed to guide actionable future research.

  • External validity beyond a UK, English-speaking online sample: replicate in non-English languages, non-Western contexts, offline populations, and among minors (the study enrolled 18+ only).
  • Modality constraints: the study is text-only; assess whether AI’s advantage persists, shrinks, or grows in audio, video, phone, or face-to-face settings with embodied cues.
  • Durability of effects: no longitudinal follow-up; measure persistence of attitude shifts and donation behavior over days/weeks/months, decay rates, and downstream actions (e.g., petition signing, voting, recurring donations).
  • Higher-stakes outcomes: extend beyond a £1 “house money” donation to own-money decisions, larger sums, recurring gifts, vote choice, and public-health compliance; include ethically designed simulations of harmful behaviors (e.g., scams) to assess risk.
  • Two-sided persuasion contests: test head-to-head debates where persuadees see simultaneous pro/con arguments from AI vs. humans and AI vs. AI; study arms-race dynamics when both sides deploy frontier AI.
  • Personalization and microtargeting: quantify how individualized messaging (based on demographics, attitudes, or digital traces) alters AI’s advantage and subgroup susceptibility relative to generic “information-first” prompts.
  • Causal mechanism isolation: disentangle length vs. speed vs. fact density by manipulating each independently (e.g., hold length constant while varying factual content, or vary latency at fixed content) to establish which throughput dimension drives impact.
  • Information quality and truthfulness: causally test how verified accuracy, source credibility, selective omission, and misinformation rates affect persuasion; examine whether AI retains its edge under strict “verified facts only” constraints or with live fact-checking.
  • Real-world delivery constraints: evaluate performance under platform conditions (e.g., SMS/Twitter character limits, WhatsApp latency, email deliverability, content moderation, rate limits, authentication frictions).
  • Disclosure effects: persuadees were not told partner identity; directly test how explicit AI labeling, watermarking, or “likely-AI” cues change persuasiveness and trust.
  • Defense and resilience interventions: benchmark inoculation messages, media-literacy prompts, friction (e.g., forced reflection), adversarial fact-checking bots, and platform UI warnings at reducing AI’s persuasive advantage.
  • Human–AI teaming: beyond offline coaching, test real-time AI copilot tools (suggested replies, retrieval-augmented facts, rebuttal libraries) to see if human persuaders can match or exceed AI with assistance.
  • Equity and vulnerability: extend subgroup analyses to low-literacy/low-numeracy populations, neurodivergent individuals, younger teens, and intersectional groups; identify who is disproportionately susceptible and why.
  • Issue and cause scope: go beyond 10 UK policy stances and one charity to highly moralized/identity-laden topics, contested or misinformation-prone issues, and diverse causes (including unpopular or stigmatized ones).
  • Donation ecology and mechanisms: move beyond lab-bonus framing to real payment instruments, external donation funnels, and field settings; measure cost per dollar raised, donor retention, and upgrade rates.
  • Cost-effectiveness at scale: compare marginal costs (API usage, orchestration) and throughput to human labor for “cost per percentage-point attitude shift” and “cost per dollar raised” under realistic workloads.
  • Repeated exposure and saturation: test cumulative effects of multiple AI contacts, timing/spacing schedules, habituation, potential backfire, and spillovers in social networks.
  • Safety guardrails trade-offs: quantify how policies that reduce aggressiveness, limit message length, enforce source citation, or restrict certain tactics affect persuasiveness and harm.
  • Model dependence and reproducibility: assess sensitivity to model/version changes, prompt variations, and open-source vs. proprietary systems; preregistered replications across providers and time.
  • Measurement validity of “fact density”: validate automated claim extraction and verification against human coding; assess whether “number of fact-checkable claims” captures quality (relevance, novelty, source credibility) rather than sheer count.
  • Attrition and engagement bias: condition-level attrition differed in Studies 1–2; analyze dropout reasons and whether selective nonresponse biases partner ratings or measured mechanisms.
  • Cognitive load and comprehension: test whether long, dense AI messages overwhelm readers; directly measure reading time, comprehension, and cognitive load as mediators/moderators of persuasion.
  • Group and public contexts: extend to group chats, public comment threads, and broadcast messaging where peer effects, social norms, and reputational concerns may alter effectiveness.
  • Counter-persuasion and refutation: study settings where persuadees actively push back or receive live refutations; evaluate AI robustness to real-time challenges and corrections from opponents.
  • Legal and institutional constraints: experimentally simulate authentication, identity verification, and electoral/compliance rules to estimate how governance frictions attenuate real-world impact.

Practical Applications

Immediate Applications

The following use cases can be deployed with today’s models and infrastructure, drawing directly on the paper’s findings (AI outperforms expert humans in live text persuasion; fact density and throughput drive impact; “impact-efficacy” messaging boosts donations) and methods (information-first prompting, claim extraction/fact-checking, throughput throttling as a safety control).

  • AI fundraising copilots for charities and NGOs (deploy on web chat, SMS, email)
    • Sectors: nonprofit, social impact, marketing.
    • What: Conversational agents that use “impact-efficacy” messages to convert visitors and supporters; integrate with CRM for follow-ups and recurring gifts.
    • Tools/workflows: LLM-based chat widget; prompt templates emphasizing concrete outcomes per donation; A/B testing; donation-intent micro-forms; handoff to human for high-value prospects.
    • Assumptions/dependencies: Access to high-performing LLMs; consent and disclosures; data privacy compliance; real-world conversion may be lower than in 10–14 minute lab chats.
  • Information-first AI persuasion agents for advocacy and campaigns (with guardrails)
    • Sectors: political communication, civic tech, public interest.
    • What: Issue explainer chatbots that present dense, accurate facts to shift attitudes or mobilize actions (e.g., petition signing).
    • Tools/workflows: Retrieval-augmented generation (RAG) from vetted sources; fact-density prompts; claim extraction plus automated verification; logging for audit.
    • Assumptions/dependencies: Jurisdiction-specific rules on political messaging; model truthfulness varies—must include fact-checking and source citations; platform terms of service.
  • AI-assisted sales conversion and customer retention
    • Sectors: software/SaaS, e-commerce, customer success.
    • What: In-product chat that deploys high-information arguments for upgrades, plan retention, or cross-sell.
    • Tools/workflows: Product usage-aware prompts; “information-first” playbooks; real-time fact packs (features, ROI, case studies); KPI dashboards tracking conversion vs. claim density.
    • Assumptions/dependencies: Consent and transparency; avoid manipulative practices; ensure claims match product capabilities.
  • Public health information companions
    • Sectors: healthcare, public health.
    • What: Chat agents that provide fact-dense, empathetic guidance on vaccines, screening, medication adherence.
    • Tools/workflows: Evidence-linked content; throughput throttling to mitigate overreach; clinical oversight; “teach-back” checks for comprehension.
    • Assumptions/dependencies: Regulatory and clinical governance; ensure accuracy; higher-stakes outcomes require careful evaluation beyond small-stakes donation evidence.
  • Debate and advocacy practice labs for students and professionals
    • Sectors: education, professional training, law.
    • What: Practice against frontier AI persuaders on chosen topics; receive annotated transcripts and suggested counters.
    • Tools/workflows: Session review with impact markers; claim density and accuracy scores; replay “what the AI would say” at key moments.
    • Assumptions/dependencies: Paper’s coaching tool improved content volume and facts but didn’t eliminate the AI gap; still valuable for practice and feedback.
  • Persuasion safety controls for platforms (“throughput throttles”)
    • Sectors: platform governance, compliance, security.
    • What: Rate-limit persuasive agents by capping response length and speed to human-like levels where risk is high (e.g., political ads, vulnerable users).
    • Tools/workflows: Policy-based dynamic caps; context-aware toggles; audit trails; user controls.
    • Assumptions/dependencies: The paper shows throttling removes AI’s advantage; enforceability depends on platform control over agents and channels.
  • Persuasion risk audits for AI deployments
    • Sectors: enterprise AI, AI governance.
    • What: Pre-deployment tests benchmarking persuasion impact vs. controls; subgroup analysis and per-issue sensitivity; assess fact density vs. accuracy.
    • Tools/workflows: Mixed-effects evaluation harness; standardized tasks; automated claim extraction and verification; red-team protocols.
    • Assumptions/dependencies: Representative samples and tasks; model versions drift over time; requires robust data pipelines.
  • “Information-first” prompt libraries and fact-density meters
    • Sectors: software, content operations.
    • What: Reusable prompt patterns that bias toward evidence-rich content; live indicators showing claims per message and per conversation.
    • Tools/workflows: Prompt kits; content linting for unsupported claims; source attachment requirements.
    • Assumptions/dependencies: Claim extraction and verification quality; tailoring by domain.
  • Customer support with informed persuasion for self-service resolution
    • Sectors: telecom, fintech, utilities.
    • What: Agents that use factual, step-by-step explanations to reduce escalations and improve acceptance of policy resolutions.
    • Tools/workflows: Policy knowledge bases; empathetic framing; measurement of resolution rates vs. claim density.
    • Assumptions/dependencies: Avoid dark patterns; disclose AI use; customer trust considerations.
  • HR and change management copilots
    • Sectors: enterprise operations.
    • What: Agents that explain rationale, evidence, and benefits for new tools/policies to improve adoption.
    • Tools/workflows: Role- and team-tailored fact packs; two-way Q&A; opt-in.
    • Assumptions/dependencies: Cultural acceptance; internal governance of persuasive AI.
  • Fact-checking and conversation analytics pipelines
    • Sectors: journalism, trust & safety, academia.
    • What: Adopt the paper’s claim-extraction + web verification pipeline to score conversations on fact density and veracity.
    • Tools/workflows: Modular claim extractor; verification engines; dashboards; sampling+human review for accuracy.
    • Assumptions/dependencies: Verification coverage and lag; access to high-quality sources; false positives/negatives management.
  • Real-time, human–AI live matching platforms for studies and CX
    • Sectors: UX research, customer experience.
    • What: Use the paper’s matchmaking design to run controlled A/B tests of human vs. AI agents in live conversations.
    • Tools/workflows: Randomization, turn-taking control, outcome collection, session analytics.
    • Assumptions/dependencies: Ethical and privacy compliance; participant recruitment.

Long-Term Applications

These use cases require further research, scaling, or regulatory development before broad deployment. They build on the paper’s core insights about throughput, fact density, and comparative performance across humans and AI.

  • Orchestrated fleets of persuasive AI agents (“persuasion ops” platforms)
    • Sectors: political campaigns, marketing, advocacy.
    • What: Multi-agent systems that coordinate messaging across channels, adapt fact packs per segment, and optimize for both exposure and per-contact persuasiveness.
    • Tools/workflows: Audience segmentation; exposure optimization; content provenance; dynamic safety constraints.
    • Assumptions/dependencies: Regulatory limits on political microtargeting; platform access; strong governance to prevent abuse.
  • Public-interest persuasion services (government or NGO-provided)
    • Sectors: public sector, civic education.
    • What: Citizen-facing assistants that explain policies with evidence, tailored to questions and literacy levels, aiming to inform rather than manipulate.
    • Tools/workflows: Verified knowledge bases; transparent sourcing; throughput caps; user choice and oversight boards.
    • Assumptions/dependencies: Trust, legitimacy, and legal frameworks; proven accuracy; accessibility across modalities.
  • Clinical-grade behavior-change companions
    • Sectors: healthcare.
    • What: Longitudinal agents for smoking cessation, diabetes management, or medication adherence using high-information and empathetic dialogue.
    • Tools/workflows: Integration with EHRs; outcome trials; safety boundaries; escalation to clinicians.
    • Assumptions/dependencies: Evidence across high-stakes outcomes (beyond small donations); medical device regulation; liability frameworks.
  • Courtroom and administrative advocacy augmentation
    • Sectors: legal services, access to justice.
    • What: Tools generating high fact-density briefs, preparing arguments, and simulating counter-positions for litigants and public defenders.
    • Tools/workflows: Jurisdiction-specific legal knowledge; citation checking; judge- and topic-specific simulations.
    • Assumptions/dependencies: Ethical and professional rules; verified legal accuracy; risk of over-reliance.
  • Multimodal persuasive agents (voice/video and embodied robotics)
    • Sectors: education, customer service, assistive tech, robotics.
    • What: Extend findings from text to voice/video/live settings where rapport and nonverbal cues matter.
    • Tools/workflows: Prosody control, affect recognition; safety constraints on intensity and pace; user studies.
    • Assumptions/dependencies: Mixed evidence on video bonuses; robust evaluation needed; hardware constraints in robotics.
  • Model cards and audits with “persuasion impact” metrics
    • Sectors: AI safety, standards.
    • What: Standardized testing of models for per-conversation persuasion effects, subgroup disparities, and accuracy at given throughput levels.
    • Tools/workflows: Benchmarks, datasets, and protocols derived from the paper’s design; reporting in model cards.
    • Assumptions/dependencies: Community adoption and standards bodies; reproducibility across versions.
  • Platform-level “persuasion budget” and provenance controls
    • Sectors: platforms, regulators.
    • What: Allocate a capped “persuasion budget” per agent or campaign (messages, claim density, rate) with auditable provenance and user-facing labels.
    • Tools/workflows: Cryptographic provenance; enforcement APIs; risk-tiered caps; user opt-outs.
    • Assumptions/dependencies: Regulatory mandates and platform coordination; robust agent identification; evidence on label effectiveness.
  • High-fidelity simulation testbeds for policy and message design
    • Sectors: academia, policy labs, communications.
    • What: Use controlled, high-engagement chat testbeds to predict impact of policy messaging before field deployment; iterate on fact packs and strategies.
    • Tools/workflows: Representative recruit panels; outcome models linking chat effects to field outcomes; exposure modeling.
    • Assumptions/dependencies: External validity from lab-to-field; model of exposure volume vs. per-contact effect.
  • Accuracy-first persuasive AI with veracity guarantees
    • Sectors: software, education, public sector.
    • What: Systems that maximize fact density while provably minimizing falsehoods via constrained generation and verified retrieval.
    • Tools/workflows: Structured knowledge graphs; citation gating; adversarial verification; conservative decoding.
    • Assumptions/dependencies: Trade-offs between throughput, style, and veracity; maintaining coverage across topics.
  • Human performance augmentation rather than training alone
    • Sectors: sales, canvassing, advocacy.
    • What: Tools that boost human throughput (templated micro-arguments, real-time fact injection, dynamic briefings) to narrow the AI gap identified in the paper.
    • Tools/workflows: Second-screen assistants; live cue cards with citations; automated follow-up drafting.
    • Assumptions/dependencies: Worker acceptance; quality control; ethical boundaries; effect sizes need validation.
  • Exposure-aware persuasion planning
    • Sectors: advertising, political comms.
    • What: Integrate the paper’s per-conversation effects with exposure models (variability in reach dominates real-world impact) to optimize channel mix and frequency.
    • Tools/workflows: Media planning algorithms combining persuasion uplift and inventory reach; simulation frameworks.
    • Assumptions/dependencies: Accurate exposure data; compliance with ad policies; diminishing returns modeling.
  • Cross-cultural and high-stakes outcome validation
    • Sectors: global NGOs, public policy, elections.
    • What: Extend evaluation to non-UK contexts and high-stakes decisions (vote choice, large donations, compliance), including multimodal channels.
    • Tools/workflows: Country-specific panels; ethics review; pre-registered trials.
    • Assumptions/dependencies: Cultural transferability; legal permissibility; careful harm mitigation.
  • Defensive tools against manipulative persuasion
    • Sectors: cybersecurity, consumer protection.
    • What: Detectors and user agents that flag high fact-density persuasive attempts, inject friction, or summarize arguments with counterpoints.
    • Tools/workflows: Client-side plugins; cognitive inoculation content; adaptive “slow-down” UX.
    • Assumptions/dependencies: Usability and adoption; false positive tolerance; alignment with platform policies.

Notes on feasibility and constraints across applications:

  • Access and cost: Frontier LLMs with top persuasive performance may be paywalled or rate-limited.
  • Truthfulness variance: Some models were more accurate than humans; others were less—deploy with verification and sourcing.
  • Engagement limits: The studies involved multi-minute, turn-based chats; real-world users may not sustain such engagement without incentives.
  • Regulatory and ethical boundaries: Political persuasion, healthcare advice, and fundraising carry domain-specific constraints (data protection, informed consent, consumer protection, election law).
  • Throughput as both lever and risk: Advantage stems from high throughput and fact density; the same knob can be used for safety (throttling) or for performance (augmentation).

Glossary

  • Active control: A non-treatment comparison condition designed to control for engagement while avoiding the persuasive intervention. "relative to an active control (a non-persuasive chat with ChatGPT-4o on a neutral, non-political topic; SI Appendix, Section 3.7.5)."
  • Anticipated regret: A behavioral mechanism that motivates action by prompting people to consider the regret they would feel if they did not act. "emotional activation, imple- mentation intentions, identity labeling, commitment escalation, anticipated regret, issue-focused information, and impact-efficacy information [13]."
  • Asymptotic Wald (z-based) inference: A large-sample statistical approach that uses normal approximations to compute tests and confidence intervals. "using asymptotic Wald (z-based) inference"
  • Benjamini-Hochberg correction: A multiple-testing procedure that controls the false discovery rate across a family of hypotheses. "after Benjamini-Hochberg correction across the family"
  • Between-subjects (design): An experimental design where different participants are assigned to different conditions. "four preregistered, single-blind, between-subjects online experiments (Studies 1-4)"
  • Commitment escalation: A persuasion strategy that increases compliance by progressively deepening a person’s commitment. "commitment escalation (+12.6 pp)"
  • Constrained AI: An experimental condition that limits an AI system’s speed and message length to match human throughput. "we added a Constrained AI condition to Study 2 alongside Coached Elite Debaters and unconstrained AI"
  • Crossed random intercepts: A mixed-model specification including random effects for multiple non-nested grouping factors. "and persuader and persuadee enter as crossed random intercepts."
  • Dunnett correction: A multiple-comparison adjustment used when comparing several treatments to a single control. "we do not apply emmeans's Dunnett correction for trt. vs. ctrl contrasts)"
  • Empirical Bayes: A method that estimates prior parameters from the data to shrink noisy estimates toward a group mean. "the highest individual estimate (an empirical-Bayes random-effect prediction) was 9.9 pp"
  • Emotional activation: A persuasion mechanism that seeks to elicit emotions to influence attitudes or behavior. "emotional activation, imple- mentation intentions, identity labeling, commitment escalation, anticipated regret, issue-focused information, and impact-efficacy information [13]."
  • Extensive margin: The component of an effect measuring whether any action occurs (as opposed to how much). "extensive margin: +6.0 pp, 95% CI [+2.6, +9.5]"
  • Fixed effect: A parameter capturing systematic differences associated with specific levels of a factor, treated as non-random. "issue is a 10-level fixed effect"
  • Identity labeling: A technique framing behavior as part of someone’s identity to encourage consistency with that identity. "emotional activation, imple- mentation intentions, identity labeling, commitment escalation, anticipated regret, issue-focused information, and impact-efficacy information [13]."
  • Impact-efficacy information: Information that links an individual’s action (e.g., a donation) to concrete, measurable outcomes. "an 'impact-efficacy' prompt emphasizing the tangible outcomes of individual donations"
  • Implementation intentions: A strategy prompting specific plans (when/where/how) to increase the likelihood of follow-through. "implementation intentions (+15.0 pp)"
  • Intensive margin: The component of an effect measuring how much is done among those who act. "intensive margin: +12.9 pp, 95% CI [+10.0, +15.8]"
  • Issue-focused information: Factual content centered on the topic or policy at hand to inform and persuade. "emotional activation, imple- mentation intentions, identity labeling, commitment escalation, anticipated regret, issue-focused information, and impact-efficacy information [13]."
  • Lagged, adaptive design: A procedure that updates targets or parameters over time based on accumulated data, with adjustments applied to subsequent observations. "set via prompt using a lagged, adaptive matching design; Methods"
  • Lee bounds: Partial-identification bounds that account for differential attrition to bound treatment effects. "Lee bounds on the headline AI-minus-human contrasts nonetheless remain strictly positive"
  • Linear mixed-effects model: A regression model that includes both fixed effects and random effects to handle clustered or hierarchical data. "from a linear mixed-effects model pooling Studies 1-3, with random intercepts for persuader and persuadee and pre-treatment attitude and issue as fixed-effect covariates"
  • Moderator (statistical): A variable that changes the strength or direction of a treatment effect. "Of the 14 persuadee variables tested as moderators, two significantly moderated AI's advantage"
  • Open Science Framework (OSF): An online platform for preregistration and sharing of research materials and data. "were each preregistered on the Open Science Framework before data collection"
  • Ordinary least squares (OLS): A method for estimating linear regression parameters by minimizing the sum of squared residuals. "Dashed line: overall OLS fit; R2 for overall / within-humans / within-AI fits annotated inline."
  • Preregistration: Publicly registering study hypotheses, designs, and analysis plans before data collection to reduce bias. "The four studies, and the separate selection tournament that fed Study 1, were each preregistered on the Open Science Framework before data collection"
  • REML (Restricted Maximum Likelihood): An estimation method for variance components in mixed models that reduces bias. "For Elite Debaters the REML fit was singular (î = 0)"
  • Single-blind: A study design in which participants are unaware of their assignment (e.g., AI vs. human), though researchers may know. "four preregistered, single-blind, between-subjects online experiments"
  • Superforecasters: Individuals who consistently outperform others in prediction tasks, often in tournaments. "prior work has shown that 'superforecasters' consistently outperform peers in prediction tournaments [26, 25]."
  • Throughput (in conversational AI): The rate at which an AI system produces content (e.g., words per unit time). "AI's vastly superior throughput, the rate at which it produces written content."
  • Truncated-normal distribution: A normal distribution restricted to values within specified bounds. "latencies were drawn from a truncated-normal sampler"
  • Wald confidence interval (Wald CI): A confidence interval derived from the Wald (normal) approximation to the sampling distribution. "pooled estimated persuasive impact on political attitudes (percentage points; 95% Wald CIs)"

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 8 tweets with 293 likes about this paper.