Precise marginal effects of reasoning token quantity on misuse assistance

Determine the precise marginal effect of reasoning token quantity on the probability that reasoning-enabled large language models (Claude Sonnet 3.7, Claude Sonnet 4, Claude Sonnet 4.5, Claude Opus 4.1, OpenAI o4-mini, and OpenAI o4-mini-deep-research) achieve high actionability and information access scores in multi-turn fraud and cybercrime long-form tasks, isolating differences across model families that currently prevent comparable estimation.

Background

The paper analyzes the effect of chain-of-thought-style reasoning tokens on assistance levels for harmful tasks using a Bayesian logistic regression restricted to six models with configurable reasoning. While the analysis finds that more reasoning tokens are associated with increased assistance, the authors note that differences in how reasoning is specified across model families obstruct a precise, comparable marginal effects calculation across models.

This open problem aims to quantify the exact marginal effects of reasoning token quantity on assistance probabilities (actionability and information access), controlling for model-family-specific differences in reasoning implementations, to enable rigorous cross-model comparison.

References

Due to differences in how reasoning is specified across model families, we cannot provide precise marginal effects on the probability of high scores, though the positive coefficient indicates that reasoning consistently increases assistance levels across models.

— A Multi-Turn Framework for Evaluating AI Misuse in Fraud and Cybercrime Scenarios (2602.21831 - Mai et al., 25 Feb 2026) in Results, subsection 'Impact of Reasoning and Search'

Precise marginal effects of reasoning token quantity on misuse assistance

Background

References

Related Problems