Cause of DeepSeek‑V3’s elevated token usage in natural‑language proof planning

Determine the underlying cause of DeepSeek‑V3’s markedly higher token usage in the natural‑language proof planning setting compared to its token usage in the symbolic graph connectivity setting, under the paper’s evaluation conditions, to explain why the model emits substantially more tokens for natural‑language tasks than for symbolic graphs.

Background

The paper compares LLMs and large reasoning models across symbolic graph connectivity and natural‑language proof planning tasks, tracking both accuracy and completion token usage. In the proof planning setting, the authors observe that OpenAI’s GPT‑4o often emits very few tokens, while DeepSeek‑V3 tends to produce verbose justifications.

Despite this clear behavioral difference, the paper notes that the specific reason for DeepSeek‑V3’s substantially higher token usage in the natural‑language setting (relative to its behavior for symbolic graphs) is not understood. Clarifying this would help interpret the model’s reasoning strategies and output characteristics across formats.

References

The reason for V3's markedly higher token usage in the natural-language setting, relative to its behavior for symbolic graphs (Figure~\ref{fig:token_usage_graph}), remains unclear.

— Reasoning Models Reason Well, Until They Don't (2510.22371 - Rameshkumar et al., 25 Oct 2025) in Section 4.2 (Proof Planning in Deductive Reasoning)

Cause of DeepSeek‑V3’s elevated token usage in natural‑language proof planning

Sponsor

Background

References

Related Problems