Effectiveness of Layout-as-Thought beyond document parsing

Determine the effectiveness of the Layout-as-Thought mechanism in Qianfan-OCR—an optional thinking phase triggered by <think> tokens that produces structured layout representations—on key information extraction, document question answering, and chart understanding tasks by rigorously evaluating whether enabling the thinking phase improves performance relative to the default no-think mode.

Background

Qianfan-OCR introduces Layout-as-Thought, an optional intermediate reasoning phase that generates bounding boxes, element labels, and reading order before producing final outputs. This mechanism is intended both to recover explicit layout analysis within an end-to-end architecture and to improve accuracy on documents with complex structures.

The paper validates Layout-as-Thought on OmniDocBench v1.5 for document parsing, observing targeted benefits on structurally complex pages and potential overhead on simpler layouts. However, the authors explicitly note that its effectiveness on other task categories, including key information extraction, document QA, and chart understanding, has not been investigated.

References

Its effectiveness on other tasks -- such as key information extraction, document QA, and chart understanding -- remains unexplored.

Qianfan-OCR: A Unified End-to-End Model for Document Intelligence  (2603.13398 - Dong et al., 11 Mar 2026) in Section 7: Limitations and Future Work — Layout-as-Thought