Semantic-equivalence-aware evaluation for document parsing

Develop evaluation methodologies for document parsing that are semantic-equivalence-aware, explicitly accounting for both format-level ambiguity—such as HTML versus Markdown representations for tables and alternative LaTeX command choices that encode the same mathematical content—and structural-level ambiguity—such as representing a bilingual aligned word list either as line-by-line paired text blocks or as a two-column table—so that different but semantically equivalent outputs receive fair and consistent scores.

Background

The paper introduces OmniDocBench v1.6 and its Multi-Granularity Adaptive Matching to correct element-matching biases but acknowledges inherent limitations in the element-matching paradigm. These limitations arise because semantically identical content can be represented using different formats (e.g., HTML vs. Markdown tables, different LaTeX commands) and different structural choices (e.g., modeling the same visual layout as text pairs or as a table), leading to disagreements even among human annotators.

The authors explicitly state that, despite improvements in matching strategies, a broader evaluation approach is needed to recognize semantic equivalence across heterogeneous output formats and structures. This motivates developing evaluation methods that fairly score different yet semantically equivalent outputs, which they identify as an open problem.

References

Developing semantic-equivalence-aware evaluation methods that account for both format and structural ambiguity remains an open problem.

MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale  (2604.04771 - Wang et al., 6 Apr 2026) in Conclusion, Limitations and Future Directions — Fundamental challenges in evaluation