LVLM Abilities for Long-Context Document Understanding
Establish the capabilities of Large Vision-Language Models (LVLMs) for long-context document understanding by determining whether these models can reliably understand and answer questions over lengthy, multi-page documents.
References
However, their abilities on long-context DU remain an open problem.
— MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
(2407.01523 - Ma et al., 1 Jul 2024) in Abstract, page 1