Unlimited OCR Works: Breaking the Long Document Bottleneck

This presentation explores a breakthrough in end-to-end Optical Character Recognition that solves a critical scalability problem. While recent OCR models leverage powerful language models, they suffer from explosive memory growth and declining speed when processing long documents. Unlimited OCR introduces Reference Sliding Window Attention, a novel mechanism inspired by human working memory that maintains constant speed and memory usage regardless of document length, achieving state-of-the-art accuracy while parsing dozens of pages in a single pass.
Script
Modern OCR models built on language model decoders hit a wall when processing long documents. Memory explodes and speed plummets as they parse page after page, making single-pass analysis of multi-page documents prohibitively expensive.
Unlimited OCR solves this with Reference Sliding Window Attention, a mechanism that mimics human working memory. Every decoded token attends globally to the original visual input but only locally to the last 128 output tokens. This keeps the memory cache bounded at a constant size, no matter how long the output grows.
The architecture combines a high-compression vision encoder achieving 16 times token reduction with a 3 billion parameter Mixture of Experts decoder. Only half a billion parameters activate per inference pass, maximizing speed while the sliding attention keeps resource usage flat across arbitrarily long outputs.
On the OmniDocBench evaluation, Unlimited OCR scores 93.23, a 6 percentage point jump over its predecessor. More striking is the throughput story: while baseline models slow down by 35 percent as outputs lengthen, Unlimited OCR maintains constant speed at 5580 tokens per second, even past 6000 tokens.
The benefits generalize across document types. Whether parsing presentations, academic papers, books, magazines, or handwritten notes, Unlimited OCR maintains edit distances below 0.11 even beyond 40 pages. The sliding window does not sacrifice accuracy for speed.
By designing attention to mirror how humans actually work with references, keeping the source document always visible but only recent outputs in working memory, Unlimited OCR proves that scalable long-horizon parsing is possible. If you want to dive deeper into reference sliding window attention and create your own video summaries of cutting-edge research, visit EmergentMind.com.