Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 57 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 20 tok/s Pro

GPT-5 High 19 tok/s Pro

GPT-4o 93 tok/s Pro

Kimi K2 176 tok/s Pro

GPT OSS 120B 449 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction (2305.02549v2)

Published 4 May 2023 in cs.CL, cs.CV, and cs.LG

Abstract: The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask LLMing to other modalities require careful multi-task tuning, complex reconstruction target designs, or additional pre-training data. In FormNetV2, we introduce a centralized multimodal graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss. The graph contrastive objective maximizes the agreement of multimodal representations, providing a natural interplay for all modalities without special customization. In addition, we extract image features within the bounding box that joins a pair of tokens connected by a graph edge, capturing more targeted visual cues without loading a sophisticated and separately pre-trained image embedder. FormNetV2 establishes new state-of-the-art performance on FUNSD, CORD, SROIE and Payment benchmarks with a more compact model size.

References (63)