Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unimodal and Multimodal Representation Training for Relation Extraction (2211.06168v1)

Published 11 Nov 2022 in cs.CL

Abstract: Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ciaran Cooney (2 papers)
  2. Rachel Heyburn (2 papers)
  3. Liam Madigan (2 papers)
  4. Mairead O'Cuinn (2 papers)
  5. Chloe Thompson (1 paper)
  6. Joana Cavadas (3 papers)
Citations (2)