Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Analyzing the Role of Image for Visual-enhanced Relation Extraction (2211.07504v1)

Published 14 Nov 2022 in cs.CL, cs.AI, cs.CV, cs.IR, and cs.LG

Abstract: Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lei Li (1293 papers)
  2. Xiang Chen (343 papers)
  3. Shuofei Qiao (19 papers)
  4. Feiyu Xiong (53 papers)
  5. Huajun Chen (198 papers)
  6. Ningyu Zhang (148 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com