Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-media Structured Common Space for Multimedia Event Extraction (2005.02472v1)

Published 5 May 2020 in cs.MM, cs.CL, cs.CV, and cs.LG

Abstract: We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Manling Li (47 papers)
  2. Alireza Zareian (16 papers)
  3. Qi Zeng (42 papers)
  4. Spencer Whitehead (18 papers)
  5. Di Lu (37 papers)
  6. Heng Ji (266 papers)
  7. Shih-Fu Chang (131 papers)
Citations (97)