Semi-supervised multimodal coreference resolution in image narrations (2310.13619v1)

Published 20 Oct 2023 in cs.CL and cs.CV

Abstract: In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised approach that utilizes image-narration pairs to resolve coreferences and narrative grounding in a multimodal context. Our approach incorporates losses for both labeled and unlabeled data within a cross-modal framework. Our evaluation shows that the proposed approach outperforms strong baselines both quantitatively and qualitatively, for the tasks of coreference resolution and narrative grounding.

Authors (4)

Arushi Goel (18 papers)
Basura Fernando (60 papers)
Frank Keller (45 papers)
Hakan Bilen (62 papers)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Semi-supervised multimodal coreference resolution in image narrations (2310.13619v1)

Summary

Related Papers