Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UniVSE: Robust Visual Semantic Embeddings via Structured Semantic Representations (1904.05521v2)

Published 11 Apr 2019 in cs.CV, cs.CL, and cs.LG

Abstract: We propose Unified Visual-Semantic Embeddings (UniVSE) for learning a joint space of visual and textual concepts. The space unifies the concepts at different levels, including objects, attributes, relations, and full scenes. A contrastive learning approach is proposed for the fine-grained alignment from only image-caption pairs. Moreover, we present an effective approach for enforcing the coverage of semantic components that appear in the sentence. We demonstrate the robustness of Unified VSE in defending text-domain adversarial attacks on cross-modal retrieval tasks. Such robustness also empowers the use of visual cues to resolve word dependencies in novel sentences.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hao Wu (623 papers)
  2. Jiayuan Mao (55 papers)
  3. Yufeng Zhang (67 papers)
  4. Yuning Jiang (106 papers)
  5. Lei Li (1293 papers)
  6. Weiwei Sun (93 papers)
  7. Wei-Ying Ma (39 papers)
Citations (6)