Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Audio-Visual Instance Discrimination via Active Contrastive Set Mining (2204.12366v1)

Published 26 Apr 2022 in cs.MM

Abstract: The recent success of audio-visual representation learning can be largely attributed to their pervasive property of audio-visual synchronization, which can be used as self-annotated supervision. As a state-of-the-art solution, Audio-Visual Instance Discrimination (AVID) extends instance discrimination to the audio-visual realm. Existing AVID methods construct the contrastive set by random sampling based on the assumption that the audio and visual clips from all other videos are not semantically related. We argue that this assumption is rough, since the resulting contrastive sets have a large number of faulty negatives. In this paper, we overcome this limitation by proposing a novel Active Contrastive Set Mining (ACSM) that aims to mine the contrastive sets with informative and diverse negatives for robust AVID. Moreover, we also integrate a semantically-aware hard-sample mining strategy into our ACSM. The proposed ACSM is implemented into two most recent state-of-the-art AVID methods and significantly improves their performance. Extensive experiments conducted on both action and sound recognition on multiple datasets show the remarkably improved performance of our method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hanyu Xuan (4 papers)
  2. Yihong Xu (20 papers)
  3. Shuo Chen (127 papers)
  4. Zhiliang Wu (15 papers)
  5. Jian Yang (505 papers)
  6. Yan Yan (242 papers)
  7. Xavier Alameda-Pineda (69 papers)