Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Cross-modal Contrastive Features for Video Domain Adaptation (2108.11974v1)

Published 26 Aug 2021 in cs.CV

Abstract: Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross-modal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Donghyun Kim (129 papers)
  2. Yi-Hsuan Tsai (69 papers)
  3. Bingbing Zhuang (15 papers)
  4. Xiang Yu (130 papers)
  5. Stan Sclaroff (56 papers)
  6. Kate Saenko (178 papers)
  7. Manmohan Chandraker (108 papers)
Citations (64)

Summary

We haven't generated a summary for this paper yet.