Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Multi-view Representation Learning via Distilled Disentangling (2403.10897v2)

Published 16 Mar 2024 in cs.CV and cs.MM

Abstract: Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources. This paper presents an in-depth analysis of existing approaches in this domain, highlighting a commonly overlooked aspect: the redundancy between view-consistent and view-specific representations. To this end, we propose an innovative framework for multi-view representation learning, which incorporates a technique we term 'distilled disentangling'. Our method introduces the concept of masked cross-view prediction, enabling the extraction of compact, high-quality view-consistent representations from various sources without incurring extra computational overhead. Additionally, we develop a distilled disentangling module that efficiently filters out consistency-related information from multi-view representations, resulting in purer view-specific representations. This approach significantly reduces redundancy between view-consistent and view-specific representations, enhancing the overall efficiency of the learning process. Our empirical evaluations reveal that higher mask ratios substantially improve the quality of view-consistent representations. Moreover, we find that reducing the dimensionality of view-consistent representations relative to that of view-specific representations further refines the quality of the combined representations. Our code is accessible at: https://github.com/Guanzhou-Ke/MRDD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Deep canonical correlation analysis. In ICML, pages 1247–1255. PMLR, 2013.
  2. wav2vec 2.0: A framework for self-supervised learning of speech representations. NIPS, 33:12449–12460, 2020.
  3. Feng Bao. Disentangled variational information bottleneck for multiview representation learning. In CICAI, pages 91–102. Springer, 2021.
  4. Mutual information neural estimation. In ICML, pages 531–540. PMLR, 2018.
  5. Representation learning: A review and new perspectives. PAMI, 35(8):1798–1828, 2013.
  6. Multi-view low-rank sparse subspace clustering. Pattern Recognition, 73:247–258, 2018.
  7. Consensus and complementarity based maximum entropy discrimination for multi-view classification. Information Sciences, 367:296–310, 2016.
  8. Mm-vit: Multi-modal video transformer for compressed video action recognition. In CVPR, pages 1910–1921, 2022.
  9. Club: A contrastive log-ratio upper bound of mutual information. In ICML, pages 1779–1788. PMLR, 2020.
  10. Histograms of oriented gradients for human detection. In CVPR, pages 886–893. IEEE, 2005.
  11. Emilien Dupont. Learning disentangled joint continuous and discrete representations. NIPS, 31, 2018.
  12. Learning robust representations via multi-view information bottleneck. ICLR, 2020.
  13. Songhao Piao Furu Wei Hangbo Bao, Li Dong. Beit: BERT pre-training of image transformers. In ICLR, 2022.
  14. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  15. Masked autoencoders are scalable vision learners. In CVPR, pages 16000–16009, 2022.
  16. beta-vae: Learning basic visual concepts with a constrained variational framework. In ICLR, 2016.
  17. Deep spectral representation learning from multi-view data. TIP, 30:5352–5362, 2021.
  18. Describing videos using multi-modal fusion. In ACMMM, pages 1087–1091, 2016.
  19. Conan: contrastive fusion networks for multi-view clustering. In Big Data, pages 653–660. IEEE, 2021.
  20. Disentangling multi-view representations beyond inductive bias. In ACMMM, pages 2582–2590, 2023.
  21. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, page 2, 2019.
  22. Auto-encoding variational bayes. In ICLR, 2014.
  23. Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
  24. Co-regularized multi-view spectral clustering. NIPS, 24:1413–1421, 2011.
  25. ALBERT: A lite BERT for self-supervised learning of language representations. In ICLR, 2020.
  26. A survey of multi-view representation learning. TKDE, 31(10):1863–1883, 2019.
  27. Scaling language-image pre-training via masking. In CVPR, pages 23390–23400, 2023.
  28. Multi-view clustering via joint nonnegative matrix factorization. In ICDM, pages 252–260. SIAM, 2013.
  29. Coupled generative adversarial networks. NIPS, 29, 2016.
  30. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  31. David G Lowe. Object recognition from local scale-invariant features. In ICCV, pages 1150–1157. Ieee, 1999.
  32. Columbia object image library (coil-20). 1996.
  33. Pytorch: An imperative style, high-performance deep learning library. NIPS, 32:8026–8037, 2019.
  34. A new approach to cross-modal multimedia retrieval. In ACMMM, pages 251–260, 2010.
  35. Adapting visual category models to new domains. In ECCV, pages 213–226. Springer, 2010.
  36. Multi-view maximum entropy discrimination. In IJCAI, pages 1706–1712, 2013.
  37. Self-supervised deep multi-view subspace clustering. In ACML, pages 1001–1016. PMLR, 2019.
  38. Contrastive multiview coding. In ECCV, pages 776–794. Springer, 2020.
  39. Reconsidering representation alignment for multi-view clustering. In CVPR, pages 1255–1265, 2021.
  40. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  41. Attention is all you need. NIPS, 30, 2017.
  42. Extracting and composing robust features with denoising autoencoders. In ICML, pages 1096–1103, 2008.
  43. On deep multi-view representation learning. In ICML, pages 1083–1092. PMLR, 2015.
  44. Multi-view subspace clustering with intactness-aware similarity. Pattern Recognition, 88:50–63, 2019.
  45. Disentangled representation learning. arXiv preprint arXiv:2211.11695, 2022.
  46. Knowledge distillation-driven semi-supervised multi-view classification. Information Fusion, 103:102098, 2024.
  47. Multiview spectral clustering via structured low-rank matrix factorization. TNNLS, 29(10):4833–4843, 2018.
  48. Masked feature prediction for self-supervised visual pre-training. In CVPR, pages 14668–14678, 2022.
  49. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  50. Progressive deep multi-view comprehensive representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10557–10565, 2023a.
  51. Multi-vae: Learning disentangled view-common and view-peculiar visual representations for multi-view clustering. In CVPR, pages 9234–9243, 2021.
  52. Multi-level feature learning for contrastive multi-view clustering. In CVPR, pages 16051–16060, 2022.
  53. Untie: Clustering analysis with disentanglement in multi-view information fusion. Information Fusion, 100:101937, 2023b.
  54. Gcfagg: Global and cross-view feature aggregation for multi-view clustering. In CVPR, pages 19863–19872, 2023.
  55. Ae2-nets: Autoencoder in autoencoder networks. In CVPR, pages 2577–2585, 2019.
  56. Multi-view clustering via deep matrix factorization. In AAAI, 2017.
  57. End-to-end adversarial-attention network for multi-modal clustering. In CVPR, pages 14619–14628, 2020.
  58. End-to-end multi-view fusion for 3d object detection in lidar point clouds. In CoRL, pages 923–932. PMLR, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com