Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

Published 24 Apr 2024 in cs.LG, cs.AI, cs.SD, and eess.AS | (2404.15704v1)

Abstract: Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance. Although multi-model fusion (MMF) can mitigate some of these issues, redundancy in learned representations may limits improvements. To this end, we propose an adversarial complementary representation learning (ACoRL) framework that enables newly trained models to avoid previously acquired knowledge, allowing each individual component model to learn maximally distinct, complementary representations. We make three detailed explanations of why this works and experimental results demonstrate that our method more efficiently improves performance compared to traditional MMF. Furthermore, attribution analysis validates the model trained under ACoRL acquires more complementary knowledge, highlighting the efficacy of our approach in enhancing efficiency and robustness across tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. “Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines,” NPJ digital medicine, vol. 3, no. 1, pp. 136, 2020.
  2. “Speecheq: Speech emotion recognition based on multi-scale unified datasets and multitask learning,” in Conference of the International Speech Communication Association (INTERSPEECH), 2022.
  3. “A novel multi-model stacking ensemble learning method for metro traction energy prediction,” IEEE Access, vol. 10, pp. 129231–129244, 2022.
  4. “Multi-model fusion metric learning for image set classification,” Knowledge Based System, vol. 164, pp. 253–264, 2019.
  5. “Effective phase encoding for end-to-end speaker verification.,” in Conference of the International Speech Communication Association (INTERSPEECH), 2021, pp. 2366–2370.
  6. “A social emotion classification approach using multi-model fusion,” Future Generation Computer Systems, vol. 102, pp. 347–356, 2020.
  7. “A review on multi-model medical image fusion,” in International Conference on Communication and Signal Processing (ICCSP). IEEE, 2019, pp. 0554–0558.
  8. “A multi-feature-based multi-model fusion method for state of health estimation of lithium-ion batteries,” Journal of Power Sources, vol. 518, pp. 230774, 2022.
  9. “Multi-step short-term wind speed prediction based on integrated multi-model fusion,” Applied Energy, vol. 298, pp. 117248, 2021.
  10. “Adapting image super-resolution state-of-the-arts and learning multi-model ensemble for video super-resolution,” Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2033–2040, 2019.
  11. “A multi model ensemble based deep convolution neural network structure for detection of covid19,” Biomedical signal processing and control, vol. 71, pp. 103126, 2022.
  12. “Multi-model fusion metric learning for image set classification,” Knowledge-Based Systems, vol. 164, pp. 253–264, 2019.
  13. “Deep multi-model fusion for single-image dehazing,” International Conference on Computer Vision (ICCV), pp. 2453–2462, 2019.
  14. “Atmfn: Adaptive-threshold-based multi-model fusion network for compressed face hallucination,” IEEE Transactions on Multimedia, vol. 22, pp. 2734–2747, 2020.
  15. “A feature selection and multi-model fusion-based approach of predicting air quality,” ISA transactions, vol. 100, pp. 210–220, 2020.
  16. “Voxsrc 2022: The fourth voxceleb speaker recognition challenge,” arXiv preprint arXiv:2302.10248, 2023.
  17. “The 2021 nist speaker recognition evaluation,” arXiv preprint arXiv:2204.10242, 2022.
  18. “The speakin system for voxceleb speaker recognition challange 2021,” arXiv preprint arXiv:2109.01989, 2021.
  19. “Audio deepfake detection: A survey,” arXiv preprint arXiv:2308.14970, 2023.
  20. “Early box office prediction in china’s film market based on a stacking fusion model,” Annals of Operations Research, vol. 308, pp. 321 – 338, 2020.
  21. “Meal: Multi-model ensemble via adversarial learning,” Association for the Advancement of Artificial Intelligence (AAAI), 2019.
  22. “Diversity matters when learning from ensembles,” in Neural Information Processing Systems, 2021.
  23. “Adversarial complementary learning for weakly supervised object localization,” Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1325–1334, 2018.
  24. “Ensemble deep learning in bioinformatics,” Nature Machine Intelligence, vol. 2, no. 9, pp. 500–508, 2020.
  25. “Ensemble deep learning: A review,” Engineering Applications of Artificial Intelligence, vol. 115, pp. 105151, 2022.
  26. “A survey on ensemble learning,” Frontiers of Computer Science, vol. 14, pp. 241–258, 2020.
  27. “A comprehensive review on ensemble deep learning: Opportunities and challenges,” Journal of King Saud University-Computer and Information Sciences, 2023.
  28. “Inverse adversarial diversity learning for network ensemble,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  29. “Unsupervised domain adaptation by backpropagation,” ArXiv, vol. abs/1409.7495, 2014.
  30. “What makes multimodal learning better than single (provably),” in Neural Information Processing Systems, 2021.
  31. “Imagenet: A large-scale hierarchical image database,” in Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2009, pp. 248–255.
  32. “Voxceleb: Large-scale speaker verification in the wild,” Computer Science and Language, 2019.
  33. “Arcface: Additive angular margin loss for deep face recognition,” Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685–4694, 2018.
  34. “ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification,” in Conference of the International Speech Communication Association (INTERSPEECH). 2020, pp. 3830–3834, International Symposium on Computer Architecture (ISCA).
  35. “X-vectors: Robust dnn embeddings for speaker recognition,” International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333, 2018.
  36. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  37. “Distilling the knowledge in a neural network,” ArXiv, vol. abs/1503.02531, 2015.
  38. “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks,” IEEE Workshop on Applications of Computer Vision (WACV), pp. 839–847, 2017.
  39. “Axiomatic attribution for deep networks,” in International Conference on Machine Learning (ICML), 2017.
  40. “Grad-cam: Why did you say that?,” arXiv preprint arXiv:1611.07450, 2016.
  41. “A study on visualization of voiceprint feature,” Conference of the International Speech Communication Association (INTERSPEECH), pp. 2233–2237, 2023.
  42. “Axiomatic attribution for deep networks,” in International conference on machine learning. PMLR, 2017, pp. 3319–3328.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.