Multi-Scale and Multi-Layer Contrastive Learning for Domain Generalization (2308.14418v5)
Abstract: During the past decade, deep neural networks have led to fast-paced progress and significant achievements in computer vision problems, for both academia and industry. Yet despite their success, state-of-the-art image classification approaches fail to generalize well in previously unseen visual contexts, as required by many real-world applications. In this paper, we focus on this domain generalization (DG) problem and argue that the generalization ability of deep convolutional neural networks can be improved by taking advantage of multi-layer and multi-scaled representations of the network. We introduce a framework that aims at improving domain generalization of image classifiers by combining both low-level and high-level features at multiple scales, enabling the network to implicitly disentangle representations in its latent space and learn domain-invariant attributes of the depicted objects. Additionally, to further facilitate robust representation learning, we propose a novel objective function, inspired by contrastive learning, which aims at constraining the extracted representations to remain invariant under distribution shifts. We demonstrate the effectiveness of our method by evaluating on the domain generalization datasets of PACS, VLCS, Office-Home and NICO. Through extensive experimentation, we show that our model is able to surpass the performance of previous DG methods and consistently produce competitive and state-of-the-art results in all datasets
- K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034.
- B. Recht, R. Roelofs, L. Schmidt, and V. Shankar, “Do imagenet classifiers generalize to imagenet?” in International Conference on Machine Learning. PMLR, 2019, pp. 5389–5400.
- K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Deeper, broader and artier domain generalization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5543–5551.
- A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in CVPR 2011, 2011, pp. 1521–1528.
- H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, “Deep hashing network for unsupervised domain adaptation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5018–5027.
- Y. He, Z. Shen, and P. Cui, “Towards non-iid image classification: A dataset and baselines,” Pattern Recognition, vol. 110, p. 107383, 2021.
- M. Wang and W. Deng, “Deep visual domain adaptation: A survey,” Neurocomputing, vol. 312, pp. 135–153, 2018.
- H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5400–5409.
- K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain generalization via invariant feature representation,” in Proceedings of the 30th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research. PMLR, 2013, pp. 10–18.
- M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Deep transfer learning with joint adaptation networks,” in Proceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research. PMLR, 2017, pp. 2208–2217.
- M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research. PMLR, 2015, pp. 97–105.
- M. Chen, S. Zhao, H. Liu, and D. Cai, “Adversarial-learned loss for domain adaptation,” Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3521–3528, 2020.
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of neural networks,” The journal of machine learning research, 2016.
- E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discriminative domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7167–7176.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates, Inc., 2014, pp. 2672–2680.
- B. Li, Y. Wang, S. Zhang, D. Li, K. Keutzer, T. Darrell, and H. Zhao, “Learning invariant representations and risks for semi-supervised domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1104–1113.
- S. Xie, Z. Zheng, L. Chen, and C. Chen, “Learning semantic representations for unsupervised domain adaptation,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research. PMLR, 2018, pp. 5423–5432.
- C. Chen, W. Xie, W. Huang, Y. Rong, X. Ding, Y. Huang, T. Xu, and J. Huang, “Progressive feature alignment for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 627–636.
- M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li, “Deep reconstruction-classification networks for unsupervised domain adaptation,” in Computer Vision – ECCV 2016. Cham: Springer International Publishing, 2016, pp. 597–613.
- D. Li, J. Yang, K. Kreis, A. Torralba, and S. Fidler, “Semantic segmentation with generative models: Semi-supervised learning and strong out-of-domain generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8300–8311.
- J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler, “Efficient object localization using convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 648–656.
- K. Saito, Y. Ushiku, T. Harada, and K. Saenko, “Adversarial dropout regularization,” 2018.
- Y. Balaji, S. Sankaranarayanan, and R. Chellappa, “MetaReg: Towards Domain Generalization using Meta-Regularization,” in Advances in Neural Information Processing Systems, vol. 31. Red Hook, NY, USA: Curran Associates, Inc., 2018, pp. 1006–1016.
- S. Yan, H. Song, N. Li, L. Zou, and L. Ren, “Improve unsupervised domain adaptation with mixup training,” arXiv preprint arXiv:2001.00677, 2020.
- G. French, M. Mackiewicz, and M. Fisher, “Self-ensembling for visual domain adaptation,” 2018.
- Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
- B. Schölkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y. Bengio, “Toward causal representation learning,” Proceedings of the IEEE, 2021.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2014.
- C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, “Understanding disentangling in β−limit-from𝛽{\beta-}italic_β -VAE,” arXiv:1804.03599 [cs, stat], 2018.
- X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates, Inc., 2016.
- J. Chen, Z. Zhang, X. Xie, Y. Li, T. Xu, K. Ma, and Y. Zheng, “Beyond mutual information: Generative adversarial network for domain adaptation using information bottleneck constraint,” IEEE Transactions on Medical Imaging, 2021.
- H. Kim and A. Mnih, “Disentangling by factorising,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research. PMLR, 2018, pp. 2649–2658.
- M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang, “Causalvae: Disentangled representation learning via neural structural causal models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9593–9602.
- A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.
- J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in neural information processing systems, vol. 33, pp. 21 271–21 284, 2020.
- X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15 750–15 758.
- X.-S. Wei, Y.-Z. Song, O. Mac Aodha, J. Wu, Y. Peng, J. Tang, J. Yang, and S. Belongie, “Fine-grained image analysis with deep learning: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 12, pp. 8927–8948, 2021.
- Y. Wang, V. I. Morariu, and L. S. Davis, “Learning a discriminative filter bank within a cnn for fine-grained recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4148–4157.
- Y. Ding, Y. Zhou, Y. Zhu, Q. Ye, and J. Jiao, “Selective sparse sampling for fine-grained image recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6599–6608.
- Z. Huang and Y. Li, “Interpretable and accurate fine-grained recognition via region grouping,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8662–8672.
- H. Zheng, J. Fu, Z.-J. Zha, and J. Luo, “Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5012–5021.
- R. Ji, L. Wen, L. Zhang, D. Du, Y. Wu, C. Zhao, X. Liu, and F. Huang, “Attention convolutional binary neural tree for fine-grained visual categorization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 468–10 477.
- X. Zhang, P. Cui, R. Xu, L. Zhou, Y. He, and Z. Shen, “Deep stable learning for out-of-distribution generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5372–5382.
- Z. Huang, H. Wang, E. P. Xing, and D. Huang, “Self-challenging improves cross-domain generalization,” in ECCV, 2020, pp. 124–140.
- F. M. Carlucci, A. D’Innocente, S. Bucci, B. Caputo, and T. Tommasi, “Domain generalization by solving jigsaw puzzles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2229–2238.
- B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in European conference on computer vision. Berlin, Heidelberg: Springer, 2016, pp. 443–450.
- D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 3490–3497.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research. PMLR, 2017, pp. 1126–1135.
- M. Zhang, H. Marklund, N. Dhawan, A. Gupta, S. Levine, and C. Finn, “Adaptive risk minimization: Learning to adapt to domain shift,” Advances in Neural Information Processing Systems, vol. 34, pp. 23 664–23 678, 2021.
- Y. Du, J. Xu, H. Xiong, Q. Qiu, X. Zhen, C. G. M. Snoek, and L. Shao, “Learning to learn with variational information bottleneck for domain generalization,” in Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020, pp. 200–216.
- H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap by reducing style bias,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8690–8699.
- S. Seo, Y. Suh, D. Kim, G. Kim, J. Han, and B. Han, “Learning to optimize domain specific normalization for domain generalization,” in Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020, pp. 68–83.
- A. Ballas and C. Diou, “CNN Feature Map Augmentation for Single-Source Domain Generalization,” in 2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService). Los Alamitos, CA, USA: IEEE Computer Society, Jul. 2023, pp. 127–131.
- K. Zhou, Y. Yang, Y. Qiao, and T. Xiang, “Domain generalization with mixstyle,” in International Conference on Learning Representations, 2021.
- C. Eastwood, A. Robey, S. Singh, J. Von Kügelgen, H. Hassani, G. J. Pappas, and B. Schölkopf, “Probable domain generalization via quantile risk minimization,” Advances in Neural Information Processing Systems, vol. 35, pp. 17 340–17 358, 2022.
- P. Wang, Z. Zhang, Z. Lei, and L. Zhang, “Sharpness-aware gradient matching for domain generalization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3769–3778.
- Z. Li, K. Ren, X. JIANG, Y. Shen, H. Zhang, and D. Li, “SIMPLE: Specialized model-sample matching for domain generalization,” in The Eleventh International Conference on Learning Representations, 2023.
- J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, and S. Park, “SWAD: Domain Generalization by Seeking Flat Minima,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. W. Vaughan, Eds., vol. 34. Red Hook, NY, USA: Curran Associates, Inc., 2021, pp. 22 405–22 418.
- J. Cha, K. Lee, S. Park, and S. Chun, “Domain Generalization by Mutual-Information Regularization with Pre-trained Models,” in Computer Vision – ECCV 2022, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer Nature Switzerland, 2022, vol. 13683, pp. 440–457, series Title: Lecture Notes in Computer Science.
- X. Yao, Y. Bai, X. Zhang, Y. Zhang, Q. Sun, R. Chen, R. Li, and B. Yu, “PCL: Proxy-based Contrastive Learning for Domain Generalization,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA: IEEE, Jun. 2022, pp. 7087–7097.
- B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- A. Ballas and C. Diou, “Multi-layer representation learning for robust ood image classification,” in Proceedings of the 12th Hellenic Conference on Artificial Intelligence, ser. SETN ’22. New York, NY, USA: Association for Computing Machinery, 2022, pp. 1–4.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, ser. Lecture Notes in Computer Science, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham: Springer International Publishing, 2015, pp. 234–241.
- A. Ballas and C. Diou, “Towards domain generalization for ecg and eeg classification: Algorithms and benchmarks,” IEEE Transactions on Emerging Topics in Computational Intelligence, pp. 1–11, 2023.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI: IEEE, Jul. 2017, pp. 2261–2269.
- L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, Oct. 2019, pp. 3712–3721.
- Y. Wang, Z. Ni, S. Song, L. Yang, and G. Huang, “Revisiting locally supervised learning: an alternative to end-to-end training,” in International Conference on Learning Representations, 2021.
- D. Kim, Y. Yoo, S. Park, J. Kim, and J. Lee, “SelfReg: Self-supervised Contrastive Regularization for Domain Generalization,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada: IEEE, Oct. 2021, pp. 9599–9608.
- A. Ballas and C. Diou, “CNNs with Multi-Level Attention for Domain Generalization,” in Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, ser. ICMR ’23. New York, NY, USA: Association for Computing Machinery, 2023, pp. 592–596, event-place: Thessaloniki, Greece.
- C. Olah, A. Mordvintsev, and L. Schubert, “Feature visualization,” Distill, 2017, https://distill.pub/2017/feature-visualization.
- C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, and A. Mordvintsev, “The building blocks of interpretability,” Distill, vol. 3, no. 3, p. e10, 2018.
- R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,” Nature Machine Intelligence, vol. 2, no. 11, pp. 665–673, 2020.
- N. Cohen, O. Sharir, and A. Shashua, “On the expressive power of deep learning: A tensor analysis,” in Conference on learning theory. PMLR, 2016, pp. 698–728.
- N. Cohen and A. Shashua, “Inductive bias of deep convolutional networks through pooling geometry,” in International Conference on Learning Representations, 2017.
- Y. Levine, D. Yakira, N. Cohen, and A. Shashua, “Deep learning and quantum entanglement: Fundamental connections with implications to network design,” in International Conference on Learning Representations, 2018.
- I. Higgins, D. Amos, D. Pfau, S. Racaniere, L. Matthey, D. Rezende, and A. Lerchner, “Towards a definition of disentangled representations,” arXiv preprint arXiv:1812.02230, 2018.
- B. R. Wilfred, W.-X. Wang, and P. T. Nelson, “Energizing mirna research: a review of the role of mirnas in lipid metabolism, with a prediction that mir-103/107 regulates human metabolic pathways,” Molecular genetics and metabolism, vol. 91, no. 3, pp. 209–217, 2007.
- Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3733–3742.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32. Red Hook, NY, USA: Curran Associates, Inc., 2019, pp. 8026–8037.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
- M. Ghifary, W. B. Kleijn, M. Zhang, and D. Balduzzi, “Domain generalization for object recognition with multi-task autoencoders,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2551–2559.
- I. Gulrajani and D. Lopez-Paz, “In search of lost domain generalization,” in International Conference on Learning Representations, 2021.
- K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2014.
- Aristotelis Ballas (11 papers)
- Christos Diou (40 papers)