Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images (2405.15961v1)
Abstract: Domain Generalization (DG) is a challenging task in machine learning that requires a coherent ability to comprehend shifts across various domains through extraction of domain-invariant features. DG performance is typically evaluated by performing image classification in domains of various image styles. However, current methodology lacks quantitative understanding about shifts in stylistic domain, and relies on a vast amount of pre-training data, such as ImageNet1K, which are predominantly in photo-realistic style with weakly supervised class labels. Such a data-driven practice could potentially result in spurious correlation and inflated performance on DG benchmarks. In this paper, we introduce a new DG paradigm to address these risks. We first introduce two new quantitative measures ICV and IDD to describe domain shifts in terms of consistency of classes within one domain and similarity between two stylistic domains. We then present SuperMarioDomains (SMD), a novel synthetic multi-domain dataset sampled from video game scenes with more consistent classes and sufficient dissimilarity compared to ImageNet1K. We demonstrate our DG method SMOS. SMOS first uses SMD to train a precursor model, which is then used to ground the training on a DG benchmark. We observe that SMOS contributes to state-of-the-art performance across five DG benchmarks, gaining large improvements to performances on abstract domains along with on-par or slight improvements to those on photo-realistic domains. Our qualitative analysis suggests that these improvements can be attributed to reduced distributional divergence between originally distant domains. Our data are available at https://github.com/fpsluozi/SMD-SMOS .
- Synthetic images as a regularity prior for image restoration neural networks. In International Conference on Scale Space and Variational Methods in Computer Vision, pages 333–345. Springer, 2021.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Learning de-biased representations with biased representations. In International Conference on Machine Learning, pages 528–539. PMLR, 2020.
- From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. IEEE transactions on medical imaging, 38(2):550–560, 2018.
- Recognition in terra incognita. In Proceedings of the European conference on computer vision (ECCV), pages 456–473, 2018.
- Domain generalization by marginal transfer learning. Journal of Machine Learning Research, 22(2):1–55, 2021.
- Openai gym, 2016.
- Swad: Domain generalization by seeking flat minima. volume 34, page 22405–22418. Curran Associates, Inc., 2021.
- Domain generalization by mutual-information regularization with pre-trained models. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII, pages 440–457, 2022.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
- Investigating neural architectures by synthetic dataset design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4890–4899, 2022.
- Sviro: Synthetic vehicle interior rear seat occupancy dataset and benchmark. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 973–982, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Decaf: A deep convolutional activation feature for generic visual recognition. In International conference on machine learning, pages 647–655. PMLR, 2014.
- Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3):465–479, 2012.
- SMURF: SeMantic and linguistic UndeRstanding fusion for caption evaluation via typicality analysis. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2250–2260, Online, Aug. 2021. Association for Computational Linguistics.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
- Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- Improving diversity with adversarially learned transformations for domain generalization. In IEEE/CVF Winter Conference on Applications of Computer Vision, 2023.
- Covariate shift detection via domain interpolation sensitivity. In First Workshop on Interpolation Regularizers and Beyond at NeurIPS 2022, 2022.
- An efficient image similarity measure based on approximations of kl-divergence between two gaussian mixtures. In Proceedings Ninth IEEE International conference on computer vision, pages 487–493. IEEE, 2003.
- In search of lost domain generalization. arXiv preprint arXiv:2007.01434, 2020.
- In search of lost domain generalization. In International Conference on Learning Representations, 2021.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Efficient learning of domain-invariant image representations. arXiv preprint arXiv:1301.3224, 2013.
- Self-challenging improves cross-domain generalization. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 124–140. Springer, 2020.
- Domain generalization via balancing training difficulty and model capability. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18993–19003, 2023.
- Undoing the damage of dataset bias. In European Conference on Computer Vision, pages 158–171. Springer, 2012.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
- Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning, pages 5815–5826. PMLR, 2021.
- Invariant information bottleneck for domain generalization. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7):7399–7407, Jun. 2022.
- Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Deeper, broader and artier domain generalization. In Proceedings of the IEEE international conference on computer vision, pages 5542–5550, 2017.
- Deep domain generalization via conditional invariant adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 624–639, 2018.
- Simple: Specialized model-sample matching for domain generalization. In The Eleventh International Conference on Learning Representations, 2022.
- Super-clevr: A virtual benchmark to diagnose domain robustness in visual reasoning. (arXiv:2212.00259), May 2023. arXiv:2212.00259 [cs].
- Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145–151, 1991.
- Nintendo Co. Ltd. Nintendo game content guidelines for online video & image sharing platforms, Oct 2023.
- Reducing domain gap by reducing style bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8690–8699, 2021.
- Reducing domain gap by reducing style bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
- Review and analysis of synthetic dataset generation methods and techniques for application in computer vision. Artificial Intelligence Review, 56(9):9221–9265, Sept. 2023.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
- Syn2real: A new benchmark forsynthetic-to-real visual domain adaptation. arXiv preprint arXiv:1806.09755, 2018.
- Learning to learn single domain generalization. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 12553–12562. IEEE, 2020.
- Playing for data: Ground truth from computer games. (arXiv:1608.02192), Aug. 2016. arXiv:1608.02192 [cs].
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234–3243, 2016.
- Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010.
- Distributionally robust neural networks. In International Conference on Learning Representations, 2020.
- Return of frustratingly easy domain adaptation. In AAAI, 2016.
- Ovvv: Using virtual worlds to design and evaluate surveillance systems. page 1–8, June 2007.
- Erm++: An improved baseline for domain generalization. (arXiv:2304.01973), Aug. 2023. arXiv:2304.01973 [cs].
- Vladimir Vapnik. Principles of risk minimization for learning theory. Advances in neural information processing systems, 4, 1991.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5018–5027, 2017.
- Generalizing to unseen domains via adversarial data augmentation. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 5339–5349, 2018.
- Heterogeneous domain generalization via domain mixup. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3622–3626. IEEE, 2020.
- Meta-simulation for the automated design of synthetic overhead imagery. (arXiv:2209.08685), Oct. 2022. arXiv:2209.08685 [cs].
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- Adaptive risk minimization: Learning to adapt to domain shift. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 23664–23678. Curran Associates, Inc., 2021.
- Nico++: Towards better benchmarking for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16036–16047, 2023.
- Jsnet: Joint instance and semantic segmentation of 3d point clouds. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 12951–12958, 2020.
- Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
- Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
- Domain generalization with mixstyle. In International Conference on Learning Representations, 2021.
- A comprehensive survey on transfer learning. (arXiv:1911.02685), June 2020. arXiv:1911.02685 [cs, stat].