GDA: Generalized Diffusion for Robust Test-time Adaptation (2404.00095v2)
Abstract: Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the model's weights. Unfortunately, those studies have primarily focused on pixel-level corruptions, thereby lacking the generalization to adapt to a broader range of OOD types. We introduce Generalized Diffusion Adaptation (GDA), a novel diffusion-based test-time adaptation method robust against diverse OOD types. Specifically, GDA iteratively guides the diffusion by applying a marginal entropy loss derived from the model, in conjunction with style and content preservation losses during the reverse sampling process. In other words, GDA considers the model's output behavior with the semantic information of the samples as a whole, which can reduce ambiguity in downstream tasks during the generation process. Evaluation across various popular model architectures and OOD benchmarks shows that GDA consistently outperforms prior work on diffusion-driven adaptation. Notably, it achieves the highest classification accuracy improvements, ranging from 4.4\% to 5.02\% on ImageNet-C and 2.5\% to 7.4\% on Rendition, Sketch, and Stylized benchmarks. This performance highlights GDA's generalization to a broader range of OOD benchmarks.
- Visual prompting: Modifying pixel space to adapt pre-trained models. arXiv preprint arXiv:2203.17274, 2022.
- Ilvr: Conditioning method for denoising diffusion probabilistic models, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Domain generalization via model-agnostic learning of semantic features. Advances in Neural Information Processing Systems, 32, 2019.
- Back to the source: Diffusion-driven test-time adaptation. arXiv preprint arXiv:2207.03442, 2022.
- Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations, 2019.
- Semi-supervised learning by entropy minimization. Advances in neural information processing systems, 17, 2004.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019a.
- Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations, 2019b.
- Using self-supervised learning can improve model robustness and uncertainty. Advances in Neural Information Processing Systems, 32, 2019.
- Augmix: A simple data processing method to improve robustness and uncertainty, 2020.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021a.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021b.
- Natural adversarial examples. CVPR, 2021c.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Video diffusion models. arXiv:2204.03458, 2022.
- Visual prompt tuning. arXiv preprint arXiv:2203.12119, 2022.
- Learning to generalize: Meta-learning for domain generalization. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Revisiting batch normalization for practical domain adaptation, 2016.
- Accelerating diffusion models for inverse problems through shortcut sampling. arXiv preprint arXiv:2305.16965, 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022.
- Generative interventions for causal learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3947–3956, 2021a.
- Adversarial attacks are reversible with natural supervision. arXiv preprint arXiv:2103.14222, 2021b.
- Discrete representations strengthen vision transformer robustness. arXiv preprint arXiv:2111.10493, 2021c.
- Causal transportability for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7521–7531, 2022.
- Benchmarking robustness in object detection: Autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484, 2019.
- Improved denoising diffusion probabilistic models. In Proceedings of the 38th International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022.
- Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision, pages 319–345. Springer, 2020.
- Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles, pages 1–18, 2017.
- Enhancing adversarial robustness via test-time transformation ensembling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 81–91, 2021.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pages 5389–5400. PMLR, 2019.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization, 2019.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Test-time training with self-supervision for generalization under distribution shifts, 2019.
- Transfer learning without knowing: Reprogramming black-box machine learning models with scarce data and limited resources. In International Conference on Machine Learning, pages 9614–9624. PMLR, 2020.
- Self-supervised convolutional visual prompts. arXiv preprint arXiv:2303.00198, 2023.
- AutoVP: An automated visual prompting framework and benchmark. In The Twelfth International Conference on Learning Representations, 2024.
- Tent: Fully test-time adaptation by entropy minimization, 2020.
- Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, 2021.
- Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, pages 10506–10518, 2019.
- Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050, 2022.
- Zero-shot contrastive loss for text-guided diffusion image style transfer, 2023.
- Distribution shift inversion for out-of-distribution prediction. The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.
- MEMO: Test time robustness via adaptation and augmentation. 2021.
- Deep domain-adversarial image generation for domain generalisation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 13025–13032, 2020.
- Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008, 2021.