Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback (2311.16102v2)

Published 27 Nov 2023 in cs.CV, cs.AI, cs.LG, and cs.RO

Abstract: The advancements in generative modeling, particularly the advent of diffusion models, have sparked a fundamental question: how can these models be effectively used for discriminative tasks? In this work, we find that generative models can be great test-time adapters for discriminative models. Our method, Diffusion-TTA, adapts pre-trained discriminative models such as image classifiers, segmenters and depth predictors, to each unlabelled example in the test set using generative feedback from a diffusion model. We achieve this by modulating the conditioning of the diffusion model using the output of the discriminative model. We then maximize the image likelihood objective by backpropagating the gradients to discriminative model's parameters. We show Diffusion-TTA significantly enhances the accuracy of various large-scale pre-trained discriminative models, such as, ImageNet classifiers, CLIP models, image pixel labellers and image depth predictors. Diffusion-TTA outperforms existing test-time adaptation methods, including TTT-MAE and TENT, and particularly shines in online adaptation setups, where the discriminative model is continually adapted to each example in the test set. We provide access to code, results, and visualizations on our website: https://diffusion-tta.github.io/.

PDF Abstract

Understanding Diffusion-TTA

Introduction to Test-time Adaptation

In the field of machine learning, the ability to accurately predict outputs from given inputs is critical. Discriminative models, which include image classifiers, object detectors, segmenters, and image captioners, are designed to make these predictions by mapping inputs, like images, to outputs or classifications. Such models perform well when the inputs are similar to data seen during training but can struggle with inputs that are quite different or "out-of-distribution." This limitation has led researchers to explore how generative models can be used to adapt discriminative models to new, unseen data at test time.

Generative Models for Adaptation

Generative models, which can create new data samples, are trained on complex tasks that require sending data backward through the model—an approach that can lead to richer data understanding. Prior research has shown that using these models in reverse could lead to better generalization on classifying images that are not part of the training set. However, stand-alone generative methods haven't outperformed discriminative ones on benchmark datasets. This has brought forth the idea that perhaps generative models shouldn't replace discriminative ones but instead be used in conjunction to leverage the strengths of both.

The Diffusion-TTA Method

The newly proposed method, known as Diffusion-TTA, takes this hybrid approach. It adapts discriminative models at test time using feedback from generative models, specifically, image diffusion models. Diffusion-TTA uses the output from a discriminative model to influence the generative model's process. By optimizing the diffusion likelihood, the method adapts pre-trained discriminative models, like image classifiers and depth predictors, for every unlabelled example in the test set. This method has proven especially effective in scenarios where the model must be continually adapted to new data as it arrives (online adaptation).

Results and Analysis

Diffusion-TTA has shown impressive enhancements in accuracy across various model architectures when compared to existing test-time adaptation methods. It has been validated over a range of discriminative tasks, including open-vocabulary classification and in-depth prediction on multiple datasets. Not only does it outperform past approaches, but its effectiveness is remarkable in both in-distribution and out-of-distribution examples. Detailed experiments have revealed that adapting both discriminative and generative models yields a more pronounced boost in performance, signifying the potent synergy between the two.

Concluding Thoughts

The research reinforces the notion that coupling discriminating encoding with generative decoding offers a promising direction for handling images that fall outside the training distribution. By publicly sharing code and trained models, the authors of Diffusion-TTA encourage further exploration into combining these two powerful approaches, potentially leading to broader real-world applications where machine learning models must deal with changing data contexts.