NeurIPS2023

Test-time Adaptation of Discriminative Models via Diffusion Generative Feedback

Mihir Prabhudesai, Tsung-Wei Ke, Alexander C. Li, Deepak Pathak, Katerina Fragkiadaki

15 citations

Abstract

Top-1 Accuracy (a) ImageNet-C (online) (b) FGVC Aircraft (single-sample) Figure 1: Diffusion-TTA improves state-of-the-art pre-trained image classifiers and CLIP models across various benchmarks. Our model adapts pre-trained image discriminative models using feedback from pre-trained image generative diffusion models. Left: Image classification performance of pre-trained image classifiers improves after online adaptation. The image classifiers are pre-trained on ImageNet and adapted on ImageNet-C in an unsupervised manner using generative feedback. As can be seen, we get a significant boost across various model architectures. Right: Accuracy of open-vocabulary CLIP classifiers improves after single-sample adaptation, where we adapt to each unlabelled sample in the test set independently. CLIP is trained on millions of image-text pairs collected from the Internet [38], here we test it on the FGVC dataset [30].