Your VAR Model is Secretly an Efficient and Explainable Generative Classifier (2510.12060v1)

Published 14 Oct 2025 in cs.LG, cs.AI, and cs.CV

Abstract: Generative classifiers, which leverage conditional generative models for classification, have recently demonstrated desirable properties such as robustness to distribution shifts. However, recent progress in this area has been largely driven by diffusion-based models, whose substantial computational cost severely limits scalability. This exclusive focus on diffusion-based methods has also constrained our understanding of generative classifiers. In this work, we propose a novel generative classifier built on recent advances in visual autoregressive (VAR) modeling, which offers a new perspective for studying generative classifiers. To further enhance its performance, we introduce the Adaptive VAR Classifier$^+$ (A-VARC$^+$), which achieves a superior trade-off between accuracy and inference speed, thereby significantly improving practical applicability. Moreover, we show that the VAR-based method exhibits fundamentally different properties from diffusion-based methods. In particular, due to its tractable likelihood, the VAR-based classifier enables visual explainability via token-wise mutual information and demonstrates inherent resistance to catastrophic forgetting in class-incremental learning tasks.

Summary

The paper introduces the Adaptive VAR Classifier (A-VARC+) that leverages VAR models for efficient generative classification.
It employs likelihood smoothing and partial-scale candidate pruning to balance computational efficiency with classification accuracy.
The method enhances visual explainability and resists catastrophic forgetting in class-incremental learning scenarios.

Efficient and Explainable Generative Classification with VAR Models

Introduction

The paper "Your VAR Model is Secretly an Efficient and Explainable Generative Classifier" introduces an innovative approach to generative classification using visual autoregressive (VAR) models. By leveraging recent advances in VAR modeling, the paper proposes the Adaptive VAR Classifier (A-VARC $^+$ ), which significantly enhances the efficiency and applicability of generative classifiers. While traditional diffusion-based generative classifiers face computational challenges, the VAR-based approach balances accuracy and speed, offering a new perspective on generative classifiers.

Methodology

Generative Classifiers

Generative classifiers estimate class-conditional likelihoods $p(x|y)$ and use Bayes' theorem to calculate the posterior $p(y|x)$ . Unlike discriminative classifiers that directly model $p(y|x)$ , generative approaches potentially offer robustness to distribution shifts but face challenges in scalability and computational efficiency, particularly with diffusion models.

VAR-Based Classifier

The paper leverages VAR models for generative classification by utilizing their tractable likelihoods. A naive implementation of a VAR classifier (VARC) demonstrates suboptimal performance, prompting the need for enhancements. The proposed A-VARC $^+$ employs two main techniques to improve performance:

Likelihood Smoothing: This technique mitigates the sensitivity to perturbations in token maps, stabilizing the likelihood estimates and enhancing accuracy with minimal computational overhead.
Figure 1: Visual explanation of the VAR-based classifier using PMI. From left to right: the input image, PMI conditioned on the true label, PMI conditioned on the highest-ranked incorrect label, and the contrastive explanation between them.
Partial-Scale Candidate Pruning: This approach reduces the computational burden by exploiting multi-scale token maps for efficient candidate pruning, significantly speeding up inference by focusing resources on the most likely classes.

Condition Contrastive Alignment (CCA)

The VAR model benefits from CCA finetuning, which strengthens class-conditional information and mitigates the trade-off between discriminative and generative performance. CCA aligns the model more closely with class distinctions, improving classification results while maintaining generative capabilities.

Comparative Analysis

Performance Evaluation

The A-VARC $^+$ demonstrates competitive accuracy on ImageNet-100 with a minimal loss in performance compared to diffusion classifiers while achieving a 160× speed-up. This efficiency is crucial for large-scale applications where computational resources are a bottleneck.

Figure 2: Confusion matrices of the VAR classifier evaluated on the first 10 classes of ImageNet.

Robustness and Limitations

While the VAR-based approach does not inherently extend the robustness properties seen in diffusion methods to distribution shifts, it excels in other areas. The model's capacity for visual explainability and resistance to catastrophic forgetting in class-incremental learning tasks marks a distinct advantage over conventional methods.

Intriguing Properties

Visual Explainability

Token-wise mutual information metrics allow the VAR model to provide insights into the decision-making process, offering explanations for model predictions and contrastive evidence between class distinctions.

Figure 3: Impact of CCA finetuning. From left to right: the input image, PMI conditioned on the true label, PMI conditioned on the highest-ranked incorrect label, and the contrastive explanation between them.

Resistance to Catastrophic Forgetting

The independent modeling of class-conditional likelihoods in VAR classifiers inherently protects against catastrophic forgetting, a significant issue in class-incremental learning tasks, providing a pathway toward adaptable and scalable classification systems without retraining overheads.

Conclusion

The paper demonstrates the potential of VAR-based generative classifiers as a practical alternative to diffusion-based methods. By combining efficiency and explainability, A-VARC $^+$ offers a robust and scalable solution for generative classification, paving the way for future exploration in generative classifiers that leverage advancements in autoregressive modeling. The exploration of properties such as visual explainability and resistance to catastrophic forgetting highlights the opportunity for VAR models to redefine the landscape of generative classification tasks.