Promoting Ensemble Diversity with Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models (2506.07247v1)

Published 8 Jun 2025 in cs.LG

Abstract: We introduce Interactive Bayesian Distributional Robustness (IBDR), a novel Bayesian inference framework that allows modeling the interactions between particles, thereby enhancing ensemble quality through increased particle diversity. IBDR is grounded in a generalized theoretical framework that connects the distributional population loss with the approximate posterior, motivating a practical dual optimization procedure that enforces distributional robustness while fostering particle diversity. We evaluate IBDR's performance against various baseline methods using the VTAB-1K benchmark and the common reasoning language task. The results consistently show that IBDR outperforms these baselines, underscoring its effectiveness in real-world applications.

Summary

The paper presents IBDR, an interactive Bayesian framework that optimizes ensemble diversity by mitigating particle collapse in posterior distributions.
It leverages Wasserstein-based robustness optimization and a novel divergence loss to jointly minimize empirical and distributional losses.
Empirical results on VTAB-1K and commonsense reasoning tasks demonstrate improved top-1 accuracy and calibration over state-of-the-art baselines.

Interactive Bayesian Distributional Robustness for Fine-tuning Foundation Models

The paper introduces an interactive framework, namely Interactive Bayesian Distributional Robustness (IBDR), designed to optimize ensemble diversity and distributional robustness in the fine-tuning of foundation models. This novel approach fundamentally reconfigures the interplay of particle models within Bayesian neural networks (BNNs), addressing the common shortcoming of particle collapse, which occurs when independently sampled particles from the posterior converge to a single mode. By instituting a joint distribution over independent posteriors imbued with a divergence loss, IBDR fosters particle interactions that yield more diverse and robust model ensembles.

The authors ground their method in a theoretical framework linking distributional population loss to approximate posterior distributions, enabling a dual optimization process that bolsters both robustness and diversity. In practical terms, the framework employs Wasserstein-based distributional robustness optimization, which allows for a more generalized risk function and supports joint product distributions. This is formalized in Theorem 4.1, which provides a bound for minimizing the general loss dependent on empirical losses over training data. Notably, Corollary 4.2 illustrates the connection to Sharpness-Aware Minimization (SAM), underscoring an extension to joint distribution operations coupled with a new divergence loss for enhanced particle interaction.

The empirical evaluation is multifaceted, utilizing the VTAB-1K benchmark for image classification with the Vision Transformer (ViT) architecture and a commonsense reasoning task using the LLaMA-2 LLM. Across these tasks, IBDR demonstrates superior performance over existing baseline methods, including Sharpness-Aware Bayesian Neural Networks (SA-BNN), Stochastic Gradient Langevin Dynamics (SGLD), and Stein Variational Gradient Descent (SVGD), particularly underscoring its efficacy in maintaining distributional robustness while increasing ensemble diversity.

Key numerical results from the experiments indicate that IBDR provides significant improvements in predictive accuracy and Expected Calibration Error (ECE) across a diverse set of data domains. For instance, on the VTAB-1K benchmark, IBDR exhibited an average top-1 accuracy superior than all considered baselines, highlighting its robust adaptation capabilities across natural, specialized, and structured domains. The framework's adaptability to varying datasets suggests promising applications in transfer learning scenarios prevalent in modern machine learning tasks.

The detailed interaction of particle models in IBDR addresses the dual goals of robustness and diversity, positioning it as an insightful contribution to parameter-efficient fine-tuning, particularly relevant given the computational constraints in large-scale model adaptations. Future research could further explore the theoretical underpinnings of such particle interactions to optimize even broader classes of neural architectures beyond those covered in this paper. Additionally, examining the scalability of IBDR in more extensive and computational scenarios could provide deeper insights into its applicability across various sectors in artificial intelligence. Overall, IBDR paves the way for more resilient Bayesian frameworks, with its interaction and diversity-centric approach offering valuable perspectives for the fine-tuning of foundation models.