GimmBO: Bayesian Adapter Merging
- GimmBO is an interactive framework for merging low-rank adapters in diffusion-based image synthesis using a probabilistic approach.
- It employs a two-stage Preferential Bayesian Optimization strategy with Gaussian process priors to efficiently explore high-dimensional merging coefficient spaces.
- Experimental results show enhanced convergence, sample efficiency, and user engagement compared to manual slider tuning and random search methods.
GimmBO (Generative Image Model Merging via Bayesian Optimization) is an interactive framework for high-dimensional adapter merging in diffusion-based image generation, targeting the optimization of subjective user-driven visual objectives efficiently via Preferential Bayesian Optimization (PBO). It addresses the exploration of vast and sparse merging coefficient spaces arising from community-created, fine-tuned adapters, streamlining workflows that previously relied on inadequate manual slider-based tuning (Liu et al., 26 Jan 2026).
1. Problem Formulation and Motivation
Given a pretrained diffusion model with weights and a collection of adapters (most commonly low-rank adapters such as LoRA), GimmBO investigates the space of image generators formed by linear adapter merges: where parameterizes the nonnegative merge coefficients. The feasible set is typically the unit simplex or its bounded variant.
The central challenge is to optimize a latent utility function —the user’s subjective quality assessment—over this space, despite access only to pairwise image preferences: with \begin{itemize} \item latent (never directly observed), \item the deterministic image synthesis mapping under fixed prompt and diffusion inference, \item Preferences obtained from user comparisons of 0 vs.\ 1. \end{itemize}
Existing approaches such as manual or slider-based exploration become infeasible even for modest 2, due to the combinatorial growth in possibilities. In contrast, GimmBO employs human-in-the-loop PBO, both learning a surrogate model of user preference efficiently and proposing new queries to optimize 3.
2. Preferential Bayesian Optimization Framework
GimmBO adopts a probabilistic surrogate model for the latent utility, placing a Gaussian process (GP) prior: 4 with selectable kernel—Matérn or RBF for small 5; a SAAS (Sparse Axis-Aligned Subspace) prior for high-dimensional settings (6)—and mean function 7 typically set to zero.
User interaction manifests as a dataset 8 of pairwise preferences: 9 and is modeled by a probit likelihood (Chu & Ghahramani 2005): 0 with 1 the standard normal CDF and 2 representing human inconsistency.
Posterior inference proceeds by MAP estimation of the latent utilities at observed 3, interrogating the GP posterior (hyperparameters inferred by NUTS under a SAAS prior in high 4) and yielding a mixture model for 5, from which the predictive mean 6 and variance 7 are extracted for query selection.
3. Two-Stage Sampling and Optimization Strategy
GimmBO introduces a two-stage BO regime that exploits empirical properties of adapter merges—namely sparsity of active coefficients and dominance of bounded regions.
Stage 1 (Coarse, Sparse Search):
- The search domain is a 8-capped simplex 9, typically with 0.
- Initialization uses randomized stick-breaking (truncated Dirichlet), followed by thresholding small coefficients to enforce additional sparsity.
- Acquisition through the UCB (Upper Confidence Bound) criterion:
1
(2), with batches of 3 optimized via multi-start L-BFGS-B in the 4 parameterization.
Stage 2 (Polishing Active Set):
- After 5 iterations, nonzero coefficients from the best 6 define a reduced-dimension active set.
- The search proceeds in the reduced simplex, restricting all other 7, with the GP re-initialized and refined over existing evaluations for 8 further iterations.
Variable relevance is selected via the SAAS prior in high dimensions, performing Bayesian model selection and dimensionality reduction.
4. Interactive User Interface and Optimization Loop
GimmBO’s interactive workflow iterates over the following steps:
- Batch proposal: The PBO backend selects 9 candidate 0 vectors by maximizing exploit/explore criteria.
- Render: Images 1 are synthesized for these, along with retrieval of several high-utility past samples.
- Preference elicitation: The user is presented with 2 images pre-sorted by GP mean; they are prompted to top-3 rank 4 images.
- Data augmentation: Rankings induce pairwise comparisons, expanding 5 for GP updating.
- Surrogate update: MAP inference for utilities is performed, re-estimating GP hyperparameters (NUTS).
- Iteration: The next batch is proposed based on the current posterior.
Additional heuristics include "free" past samples to strengthen the model without extra rendering, automatic UI transition from Stage 1 to Stage 2 after iteration 11, and slider constraints during Stage 1 (6).
5. Experimental Methodology and Evaluation
Simulated User Studies
- Setup: 20-dimensional problem instances, with 5 initialization and 20 subsequent iterations (7 total renderings).
- Metrics:
- DreamSim similarity (normalized [0,1]) to the target image.
- F1 score for the recovered support of 8.
- Baselines:
- Sequential Slider BO (1 sample/iteration).
- Gallery BO (2 samples/iteration in a 393 grid).
- Random coordinate descent.
- Random directional descent.
Results Summary
| Method | DreamSim (10 iters) | DreamSim (20 iters) | Support F1 | Plateau DreamSim (Baselines) |
|---|---|---|---|---|
| GimmBO | 0.90 | >0.95 | ~0.95 | 0.80–0.85 |
| Baselines | — | — | <0.6 | — |
| 30D/40D Stress (GimmBO) | ~0.10–0.15 > baseline | — | — | — |
- A plausible implication is that GimmBO’s two-stage strategy ensures both convergence and scalable performance as 0 increases.
User Study (12 participants)
- Interfaces Compared: Slider, Gallery, Top-1 (GimmBO)
- Outcomes:
- GimmBO Top-2: Final DreamSim ≈ 0.91, success rate (>0.90) 75%
- Gallery: 0.85 DreamSim, 50% success
- Slider: 0.82 DreamSim, 40% success
- Subjective: Top-3 ranking was considered more engaging, better guided, and reduced cognitive load over alternatives.
Ablation Findings
- 4 simplex bound (vs.\ 5 or 6) yields optimal performance.
- Top-5 ranking doubles sample efficiency relative to top-1.
- Absence of “free” past samples attenuates convergence by 20–30%.
6. Applications, Limitations, and Future Directions
GimmBO is directly applicable to style blending, novel concept composition, and fine-grained content merging in creative diffusion-based image generation. The linear adapter merging weights 7 identified can serve as reusable presets applicable to new prompts (e.g. via SDEdit).
Integration with community-driven adapter repositories (e.g. Stylus) is facilitated via the plug-and-play architecture.
The present methodology is limited to linear merging; extensions to nonlinear approaches (such as Fisher-weighted merges) remain unexplored. Preference violations of transitivity (as identified by Tversky & Kahneman 1981) may affect the GP posterior’s representational fidelity; more sophisticated feedback models could address this. The stick-breaking acquisition can induce coordinate bias—projection-based methods are alternatives. Diffusion inference latency is a bottleneck, suggesting value in asynchronous or anticipatory UI designs.
GimmBO establishes a robust framework for interactively exploring high-dimensional, subjectively-evaluated generative model spaces, combining domain-specific statistical priors, efficient PBO, and user-centric preference elicitation (Liu et al., 26 Jan 2026).