Papers
Topics
Authors
Recent
Search
2000 character limit reached

GimmBO: Bayesian Adapter Merging

Updated 2 February 2026
  • GimmBO is an interactive framework for merging low-rank adapters in diffusion-based image synthesis using a probabilistic approach.
  • It employs a two-stage Preferential Bayesian Optimization strategy with Gaussian process priors to efficiently explore high-dimensional merging coefficient spaces.
  • Experimental results show enhanced convergence, sample efficiency, and user engagement compared to manual slider tuning and random search methods.

GimmBO (Generative Image Model Merging via Bayesian Optimization) is an interactive framework for high-dimensional adapter merging in diffusion-based image generation, targeting the optimization of subjective user-driven visual objectives efficiently via Preferential Bayesian Optimization (PBO). It addresses the exploration of vast and sparse merging coefficient spaces arising from community-created, fine-tuned adapters, streamlining workflows that previously relied on inadequate manual slider-based tuning (Liu et al., 26 Jan 2026).

1. Problem Formulation and Motivation

Given a pretrained diffusion model with weights W0W_0 and a collection of NN adapters {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\} (most commonly low-rank adapters such as LoRA), GimmBO investigates the space of image generators formed by linear adapter merges: Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i, where wR0Nw \in \mathbb{R}^N_{\ge 0} parameterizes the nonnegative merge coefficients. The feasible set is typically the unit simplex Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \} or its bounded variant.

The central challenge is to optimize a latent utility function f(w)f(w)—the user’s subjective quality assessment—over this space, despite access only to pairwise image preferences: maxwΔf(w)\max_{w \in \Delta} f(w) with \begin{itemize} \item f:ΔRf: \Delta \rightarrow \mathbb{R} latent (never directly observed), \item g(w)g(w) the deterministic image synthesis mapping under fixed prompt and diffusion inference, \item Preferences obtained from user comparisons of NN0 vs.\ NN1. \end{itemize}

Existing approaches such as manual or slider-based exploration become infeasible even for modest NN2, due to the combinatorial growth in possibilities. In contrast, GimmBO employs human-in-the-loop PBO, both learning a surrogate model of user preference efficiently and proposing new queries to optimize NN3.

2. Preferential Bayesian Optimization Framework

GimmBO adopts a probabilistic surrogate model for the latent utility, placing a Gaussian process (GP) prior: NN4 with selectable kernel—Matérn or RBF for small NN5; a SAAS (Sparse Axis-Aligned Subspace) prior for high-dimensional settings (NN6)—and mean function NN7 typically set to zero.

User interaction manifests as a dataset NN8 of pairwise preferences: NN9 and is modeled by a probit likelihood (Chu & Ghahramani 2005): {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}0 with {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}1 the standard normal CDF and {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}2 representing human inconsistency.

Posterior inference proceeds by MAP estimation of the latent utilities at observed {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}3, interrogating the GP posterior (hyperparameters inferred by NUTS under a SAAS prior in high {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}4) and yielding a mixture model for {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}5, from which the predictive mean {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}6 and variance {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}7 are extracted for query selection.

3. Two-Stage Sampling and Optimization Strategy

GimmBO introduces a two-stage BO regime that exploits empirical properties of adapter merges—namely sparsity of active coefficients and dominance of bounded regions.

Stage 1 (Coarse, Sparse Search):

  • The search domain is a {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}8-capped simplex {ΔW1,,ΔWN}\{\Delta W_1, \ldots, \Delta W_N\}9, typically with Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,0.
  • Initialization uses randomized stick-breaking (truncated Dirichlet), followed by thresholding small coefficients to enforce additional sparsity.
  • Acquisition through the UCB (Upper Confidence Bound) criterion:

Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,1

(Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,2), with batches of Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,3 optimized via multi-start L-BFGS-B in the Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,4 parameterization.

Stage 2 (Polishing Active Set):

  • After Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,5 iterations, nonzero coefficients from the best Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,6 define a reduced-dimension active set.
  • The search proceeds in the reduced simplex, restricting all other Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,7, with the GP re-initialized and refined over existing evaluations for Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,8 further iterations.

Variable relevance is selected via the SAAS prior in high dimensions, performing Bayesian model selection and dimensionality reduction.

4. Interactive User Interface and Optimization Loop

GimmBO’s interactive workflow iterates over the following steps:

  1. Batch proposal: The PBO backend selects Wmerged(w)=W0+i=1NwiΔWi,W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,9 candidate wR0Nw \in \mathbb{R}^N_{\ge 0}0 vectors by maximizing exploit/explore criteria.
  2. Render: Images wR0Nw \in \mathbb{R}^N_{\ge 0}1 are synthesized for these, along with retrieval of several high-utility past samples.
  3. Preference elicitation: The user is presented with wR0Nw \in \mathbb{R}^N_{\ge 0}2 images pre-sorted by GP mean; they are prompted to top-wR0Nw \in \mathbb{R}^N_{\ge 0}3 rank wR0Nw \in \mathbb{R}^N_{\ge 0}4 images.
  4. Data augmentation: Rankings induce pairwise comparisons, expanding wR0Nw \in \mathbb{R}^N_{\ge 0}5 for GP updating.
  5. Surrogate update: MAP inference for utilities is performed, re-estimating GP hyperparameters (NUTS).
  6. Iteration: The next batch is proposed based on the current posterior.

Additional heuristics include "free" past samples to strengthen the model without extra rendering, automatic UI transition from Stage 1 to Stage 2 after iteration 11, and slider constraints during Stage 1 (wR0Nw \in \mathbb{R}^N_{\ge 0}6).

5. Experimental Methodology and Evaluation

Simulated User Studies

  • Setup: 20-dimensional problem instances, with 5 initialization and 20 subsequent iterations (wR0Nw \in \mathbb{R}^N_{\ge 0}7 total renderings).
  • Metrics:
    • DreamSim similarity (normalized [0,1]) to the target image.
    • F1 score for the recovered support of wR0Nw \in \mathbb{R}^N_{\ge 0}8.
  • Baselines:
    • Sequential Slider BO (1 sample/iteration).
    • Gallery BO (2 samples/iteration in a 3wR0Nw \in \mathbb{R}^N_{\ge 0}93 grid).
    • Random coordinate descent.
    • Random directional descent.

Results Summary

Method DreamSim (10 iters) DreamSim (20 iters) Support F1 Plateau DreamSim (Baselines)
GimmBO 0.90 >0.95 ~0.95 0.80–0.85
Baselines <0.6
30D/40D Stress (GimmBO) ~0.10–0.15 > baseline
  • A plausible implication is that GimmBO’s two-stage strategy ensures both convergence and scalable performance as Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}0 increases.

User Study (12 participants)

  • Interfaces Compared: Slider, Gallery, Top-Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}1 (GimmBO)
  • Outcomes:
    • GimmBO Top-Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}2: Final DreamSim ≈ 0.91, success rate (>0.90) 75%
    • Gallery: 0.85 DreamSim, 50% success
    • Slider: 0.82 DreamSim, 40% success
    • Subjective: Top-Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}3 ranking was considered more engaging, better guided, and reduced cognitive load over alternatives.

Ablation Findings

  • Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}4 simplex bound (vs.\ Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}5 or Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}6) yields optimal performance.
  • Top-5 ranking doubles sample efficiency relative to top-1.
  • Absence of “free” past samples attenuates convergence by 20–30%.

6. Applications, Limitations, and Future Directions

GimmBO is directly applicable to style blending, novel concept composition, and fine-grained content merging in creative diffusion-based image generation. The linear adapter merging weights Δ={wR0Niwi=1}\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}7 identified can serve as reusable presets applicable to new prompts (e.g. via SDEdit).

Integration with community-driven adapter repositories (e.g. Stylus) is facilitated via the plug-and-play architecture.

The present methodology is limited to linear merging; extensions to nonlinear approaches (such as Fisher-weighted merges) remain unexplored. Preference violations of transitivity (as identified by Tversky & Kahneman 1981) may affect the GP posterior’s representational fidelity; more sophisticated feedback models could address this. The stick-breaking acquisition can induce coordinate bias—projection-based methods are alternatives. Diffusion inference latency is a bottleneck, suggesting value in asynchronous or anticipatory UI designs.

GimmBO establishes a robust framework for interactively exploring high-dimensional, subjectively-evaluated generative model spaces, combining domain-specific statistical priors, efficient PBO, and user-centric preference elicitation (Liu et al., 26 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GimmBO.