GimmBO: Bayesian Adapter Merging

Updated 2 February 2026

GimmBO is an interactive framework for merging low-rank adapters in diffusion-based image synthesis using a probabilistic approach.
It employs a two-stage Preferential Bayesian Optimization strategy with Gaussian process priors to efficiently explore high-dimensional merging coefficient spaces.
Experimental results show enhanced convergence, sample efficiency, and user engagement compared to manual slider tuning and random search methods.

GimmBO (Generative Image Model Merging via Bayesian Optimization) is an interactive framework for high-dimensional adapter merging in diffusion-based image generation, targeting the optimization of subjective user-driven visual objectives efficiently via Preferential Bayesian Optimization (PBO). It addresses the exploration of vast and sparse merging coefficient spaces arising from community-created, fine-tuned adapters, streamlining workflows that previously relied on inadequate manual slider-based tuning (Liu et al., 26 Jan 2026).

1. Problem Formulation and Motivation

Given a pretrained diffusion model with weights $W_0$ and a collection of $N$ adapters $\{\Delta W_1, \ldots, \Delta W_N\}$ (most commonly low-rank adapters such as LoRA), GimmBO investigates the space of image generators formed by linear adapter merges: $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ where $w \in \mathbb{R}^N_{\ge 0}$ parameterizes the nonnegative merge coefficients. The feasible set is typically the unit simplex $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ or its bounded variant.

The central challenge is to optimize a latent utility function $f(w)$ —the user’s subjective quality assessment—over this space, despite access only to pairwise image preferences: $\max_{w \in \Delta} f(w)$ with \begin{itemize} \item $f: \Delta \rightarrow \mathbb{R}$ latent (never directly observed), \item $g(w)$ the deterministic image synthesis mapping under fixed prompt and diffusion inference, \item Preferences obtained from user comparisons of $N$ 0 vs.\ $N$ 1. \end{itemize}

Existing approaches such as manual or slider-based exploration become infeasible even for modest $N$ 2, due to the combinatorial growth in possibilities. In contrast, GimmBO employs human-in-the-loop PBO, both learning a surrogate model of user preference efficiently and proposing new queries to optimize $N$ 3.

2. Preferential Bayesian Optimization Framework

GimmBO adopts a probabilistic surrogate model for the latent utility, placing a Gaussian process (GP) prior: $N$ 4 with selectable kernel—Matérn or RBF for small $N$ 5; a SAAS (Sparse Axis-Aligned Subspace) prior for high-dimensional settings ( $N$ 6)—and mean function $N$ 7 typically set to zero.

User interaction manifests as a dataset $N$ 8 of pairwise preferences: $N$ 9 and is modeled by a probit likelihood (Chu & Ghahramani 2005): $\{\Delta W_1, \ldots, \Delta W_N\}$ 0 with $\{\Delta W_1, \ldots, \Delta W_N\}$ 1 the standard normal CDF and $\{\Delta W_1, \ldots, \Delta W_N\}$ 2 representing human inconsistency.

Posterior inference proceeds by MAP estimation of the latent utilities at observed $\{\Delta W_1, \ldots, \Delta W_N\}$ 3, interrogating the GP posterior (hyperparameters inferred by NUTS under a SAAS prior in high $\{\Delta W_1, \ldots, \Delta W_N\}$ 4) and yielding a mixture model for $\{\Delta W_1, \ldots, \Delta W_N\}$ 5, from which the predictive mean $\{\Delta W_1, \ldots, \Delta W_N\}$ 6 and variance $\{\Delta W_1, \ldots, \Delta W_N\}$ 7 are extracted for query selection.

3. Two-Stage Sampling and Optimization Strategy

GimmBO introduces a two-stage BO regime that exploits empirical properties of adapter merges—namely sparsity of active coefficients and dominance of bounded regions.

Stage 1 (Coarse, Sparse Search):

The search domain is a $\{\Delta W_1, \ldots, \Delta W_N\}$ 8-capped simplex $\{\Delta W_1, \ldots, \Delta W_N\}$ 9, typically with $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 0.
Initialization uses randomized stick-breaking (truncated Dirichlet), followed by thresholding small coefficients to enforce additional sparsity.
Acquisition through the UCB (Upper Confidence Bound) criterion:

$W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 1

( $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 2), with batches of $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 3 optimized via multi-start L-BFGS-B in the $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 4 parameterization.

Stage 2 (Polishing Active Set):

After $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 5 iterations, nonzero coefficients from the best $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 6 define a reduced-dimension active set.
The search proceeds in the reduced simplex, restricting all other $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 7, with the GP re-initialized and refined over existing evaluations for $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 8 further iterations.

Variable relevance is selected via the SAAS prior in high dimensions, performing Bayesian model selection and dimensionality reduction.

4. Interactive User Interface and Optimization Loop

GimmBO’s interactive workflow iterates over the following steps:

Batch proposal: The PBO backend selects $W_{\mathrm{merged}}(w) = W_0 + \sum_{i=1}^N w_i \Delta W_i,$ 9 candidate $w \in \mathbb{R}^N_{\ge 0}$ 0 vectors by maximizing exploit/explore criteria.
Render: Images $w \in \mathbb{R}^N_{\ge 0}$ 1 are synthesized for these, along with retrieval of several high-utility past samples.
Preference elicitation: The user is presented with $w \in \mathbb{R}^N_{\ge 0}$ 2 images pre-sorted by GP mean; they are prompted to top- $w \in \mathbb{R}^N_{\ge 0}$ 3 rank $w \in \mathbb{R}^N_{\ge 0}$ 4 images.
Data augmentation: Rankings induce pairwise comparisons, expanding $w \in \mathbb{R}^N_{\ge 0}$ 5 for GP updating.
Surrogate update: MAP inference for utilities is performed, re-estimating GP hyperparameters (NUTS).
Iteration: The next batch is proposed based on the current posterior.

Additional heuristics include "free" past samples to strengthen the model without extra rendering, automatic UI transition from Stage 1 to Stage 2 after iteration 11, and slider constraints during Stage 1 ( $w \in \mathbb{R}^N_{\ge 0}$ 6).

5. Experimental Methodology and Evaluation

Simulated User Studies

Setup: 20-dimensional problem instances, with 5 initialization and 20 subsequent iterations ( $w \in \mathbb{R}^N_{\ge 0}$ 7 total renderings).
Metrics:
- DreamSim similarity (normalized [0,1]) to the target image.
- F1 score for the recovered support of $w \in \mathbb{R}^N_{\ge 0}$ 8.
Baselines:
- Sequential Slider BO (1 sample/iteration).
- Gallery BO (2 samples/iteration in a 3 $w \in \mathbb{R}^N_{\ge 0}$ 93 grid).
- Random coordinate descent.
- Random directional descent.

Results Summary

Method	DreamSim (10 iters)	DreamSim (20 iters)	Support F1	Plateau DreamSim (Baselines)
GimmBO	0.90	>0.95	~0.95	0.80–0.85
Baselines	—	—	<0.6	—
30D/40D Stress (GimmBO)	~0.10–0.15 > baseline	—	—	—

A plausible implication is that GimmBO’s two-stage strategy ensures both convergence and scalable performance as $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 0 increases.

User Study (12 participants)

Interfaces Compared: Slider, Gallery, Top- $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 1 (GimmBO)
Outcomes:
- GimmBO Top- $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 2: Final DreamSim ≈ 0.91, success rate (>0.90) 75%
- Gallery: 0.85 DreamSim, 50% success
- Slider: 0.82 DreamSim, 40% success
- Subjective: Top- $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 3 ranking was considered more engaging, better guided, and reduced cognitive load over alternatives.

Ablation Findings

$\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 4 simplex bound (vs.\ $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 5 or $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 6) yields optimal performance.
Top-5 ranking doubles sample efficiency relative to top-1.
Absence of “free” past samples attenuates convergence by 20–30%.

6. Applications, Limitations, and Future Directions

GimmBO is directly applicable to style blending, novel concept composition, and fine-grained content merging in creative diffusion-based image generation. The linear adapter merging weights $\Delta = \{ w \in \mathbb{R}^N_{\ge 0} \mid \sum_i w_i = 1 \}$ 7 identified can serve as reusable presets applicable to new prompts (e.g. via SDEdit).

Integration with community-driven adapter repositories (e.g. Stylus) is facilitated via the plug-and-play architecture.

The present methodology is limited to linear merging; extensions to nonlinear approaches (such as Fisher-weighted merges) remain unexplored. Preference violations of transitivity (as identified by Tversky & Kahneman 1981) may affect the GP posterior’s representational fidelity; more sophisticated feedback models could address this. The stick-breaking acquisition can induce coordinate bias—projection-based methods are alternatives. Diffusion inference latency is a bottleneck, suggesting value in asynchronous or anticipatory UI designs.

GimmBO establishes a robust framework for interactively exploring high-dimensional, subjectively-evaluated generative model spaces, combining domain-specific statistical priors, efficient PBO, and user-centric preference elicitation (Liu et al., 26 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

GimmBO: Interactive Generative Image Model Merging via Bayesian Optimization (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GimmBO.