GASP: Latent Bayesian Optimization

Updated 1 May 2026

GASP is a framework that uses learned latent representations and Gaussian process surrogates to efficiently explore high-dimensional, structured input spaces.
It integrates deep generative models, multi-task learning, and meta-learning to address challenges in robotics, combinatorial design, and AutoML.
The method leverages advanced acquisition functions and dynamic compression strategies to optimize latent spaces while reducing evaluation costs.

Latent Bayesian Optimization (GASP) encompasses a family of Bayesian optimization (BO) methods that employ probabilistic surrogates on learned or constructed latent spaces for sample-efficient black-box optimization, particularly in scenarios with structured, high-dimensional, or heterogeneous input domains. Instead of direct modeling in the original input or objective space—which is often prohibitively large, discontinuous, or otherwise ill-suited for traditional Gaussian process (GP) surrogates—GASP methods learn or define lower-dimensional latent representations that facilitate scalable, data-efficient optimization. These approaches integrate advances from deep generative modeling, multi-task learning, structural embedding, and transfer/meta-learning for problems in robotics, combinatorial design, automated machine learning, and adversarial prompt generation.

1. Foundations and Theoretical Frameworks

Latent Bayesian Optimization (GASP) formalizes BO in a latent (possibly learned) representation, supporting cases where the true input space is high-dimensional, composite, or structured. The canonical form is as follows:

For an unknown black-box function $f(x)$ , optimization is performed by leveraging a mapping:

$x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$

where $\phi$ is a learned or constructed encoder embedding the input $x \in \mathbb{R}^d$ (with $d$ possibly $\gg 1$ ) into a lower-dimensional latent $z$ , enabling GP surrogates or similar probabilistic models.

In the composite-function setting as in Joint Composite Latent Space Bayesian Optimization (JoCo) (Maus et al., 2023), this extends to

$f(x) = g(h(x)),$

with explicit encoders for both $x$ and intermediate output $y = h(x)$ , yielding:

$x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 0

where $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 1, $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 2, and $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 3, $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 4 are jointly optimized GPs.

Alternative frameworks include GASP for task embedding (Atkinson et al., 2020), where discrete or non-Euclidean task features $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 5 are assigned continuous latents $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 6 jointly inferred with the GP, and Ordinal Bayesian Optimization (Picheny et al., 2019), which eschews metric structure for order-based latent embeddings to accommodate discontinuity and nonstationarity.

2. Encoder Architectures and Latent Representation Learning

GASP methods vary in how latent representations are constructed:

Direct encoding of known structure: For multi-task or heterogeneous settings, discrete system features are embedded as continuous latents and inferred using GP variational inference (Atkinson et al., 2020).
Unsupervised generative models: Deep generative architectures, such as variational autoencoders (VAEs) and sequential VAEs (SVAEs), are trained to encode complex trajectories, molecules, or combinatorial objects into a tractable latent space (Antonova et al., 2019, Deshwal et al., 2021).
Supervised/joint training: JoCo (Maus et al., 2023) extends latent BO by jointly training neural encoders for both inputs and (possibly high-dimensional) intermediate outputs, optimizing both for (i) high-fidelity reconstruction and (ii) direct relevance to the final optimization target, rather than reconstruction loss alone.
Order/ordinal warpings: For discontinuous or ill-conditioned problems, GASP instantiates monotonic, order-preserving warpings of both inputs and outputs, learning step vectors $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 7 that encode the observed order structure (Picheny et al., 2019).

Encoding strategies may further encompass adversarial regularization for transferability (forcing latent codes of different algorithms to overlap and thus enabling cross-problem information sharing) (Ishikawa et al., 13 Feb 2025).

3. Surrogate Models and Acquisition in Latent Spaces

Across GASP methods, the surrogate model is typically a Gaussian process placed over the latent space, which enables effective modeling even in the presence of high-dimensional, structured, or non-stationary objectives:

GP priors on latent spaces: The surrogate GP is fit to the outputs observed at decoded latents, with kernels defined over the latent variables. For example, JoCo (Maus et al., 2023) fits multi-output GPs from $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 8 to $x \mapsto z = \phi(x), \quad z \in \mathbb{R}^{d'}$ 9 and scalar GPs from $\phi$ 0 to the final $\phi$ 1.
Task-augmented surrogates: In BEBO/GASP (Atkinson et al., 2020), the GP is defined on the augmented space $\phi$ 2, pooling data across tasks/systems by inferring task latents $\phi$ 3.
Structure-coupled kernels: For combinatorial or molecular objectives, LADDER (Deshwal et al., 2021) formalizes a kernel $\phi$ 4 that couples the DGM latent embedding kernel $\phi$ 5 with structural similarity $\phi$ 6 in the decoded space:

$\phi$ 7

where $\phi$ 8 and $\phi$ 9 are Gram matrices in structural and latent spaces, respectively.

Dynamic compression: Penalizing or scaling latent regions associated with undesirable outcomes, as in SVAE-DC (Antonova et al., 2019), compresses exploration in the latent space to reject unpromising trajectories.

Acquisition functions are optimized in the latent space and typically include:

Expected Improvement (EI): Common across GASP applications (Maus et al., 2023, Basani et al., 2024, Ishikawa et al., 13 Feb 2025).
Thompson Sampling (TS): Used in JoCo (Maus et al., 2023) and rapid empirical ablations (Basani et al., 2024).
LCB or UCB: For both continuous and combinatorial domains (Picheny et al., 2019, Antonova et al., 2019). Optimization of acquisition is performed via gradient-based or zeroth-order methods (e.g., CMA-ES in LADDER (Deshwal et al., 2021)) depending on the geometry and differentiability of the latent space.

4. Algorithmic Workflow and Implementation

GASP-style BO typically follows the following loop:

Initialization: Sample initial points in the latent space (or from original design space, encode to latent).
Surrogate model update: Fit or update the GP surrogate on the current dataset in latent space, possibly jointly with encoders.
Acquisition optimization: Maximize the acquisition function in the latent space to propose next candidates.
Decoding and evaluation: Map latent candidates back to properly structured inputs (e.g., controller parameters, molecular graphs, or combinatorial objects) and evaluate the objective.
Iterative refinement: Optionally adjust the encoder/decoder or latent space using the latest data to improve alignment with the optimization objective. In JoCo, encoders and GPs are re-trained after every step, using both final objective values and intermediate observations (Maus et al., 2023).
Stopping and recommendation: After budget exhaustion, recommend the best candidate evaluated so far.

Specialized design choices include dynamic trust regions (TuRBO (Maus et al., 2023)), adversarial regularization plus meta-feature–guided PTEM selection (for joint CASH/hyperparameter search (Ishikawa et al., 13 Feb 2025)), and order-based partition tree search (for ordinal BO (Picheny et al., 2019)).

5. Applications and Empirical Performance

GASP variants have demonstrated strong empirical results across diverse applications:

Robotics: Ultra data-efficient controller tuning (requiring only 10–20 iterations), with dynamic compression focusing exploration away from undesired states (Antonova et al., 2019).
Combinatorial optimization: Molecular and sequence design, using deep generative latent spaces and structure-coupled kernels for BO (Deshwal et al., 2021).
Multi-task/few-shot learning: Rapid adaptation to previously unseen tasks, leveraging latent space sharing among tasks as in BEBO/GASP (Atkinson et al., 2020).
Composite functions and adversarial optimization: Efficiently optimizing over composite-structured spaces with extremely high input and intermediate-output dimensions (thousands to hundreds of thousands) (Maus et al., 2023). Black-box adversarial prompt generation for LLM red-teaming, with LBO–guided exploration in the latent LoRA space of suffix selectors (Basani et al., 2024).
Automated Machine Learning (AutoML): Simultaneous cash-hyperparameter search with joint latent embeddings, adversarial pretraining, and meta-feature–based transfer, yielding superior performance to independent BO or meta-learning baselines (Ishikawa et al., 13 Feb 2025).

6. Methodological Comparisons and Key Innovations

Most prior GASP methods (e.g., LADDER (Deshwal et al., 2021), SVAE-DC (Antonova et al., 2019), basic GASP (Atkinson et al., 2020), and CASH variants (Ishikawa et al., 13 Feb 2025)) rely on off-line pre-trained latent mappings or unsupervised embedding, which are then held fixed during BO. JoCo (Maus et al., 2023) departs from this by jointly training both (i) encoder mappings for input and/or intermediate outputs and (ii) GP surrogates, leveraging supervision directly from the optimization target. This enables JoCo to adapt its latent embedding online, focusing representational capacity solely on features critical to maximizing $x \in \mathbb{R}^d$ 0 and discarding irrelevant details.

Ordinal Bayesian Optimization (Picheny et al., 2019) provides a special case where the latent mapping is purely order-based, allowing BO in non-metric, discontinuous spaces that evade standard GP assumptions.

Empirical ablations confirm the benefits of joint training (JoCo), dynamic compression (SVAE-DC), structure-coupled kernels (LADDER), and meta-feature–driven PTEM selection (CASH-GASP) in both sample efficiency and final solution quality.

7. Limitations, Extensions, and Theoretical Properties

Challenges for GASP methods include the complexity of encoder training in high-dimensional or highly-structured domains, and computational cost—especially when variational inference or joint optimization is invoked at every iteration (Picheny et al., 2019). Certain approaches (e.g., ordinal BO) may lose metric information, over-specializing once warpings collapse large regions; partition-based strategies can also face exponential cell proliferation in higher dimensions.

Theoretical properties, such as no-regret guarantees, carry over from stationary latent-space GPs under smoothness assumptions, but may require adaptation when latent mappings are nonstationary, discontinuous, or stochastic (Picheny et al., 2019).

Extensions include hybrid schemes for global exploration, GP-LVM priors for handling mixed-data types, and stronger inductive biases via task/model meta-features (Ishikawa et al., 13 Feb 2025).

Key primary references include:

"Joint Composite Latent Space Bayesian Optimization" (Maus et al., 2023)
"Bayesian task embedding for few-shot Bayesian optimization" (Atkinson et al., 2020)
"Bayesian Optimization in Variational Latent Spaces with Dynamic Compression" (Antonova et al., 2019)
"Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces" (Deshwal et al., 2021)
"Bayesian Optimization for Simultaneous Selection of Machine Learning Algorithms and Hyperparameters on Shared Latent Space" (Ishikawa et al., 13 Feb 2025)
"Ordinal Bayesian Optimisation" (Picheny et al., 2019)
"GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs" (Basani et al., 2024)