Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning (2106.03609v3)

Published 7 Jun 2021 in cs.LG

Abstract: We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces. By adapting ideas from deep metric learning, we use label guidance from the blackbox function to structure the VAE latent space, facilitating the Gaussian process fit and yielding improved BO performance. Importantly for BO problem settings, our method operates in semi-supervised regimes where only few labelled data points are available. We run experiments on three real-world tasks, achieving state-of-the-art results on the penalised logP molecule generation benchmark using just 3% of the labelled data required by previous approaches. As a theoretical contribution, we present a proof of vanishing regret for VAE BO.

Citations (54)

Summary

  • The paper integrates variational autoencoders with deep metric learning to create informative latent spaces for effective Bayesian optimisation in complex, high-dimensional domains.
  • It achieves state-of-the-art performance on the penalised logP benchmark using only 3% of labeled data, significantly improving data efficiency in semi-supervised regimes.
  • The study provides a theoretical proof of vanishing regret, underscoring the robustness of this approach for practical applications such as drug discovery and materials design.

High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

The research paper introduces an advanced methodology that combines Variational Autoencoders (VAEs) with Deep Metric Learning (DML) to enhance Bayesian Optimisation (BO) within high-dimensional and structured input spaces. The focus is on overcoming limitations related to scaling BO to such complex domains by employing innovations in constructing informative latent spaces through label guidance mechanisms.

Methodological Overview

Bayesian Optimisation is a valuable tool for addressing black-box optimisation problems, particularly where evaluations are costly. However, its application to high-dimensional settings remains nontrivial. Here, the authors leverage VAEs, which effectively compress structured data (like graphs or molecular structures) into a low-dimensional latent space, making the subsequent optimisation more feasible.

  1. VAE and DML Integration: The novel integration of deep metric learning within the VAE framework is a pivotal aspect of this work. DML aids in shaping the VAE latent space using a structured approach that aligns with function labels, ultimately facilitating a more effective Gaussian Process (GP) surrogate model fit. This process is particularly beneficial in semi-supervised regimes where only scarce labeled data is available. The use of DML encourages clustering of latent encodings based on their respective function values, thus assisting the GP in regression tasks.
  2. Semi-supervised Regime: An important contribution is the ability of the proposed method to function under semi-supervised conditions. The approach achieves state-of-the-art results on the penalised logP molecule generation benchmark using merely 3% of the labeled data required by previous methods. This efficiency underscores the utility of the DML-VAE combination in label-scarce scenarios.
  3. Theoretical Contributions: The paper provides a proof of vanishing regret for the VAE BO method, which adds theoretical robustness to its practical applications.

Empirical Evaluation

The authors conducted experiments on three real-world tasks which underscored the superior performance of their method. Notably:

  • Penalised logP Molecule Generation: The method significantly outperformed previous approaches by a large margin in terms of data efficiency and achieved high logP scores using minimal labeled samples.

Implications and Future Research Directions

The methodological innovations presented hold significant implications for both practical applications and future research:

  • Applications: The approach is particularly suited for tasks requiring exploration in complex input spaces—such as drug discovery, materials design, and similar areas where the search space is vast and continuous.
  • Research Directions: Future research could focus on extending the range of DML losses or integrating more sophisticated chemical information in molecule generation tasks. Additionally, theoretical advancements to relax the assumptions around decoder capabilities could yield more versatile applications.

The paper presents a notable contribution to improving the efficacy of Bayesian optimisation in challenging high-dimensional domains, reinforcing the role of structured latent space learning in complex optimisation tasks. This work lays foundational insights for further advancements in the optimisation of structured inputs through machine learning techniques.