PG-LBO: Enhancing High-Dimensional Bayesian Optimization with Pseudo-Label and Gaussian Process Guidance (2312.16983v1)
Abstract: Variational Autoencoder based Bayesian Optimization (VAE-BO) has demonstrated its excellent performance in addressing high-dimensional structured optimization problems. However, current mainstream methods overlook the potential of utilizing a pool of unlabeled data to construct the latent space, while only concentrating on designing sophisticated models to leverage the labeled data. Despite their effective usage of labeled data, these methods often require extra network structures, additional procedure, resulting in computational inefficiency. To address this issue, we propose a novel method to effectively utilize unlabeled data with the guidance of labeled data. Specifically, we tailor the pseudo-labeling technique from semi-supervised learning to explicitly reveal the relative magnitudes of optimization objective values hidden within the unlabeled data. Based on this technique, we assign appropriate training weights to unlabeled data to enhance the construction of a discriminative latent space. Furthermore, we treat the VAE encoder and the Gaussian Process (GP) in Bayesian optimization as a unified deep kernel learning process, allowing the direct utilization of labeled data, which we term as Gaussian Process guidance. This directly and effectively integrates the goal of improving GP accuracy into the VAE training, thereby guiding the construction of the latent space. The extensive experiments demonstrate that our proposed method outperforms existing VAE-BO algorithms in various optimization scenarios. Our code will be published at https://github.com/TaicaiChen/PG-LBO.
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org.
- BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33.
- Optimizing training trajectories in variational autoencoders via latent Bayesian optimization approach. Machine Learning: Science and Technology, 4(1): 015011.
- Towards Semi-supervised Learning with Non-random Missing Labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 16121–16131.
- Bayesian optimization and attribute adjustment. In Proc. 34th Conference on Uncertainty in Artificial Intelligence.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2): 268–276.
- High-dimensional Bayesian optimisation with variational autoencoders and deep metric learning. arXiv preprint arXiv:2106.03609.
- Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
- Generative multiform Bayesian optimization. IEEE Transactions on Cybernetics.
- Semi-supervised deep kernel learning: Regression with unlabeled data by minimizing predictive variance. Advances in Neural Information Processing Systems, 31.
- Junction tree variational autoencoder for molecular graph generation. In International Conference on Machine Learning, 2323–2332. PMLR.
- High dimensional Bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, 295–304. PMLR.
- Chembo: Bayesian optimization of small organic molecules with synthesizable recommendations. In International Conference on Artificial Intelligence and Statistics, 3393–3403. PMLR.
- Grammar variational autoencoder. In International Conference on Machine Learning, 1945–1954. PMLR.
- IOMatch: Simplifying open-set semi-supervised learning with joint inliers and outliers utilization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15870–15879.
- GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research, 18(40): 1–6.
- Local latent space bayesian optimization over structured inputs. Advances in Neural Information Processing Systems, 35: 34505–34518.
- Inverse Protein Folding Using Deep Bayesian Optimization. arXiv preprint arXiv:2305.18089.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d’Alché Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems 32, 8024–8035. Curran Associates, Inc.
- Meta pseudo labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11557–11568.
- In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning. In International Conference on Learning Representations.
- Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels. In International Conference on Learning Representations.
- Good practices for Bayesian optimization of high dimensional structured spaces. Applied AI Letters, 2(2): e24.
- Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25.
- Neural networks for topology optimization. Russian Journal of Numerical Analysis and Mathematical Modelling, 34(4): 215–223.
- Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning, 20459–20478. PMLR.
- ZINC 15–ligand discovery for everyone. Journal of Chemical Information and Modeling, 55(11): 2324–2337.
- Titsias, M. 2009. Variational learning of inducing variables in sparse Gaussian processes. In Artificial Intelligence and Statistics, 567–574. PMLR.
- Sample-efficient optimization in the latent space of deep generative models via weighted retraining. Advances in Neural Information Processing Systems, 33: 11259–11272.
- High-dimensional Bayesian optimization with invariance. In ICML Workshop on Adaptive Experimental Design and Active Learning.
- FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning. In The Eleventh International Conference on Learning Representations.
- S2OSC: A Holistic Semi-Supervised Approach for Open Set Classification. ACM Trans. Knowl. Discov. Data, 16(2): 34:1–34:27.
- High-dimensional Bayesian Optimization via Semi-supervised Learning with Optimized Unlabeled Data Sampling. arXiv preprint arXiv:2305.02614.
- D-vae: A variational autoencoder for directed acyclic graphs. Advances in Neural Information Processing Systems, 32.
- A hybrid anomaly detection method for high dimensional data. PeerJ Computer Science, 9: e1199.