Bayesian Optimization of Function Networks
- BOFN is a methodology that optimizes expensive black-box objectives by leveraging a network of interconnected functions modeled with probabilistic surrogates.
- It exploits intermediate outputs and node-level models like Gaussian Processes and Bayesian Neural Networks to enhance sample efficiency and cost effectiveness.
- Specialized acquisition functions, including expected improvement and knowledge gradient methods, enable cost-aware and parallel evaluations in diverse applications.
Bayesian Optimization of Function Networks (BOFN) is a paradigm for optimizing expensive black-box objectives composed as networks—often directed acyclic graphs—of multiple functions, with outputs from some nodes serving as inputs for others. BOFN encompasses techniques that model, exploit, and optimize the structural properties of such networks. This enables improved sample efficiency, cost-awareness, and scalability, especially in scientific, engineering, and machine learning domains where evaluation cost, partial observability, variable costs across components, and parallelism are central concerns.
1. Foundational Principles and Problem Setting
The classical Bayesian optimization (BO) framework focuses on sequentially optimizing an expensive and typically black-box scalar function , guided by a probabilistic surrogate model and an acquisition function that quantifies the value of querying specific inputs. BOFN generalizes this to objectives defined as the output of a function network: a composition or network in which
- each node represents a function,
- edges represent data dependencies (outputs feeding into inputs),
- and the overall objective is the output at the leaf/terminal node.
Critically, BOFN leverages:
- Intermediate outputs: Observations from internal nodes are used for learning and diagnosis, not just the final objective value (Astudillo et al., 2021).
- Node-level modeling: Each node can be modeled independently, typically via Gaussian Processes (GPs) or, increasingly, Bayesian neural networks and operator networks (Snoek et al., 2015, Li et al., 2020, Kim et al., 2021, Guilhoto et al., 3 Apr 2024).
BOFN encompasses settings such as:
- Sequential manufacturing (multi-stage processes with variable costs per stage)
- ML pipelines with staged pre-processing and modeling
- Engineering design where outputs are composite functionals of simulation outputs
- Scientific discovery workflows with multi-fidelity or multi-modal simulation/experimentation
2. Probabilistic Surrogates and Uncertainty Quantification
Probabilistic surrogates in BOFN must accommodate the networked structure and potentially high dimensionality of the problem. Two broad classes of surrogates are commonly employed:
- Gaussian Processes (GPs):
- Each node is modeled as an independent GP, with recursive sampling of the outputs to propagate uncertainty through the network (Astudillo et al., 2021).
- Surrogate posteriors for internal nodes provide richer information than only modeling the global objective as a black box.
- The induced posterior on the output is generally non-Gaussian, requiring sample-based inference for expected improvement and other acquisition calculations.
- Bayesian Neural Networks (BNNs) and Deep Operator Networks:
- BNNs provide scalable uncertainty quantification, enabling applications to high-dimensional, structured, or function-valued outputs (Snoek et al., 2015, Kim et al., 2021, Makrygiorgos et al., 14 Apr 2025).
- Recent developments in operator learning and epistemic neural networks (e.g., NEON) allow direct modeling of mappings between function spaces, with well-calibrated uncertainty and high parameter efficiency (Guilhoto et al., 3 Apr 2024).
- Moment-matching and variational Bayesian inference methods enable computational tractability when propagating uncertainty through deep neural stacks (Li et al., 2020, Li et al., 2020, Li et al., 2021).
Uncertainty predictions are essential for effective acquisition function optimization, trustworthy decision-making, and for handling partial or noisy observations.
3. Acquisition Functions and Partial Evaluation
Acquisition functions in BOFN are tailored to the network structure and cost regime:
- Expected Improvement (EI) for Function Networks: EI is computed with respect to the posterior induced by the full network or, in the multi-node setting, by recursively sampling through each GP to estimate the improvement over the current best observed value (Astudillo et al., 2021). Closed-form EI expressions are unavailable; instead, sample average approximation (SAA) provides efficient, differentiable estimates for optimization.
- Knowledge Gradient (KG) with Partial Evaluations (p-KGFN): To exploit situations where nodes can be evaluated individually at varying costs, the KG acquisition function is reformulated to select, at each step, both the node and input to query, maximizing benefit per cost (Buathong et al., 2023). Specifically,
where is the evaluation cost and is the current best posterior mean for the output node.
- Accelerated p-KGFN: To alleviate the high computational overhead of nested acquisition computations, a recent method accelerates acquisition optimization by generating node-specific candidate inputs with one global Monte Carlo simulation and evaluating acquisition functions over a judiciously constructed discrete set. This reduces per-iteration runtime by up to , with only modest loss in sample efficiency (Buathong et al., 13 Jun 2025).
- Extensions:
- Cost-aware selection and batch querying strategies, leveraging meta-information about node relationships, costs, and dependencies (Buathong et al., 2023, Li et al., 2021).
- Entropy-based and mutual information-based acquisition functions suitable for multi-fidelity and multi-output settings (Li et al., 2020, Li et al., 2021).
- Accommodating partial evaluations, noisy outputs, and side-information.
4. Computational and Practical Advantages
BOFN methods yield substantial gains in both computational and experimental efficiency, as demonstrated by empirical studies:
- Sample efficiency: By exploiting intermediate observations and node-specific evaluation, BOFN methods deliver improved performance in terms of finding global or high-quality optima with fewer expensive queries than black-box BO (Astudillo et al., 2021, Buathong et al., 2023).
- Cost amortization: Selective evaluation at lower-cost nodes focuses learning on informative or bottleneck components, reserving expensive evaluations (e.g., final or costly stages) for promising candidates (Buathong et al., 2023).
- Scalability: Surrogate models based on neural networks (DNGO, BNNs, NEON) or multi-task Bayesian linear regression exhibit linear scaling with data, in contrast to cubic scaling in GPs (Snoek et al., 2015, Perrone et al., 2017, Kim et al., 2021). This enables their use in large, parallel, or high-dimensional settings.
- Computational acceleration: Fast-p-KGFN drastically reduces the time required for acquisition optimization, from many minutes to tens of seconds per iteration, with negligible compromise on optimization quality in practical benchmarks (Buathong et al., 13 Jun 2025).
These properties are crucial in domains where evaluation cost, model complexity, and scale make classical BO approaches infeasible.
5. Applications and Empirical Evidence
BOFN methods have been validated and applied across a range of realistic settings:
- Manufacturing and engineering design: Optimization of multi-step or multi-fidelity processes (e.g., vaccine production, mechanical design), leveraging both sequence structure and cost-aware node selection (Astudillo et al., 2021, Buathong et al., 2023, Li et al., 2020, Li et al., 2021).
- Scientific simulation and inverse design: Use of operator neural surrogates for optimizing composite functionals over unknown mappings (e.g., PDE-driven objectives, optical or chemical process optimization) (Guilhoto et al., 3 Apr 2024).
- Drug/material discovery: Sequential selection of experimental or computational domains (e.g., molecular design with cheap simulation and costly measurement stages) (Buathong et al., 2023).
- Hyperparameter and neural architecture optimization: Tree-structured or conditional parameter spaces (function networks with conditional activation) are efficiently optimized using tree-structured kernels and cost-aware strategies (Ma et al., 2020).
- Graph and combinatorial domains: Optimization over node subsets in graphs, using function networks that capture both graph structure and combinatorics (Liang et al., 24 May 2024).
- Policy search and reinforcement learning: Multi-stage simulation or learning pipelines, where internal evaluations can be obtained or simulated at variable cost or fidelity (Astudillo et al., 2021).
Empirical studies demonstrate that BOFN variants (e.g., p-KGFN, NEON, BMBO-DARN) achieve or surpass state-of-the-art performance, improve convergence, and often reduce experimental or computational overhead dramatically.
6. Limitations, Open Problems, and Future Directions
- Surrogate model misspecification: While neural surrogates improve scalability and expressivity, uncertainty quantification is more challenging than with GPs, requiring robust Bayesian treatments or calibration (Kim et al., 2021, Makrygiorgos et al., 14 Apr 2025).
- Structure dependence: Existing BOFN methods often require explicit knowledge of the function network topology; automatic discovery of network structure remains challenging (Ma et al., 2020).
- Acquisition function computational cost: Acquisition optimization in general networks can remain costly for high-dimensional or deeply nested architectures, though recent advances (e.g., fast p-KGFN) alleviate this (Buathong et al., 13 Jun 2025).
- Exploitation of additional information: Advanced settings—such as leveraging derivative or side-channel observations, incorporating user priors, or designing non-myopic acquisition functions—are active areas of research (Makrygiorgos et al., 14 Apr 2025, Müller et al., 2023, Kong et al., 2023).
- High-dimensional and combinatorial scalability: Extensions to very high-dimensional, graph-structured, or combinatorial design spaces present both algorithmic and statistical challenges (Phan-Trong et al., 2023, Liang et al., 24 May 2024).
- Theory: Asymptotic consistency is established for the general EI-FN setting, including cases where not all regions are densely sampled; practical finite-time regret and convergence properties for advanced surrogates and acquisition functions remain active topics (Astudillo et al., 2021, Phan-Trong et al., 2023).
7. Summary Table: Key Features of BOFN Approaches
Feature | GP-based BOFN | BNN/Deep Operator BOFN | Partial Eval. BOFN (p-KGFN) | Fast Acquisition (Fast p-KGFN) |
---|---|---|---|---|
Node-level Modeling | Yes | Yes | Yes | Yes |
Leverages intermediates | Yes | Yes | Yes | Yes |
Acquisition Function | SAA EI, KG | SAA/Monte Carlo-based | KG with cost, node choice | Accelerated on discrete set |
Cost Awareness | Limited | Limited | Per-node, per-step | Per-node, efficient |
Computational Scaling | High (nested MC, opt.) | Much lower (up to speedup) | ||
Empirical Validation | Multiple domains | High-dimensional, composite | Synthetic and real-world networks | Superior wall-clock runtime |
References
- (Snoek et al., 2015): "Scalable Bayesian Optimization Using Deep Neural Networks"
- (Perrone et al., 2017): "Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start"
- (Astudillo et al., 2021): "Bayesian Optimization of Function Networks"
- (Buathong et al., 2023): "Bayesian Optimization of Function Networks with Partial Evaluations"
- (Buathong et al., 13 Jun 2025): "Fast Bayesian Optimization of Function Networks with Partial Evaluations"
- (Kim et al., 2021): "Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure"
- (Li et al., 2020): "Multi-Fidelity Bayesian Optimization via Deep Neural Networks"
- (Guilhoto et al., 3 Apr 2024): "Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks"
- (Phan-Trong et al., 2023): "Neural-BO: A Black-box Optimization Algorithm using Deep Neural Networks"
- (Ma et al., 2020): "Additive Tree-Structured Conditional Parameter Spaces in Bayesian Optimization: A Novel Covariance Function and a Fast Implementation"
- (Liang et al., 24 May 2024): "Bayesian Optimization of Functions over Node Subsets in Graphs"
BOFN represents a convergence of advances in probabilistic modeling, scalable Bayesian computation, and structural exploitation, enabling principled, practical optimization in complex network-structured problem settings that are increasingly common in modern science and engineering.