Bayesian Optimization of Function Networks

Updated 30 June 2025

BOFN is a methodology that optimizes expensive black-box objectives by leveraging a network of interconnected functions modeled with probabilistic surrogates.
It exploits intermediate outputs and node-level models like Gaussian Processes and Bayesian Neural Networks to enhance sample efficiency and cost effectiveness.
Specialized acquisition functions, including expected improvement and knowledge gradient methods, enable cost-aware and parallel evaluations in diverse applications.

Bayesian Optimization of Function Networks (BOFN) is a paradigm for optimizing expensive black-box objectives composed as networks—often directed acyclic graphs—of multiple functions, with outputs from some nodes serving as inputs for others. BOFN encompasses techniques that model, exploit, and optimize the structural properties of such networks. This enables improved sample efficiency, cost-awareness, and scalability, especially in scientific, engineering, and machine learning domains where evaluation cost, partial observability, variable costs across components, and parallelism are central concerns.

1. Foundational Principles and Problem Setting

The classical Bayesian optimization (BO) framework focuses on sequentially optimizing an expensive and typically black-box scalar function $f: \mathcal{X} \to \mathbb{R}$ , guided by a probabilistic surrogate model and an acquisition function that quantifies the value of querying specific inputs. BOFN generalizes this to objectives defined as the output of a function network: a composition or network $g(x)$ in which

each node $f_k$ represents a function,
edges represent data dependencies (outputs feeding into inputs),
and the overall objective is the output at the leaf/terminal node.

Critically, BOFN leverages:

Intermediate outputs: Observations from internal nodes are used for learning and diagnosis, not just the final objective value (2112.15311).
Node-level modeling: Each node can be modeled independently, typically via Gaussian Processes (GPs) or, increasingly, Bayesian neural networks and operator networks (1502.05700, 2007.03117, 2104.11667, 2404.03099).

BOFN encompasses settings such as:

Sequential manufacturing (multi-stage processes with variable costs per stage)
ML pipelines with staged pre-processing and modeling
Engineering design where outputs are composite functionals of simulation outputs
Scientific discovery workflows with multi-fidelity or multi-modal simulation/experimentation

2. Probabilistic Surrogates and Uncertainty Quantification

Probabilistic surrogates in BOFN must accommodate the networked structure and potentially high dimensionality of the problem. Two broad classes of surrogates are commonly employed:

Gaussian Processes (GPs):
- Each node is modeled as an independent GP, with recursive sampling of the outputs to propagate uncertainty through the network (2112.15311).
- Surrogate posteriors for internal nodes provide richer information than only modeling the global objective as a black box.
- The induced posterior on the output is generally non-Gaussian, requiring sample-based inference for expected improvement and other acquisition calculations.
Bayesian Neural Networks (BNNs) and Deep Operator Networks:
- BNNs provide scalable uncertainty quantification, enabling applications to high-dimensional, structured, or function-valued outputs (1502.05700, 2104.11667, 2504.10076).
- Recent developments in operator learning and epistemic neural networks (e.g., NEON) allow direct modeling of mappings between function spaces, with well-calibrated uncertainty and high parameter efficiency (2404.03099).
- Moment-matching and variational Bayesian inference methods enable computational tractability when propagating uncertainty through deep neural stacks (2007.03117, 2007.03117, 2106.09884).

Uncertainty predictions are essential for effective acquisition function optimization, trustworthy decision-making, and for handling partial or noisy observations.

3. Acquisition Functions and Partial Evaluation

Acquisition functions in BOFN are tailored to the network structure and cost regime:

Expected Improvement (EI) for Function Networks: EI is computed with respect to the posterior induced by the full network or, in the multi-node setting, by recursively sampling through each GP to estimate the improvement over the current best observed value (2112.15311). Closed-form EI expressions are unavailable; instead, sample average approximation (SAA) provides efficient, differentiable estimates for optimization.
Knowledge Gradient (KG) with Partial Evaluations (p-KGFN): To exploit situations where nodes can be evaluated individually at varying costs, the KG acquisition function is reformulated to select, at each step, both the node and input to query, maximizing benefit per cost (2311.02146). Specifically,

$\alpha_{n, k}(z_k) = \frac{\mathbb{E}_{y_k}\left[ \nu^*_{n+1} \right] - \nu^*_{n}}{ c_k(z_k) }$

where $c_k(z_k)$ is the evaluation cost and $\nu^*_{n}$ is the current best posterior mean for the output node.

Accelerated p-KGFN: To alleviate the high computational overhead of nested acquisition computations, a recent method accelerates acquisition optimization by generating node-specific candidate inputs with one global Monte Carlo simulation and evaluating acquisition functions over a judiciously constructed discrete set. This reduces per-iteration runtime by up to $16\times$ , with only modest loss in sample efficiency (2506.11456).
Extensions:
- Cost-aware selection and batch querying strategies, leveraging meta-information about node relationships, costs, and dependencies (2311.02146, 2106.09884).
- Entropy-based and mutual information-based acquisition functions suitable for multi-fidelity and multi-output settings (2007.03117, 2106.09884).
- Accommodating partial evaluations, noisy outputs, and side-information.

4. Computational and Practical Advantages

BOFN methods yield substantial gains in both computational and experimental efficiency, as demonstrated by empirical studies:

Sample efficiency: By exploiting intermediate observations and node-specific evaluation, BOFN methods deliver improved performance in terms of finding global or high-quality optima with fewer expensive queries than black-box BO (2112.15311, 2311.02146).
Cost amortization: Selective evaluation at lower-cost nodes focuses learning on informative or bottleneck components, reserving expensive evaluations (e.g., final or costly stages) for promising candidates (2311.02146).
Scalability: Surrogate models based on neural networks (DNGO, BNNs, NEON) or multi-task Bayesian linear regression exhibit linear scaling with data, in contrast to cubic scaling in GPs (1502.05700, 1712.02902, 2104.11667). This enables their use in large, parallel, or high-dimensional settings.
Computational acceleration: Fast-p-KGFN drastically reduces the time required for acquisition optimization, from many minutes to tens of seconds per iteration, with negligible compromise on optimization quality in practical benchmarks (2506.11456).

These properties are crucial in domains where evaluation cost, model complexity, and scale make classical BO approaches infeasible.

5. Applications and Empirical Evidence

BOFN methods have been validated and applied across a range of realistic settings:

Manufacturing and engineering design: Optimization of multi-step or multi-fidelity processes (e.g., vaccine production, mechanical design), leveraging both sequence structure and cost-aware node selection (2112.15311, 2311.02146, 2007.03117, 2106.09884).
Scientific simulation and inverse design: Use of operator neural surrogates for optimizing composite functionals over unknown mappings (e.g., PDE-driven objectives, optical or chemical process optimization) (2404.03099).
Drug/material discovery: Sequential selection of experimental or computational domains (e.g., molecular design with cheap simulation and costly measurement stages) (2311.02146).
Hyperparameter and neural architecture optimization: Tree-structured or conditional parameter spaces (function networks with conditional activation) are efficiently optimized using tree-structured kernels and cost-aware strategies (2010.03171).
Graph and combinatorial domains: Optimization over node subsets in graphs, using function networks that capture both graph structure and combinatorics (2405.15119).
Policy search and reinforcement learning: Multi-stage simulation or learning pipelines, where internal evaluations can be obtained or simulated at variable cost or fidelity (2112.15311).

Empirical studies demonstrate that BOFN variants (e.g., p-KGFN, NEON, BMBO-DARN) achieve or surpass state-of-the-art performance, improve convergence, and often reduce experimental or computational overhead dramatically.

6. Limitations, Open Problems, and Future Directions

Surrogate model misspecification: While neural surrogates improve scalability and expressivity, uncertainty quantification is more challenging than with GPs, requiring robust Bayesian treatments or calibration (2104.11667, 2504.10076).
Structure dependence: Existing BOFN methods often require explicit knowledge of the function network topology; automatic discovery of network structure remains challenging (2010.03171).
Acquisition function computational cost: Acquisition optimization in general networks can remain costly for high-dimensional or deeply nested architectures, though recent advances (e.g., fast p-KGFN) alleviate this (2506.11456).
Exploitation of additional information: Advanced settings—such as leveraging derivative or side-channel observations, incorporating user priors, or designing non-myopic acquisition functions—are active areas of research (2504.10076, 2305.17535, 2310.10614).
High-dimensional and combinatorial scalability: Extensions to very high-dimensional, graph-structured, or combinatorial design spaces present both algorithmic and statistical challenges (2303.01682, 2405.15119).
Theory: Asymptotic consistency is established for the general EI-FN setting, including cases where not all regions are densely sampled; practical finite-time regret and convergence properties for advanced surrogates and acquisition functions remain active topics (2112.15311, 2303.01682).

7. Summary Table: Key Features of BOFN Approaches

Feature	GP-based BOFN	BNN/Deep Operator BOFN	Partial Eval. BOFN (p-KGFN)	Fast Acquisition (Fast p-KGFN)
Node-level Modeling	Yes	Yes	Yes	Yes
Leverages intermediates	Yes	Yes	Yes	Yes
Acquisition Function	SAA EI, KG	SAA/Monte Carlo-based	KG with cost, node choice	Accelerated on discrete set
Cost Awareness	Limited	Limited	Per-node, per-step	Per-node, efficient
Computational Scaling	$\mathcal{O}(N^3)$	$\mathcal{O}(N)$	High (nested MC, opt.)	Much lower (up to $16\times$ speedup)
Empirical Validation	Multiple domains	High-dimensional, composite	Synthetic and real-world networks	Superior wall-clock runtime

References

(1502.05700): "Scalable Bayesian Optimization Using Deep Neural Networks"
(1712.02902): "Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start"
(2112.15311): "Bayesian Optimization of Function Networks"
(2311.02146): "Bayesian Optimization of Function Networks with Partial Evaluations"
(2506.11456): "Fast Bayesian Optimization of Function Networks with Partial Evaluations"
(2104.11667): "Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure"
(2007.03117): "Multi-Fidelity Bayesian Optimization via Deep Neural Networks"
(2404.03099): "Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks"
(2303.01682): "Neural-BO: A Black-box Optimization Algorithm using Deep Neural Networks"
(2010.03171): "Additive Tree-Structured Conditional Parameter Spaces in Bayesian Optimization: A Novel Covariance Function and a Fast Implementation"
(2405.15119): "Bayesian Optimization of Functions over Node Subsets in Graphs"

BOFN represents a convergence of advances in probabilistic modeling, scalable Bayesian computation, and structural exploitation, enabling principled, practical optimization in complex network-structured problem settings that are increasingly common in modern science and engineering.