2000 character limit reached

Functional Gradient Ascent (FGA)

Updated 18 November 2025

Functional Gradient Ascent (FGA) is a method for optimizing infinite-dimensional function inputs by computing Fréchet derivatives on acquisition functions derived from Gaussian process models.
It employs a scalarized upper confidence bound (UCB) acquisition approach with integrated weight functions to effectively navigate high-dimensional function spaces.
Empirical benchmarks demonstrate that FGA significantly reduces regret and improves sample efficiency compared to alternative optimization methods in scientific and engineering applications.

The functional gradient ascent algorithm (FGA) is a method for optimizing infinite-dimensional function-valued inputs over acquisition functions derived from a function-on-function Gaussian process (FFGP) model within function-on-function Bayesian optimization (FFBO). This setting arises when both the input and output of the optimization are elements of functional spaces, frequently encountered in advanced scientific and engineering domains requiring optimization of curves, shapes, or other high-dimensional function-valued objects (Huang et al., 16 Nov 2025).

1. Mathematical Formulation of Function-on-Function BO

In FFBO, the objective is to maximize a black-box map $f: \mathcal{X}^p \to \mathcal{Y}$ , where $\mathcal{X} \subset L^2(\Omega_x)$ is an infinite-dimensional Hilbert space of square-integrable functions defined on $\Omega_x$ and $\mathcal{Y} = L^2(\Omega_y)$ is the output space (also a Hilbert space of functions on $\Omega_y$ ). The principal challenge lies in modeling posterior uncertainty and optimizing over $\mathcal{X}$ .

The FFGP prior is defined by a mean $\mu \in \mathcal{Y}$ and a separable operator-valued kernel

$K(\bm x, \bm x') = \sigma^2\, k_x(\bm x, \bm x') T_\mathcal{Y}$

where $k_x$ is a positive-definite scalar-valued kernel (e.g., Matérn kernel on the $L^2$ metric), and $T_\mathcal{Y}$ is a nonnegative self-adjoint operator on $\mathcal{Y}$ (typically an integral operator).

For observation pairs $(\bm x_i, y_i)$ , the posterior mean and covariance functions of $f(\bm x)$ are given by

$\begin{aligned} \hat f(\bm x) &= \mu + \bm K_n(\bm x)^\top (\mathbf{K}_n + \tau^2 I_\mathcal{Y})^{-1} (\bm Y_n - \mathbf{1}_n\mu) \ \hat K(\bm x,\bm x) &= K(\bm x, \bm x) - \bm K_n(\bm x)^\top (\mathbf{K}_n + \tau^2 I_\mathcal{Y})^{-1} \bm K_n(\bm x) \end{aligned}$

where $\mathbf{K}_n$ is an $n \times n$ block-matrix of operator-valued kernels and $\tau^2$ is the functional output noise variance.

2. Scalarized Acquisition and UCB Functional

A scalar acquisition function is defined through a functional $L_\phi$ mapping $f(\bm x)$ to a scalar by

$L_\phi: f(\bm x) \mapsto \int_{\Omega_y} \phi(t) f(\bm x)(t)\, dt$

for a weight function $\phi \in L^2(\Omega_y)$ . The scalarized mean and variance become

$\begin{aligned} \hat\mu_g(\bm x) &= L_\phi \mu + \bm k_x^{(n)}(\bm x)^\top (\mathbf{K}_x^{(n)} + \tfrac{\tau^2}{c}I)^{-1} (L_\phi \bm Y_n - \mathbf{1}_n\mu^g) \ \hat k_g(\bm x,\bm x) &= c[k_x(\bm x,\bm x) - \bm k_x^{(n)}(\bm x)^\top (\mathbf{K}_x^{(n)} + \tfrac{\tau^2}{c}I)^{-1} \bm k_x^{(n)}(\bm x)] \end{aligned}$

where $c = \iint \phi(t)\phi(s)k_y(s,t)\,ds\,dt$ and $k_y$ is the kernel underlying $T_\mathcal{Y}$ .

The upper confidence bound (UCB) acquisition function is then

$\alpha_{\mathrm{UCB}}(\bm x) = \hat\mu_g(\bm x) + \sqrt{\beta_t} \sqrt{\hat k_g(\bm x,\bm x)}$

with exploration parameter $\beta_t$ .

3. Functional Gradient Ascent Algorithm

FGA is applied to maximize $\alpha_{\mathrm{UCB}}(\bm x)$ over the infinite-dimensional input space: $\bm x_{n+1} = \arg\max_{\bm x \in \mathcal{X}^p} \alpha_{\mathrm{UCB}}(\bm x)$ The Fréchet derivative (the functional generalization of the gradient) is computed as: $\nabla_{\bm x} \alpha_{\mathrm{UCB}} = g_1(\bm x) + g_2(\bm x)$ where

$\begin{aligned} g_1(\bm x) &= \nabla \bm k_x^{(n)}(\bm x)^\top (\mathbf{K}_x^{(n)} + \tfrac{\tau^2}{c}I)^{-1} (L_\phi\bm Y_n - \mathbf{1}_n\mu^g) \ g_2(\bm x) &= -2c\, \nabla \bm k_x^{(n)}(\bm x)^\top (\mathbf{K}_x^{(n)} + \tfrac{\tau^2}{c}I)^{-1} \bm k_x^{(n)}(\bm x) \end{aligned}$

For the $L^2$ -distance based kernel (e.g., Matérn), the Fréchet derivative with respect to variations $h \in L^2$ is

$\nabla_{\bm x} k_x(\bm x_i, \bm x)(s) = 2\, k_x(\|\bm x_i - \bm x\|/\psi_x) \frac{x(s) - x_i(s)}{\psi_x \|\bm x_i - \bm x\|}$

The optimization is performed as an iterative ascent, i.e.,

$\bm x^{(\ell)} = \bm x^{(\ell-1)} + \gamma_\ell \nabla_{\bm x} \alpha_{\mathrm{UCB}}(\bm x^{(\ell-1)})$

for step-size $\gamma_\ell$ until convergence.

4. Theoretical Properties and Convergence

Under regularity conditions (Matérn kernel, trace-class operator $T_\mathcal{Y}$ , bounded $L^2$ norms, and appropriate vanishing noise scaling), the posterior for $f(\bm x)$ remains well-defined, and truncation at finite rank $m$ yields

$\|\hat f_m(\bm x) - \hat f(\bm x)\| \le C m^{-1}$

(regardless of the choice of basis). The FFBO regret with FGA satisfies, with high probability,

$R_T \le \sqrt{B_1 T \beta_T \gamma_T} + \frac{\pi^2}{6}$

with information gain $\gamma_T$ , thus simple regret is $O^*(\sqrt{T})$ . This provides a nontrivial guarantee for global optimization in infinite-dimensional settings (Huang et al., 16 Nov 2025).

5. Empirical Performance and Benchmarks

FGA-based FFBO is evaluated against baselines: FIBO (function-input BO), FOBO (function-output BO), and MTBO (multi-task BO). On both synthetic and real-world tasks involving optimization over function-valued domains (e.g., one-dimensional curves, stress–strain waveform matching), FFBO achieves faster convergence and lower regret across all tested scenarios for the same query budget, with robustness to noise and consistent superiority in sample efficiency (Huang et al., 16 Nov 2025).

6. Context within Operator-Based Bayesian Optimization

Alternative approaches in the operator learning and surrogate modeling literature include surrogate construction via parametric operator networks (e.g., NEON, as in (Guilhoto et al., 3 Apr 2024)), which operate over deterministic mappings $h: X \to C(\mathcal{Y}, \mathbb{R}^{d_s})$ and use backpropagation-based optimizers (e.g., L-BFGS) in the reduced design space $X \subset \mathbb{R}^{d_u}$ . These methods contrast with FGA in their reliance on finite-parametric representations and backpropagation instead of Fréchet-based optimization over function spaces.

7. Implications and Extensions

FGA represents a principled approach for functional optimization in the fully infinite-dimensional regime, circumventing intrinsic limitations of finite-parameterization. A plausible implication is that FGA enables direct exploitation of smoothness and structural priors on function spaces—reflected in the kernel and operator choices—yielding both theoretical and empirical superiority in problems where function-valued inputs and outputs are intrinsic (Huang et al., 16 Nov 2025). Such methods set the foundation for further advances in functional sequential design, shape optimization, and scientific computing with high-dimensional function spaces.

PDF Markdown Chat (Pro)

References (2)

Function-on-Function Bayesian Optimization (2025)

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks (2024)

Follow Topic

Get notified by email when new papers are published related to Functional Gradient Ascent Algorithm (FGA).