Composite Function Vectors

Updated 20 January 2026

Composite function vectors are functions formed by composing smooth mapping with scalar or vector-valued functions, characterized by structured rules for derivatives.
Their analysis employs multivariate Bell polynomials and Faà di Bruno formulas to precisely enumerate derivative contributions and combinatorial structures.
Applications span derivative-free and stochastic optimization, finite-difference schemes, and composition operators in analytic function spaces.

A composite function vector refers to the broad class of functions formed by composing a (possibly vector-valued) smooth mapping with another (scalar- or vector-valued) function, as well as the associated mathematical structures and rules for their derivatives, finite differences, and analytical properties. The study of composite function vectors is central to areas such as multivariate calculus, derivative-free optimization, stochastic optimization, and the theory of analytic function spaces, with special attention to the structure of their derivatives—encapsulated by multivariate generalizations of the Faà di Bruno and Bell polynomial formulas—as well as their discrete and functional-analytic generalizations.

1. Multivariate Bell Polynomials and the Faà di Bruno Formula

The most precise symbolic description of repeated derivatives of a composite function vector $f(g(x))$ , where $g: \mathbb{R}^n \to \mathbb{R}^m$ and $f:\mathbb{R}^m\to\mathbb{R}^p$ , is given by the multivariate Faà di Bruno formula, expressed in terms of multivariate Bell polynomials(Schumann, 2019). For multi-indices $n\in\mathbb{N}^n$ and $k\in\mathbb{N}^m$ , the partial multivariate Bell polynomial $B_{n,k}(y_j; j)$ packages all combinatorial contributions of higher-order partials of $g$ , such that

$\partial_x^n [f\circ g](x) = \sum_{|k|=0}^{|n|} f^{(k)}(g(x)) \cdot B_{n,k}\left(g^{(j)}(x); j\right)$

where $g^{(j)}(x)$ are multi-indexed derivatives of $g$ , and $f^{(k)}$ those of $f$ . This generalization recovers the classical (single-variable) Faà di Bruno and Bell polynomial structure when $n=m=1$ , but extends to all orders and mixed partials in higher dimensions. Combinatorially, Bell polynomials $B_{n,k}$ enumerate all ways to partition derivatives among the arguments—a necessity given the tensor character of $g$ and $f$ .

2. Discrete Composite Function Vector: Finite-Difference Structure

The discrete analog of composite function vectors is governed by a finite-difference version of the Faà di Bruno formula(Duarte et al., 2008). For $g:\mathbb{R}^n\to\mathbb{R}^m$ and $f:\mathbb{R}^m\to\mathbb{R}^p$ , and for multi-step directions $u_1,\ldots,u_k$ , the repeated difference operator $A^\alpha$ applied to $f\circ g$ is given by

$A^\alpha(f\circ g)(x) = \sum_{r=1}^{|\alpha|} \sum_{\{\alpha^1,\ldots,\alpha^r\}} A_{A^{\alpha^1}g(x), \ldots, A^{\alpha^r}g(x)}f(g(x))$

where the inner sum is over all partitions of the binary multi-index $\alpha$ into $r$ nonzero multi-indices $\alpha^j$ , and $A_{v^1,\ldots,v^r}f(y)$ is the $r$ -fold directional difference. This formula is entirely algebraic and applies to any abelian group, with no smoothness assumptions, providing a direct combinatorial correspondence to the continuous case.

3. Analytical Properties and Fully Linear Model Approximations

Composite function vectors arise as core objects in derivative-free optimization (DFO) when the objective function is a composition $f(x)=g(F(x))$ with $g:\mathbb{R}^m\to\mathbb{R}\cup\{+\infty\}$ convex lsc and $F:\mathbb{R}^n\to\mathbb{R}^m$ smooth(Hare, 2016). When each component $F_i$ can be approximated by fully linear models $m_{F_i,\Delta}$ satisfying

$|F_i(y) - m_{F_i,\Delta}(y)| \leq \kappa_F \Delta^2, \quad \|\nabla F_i(y) - \nabla m_{F_i,\Delta}(y)\| \leq \kappa_G \Delta,$

the error in the composite model $m_{f,\Delta}(y) = g(m_{F,\Delta}(y))$ is

$|f(y) - m_{f,\Delta}(y)| \leq L m \kappa_F \Delta^2$

with $L$ a local Lipschitz constant. Subdifferential proximity at a focal point $\bar x$ can be guaranteed with

$\text{dist}(\partial f(\bar x), \partial m_f(\bar x)) \leq M \sqrt{m} \kappa_G \Delta,$

where $M$ is the Lipschitz modulus of $g$ at $F(\bar x)$ . These error bounds justify the use of composite models in derivative-free trust region algorithms even when $f$ is nonsmooth but retains composite structure.

4. Stochastic Composite Function Vector Optimization

The composite function vector framework is central in modern optimization, required for problems where the objective takes the form $\Phi(x)=f(g(x))+r(x)$ , with $g(x)=E_\xi[g_\xi(x)]$ or $g(x)=\frac1N\sum g_i(x)$ , $f$ smooth, and $r$ a regularizer(Zhang et al., 2019). The chain rule for the exact gradient reads

$\nabla F(x) = [\nabla g(x)]^T \nabla f(g(x)),$

but naive stochastic estimators for $\nabla F(x)$ are biased due to the nonlinear action of $\nabla f$ on the inner estimate $\tilde g(x)$ . The Composite Incremental Variance-Reduced (CIVR) algorithm introduces independent SARAH/SPIDER-style variance reduction for $g$ and $\nabla g$ , yielding total sample complexity $O(\epsilon^{-3/2})$ for expectation-form and $O(N+\sqrt{N}\epsilon^{-1})$ for finite-sum form, matching the best known first-order methods for such nonconvex composite structures. The approach retains practical efficiency for a broad range of machine learning and reinforcement learning problems.

5. Composition Operators on Vector-Valued Analytic Function Spaces

In functional analysis, composite function vectors appear as composition operators $C_\varphi$ acting on Banach spaces $E(X)$ of $X$ -valued analytic functions via $C_\varphi(f)(z)=f(\varphi(z))$ (Laitila et al., 2015). The boundedness, compactness, and weak compactness of $C_\varphi$ depend on both the operator's action on scalar-valued spaces and the structure of $X$ . Key facts include:

$C_\varphi: E(X)\to E(X)$ is bounded iff $C_\varphi: E\to E$ is bounded.
For $X$ infinite dimensional, $C_\varphi$ is never compact on strong spaces $E(X)$ , but can be weakly compact under conditions derived from the scalar case (e.g., Shapiro's condition for $H^1$ ).
For reflexive $X$ , weak compactness of $C_\varphi$ on $E(X)$ is equivalent to compactness on $E$ for large classes of spaces, such as $H^1(X)$ , weighted Bergman, Bloch, and BMOA(X).

This theory connects composition of vector-valued functions to operator theory and function space geometry, exposing deep links between vectorial analytic structure and function-theoretic operator properties.

6. Concrete Examples and Special Cases

Explicit calculation in low dimension clarifies the structure of the composite function vector theory. For $x=(x_1,x_2)\in\mathbb{R}^2$ , $g:\mathbb{R}^2\to\mathbb{R}^2$ , $f:\mathbb{R}^2\to\mathbb{R}$ , the mixed partial $\partial_{x_1 x_2}[f\circ g]$ contains contributions from both second-order partials of $g$ (via $f^{(1)}$ ) and products of first partials (via $f^{(2)}$ ), with the multivariate Bell polynomial structure systematically packaging all terms(Schumann, 2019). For finite differences, the combinatorics are determined by partitions of the multi-index, with the formula reducing to familiar forms for $n=m=1$ , $p=1$ (Duarte et al., 2008).

7. Extensions, Assumptions, and Open Problems

The theory is sensitive to the assumptions on $g$ and $f$ . For convex composite optimization, interior point assumptions and model interpolation are critical(Hare, 2016). In analytic function spaces, key open questions concern the extension of Hilbert-Schmidt boundedness and weakly compact composition to non-classical spaces and operator-weighted settings(Laitila et al., 2015). In stochastic optimization, extensions to multilevel compositional problems are tractable by similar variance-reduction techniques, facilitating application to hierarchical models in machine learning(Zhang et al., 2019).

References

"Multivariate Bell Polynomials and Derivatives of Composed Functions" (Schumann, 2019)
"A discrete Faa di Bruno's formula" (Duarte et al., 2008)
"Compositions of Convex Functions and Fully Linear Models" (Hare, 2016)
"Composition operators on vector-valued analytic function spaces: a survey" (Laitila et al., 2015)
"A Stochastic Composite Gradient Method with Incremental Variance Reduction" (Zhang et al., 2019)