Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 75 tok/s

Gemini 2.5 Pro 40 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 97 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 455 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

CoVariance Filters and Neural Networks over Hilbert Spaces (2509.13178v1)

Published 16 Sep 2025 in cs.LG and eess.SP

Abstract: CoVariance Neural Networks (VNNs) perform graph convolutions on the empirical covariance matrix of signals defined over finite-dimensional Hilbert spaces, motivated by robustness and transferability properties. Yet, little is known about how these arguments extend to infinite-dimensional Hilbert spaces. In this work, we take a first step by introducing a novel convolutional learning framework for signals defined over infinite-dimensional Hilbert spaces, centered on the (empirical) covariance operator. We constructively define Hilbert coVariance Filters (HVFs) and design Hilbert coVariance Networks (HVNs) as stacks of HVF filterbanks with nonlinear activations. We propose a principled discretization procedure, and we prove that empirical HVFs can recover the Functional PCA (FPCA) of the filtered signals. We then describe the versatility of our framework with examples ranging from multivariate real-valued functions to reproducing kernel Hilbert spaces. Finally, we validate HVNs on both synthetic and real-world time-series classification tasks, showing robust performance compared to MLP and FPCA-based classifiers.

Summary

The paper introduces a framework that extends covariance-based neural networks to infinite-dimensional Hilbert spaces, enabling recovery of FPCA scores.
It develops Hilbert coVariance Filters using both spectral and polynomial methods, ensuring scalability without requiring explicit eigendecomposition.
Empirical evaluations on synthetic and ECG time-series data demonstrate improved robustness and sample efficiency compared to MLP and FPCA baselines.

CoVariance Filters and Neural Networks over Hilbert Spaces: A Technical Analysis

Introduction and Motivation

This work addresses the extension of covariance-based neural architectures from finite-dimensional to infinite-dimensional Hilbert spaces. Covariance Neural Networks (VNNs) have demonstrated robust performance and transferability by leveraging the empirical covariance matrix as a graph shift operator, effectively combining the representational power of PCA with the stability of GNNs. However, their applicability has been limited to finite-dimensional settings. The paper introduces a principled framework for convolutional learning over general Hilbert spaces, centered on the covariance operator, and develops Hilbert coVariance Filters (HVFs) and Hilbert coVariance Networks (HVNs). The framework is made practical via a discretization procedure, and its theoretical properties are established, including the recovery of Functional PCA (FPCA) through empirical HVFs.

Theoretical Framework

Covariance Operator and Spectral Domain

Given a separable Hilbert space $H$ and a square-integrable $H$ -valued random variable $X$ , the covariance operator $C: H \to H$ is defined as

$Cv = \mathbb{E}[(X - \mu) \langle X - \mu, v \rangle],$

where $\mu = \mathbb{E}[X]$ . $C$ is compact, self-adjoint, and trace-class, admitting a spectral decomposition

$Cv = \sum_{\ell \ge 1} \lambda_\ell \langle v, \phi_\ell \rangle \phi_\ell,$

with nonnegative eigenvalues $\lambda_\ell$ and orthonormal eigenfunctions $\phi_\ell$ .

The Hilbert coVariance Fourier Transform (HVFT) of $x \in H$ is defined as $\widetilde{x}[\ell] = \langle x, \phi_\ell \rangle$ , which is equivalent to the FPCA transform up to recentering. This spectral domain provides a natural basis for defining convolutional operations and learning architectures.

Hilbert coVariance Filters (HVFs)

Two classes of HVFs are introduced:

Spectral HVFs: For a bounded Borel function $h$ , the filter is

$\mathbf{h}(C)x = \sum_{\ell=1}^\infty h(\lambda_\ell) \langle x, \phi_\ell \rangle \phi_\ell + h(0)x_\perp,$

where $x_\perp$ is the projection onto $\ker(C)$ . Filtering is pointwise in the HVFT domain: $\widetilde{g}[\ell] = h(\lambda_\ell)\widetilde{x}[\ell]$ .

Spatial (Polynomial) HVFs: For order $J$ and parameters $w_0, \ldots, w_J$ ,

$\mathbf{h}(C) = \sum_{j=0}^J w_j C^j.$

The frequency response is $h(\lambda) = \sum_{k=0}^J w_k \lambda^k$ . This avoids explicit eigendecomposition and is scalable.

Hilbert coVariance Networks (HVNs)

HVNs are constructed as layered architectures stacking banks of HVFs and nonlinear activations. The $t$ -th layer propagates as

$x_{t+1}^u = \sigma\left(\sum_{i=1}^{F_t} \mathbf{h}(C)_t^{u,i} x_t^i\right),$

where $\sigma$ is a nonlinear operator on $H$ . The architecture generalizes VNNs and GNNs to infinite-dimensional settings.

Discretization and Implementation

Empirical Covariance Operator

Given $n$ i.i.d. samples $x_1, \ldots, x_n \in H$ , the empirical covariance operator is

$\hat{C}_n v = \frac{1}{n} \sum_{i=1}^n \langle x_i - \bar{x}, v \rangle (x_i - \bar{x}),$

with $\bar{x} = n^{-1} \sum_{i=1}^n x_i$ . $\hat{C}_n$ is finite-rank, and its spectral decomposition enables empirical HVFs.

Discretization Operator

A bounded discretization operator $S_m: H \to \mathbb{R}^m$ is defined via bounded linear functionals, yielding discretized samples $x_i^{(m)} = S_m x_i$ . The empirical covariance matrix of discretized signals is

$\hat{\mathbf{C}}_n^{(m)} = \frac{1}{n} \sum_{i=1}^n (x_i^{(m)} - \overline{x}^{(m)})(x_i^{(m)} - \overline{x}^{(m)})^\top.$

It is shown that $\hat{\mathbf{C}}_n^{(m)} = S_m \hat{C}_n S_m^*$ , ensuring commutativity between filtering and discretization under orthonormal sampling.

Discrete HVFs and HVNs

Discrete HVFs and HVNs are implemented by replacing $C$ with $\hat{\mathbf{C}}_n^{(m)}$ and signals with their discretized versions. The layerwise propagation in matrix form is

$\mathbf{X}_{t+1}^{(m)} = \sigma\left(\sum_{j=0}^J (\hat{\mathbf{C}}_n^{(m)})^j \mathbf{X}_t^{(m)} \mathbf{W}_{t,j}\right),$

where $\mathbf{W}_{t,j}$ are learnable parameters.

Theoretical Guarantees

A key result is that empirical polynomial HVFs can recover the empirical FPCA scores. For each distinct eigenvalue $\alpha$ of $\hat{C}_n$ , there exists a polynomial HVF $\mathbf{h}_\alpha$ such that

$\mathbf{h}_\alpha(\hat{C}_n)x = P_\alpha x,$

where $P_\alpha$ is the orthogonal projector onto the eigenspace of $\alpha$ . This establishes a direct connection between HVFs and FPCA, generalizing the classical result that PCA scores are outputs of a bank of narrowband spectral filters.

Practical Use-Cases

The framework is applicable to a variety of settings:

Functional Data (e.g., time series): $H = L^2([0,1], \mathbb{R}^d)$ , with discretization via bin-averaging.
Infinite Sequences: $H = \ell^2(\mathbb{N})$ , with canonical projection.
Reproducing Kernel Hilbert Spaces (RKHS): $H = \mathcal{H}_K$ , with point evaluation as discretization.

Nonlinearities are typically pointwise Lipschitz functions, ensuring stability and well-posedness.

Empirical Evaluation

The paper presents experiments on both synthetic and real-world time-series classification tasks.

Synthetic Task: Binary classification of multivariate time-series bags, where class information is encoded in cross-channel covariance. HVNs outperform MLPs (which only access first-order information) and FPCA-based classifiers (which suffer from coordinate drift and instability).

Figure 1: Test accuracy versus number of samples, demonstrating the superior sample efficiency and robustness of HVNs compared to MLP and FPCA baselines.

ECG5000 Dataset: Classification of ECG time-series with varying discretization resolution. HVNs consistently outperform both MLP and FPCA, especially as the discretization becomes finer, highlighting the benefit of robust covariance-aware learning.

Implementation Considerations

Computational Complexity: Spatial HVFs avoid eigendecomposition, making them scalable to high-dimensional discretizations.
Stability: The use of the covariance operator as a shift ensures transferability and robustness to stochastic perturbations in the empirical covariance, especially in high-dimension/low-sample regimes.
Nonlinearity Choice: Pointwise Lipschitz activations are recommended for $L^2$ spaces to preserve integrability and stability.
Discretization: The choice of discretization operator (e.g., bin-averaging, canonical projection, point evaluation) should be tailored to the underlying Hilbert space and application domain.

Implications and Future Directions

The proposed framework generalizes covariance-based neural architectures to infinite-dimensional settings, enabling principled learning from functional, sequential, or kernelized data. The theoretical connection to FPCA provides interpretability and guarantees on representational power. The empirical results demonstrate robust performance and transferability, particularly in scenarios where discriminative information is encoded in second-order statistics.

Future work should address:

Convergence Analysis: Studying the convergence of discrete HVFs/HVNs to their infinite-dimensional counterparts as discretization resolution increases.
Transferability: Formal analysis of the transferability properties of HVNs across domains and tasks.
Extension to Other Operators: Generalization to other classes of operators (e.g., covariance density operators) for enhanced robustness and control over stability-discriminability trade-offs.

Conclusion

This paper establishes a rigorous and practical framework for convolutional learning over Hilbert spaces via covariance operators, unifying and extending covariance-based neural architectures to infinite-dimensional domains. The approach is theoretically grounded, computationally tractable, and empirically validated, with broad applicability to functional data analysis, time-series modeling, and kernel methods. The results open new avenues for robust, interpretable, and transferable learning in high- and infinite-dimensional settings.