Geometry-Aware Stochastic Neural Architecture

Updated 12 January 2026

The paper introduces a framework where underlying geometry directly informs stochastic modeling, search, and neural architecture design.
The methodology leverages latent field models, structured NAS landscapes, and natural gradient updates to enhance robustness and efficiency.
Practical implications include improved sample efficiency, zero-shot generalization, and reduced Wasserstein distance in simulation tasks.

A geometry-aware stochastic neural architecture is any neural system in which the underlying geometry—either of the input space, the architecture space, or an explicit geometric manifold—plays a direct and formally encoded role in the stochastic modeling, search, learning, or generation of neural architectures and responses. These approaches utilize a combination of stochastic processes, geometric descriptors, and often variational or autoregressive modeling to exploit structural information that is intrinsic to the parametrization or the connectivity of the network, or to the domain itself.

1. Geometric Parametrization and Latent Field Models

Geometry-aware frameworks formalize architectural stochasticity through latent random fields defined on geometric manifolds or structured spaces. A representative example is the probabilistic model in which neural architectures themselves are instantiated as stochastic realizations of a latent anisotropic Gaussian random field on a compact, boundaryless, multiply-connected manifold. Here, both neuron locations and connectivity are random, induced by geodesic proximity and field affinity, with parameters governed by the solution to a covariance-driven stochastic PDE: $(\tau_0 - \nabla \cdot (K(x) \nabla)) U(x) = W(x)$ where the metric tensor field $K(x)$ encodes local anisotropy, and $W(x)$ is white noise on the manifold. This induces a spatial intensity for neuron sampling (from an inhomogeneous Poisson process with density proportional to field variance). Edge weights are then coupled to both geodesic and field-driven affinity, and sparsification is achieved by percentile-based diffusion masking, ultimately yielding a sparse, geometry-aware, random neural network whose stochasticity and expressivity are driven by the underlying manifold and SPDE hyperparameters (Soize, 11 Dec 2025).

This approach provides rigorous guarantees regarding well-posedness, measurability, and expressive stochastic capacity, allowing for learned, non-Gaussian stochastic mappings between inputs and outputs in neural systems, especially for highly structured domains.

2. Geometry-Aware Stochastic Search and Optimization in Architecture Space

Geometry-aware stochastic search refers to NAS algorithms that incorporate explicit geometric notions—such as neighborhoods, distances, and flatness—in the space of neural architectures. In discrete NAS spaces, architectures are often encoded as sequences or DAGs, and geometric structure is defined by Hamming-type distances representing minimal sequences of edits (edge operations, node additions, etc.). Neighborhoods, paths, and associated flatness/barrier metrics are then computed, revealing a clustered landscape in which high-performing architectures assemble in flat regions, while suboptimal ones are isolated.

Advanced geometry-aware optimization strategies, such as Architecture-Aware Minimization (A $^2$ M), inject explicit flatness-aware perturbations into the architecture parameterization (e.g., via the DARTS continuous relaxation). By biasing the stochastic gradient descent in the architecture space toward flat basins—using SAM-inspired finite-difference Hessian terms and local perturbation—the search avoids sharp or brittle solutions, attaining robust generalization and improved test accuracy across NAS-Bench-201 and DARTS search spaces (Gambella et al., 13 Mar 2025).

The geometry-aware stochastic perspective thus provides a mechanism to structure NAS search not only for performance, but also for robustness and transferability by aligning the stochastic exploration with the geometry of the loss landscape.

3. Stochastic Relaxation and Geometry-Aware Natural Gradients

Continuous relaxation and stochastic optimization methods play a foundational role in geometry-aware neural architecture search. Stochastic relaxation replaces combinatorial search over a discrete architecture variable $c$ with a parameterized probability distribution $p_\theta(c)$ , facilitating efficient gradient-based optimization via Monte Carlo sampling. The expectation

$J(x, \theta) = \mathbb{E}_{c \sim p_\theta} [f(x, c)]$

is maximized with respect to both network weights and architecture distribution parameters.

Natural gradient methods, which utilize the Fisher information matrix corresponding to the geometry of the statistical manifold of $p_\theta$ , deliver geometry-aware updates enjoying fast convergence and robust monotonic improvement. The adaptive stochastic natural gradient (ASNG) algorithm augments this approach by dynamically adjusting the effective trust region size in response to the local signal-to-noise ratio. This natural geometric coupling between architecture parametrization and stochastic optimization enables universal applicability across diverse search spaces and supports high-performance, sample-efficient NAS (Akimoto et al., 2019).

4. Mirror Descent, Bregman Geometry, and Sparsification

Geometry-aware optimization frameworks for NAS frequently rely on mirror descent, which generalizes gradient descent to non-Euclidean geometry. For architecture parameters that live on product-of-simplex spaces (e.g., the softmax weights over edge operations), using the negative entropy as a distance-generating function induces Kullback–Leibler geometry, enabling multiplicative exponentiated-gradient updates that inherently respect the simplex constraints and promote sparsity.

This Bregman geometry admits fast stochastic convergence ( $O(1/\sqrt{T})$ ) and dimension-robust performance, as the exponentiated updates naturally zero out less promising operations, enhancing the stability of discrete sampling post-optimization. When combined with appropriate continuous relaxations and co-designed optimizers, geometry-aware mirror-descent methods can match or outperform more elaborate bilevel algorithms on NAS-Bench-201, DARTS, and other state-of-the-art NAS benchmarks (Li et al., 2020).

5. Optimal Transport, Gradient Flows, and Semi-Discrete Geometry

Frameworks based on optimal transport principles endow the product space of network weights and discrete architectures ( $\mathbb{R}^d \times G$ for finite architecture graph $G$ ) with a metric akin to the $W_2$ Wasserstein distance. The resulting geometry supports rigorous variational analysis and gradient-flow dynamics for entropy-regularized objectives. In the semi-discrete setting, coupled reaction–diffusion PDEs govern the dynamics of particle ensembles representing both weights and architectures.

This stochastic geometry-aware paradigm allows, via JKO (Jordan–Kinderlehrer–Otto) minimizing-movement schemes, for particle-based stochastic search that intrinsically balances exploration (by Metropolis-weighted architecture moves) with exploitation (by negative gradient drift). These flows are grounded in the formal Riemannian geometric structure of the probability simplex over $\mathbb{R}^d \times G$ , and yield practical algorithms whose mean-field limit provably corresponds to the continuous flow, thereby unifying discrete combinatorial search with geometry-aware stochastic optimization (Trillos et al., 2020).

6. Geometry-Aware Modeling for Irregular Domains and Manifolds

For tasks involving data with intrinsic geometric structure (e.g., point clouds, meshes, or manifold-valued representations), geometry-aware stochastic neural architectures leverage kernels or random fields that directly encode manifold geometry. In Bayesian deep learning on manifolds, geometry-aware kernels (derived from Laplace–Beltrami diffusions or mean-curvature flows) yield priors and representations that natively encode geodesic distances and local curvature, supporting inferences, regressions, and feature aggregations that are consistent with the geometry of the input domain.

These constructions can be stacked hierarchically (e.g., deep GPs with geometry-aware kernels) and fused with neural networks for end-to-end learning, supporting both stochastic feature extraction and probabilistic inference on complex, irregular domains. Experimental evidence demonstrates improved sample efficiency and accuracy over geometry-agnostic methods in learning on manifolds (Fan et al., 2021).

7. Zero-Shot Generalization and Practical Implications

Geometry-aware stochastic architectures have demonstrated substantial practical benefits, particularly for applications demanding transferability and data-driven reasoning across structurally diverse settings. For instance, the Geometry-Aware Autoregressive Model (GAAM) for fast calorimeter simulation integrates per-cell geometry descriptors into a conditional autoregressive generative network, enabling simulation of unseen detector geometries without retraining and yielding over 50% reduction in Wasserstein distance relative to geometry-unaware baselines for physics-driven summary statistics (Liu et al., 2023).

Zero-shot generalization is evident when interpolating to new, untrained spatial segmentations, though reliable extrapolation beyond the convex hull of training geometries remains an open challenge. The ability to encode geometry in a learned embedding, and fuse this information into stochastic, likelihood-based architectures, is thereby crucial for foundational models seeking broad deployment across evolving domains and geometries.

In summary, geometry-aware stochastic neural architectures form a broad and formally grounded class of models and algorithms in which geometric constraints, spatial relations, or manifold structures are inherently embedded in the stochastic parametrization, optimization, or generative process. This enables principled handling of structured data, robust and transferable architecture search, and improved generalization—all while maintaining rigorous mathematical properties and scalable practical implementations (Liu et al., 2023, Gambella et al., 13 Mar 2025, Li et al., 2020, Trillos et al., 2020, Soize, 11 Dec 2025, Akimoto et al., 2019, Fan et al., 2021).