Perceptual Manifolds: A Geometric Framework

Updated 2 July 2026

Perceptual manifolds are geometric representations capturing all identity-preserving variations of a percept through neural or model responses.
They provide a framework for invariant object recognition by analyzing manifold geometry, dimensionality, and separation across biological and artificial systems.
Applications include enhancing generative models, quality assessment, and developing metrics for neural coding and deep network representation.

A perceptual manifold (PM) is the set of neural, behavioral, or model states elicited by all physically equivalent variations of a particular percept—such as the ensemble of neuronal activations representing an object under identity-preserving transformations, or the subspace of deep neural network features encoding all views of a 3D shape. PMs provide a unifying geometric framework for describing, analyzing, and benchmarking both biological and artificial perceptual systems. The geometry, dimensionality, and capacity of PMs underpin theories of invariant object recognition, neural coding, evaluation metrics in sensory source separation, and generative models. This entry systematically surveys theoretical foundations, mathematical modeling, empirical estimators, and practical applications of perceptual manifolds across neuroscience, computer vision, and machine learning.

1. Mathematical Definitions and Core Models

The central formalization of a perceptual manifold is as follows. In a neural or model response space $\mathbb R^N$ , each object, stimulus, or percept $\mu$ under a set of identity-preserving transformations $s \in S \subseteq \mathbb R^D$ sweeps a set

$M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$

where $\mathbf x_0^\mu$ is the manifold center, $\{\mathbf u_i^\mu\}$ an orthonormal basis for the local directions of variability, and $f(s)=0$ specifies the constraint boundary (e.g., $\|\mathbf s\| \leq 1$ for $L_2$ balls or polytopic bounds for discrete sample clouds) (Chung et al., 2017, Chung et al., 2015, Chung, 2021). In deep networks, the PM is the set of feature vectors associated with all appearances (e.g., views) of a single object instance (Lin et al., 2017).

Key geometric invariants:

Manifold dimension ( $D$ ): Degrees of freedom along which responses vary.
Manifold radius ( $\mu$ 0): Typical maximum response displacement from the center.
Manifold shape: Defined by the convex hull of allowed response variations (e.g., ellipsoid, polytope, ring).

In multi-oscillator neurodynamical models, a PM can emerge as the steady-state subspace $\mu$ 1 of coupled processes $\mu$ 2 where $\mu$ 3 encodes perceptual distances, making the steady-state isomorphic to the metric structure of the perceptual domain (e.g., discrete line, color space) (Kraikivski, 2019).

2. Perceptual Manifolds in Biological and Artificial Vision

Neural populations represent each object or percept by a manifold of response vectors as the object is transformed by pose, scaling, illumination, or other nuisance variables. In both biological and model systems, these manifolds tend to be smooth, low-dimensional, and high-dimensional in the ambient response space (Chung, 2021, Chung et al., 2017, Chung et al., 2015, Lin et al., 2017).

In deep convolutional networks, PMs arise in the feature space as the set of embedded representations from all possible views of an object $\mu$ 4, $\mu$ 5, where $\mu$ 6 denotes each input appearance, and $\mu$ 7 the network mapping (Lin et al., 2017). The tightness (low intra-object variance) and separation (high inter-object variance) of PMs after re-training with object-persistence constraints strongly predicts view-invariant recognition and aligns with human similarity judgments (Lin et al., 2017).

Empirical studies of neural and deep network representations show that successive processing stages reduce the PMs’ radii and dimensions, thereby increasing their linear separability and “untangling” object representations for readout neurons or classifiers (Chung, 2021, Chung et al., 2015). For example, in ImageNet-trained networks, effective manifold radius $\mu$ 8 and dimension $\mu$ 9 decrease from early to late layers, while the classification capacity—number of linearly separable objects per neuron—increases correspondingly (Chung, 2021).

3. Classification Capacity and Manifold Geometry

A foundational result is that PMs’ geometry controls the linear classification capacity $s \in S \subseteq \mathbb R^D$ 0 (number of separable objects $s \in S \subseteq \mathbb R^D$ 1 per neuron $s \in S \subseteq \mathbb R^D$ 2), a key theoretical metric for evaluating representations (Chung et al., 2015, Chung et al., 2017, Chung, 2021). The main capacity results are:

For $s \in S \subseteq \mathbb R^D$ 3-ball manifolds of radius $s \in S \subseteq \mathbb R^D$ 4 in $s \in S \subseteq \mathbb R^D$ 5 dimensions, capacity satisfies

$s \in S \subseteq \mathbb R^D$ 6

with analytic simplifications for key limits, e.g., $s \in S \subseteq \mathbb R^D$ 7 for line segments.

For arbitrary convex PMs in high $s \in S \subseteq \mathbb R^D$ 8, only the effective radius $s \in S \subseteq \mathbb R^D$ 9 and effective dimension $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 0 matter:

$M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 1

where $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 2, $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 3, and $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 4 is the anchor point touching the margin in convex decomposition (Chung et al., 2017).

This statistical-mechanical framework generalizes Gardner’s point capacity to arbitrary manifold geometries, including ellipsoids, $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 5 polytopes, and smooth ring manifolds (Chung et al., 2017, Chung et al., 2015, Chung, 2021). The theory predicts how label sparsity, manifold correlations, and geometry affect readout capacity, establishing precise design targets for representation learning.

4. Perceptual Manifolds in Generative and Quality Models

Recent work in generative diffusion models utilizes the PM framework to concentrate samples on the subset of images that match human perceptual quality. In latent diffusion, the perceptual manifold $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 6 is defined as a submanifold of the data manifold $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 7 or its noisy versions $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 8, consisting of latents $M_\mu = \left\{ \mathbf x^\mu(s) = \mathbf x_0^\mu + \sum_{i=1}^D s_i\,\mathbf u_i^\mu \;\middle|\; f(s) \leq 0 \right\}$ 9 whose decoded outputs $\mathbf x_0^\mu$ 0 are perceptually consistent with reference features $\mathbf x_0^\mu$ 1 (Saini et al., 31 May 2025). Perceptual consistency is enforced during sampling by adding gradients of perceptual loss, e.g., $\mathbf x_0^\mu$ 2, alongside standard data terms.

Pixel-level diffusion models (e.g., PixelGen) extend this approach by learning not the full image manifold $\mathbf x_0^\mu$ 3 but its perceptually meaningful submanifold $\mathbf x_0^\mu$ 4, defined by proximity in fixed feature spaces (e.g., VGG for local texture, DINO for global semantics) (Ma et al., 2 Feb 2026). Explicit perceptual losses and feature alignment during training bias the generative process toward images that reside on $\mathbf x_0^\mu$ 5, improving alignment with human perceptual judgments and sample diversity across scales.

5. Algorithmic Estimation and Applications

In practice, PMs are estimated through both explicit manifold learning and deep feature-based distance metrics. Key computational pipelines include:

Animation video resequencing: A perceptual manifold $\mathbf x_0^\mu$ 6 is constructed as a finite set $\mathbf x_0^\mu$ 7 endowed with a learned pairwise distance, where $\mathbf x_0^\mu$ 8 averages per-channel CNN feature differences weighted and calibrated via human judgments. The graph structure built from these distances enables shortest-path or spanning-tree traversals for novel sequence generation (Morace et al., 2021).
Quality assessment in audio source separation: The PM is realized via self-supervised embeddings of reference waveforms and their fundamental distortions. Diffusion maps yield a manifold where Mahalanobis distances from system outputs to their attributed reference-distortion clusters quantify self-distortion. The Perceptual Match (PM) score is then computed from the upper tail of a fit gamma distribution over these distances, providing granular, differentiable, and uncertainty-quantified perceptual evaluation (Ivry et al., 11 Sep 2025).

Empirical benchmarks demonstrate that PM-based metrics achieve near-perfect correspondence with mean-opinion scores for image and audio quality, surpassing traditional signal-based or hand-crafted methods. In animation resequencing, PM construction via metric learning and manifolds enables smooth transitions and novel arrangements unsupervised (Morace et al., 2021).

6. Extensions: Neurodynamics and Cortical Manifold Structures

Beyond purely geometric or algebraic constructions, PMs can be encoded by dynamical systems. In Kraikivski’s oscillator framework, each node (process) $\mathbf x_0^\mu$ 9 is coupled to all others by weights proportional to perceptual distances, and the steady-state amplitudes realize a manifold isomorphic to the perceptual metric space (e.g., Euclidean spatial line, color differences) (Kraikivski, 2019). This network exhibits self-interpretation: each process is algebraically reconstructible from its complement, reflecting a completeness property connected to consciousness theories.

In biologically-motivated geometry, Barbieri et al. formalize the primary visual cortex’s functionality via a 5D contact manifold $\{\mathbf u_i^\mu\}$ 0 with coordinates (position, time, orientation, velocity). The manifold’s sub-Riemannian structure and associated stochastic kernels implement Gestalt principles for contour and motion integration. Connectivity in this manifold encodes association fields and trajectory prediction (Barbieri et al., 2013).

7. Generalizations and Implications

The PM formalism extends naturally to diverse perceptual domains:

Any metric stimulus space (space, color, timbre) admits a representation as a PM via appropriate choice of distance or feature metric (Kraikivski, 2019).
In deep learning, view- and instance-level PMs can be untangled via metric learning with object-persistence constraints, yielding representations transferable to novel object categories and aligning with human similarity structure (Lin et al., 2017).
Statistical-mechanical extensions handle correlated manifolds, heterogeneous mixtures, sparse labels, and nonlinear embedding maps (Chung et al., 2017, Chung, 2021).

PMs thus provide a mathematically rigorous, data-driven architecture for understanding, comparing, and optimizing complex perceptual systems—both natural and artificial—across modalities and tasks. Their role in quantifying invariance, capacity, perceptual similarity, and generative consistency continues to drive methodological and empirical advances in neuroscience and machine learning.