Fractal & Regular Geometry in DNNs

Updated 12 February 2026

Fractal and regular geometry is a framework for understanding deep neural networks, blending self-similar, multiscale patterns with locally affine transformations.
Rigorous methods like box-counting, Hausdorff dimension, and persistent homology quantify intricate network structures that influence convergence and functional richness.
Architectural innovations such as FractalNet utilize recursive, self-similar motifs to achieve parameter efficiency and improved performance across diverse scales.

Fractal and regular geometry are fundamental to understanding the expressivity, dynamics, and generalization phenomena of deep neural networks (DNNs). At multiple scales—spanning weight spaces, internal representations, optimization trajectories, and loss landscapes—DNNs integrate simple regular structures with multiscale, self-similar, and fractal patterns. Rigorous analysis leveraging box-counting, Hausdorff, and persistent homology dimensions has revealed that the geometric complexity of neural networks both constrains and enables their functional richness.

1. Fractal Structure in Optimization and Trainability Boundaries

A canonical insight into the fractal geometry of deep learning emerges from examining the convergence properties of gradient-based training algorithms. Standard full-batch gradient descent on parameter vector $\theta\in\mathbb{R}^P$ with learning rate $\eta$ is an iteration of the map

$\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$

analogous to function iteration in classical complex dynamics (e.g., the quadratic map underlying Mandelbrot and Julia sets). Here, the outcome (convergent or divergent optimization) depends sensitively on hyperparameters (such as $\eta$ ), and the locus in hyperparameter space separating convergent from divergent regimes—the "trainability boundary"—is found to be fractal.

The formal definition of the trainability boundary in a 2D hyperparameter slice $H$ involves running the optimizer for $T$ steps at each point $h\in H$ and partitioning $H$ into convergent ( $T_{\mathrm{conv}}$ ) and divergent ( $T_{\mathrm{div}}$ ) subsets. The boundary $\eta$ 0 is the closure of $\eta$ 1. Experimental visualizations using grids of $\eta$ 2 and zoom sequences spanning more than 10 decades reveal persistent, self-similar structures. Quantitative box-counting yields non-integer boundary dimensions ( $\eta$ 3) in the range $\eta$ 4– $\eta$ 5 across architectures and data regimes, confirming bona fide fractality: for example, $\eta$ 6 for fully batch ReLU networks, $\eta$ 7– $\eta$ 8 for nonlinear (tanh) or stochastic (minibatch) setups, and $\eta$ 9 for (learning rate, init-scale) grids (Sohl-dickstein, 2024).

This establishes that the apparent "edge of chaos" in neural hyperparameter space is not a smooth manifold but a fractal frontier.

2. Fractal and Regular Geometry in Network Functions

The interplay between fractal and regular geometry is reflected in the expressive power of feedforward networks. A ReLU DNN implements a continuous piecewise-linear (CPwL) map; locally, each linear region is a regular affine transformation, but the global partition of input space can be extremely intricate and, under specific constructions, exhibit self-similar, fractal structure.

A key mechanism for fractal function generation comes via Iterated Function Systems (IFS): starting from contractive affine maps $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 0, repeated application yields a sequence of nested convex sets or polytopes converging to a fractal attractor. The corresponding indicator functions $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 1, each CPwL and with a number of linear regions $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 2, can be exactly implemented via a deep ReLU network with $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 3 parameters—i.e., parameter efficiency coexists with exponential region proliferation (Dym et al., 2019).

This construction exemplifies how DNN depth enables encoding of highly "irregular" (fractal) functions, despite only using elementary (regular) nonlinearities. In practice, real networks globally combine regular partitions with hierarchical self-similarity, encoding fractal boundary sets with affine interiors.

3. Statistical Geometry: Activation Regularity, Depth, and Boundary Roughness

The geometric character of network decision surfaces—and its evolution with depth and activation choice—is rigorously analyzed using excursion set geometry and Hausdorff dimension. For infinitely wide random networks, let $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 4 describe the network function at depth $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 5. The boundary (level set) $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 6, for a fixed threshold $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 7, exhibits distinct regimes:

Non-regular (fractal) activations (e.g., Heaviside): the boundary's Hausdorff dimension increases monotonically with depth, $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 8, approaching the full input dimension ( $\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t) \equiv f(\theta_t;\eta),$ 9) for large $\eta$ 0; deeper networks realize rough, fractal-like classifiers.
Regular activations (ReLU, tanh, logistic): the expected boundary volume, governed by a spectral parameter $\eta$ $η$ 1 linked to the activation kernel, exhibits a trichotomy:
- $\eta$ 2: boundary volume decays to zero (over-regularization);
- $\eta$ 3: boundary volume is constant (critical regime);
- $\eta$ 4: exponential growth with depth (chaotic proliferation of boundaries).

Monte Carlo simulations confirm that, for non-smooth activations, increased depth produces boundary roughness and divergent nodal length, while for smooth cases the boundary complexity is tightly controlled by $\eta$ 5 (Lillo et al., 8 Apr 2025).

4. Fractal Geometry of Weights, Representations, and Learning Trajectories

Multiple geometric frameworks have been developed for quantifying the fractal dimension of neural network weights, representations, and parameter evolution:

Weight-space fractality is captured through coarse group actions on discrete grids; using recursively defined "fractal transformations" $\eta$ 6, segmentation and box-counting across scales yield Hausdorff-Besicovitch dimensions $\eta$ 7 for network layers. Most architectures combine Euclidean ( $\eta$ 8) and fractal ( $\eta$ 9) behaviors depending on scale; skip connections and architectural symmetry induce persistent self-similarity (Moharil et al., 18 Mar 2025).
Intrinsic and persistent homology dimension ( $H$ 0) of SGD trajectories—estimated via topological data analysis—quantifies the effective fractal dimension of the weight path. Empirically, lower $H$ 1 strongly predicts better generalization, providing a topological capacity control unrelated to parameter count (Birdal et al., 2021).
Internal representations: Layerwise activations, treated as sampled manifolds, are analyzed with persistent homological fractal dimension (PHdim). Convolutional networks exhibit a "hump" in PHdim at intermediate depth (peak feature diversity), with simplification towards the final layer. Transformers and attention-based models retain higher PHdim throughout, reflecting distributed representation (Magai, 2023).

These analyses clarify that fractal complexity is manifest both in raw weights and the geometry of forward or backward signal and information propagation.

5. Fractal Frontiers, Finite Size Effects, and Information Propagation

Mean-field theory of information propagation in deep random networks predicts sharp phase boundaries (e.g., the edge of chaos) that are smooth in infinite width. However, finite width introduces stochasticity: in a deep network, the one-step recurrence for input similarity (correlation limit) becomes a noisy map,

$H$ 2

where $H$ 3 are random fluctuations. The attractor basin boundaries in hyperparameter space (e.g., $H$ 4 plane) then generically become fractals of dimension $H$ 5– $H$ 6 for $H$ 7– $H$ 8, rather than smooth curves ( $H$ 9). Extensions to convolutional and Fourier-structured layers preserve this fractal nature, which is universal across architectures (D'Inverno et al., 5 Aug 2025).

From a design perspective, this yields a "Cantor-band" of acceptable hyperparameters at finite $T$ 0; as depth increases, the effective safety margin required in hyperparameter selection scales as $T$ 1.

6. Architectures and Learning Dynamics: Explicit Fractality and Scale-Invariance

Certain deep architectures explicitly incorporate fractal geometry at the macro level. FractalNet replaces residual connections with self-similar, recursively-defined expansion rules producing a hierarchy of subpaths whose lengths follow a truncated fractal pattern. Drop-path regularization forces independent competence in all subpaths and supports "anytime" inference, where subnetworks of varying depth can be queried for accuracy/latency trade-offs. Empirically, FractalNet exhibits trainability and test error matching or exceeding standard ResNets, yet lacks explicit identity connections (Larsson et al., 2016).

Scale-invariant diagnostic methodologies measure box-counting fractal dimension and "roughness" across weight segments and layers, revealing a correlation between intermediate fractal dimension ( $T$ 2) and optimal generalization. Phase-flow analysis shows stabilization of fractal measures and the emergence of attractors during training, linking chaotic exploration to eventual convergence (Moharil et al., 2024).

7. Fractal Features, CNN Limitations, and Hybrid Geometries

Attempts to probe whether CNNs intrinsically learn fractal features show that standard architectures are not aligned with global fractal dimensionality: CKA and CCA similarities between fractal-based features and deep layer representations are negligible ( $T$ 3), in contrast to high similarities for classical texture or edge features. However, shallow networks trained directly on zoomed-in fractal features efficiently solve specific classification tasks—particularly those involving scale-invariant structure—with up to 30% higher accuracy and 84% less training time than deep CNNs (Zini et al., 2024).

A plausible implication is that classical CNNs, characterized by regular, fixed-scale receptive fields, do not automatically acquire scale-invariant discriminatory statistics, unless augmented with specific architectural or loss-based fractal priors.

In summary, the geometry of deep neural networks is a hybrid of regular (locally affine, smooth) and fractal (self-similar, scale-invariant, multiscale) attributes across all facets: parameter spaces, functional representations, learning dynamics, and architectural motifs. Fractal structure explains the notorious sensitivity of hyperparameter optimization, the capacity for vast expressivity, and the nuanced behavior of generalization and regularization. The emerging field systematically applying fractal geometry, persistent homology, and related tools provides a quantitative and predictive theory of DNN geometry that unifies disparate observations across architectures, tasks, and regimes (Sohl-dickstein, 2024, Dym et al., 2019, Lillo et al., 8 Apr 2025, Birdal et al., 2021, Magai, 2023, Amari et al., 2018, Moharil et al., 18 Mar 2025, Moharil et al., 2024, Zini et al., 2024, D'Inverno et al., 5 Aug 2025).