Papers
Topics
Authors
Recent
Search
2000 character limit reached

Local properties of neural networks through the lens of layer-wise Hessians

Published 20 Oct 2025 in cs.LG | (2510.17486v2)

Abstract: We introduce a methodology for analyzing neural networks through the lens of layer-wise Hessian matrices. The local Hessian of each functional block (layer) is defined as the matrix of second derivatives of a scalar function with respect to the parameters of that layer. This concept provides a formal tool for characterizing the local geometry of the parameter space. We show that the spectral properties of local Hessians, such as the distribution of eigenvalues, reveal quantitative patterns associated with overfitting, underparameterization, and expressivity in neural network architectures. We conduct an extensive empirical study involving 111 experiments across 37 datasets. The results demonstrate consistent structural regularities in the evolution of local Hessians during training and highlight correlations between their spectra and generalization performance. These findings establish a foundation for using local geometric analysis to guide the diagnosis and design of deep neural networks. The proposed framework connects optimization geometry with functional behavior and offers practical insight for improving network architectures and training stability.

Summary

  • The paper introduces a novel method using layer-wise Hessians to analyze the local geometry of neural networks and diagnose training behaviors.
  • It employs large-scale experiments across 37 datasets and 111 trials to correlate spectral properties with network performance.
  • The findings offer actionable guidelines for tuning architectures, addressing overfitting, and enhancing model generalization.

Analysis of Neural Networks through Layer-wise Hessians

Introduction

The paper "Local properties of neural networks through the lens of layer-wise Hessians" (2510.17486) introduces a novel approach for analyzing neural networks using layer-wise Hessians. This methodology formalizes the concept of local Hessians in neural networks to explore the geometry of the parameter space and provides insights into phenomena such as overfitting, underparameterization, and expressivity. Through a comprehensive empirical analysis involving 111 experiments across 37 datasets, the study establishes foundational diagnostics for neural architectures, enhancing our understanding of their training dynamics and generalization capabilities.

Methodology and Experimental Framework

To validate the proposed methodology, the study employed a large-scale experimental framework focusing on spectral properties of local Hessians across different neural network architectures. The networks varied in parameters such as the number of layers, weight initialization, and optimization algorithms, enabling the exploration of models ranging from small to overparameterized. The comprehensive data collection encompassed spectral characteristics of weights, gradients, local Hessians, and quality metrics across multiple checkpoints.

The experiments revealed that spectral characteristics serve as potent indicators of a network's internal structure and functional behavior. Canonical correlation analysis was used to establish relationships between quality metrics and spectral properties of network parameters, highlighting the dependence on architectural choices.

Spectral Analysis Findings

A salient observation from the study is the pronounced differences in spectral properties among architectures, particularly in gradient propagation dynamics and Hessian eigenvalue distributions. Large architectures were shown to exhibit more robust spectral characteristics, signaling improved generalization capabilities. Figure 1

Figure 1: Comparison of CCA Score statistics across architectures. Large architectures (huge'') exhibit the highest stability with standard deviation of 0.082, while small architectures (no'') show extreme variability (std=0.976).

The spectral analysis of gradients and local Hessians underscores the transformative impact of architecture size, where substantial variance changes were observed in both gradient propagation and Hessian structure, suggesting different optimization landscapes. Figure 2

Figure 2: Comparison of spectral characteristics of third-layer gradients. The huge'' architecture shows [PSD](https://www.emergentmind.com/topics/perturbed-saddle-escape-descent-psd-algorithm) values exceeding theno'' architecture by over 100 times, indicating qualitatively different gradient propagation dynamics.

Implications and Practical Guidelines

The study offers practical guidelines for optimizing neural network architectures. Recommendations include balancing parameter allocation across layers, detecting insufficient expressivity via spectral analysis, and identifying overfitting through Hessian eigenvalue concentration. Additionally, adaptations in optimizer strategies based on Hessian condition numbers could refine training dynamics. Figure 3

Figure 3: Distribution of canonical X-weights across architectures. Large architectures (huge'') show more uniform distribution, while small architectures (no'') exhibit concentrated structure with dominant components.

Beyond architecture optimization, the findings posit that local Hessian analysis can act as a diagnostic tool for identifying hidden training issues, refining architecture design, and enhancing model stability.

Conclusion

This work significantly contributes to the theoretical and empirical understanding of neural network dynamics, offering robust methodologies for diagnostics and architecture improvement. By leveraging the local geometric properties of neural networks, the study provides actionable insights that encapsulate the crucial role of Hessians in model performance evaluation.

Future research directions suggested by the authors involve further exploration of layer-specific spectral properties, application of the methodology to novel architectures, and potential automation of architecture optimization processes based on local Hessian analysis. These initiatives promise to elevate the design and functionality of neural networks, fostering advancements in AI research and application. Figure 4

Figure 4: Distribution of architectures in the space of spectral characteristics of Hessians after dimensionality reduction. Three distinct clusters correspond to small (no''), medium (sure''), and large (``huge'') architectures, indicating qualitative differences in their parameter spaces.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.