Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quantization robustness from dense representations of sparse functions in high-capacity kernel associative memory

Published 22 Apr 2026 in cs.NE | (2604.20333v1)

Abstract: High-capacity associative memories based on Kernel Logistic Regression (KLR) are known for their exceptional performance but are hindered by high computational costs. This paper investigates the compressibility of KLR-trained Hopfield networks to understand the geometric principles of its robust encoding. We provide a comprehensive geometric theory based on spontaneous symmetry breaking and Walsh analysis, and validate it with compression experiments (quantization and pruning). Our experiments reveal a striking contrast: the network is extremely robust to low-precision quantization but highly sensitive to pruning. Our theory explains this via a ``sparse function, dense representation'' principle, where a sparse input mapping is implemented with a dense, bimodal parameterization. Our findings not only provide a practical path to hardware-efficient kernel memories but also offer new insights into the geometric principles of robust representation in neural systems.

Authors (1)

Summary

  • The paper demonstrates that high-capacity KLR Hopfield networks maintain perfect recall accuracy under extreme quantization by aligning weights to a bimodal distribution.
  • The study applies geometric and Walsh analysis to reveal a dense representation, sparse function duality that underpins the network’s robustness.
  • Empirical results indicate that while extreme quantization preserves attractor basins, even modest magnitude pruning leads to catastrophic performance loss.

Quantization Robustness in High-Capacity Kernel Associative Memory

Introduction

This paper presents an in-depth analysis of the quantization robustness and sparsity sensitivity of high-capacity Hopfield networks trained via kernel logistic regression (KLR). The study investigates the geometric origins and practical implications of a phenomenon in which such networks, optimized on the "Ridge of Optimization", exhibit extreme robustness to weight quantization but are highly sensitive to magnitude pruning. The findings rely on combining rigorous empirical analysis with a geometric interpretation centered around spontaneous symmetry breaking and Walsh analysis, culminating in the identification of a “sparse function, dense representation” duality. This work not only establishes a theoretical framework for understanding compression robustness in kernel memories but also emphasizes implications for neuromorphic hardware and biological plausibility.

Empirical Dichotomy: Robustness to Quantization Versus Sensitivity to Pruning

Extensive post-training compression experiments reveal a stark dichotomy in the resilience of KLR-trained Hopfield networks. The networks maintain perfect recall accuracy under extreme quantization, retaining performance at 2-bit and even 1-bit precision. Bit accuracy and stability margin are unaffected until the bit depth approaches the extreme, and even then, the sign of the retrieval force is preserved, maintaining deep attractor basins. This phenomenon, termed "Quantization Saturation," is attributed to a bimodal self-organization of the dual variables, amplifying retrieval forces despite coarse parameter discretization. Figure 1

Figure 1: Bit accuracy and stability margin are invariant under decreasing quantization bit depth, demonstrating exceptional robustness to uniform quantization.

The situation is inverted for magnitude pruning (sparsification). Even modest sparsity results in catastrophic performance loss, indicating an essential role for all weights, including those of small magnitude. Pruning does not merely remove redundant parameters but destroys distributed information, reflecting dense parameter reliance. Figure 2

Figure 2: Severe degradation of performance occurs with increasing sparsity, indicating pronounced sensitivity to magnitude pruning.

This empirically observed contradiction motivates the need for a geometric account of how information is distributed and safeguarded in KLR Hopfield networks.

Geometric Theory: Bimodality, Walsh Influence, and Representational Duality

Symmetry Breaking and Bimodal Distribution

The emergence of a bimodal weight distribution is explained by a force-balance interpretation of the KLR objective. The tension between L2L_2 regularization (favoring weight shrinkage) and the margin-maximizing data term (favoring large, separable weights) results in spontaneous symmetry breaking. This creates a double-well potential landscape in parameter space, collapsing the distribution of learned weights to two distinct non-zero values symmetrically placed around the origin. Figure 3

Figure 3: The learned weights form a bimodal distribution, strongly localized around ±3\pm 3 with a paucity near zero.

This naturally binarized parameter regime induces an intrinsic robustness to quantization: quantization merely aligns with the existing weight clusters and seldom introduces disruptive error.

Walsh Influence Analysis

A Walsh analysis of the input-output mapping reveals that, functionally, the network performs feature selection, with only a subset of input dimensions exerting significant influence. However, in the natural KLR regime (as opposed to L1L_1-regularized/lasso regimes), the parameterization is dense: many weights have nontrivial cross-influence, distributing information in a holographic code. This “dense representation, sparse influence” structure is fundamental for the observed dichotomy. Figure 4

Figure 4: Dense cross-influence tail under L2L_2 regularization compared to highly skewed, sparser influence with L1L_1 regularization.

Quantitative Scaling of Degradation

The scaling of quantization-induced degradation follows a sub-quadratic power law with respect to the quantization step size—a predictable, smooth response consistent with high curvature of the loss landscape near the Ridge. The observed exponent is slightly less than quadratic, explained by the anisotropy and non-Gaussianity of the parameter space. Figure 5

Figure 5: Stability margin degradation follows a robust power-law with quantization granularity, confirming geometric smoothness.

Ridge Geometry and Locality

The quantization robustness is maximized on the Ridge of Optimization, with the effect degrading rapidly both toward more global regimes (lower γ\gamma) and toward high locality (higher γ\gamma), the latter corresponding to flatter landscapes, as evidenced by asymptotic insensitivity to quantization. Figure 6

Figure 6: Quantization sensitivity exhibits a minimal valley aligned with the Ridge, confirming it as a robustness sweet spot.

Experimental Validation and Practical Consequences

Noise robustness comparisons between full-precision and quantized (2-bit) networks demonstrate that attractor depths and recall accuracy are largely invariant to quantization, even for substantial input corruption. The advantage of quantization is not offset by any practical loss in functional capacity. Figure 7

Figure 7: Bit-wise recall accuracy under noise is preserved between full-precision and 2-bit quantized models.

The generality of these phenomena is established across different storage loads, confirming that dense/bimodal parameterization is a universal feature of high-capacity, optimally loaded KLR memories. Figure 8

Figure 8

Figure 8: The quantization robustness persists even at reduced storage load, confirming generality of the results.

Conversely, shifting away from the Ridge, especially toward overloaded regimes, produces catastrophic degradation with increased quantization. Figure 9

Figure 9: Stability margin degrades rapidly under quantization in the local regime, emphasizing the specificity of the Ridge region.

Broader Implications

Hardware-Efficient Neuromorphic Designs

The results provide a compelling blueprint for edge devices and neuromorphic architectures: robustness to low-precision arithmetic enables dramatic reductions in memory and computational complexity, while dense parameterization supports error-tolerance and high capacity. Unlike approaches emphasizing sparsity for efficiency, maximizing quantization insensitivity through dense, holographic codes is shown to be superior for memory systems. Bitwise computations, such as XNOR and population counts, can replace floating-point arithmetic, yielding substantial energy and speed improvements.

Biological Plausibility

The principles identified resonate with experimental constraints on biological memory. Hippocampal synapses exhibit low precision and highly overlapping, distributed connectivity. The “dense, bimodal parameter; sparse functional influence” motif may represent a convergent solution for robust associative memory in both artificial and biological substrates.

Theoretical Implications

Spectral analysis connects the observed robustness with Fisher information spectra: sharp, dominant eigenmodes in the parameter space enable stability and high capacity near the Ridge. This supports a positive re-interpretation of so-called “pathological” spectral bias as advantageous for reliable memory in kernel-based neural systems.

Conclusion

By systematically dissecting the compression behavior of KLR-trained Hopfield networks, this study establishes a geometric foundation for their exceptional quantization robustness and pronounced pruning sensitivity. The work articulates and empirically validates a representational duality: the network achieves a sparse functional mapping in input space, but stores it with dense, intrinsically digital (bimodal) parameters. Quantization aligns with this digital structure, preserving attractor landscapes and recall capabilities, while attempts to sparsify weights are destructive.

This dual strategy not only underlies robust, energy-efficient storage but frames the Ridge of Optimization as a critical sweet spot balancing capacity against catastrophic interference, and bridging theoretical high-dimensional models with practical neuromorphic and biological instantiations.

Reference: "Quantization robustness from dense representations of sparse functions in high-capacity kernel associative memory" (2604.20333)

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.