QK Normalization: Methods & Applications

Updated 2 October 2025

QK Normalization is a framework combining quaternionic-Kähler geometry, Query/Key vector scaling in transformers, and algorithmic normalization methods.
It employs geometric correspondences, Rogers dilogarithm identities, and cluster algebra mutations to maintain metric and algebraic consistency.
Its applications span differential geometry, quantum information, and deep learning, driving advances in both theoretical physics and modern machine learning.

QK Normalization encompasses a spectrum of concepts and methodologies centered around the normalization or regularization of structures involving Quaternionic-Kähler (QK) geometry, Query/Key (QK) vectors in transformers, Quaternionic polynomials, and their functional-analytic and algorithmic analogs. Its application spans differential geometry, representation theory, complex analysis, algebraic geometry, deep learning, and quantum information. Central themes include dual correspondences (notably QK/HK), normalization procedures ensuring geometric, algebraic, or computational consistency, and normalization schemes in learning architectures that regulate scaling, stability, and representation. This article presents core principles, mathematical constructions, and contemporary algorithms and applications.

1. QK/HK Geometric Correspondence and Metric Normalization

QK normalization is rooted in the interplay between quaternionic-Kähler manifolds and their hyperkähler (HK) duals. The QK/HK correspondence establishes a duality, whereby a QK manifold $M$ with quaternionic isometry is matched to a HK manifold $\tilde{M}$ with a rotational isometry, together with a hyperholomorphic line (or circle) bundle. The procedure is implemented via the Swann bundle $S$ , a hyperkähler cone over $M$ , and a hyperkähler quotient at nonzero moment map level, resulting in $P(\vec{r}) = \{s \in S: \vec{\mu}(s) = \vec{r}\}$ as a circle bundle over $\tilde{M}$ , with the quotient metric hyperkähler and the Levi-Civita connection inducing a hyperholomorphic connection on $L \to \tilde{M}$ .

At the twistor space level, Darboux coordinates and transition functions from the QK and HK sides are identified, yielding a geometric encoding of the normalization problem. For D-instanton corrections in string theory, or BPS state counting in gauge theory, nontrivial jumps induced by wall-crossing are controlled by functional relations of the Rogers dilogarithm function, ensuring global consistency of the metric and connection (Alexandrov et al., 2011).

2. Wall-Crossing, Dilogarithm Identities, and Cluster Algebraic Structures

In four-dimensional $\mathcal{N}=2$ theories, normalization is governed by the Kontsevich–Soibelman (KS) wall-crossing formula, which encapsulates the discontinuous jumps in BPS indices $\Omega(\gamma, z)$ as the moduli cross walls of marginal stability. Twistor coordinates $X_\gamma = e^{-2\pi i \langle \gamma, \Xi \rangle}$ undergo symplectomorphisms $U_\gamma$ parameterized by these indices. The motivic (quantum) wall-crossing formula ensures that the composition of these symplectomorphisms leaves the overall symplectic structure invariant.

Upon lifting to contact transformations, the discontinuities in the contact coordinate $\alpha$ are succinctly captured by the Rogers dilogarithm:

$\Delta_\gamma = \frac{\Omega(\gamma)}{2\pi^2} L_{\sigma(\gamma)}(X_\gamma),$

with $L(z) = \mathrm{Li}_2(z) + \frac{1}{2} \log(z)\log(1-z)$ and quadratic refinement $\sigma(\gamma)$ . Functional identities (five-term, six-term, eight-term) among dilogarithms correspond precisely to the cluster algebra mutation sequences associated with Dynkin quivers ( $A_2$ , $B_2$ , $G_2$ ). These identities guarantee that global normalization of the QK metric is obtained as the cancellation of accumulated translations, underpinning the algebraic stability of the moduli space (Alexandrov et al., 2011).

3. Extensions: Para-QK, Quantile, and Topological Normalization

The normalization paradigm extends to para-quaternionic geometries, where the underlying complex structure algebra is altered (via sign permutations), and the HK/QK correspondence is generalized to para-HK/QK correspondence, constructing metrics on para-quaternionic Kähler manifolds with similar normalization guarantees. One-parameter deformations, especially in supergravity c-map metrics, reside within this framework, with normalization conditions reflected in reduced scalar curvature and closure of fundamental four-forms (Dyckmanns et al., 2016).

In function spaces, normalization also arises in QK-type analytic function spaces $\mathcal{Q}_K$ , where the norm is defined via a kernel K and modulus conditions. Notably, for Hardy space functions $f \in H^2$ :

$f \in \mathcal{Q}_K \iff |f| \in \mathcal{Q}_K(\partial\mathbb{D}) \text{ and } \sup_{a\in\mathbb{D}}\int_\mathbb{D} \frac {\left(\int_{\partial\mathbb{D}}|f(\zeta)|d\mu_z(\zeta)-|f(z)|\right)^2} {(1-|z|^2)^2} K(1-|\sigma_a(z)|^2)dA(z) < \infty,$

providing a normalization criterion in terms of the modulus (Bao et al., 2016). Inner-outer factorization also hinges on a normalization involving interaction of the outer function's norm and the inner function's "defect".

In topology, the notion of $Q^*$ -normal space generalizes normality (as separation via open sets) to $Q^*$ -closed sets (those with empty interior), and several preservation theorems establish that $Q^*$ -normality is retained under images of open continuous injective maps and almost $Q^*g$ -continuous surjections. These results reflect an underlying normalization principle in the separation of sets of prescribed closure properties (Kumar et al., 18 Jun 2025).

4. Algorithmic and Computational Normalization

Normalization arises algorithmically in algebraic geometry, notably in the computation of normalizations of affine algebras or polynomial rings. Parallel and modular algorithms stratify the singular locus $\mathrm{Sing}(A)$ , compute local normalizations in each stratum, and patch the results to recover the global normalization:

$\bar{A} = \sum_{V \in \mathrm{Strata}(A)} A(V),$

implemented in the SINGULAR system, and achieving substantial complexity reductions via modular Gröbner bases and Chinese remainder theorem lifting, especially when $K = \mathbb{Q}$ (Boehm et al., 2011).

In quaternionic polynomial algebras, the normalization of polynomials is achieved via certification of a non-commutative reduced Gröbner basis under a conjugate-alternating order, together with novel reduction techniques controlling subsequences and local jumping heads. The normal form of a polynomial is computed by top reduction with respect to the certified basis, enabling efficient manipulation in applications in geometry, imaging, and control (Li et al., 21 Apr 2025).

Quantum algorithms further accelerate orthogonal normalization and QR decomposition by deploying quantum phase estimation and Hamiltonian simulation, extracting the orthogonal complement in one unitary step. The query complexity achieves polynomial acceleration ( $O(N^2\mathrm{poly}(\log N))$ ), robust under repeated measurements, and guarantees nearly orthogonal output vectors, opening avenues for scalable quantum linear algebra (Li et al., 26 Dec 2024).

5. QK Normalization in Machine Learning and Deep Architectures

In deep learning, QK normalization refers to explicit normalization of Query and Key vectors in transformer architectures, crucial for training stability, controlling logit magnitudes, preventing gradient saturation, and allowing higher learning rates. Canonical strategies include applying LayerNorm to Q and K vectors before dot-product attention ("QK_norm"), or normalizing after QKV linear projections ("QKV_norm"), often in conjunction with softmax capping to avoid extreme softmax outputs:

$\text{logits} = (1/\sqrt{d})\, \mathrm{LN}(X W^Q) \cdot [\mathrm{LN}(X W^K)]^\top,$

$\text{attention} = \mathrm{softmax}(\tanh(\text{logits}/c)\, c),$

with $c$ a capping constant. These techniques yield measurable improvements in stability and perplexity, enabling effective learning rate increases (Rybakov et al., 22 Oct 2024, Loshchilov et al., 1 Oct 2024). Hypersphere-based normalization (all hidden states, embeddings, and projection vectors renormalized to unit norm) ensures that attention maps reflect cosine similarity, promoting stability and accelerating convergence (Loshchilov et al., 1 Oct 2024). Further, in data normalization for KANs and copula-based learning, CDF-based quantile normalization (mapping inputs using $\text{CDF}_n(x) = \frac{1}{2} [1 + \mathrm{erf}(x/\sqrt{2})]$ ) yields nearly uniform marginals, better suited for orthonormal polynomial bases and reducing overfitting (Strawa et al., 16 Jul 2025).

Training-free methods for KV cache compression in large autoregressive transformers (Q-Filters) exploit the geometric anisotropy of QK vectors, constructing SVD-based filters that estimate the relevance of KV pairs:

$\mathbb{E}_{Q_i^h} \langle Q_i^h, K_j^h \rangle \simeq \kappa^h \langle K_j^h, u^h \rangle,$

enabling efficient, high-accuracy cache reduction and memory savings without explicit attention map computation (Godey et al., 4 Mar 2025).

6. Functional Analysis, Renormalization, and QK Spaces

Renormalization principles in QK-type function spaces and their quasi-normal variants provide analytic criteria for global normalization. For families of holomorphic mappings $F \subset \mathrm{Hol}(Q, M)$ , a Zalcman-type renormalization theorem characterizes non-quasi-normality via the existence of sequences $g_j(\zeta) = f_j(w_{j,p} + \rho_{j,p}\zeta)$ that either converge to nonconstant entire maps or diverge compactly, equating blow-up behavior to a failure of normalization in the function family (Datt et al., 2015).

Inner-outer factorization theorems and modulus criteria for QK function spaces offer necessary and sufficient conditions for membership in $\mathcal{Q}_K$ , each linked to normalization properties of the function and its modulus on the boundary (Bao et al., 2016).

7. Summary and Perspectives

QK normalization, across its geometric, algebraic, analytic, and machine learning formulations, subsumes mechanisms that enforce regularity, global smoothness, algebraic canonical forms, and algorithmic stability. These are achieved through well-defined correspondences, normalization conditions, functional identities (notably Roger’s dilogarithm relations), stratification and modular computation, norm-constrained weight and state updates, unit-norm embeddings on hyperspheres, and renormalization procedures in function spaces. The relationships between wall-crossing, cluster algebras, unit-norm constraints, and spectral anisotropy reveal structural regularities exploited in both theoretical physics and data-driven computation.

Modern research continues to extend QK normalization paradigms into para-geometries, quantum-accelerated linear algebra, high-dimensional deep architectures, and probabilistic learning frameworks, guided by preservation theorems, uniqueness results, and computational efficiency. The normalization principle remains fundamental for ensuring consistency, interpretability, and scalability in models where quaternionic structure, query-key interactions, or kernel-induced regularity are essential.