Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings (2511.06425v1)

Published 9 Nov 2025 in stat.ML, cs.CV, cs.LG, and stat.ME

Abstract: Interpretable representation learning is a central challenge in modern machine learning, particularly in high-dimensional settings such as neuroimaging, genomics, and text analysis. Current methods often struggle to balance the competing demands of interpretability and model flexibility, limiting their effectiveness in extracting meaningful insights from complex data. We introduce Non-negative Stiefel Approximating Flow (NSA-Flow), a general-purpose matrix estimation framework that unifies ideas from sparse matrix factorization, orthogonalization, and constrained manifold learning. NSA-Flow enforces structured sparsity through a continuous balance between reconstruction fidelity and column-wise decorrelation, parameterized by a single tunable weight. The method operates as a smooth flow near the Stiefel manifold with proximal updates for non-negativity and adaptive gradient control, yielding representations that are simultaneously sparse, stable, and interpretable. Unlike classical regularization schemes, NSA-Flow provides an intuitive geometric mechanism for manipulating sparsity at the level of global structure while simplifying latent features. We demonstrate that the NSA-Flow objective can be optimized smoothly and integrates seamlessly with existing pipelines for dimensionality reduction while improving interpretability and generalization in both simulated and real biomedical data. Empirical validation on the Golub leukemia dataset and in Alzheimer's disease demonstrate that the NSA-Flow constraints can maintain or improve performance over related methods with little additional methodological effort. NSA-Flow offers a scalable, general-purpose tool for interpretable ML, applicable across data science domains.

Summary

The paper introduces NSA-Flow, a novel method balancing reconstruction fidelity and orthogonality by tuning a continuous weight.
It employs a soft-retraction gradient flow with proximal updates to achieve sparse, nearly orthogonal embeddings for applications in genomics and neuroimaging.
Empirical results show improved predictive accuracy and clearer biomarker selection compared to traditional PCA and SPCA methods.

Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings

Introduction

The Non-Negative Stiefel Approximating Flow (NSA-Flow) framework addresses the complex trade-off between interpretability and flexibility in high-dimensional matrix factorization tasks common to neuroimaging, bioinformatics, and text mining. Standard dimensionality reduction methods—such as PCA, NMF, and their sparse and orthogonal variants—struggle to impose both non-negativity and orthogonality without loss of fidelity or interpretability in latent factors. NSA-Flow offers a unified approach by parameterizing the balance between data reconstruction and column-wise decorrelation through a continuous weight ( $w$ ), operating near the Stiefel manifold while enforcing non-negativity via proximal updates. This method generalizes prior sparse, orthogonal, and manifold-learning methods, yielding interpretable, sparse, and stable representations suitable for downstream predictive analytics.

NSA-Flow: Optimization Principles

NSA-Flow casts the matrix approximation problem as minimizing a composite energy:

$\min_{Y \in \mathbb{R}^{p \times k},\; Y \geq 0} E(Y) = (1-w) L_{\text{fid}}(Y, X_0) + w L_{\text{orth}}(Y),$

where $L_{\text{fid}}$ is the squared Frobenius reconstruction loss and $L_{\text{orth}}$ quantifies deviation from Stiefel manifold orthogonality ( $Y^\top Y = I_k$ ):

$L_{\text{orth}}(Y) = \frac{1}{2}\| Y^\top Y - I_k \|_F^2.$

To improve scaling and conditioning, the default implementation uses a scale-invariant penalty, measuring cosine similarities irrespective of vector norms. The optimization proceeds via a soft-retraction gradient flow: after an unconstrained gradient step, the update is convexly combined with a polar retraction toward the Stiefel manifold, weighted by $w$ . This avoids computational burden and non-smooth corrections inherent in hard manifold retractions or Cayley transforms.

For non-negativity, NSA-Flow applies proximal projection (clamping, ReLU, or softplus), ensuring both the geometric and sign constraints are satisfied in each iterate. Adaptive learning rate control (optionally via Armijo line search or Bayesian estimation) safeguards stability, and built-in diagnostic monitoring facilitates robust deployment across variable matrix sizes ( $p$ ) and ranks ( $k$ ).

Sparse, Orthogonal Matrix Factorization and Embedding

NSA-Flow naturally induces sparsity through global orthogonality. As $w$ is increased, column supports become more disjoint, enhancing interpretability without explicit $\ell_1$ or $\ell_0$ penalties. Integration with sparse PCA is straightforward: NSA-Flow operates as a proximal operator in the optimization pipeline, enforcing both non-negativity and column decorrelation simultaneously. This contrasts with classical SPCA pipelines, which decouple sparsity (via soft-thresholding) and orthogonality (via QR reorthogonalization), resulting in less coherent latent representations.

Algorithmic choices—such as optimizer (ASGD, LARS, etc.), retraction type, and proximal mapping—can be easily customized to target specific application characteristics (e.g., tall-skinny vs. wide matrices). Empirical evidence suggests that practical scaling is limited by $O(pk^2)$ but is manageable for moderate $p$ and $k$ through batching and sparse linear algebra, with convergence typically achieved in fewer than 1000 iterations.

Empirical Validation and Applications

Synthetic Simulation

NSA-Flow recovers structured, non-negative, nearly orthogonal embeddings from noisy mixtures, with smooth trade-off between reconstruction fidelity and orthogonality defect controlled by $w$ . As $w \to 1$ , basis columns become highly sparse and decorrelated; as $w \to 0$ , the embeddings resemble dense PCA solutions.

Genomics – Golub Leukemia Dataset

NSA-Flow-based sparse PCA outperforms vanilla PCA and standard soft-thresholding SPCA in the classification of leukemia subtypes, achieving a cross-validated accuracy of 88.3% (Table 2), improving interpretability and biomarker selection despite a moderate decrease in explained variance. Selected gene sets are biologically plausible and less entangled across components compared to PCA, simplifying downstream biological inference.

Method	Explained Var.	Sparsity	Orthog. Defect	CV Accuracy
Standard PCA	0.290	0.000	0.000	0.819
Sparse PCA (Basic)	0.158	0.800	0.006	0.864
NSA-Flow SPCA	0.172	0.704	0.000	0.883

Neuroimaging – Alzheimer's Disease

NSA-Flow refines PCA network components derived from cortical thickness, yielding sparser, near-orthogonal loadings that improve both interpretability and clinical prediction. NSA-Flow embeddings achieve a mean AUC of 0.765 in multi-class random forest diagnosis (CN vs MCI vs AD), compared to 0.719 for PCA, with consistent improvements in prediction of cognitive outcomes and biomarker identification.

Generalization

NSA-Flow is highly extensible. Its modular Python implementation (and R wrapper) can be pip-installed and deployed as a layer in deep nets or as a preprocessing/refinement step in unsupervised ML. It is pertinent in any domain requiring interpretable, sparse decompositions under constraints familiar in signal processing, bioinformatics, and recommender systems.

Theoretical and Practical Implications

NSA-Flow generalizes manifold optimization strategies for interpretable matrix factorization by providing continuous, dataset-level control over the sparsity/decorrelation trade-off via a single tunable parameter $w$ . This extends beyond traditional regularization schemes, which rely on non-intuitive component-wise penalties and strict orthogonality projections. Moreover, the soft-retraction update demonstrates stable convergence within proximal operator theory and is empirically robust across high-dimensional, structured noise regimes.

The modular approach yields:

Intuitive geometric control: Adjusts sparsity/orthogonality globally.
Flexible integration: Compatible with stochastic optimizers, proximal algorithms, and deep learning frameworks.
Scalability: Efficient for moderate $k$ , extensible via batching/sparse operations.
Improved interpretability: Disjoint latent bases simplify feature selection and biological/clinical inference.

Limitations include scaling challenges as $k$ grows (matrix inversion in polar retraction), potential convergence issues in extremely noisy or ill-conditioned matrices, and sensitivity of $w$ tuning to application context. These can be partially mitigated through initialization schemes, automated hyperparameter selection, and future sparse manifold extensions.

NSA-Flow does not guarantee global optimality due to nonconvexity; however, via KL inequality and nonexpansiveness of proximal maps, convergence to critical points is assured for generic smooth initializations. Future directions include stochastic and second-order variants for large-scale applications and customization for multi-modal/domain-specific constraints.

Conclusion

NSA-Flow constitutes a robust, interpretable optimization framework for non-negative, semi-orthogonal matrix learning. By operating as a soft flow near the Stiefel manifold with tunable proximity to constraint satisfaction, NSA-Flow yields sparse and disjoint latent representations amenable to clinical, biological, and engineering tasks. Extensive empirical evidence from genomics and neuroimaging substantiates its performance gains over classical techniques, with clear advantages in interpretability and predictive utility. The open-source implementation further facilitates its adoption in diverse data science workflows, and future research should prioritize extension to large-scale, structured, and multi-modal data environments.