Papers
Topics
Authors
Recent
2000 character limit reached

Deep Semi-NMF: Hierarchical Matrix Factorization

Updated 16 December 2025
  • Deep Semi-NMF is a hierarchical multi-layer matrix factorization framework that decomposes high-dimensional data into interpretable, non-negative soft cluster memberships.
  • It extends traditional Semi-NMF by stacking linear transformations to uncover nested feature hierarchies and reveal complex, overlapping attributes.
  • The method employs greedy layer-wise pretraining followed by joint fine-tuning with multiplicative update rules to improve clustering accuracy and classification performance.

Deep Semi-Non-negative Matrix Factorization (Deep Semi-NMF) is a hierarchical matrix factorization framework designed to recover interpretable, multi-level attribute representations from high-dimensional data. Extending the concept of Semi-NMF to a deep, multi-layer architecture, it models the generative structure of data as a product of stacked linear transformations culminating in non-negative latent factors. Unlike classical flat matrix factorization, Deep Semi-NMF captures hierarchies of attributes, with each layer producing a non-negative feature matrix interpreted as soft cluster memberships for latent factors. This approach allows the uncovering of complex, nested structure in datasets, particularly when clustering or class labels reflect multiple, overlapping factors of variation (Trigeorgis et al., 2015).

1. Model Architecture

Given a data matrix XRp×nX \in \mathbb{R}^{p \times n}, Deep Semi-NMF factorizes XX as:

XZ1Z2ZmHmX \approx Z_1 Z_2 \ldots Z_m H_m

where ZiRki1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i} (with k0=pk_0=p) are stacked “basis” matrices with mixed signs and HmR0km×nH_m \in \mathbb{R}_{\geq 0}^{k_m \times n} is a non-negative feature (“attribute”) matrix. Intermediate layers introduce feature matrices H1,,Hm1H_1, \ldots, H_{m-1}, each non-negative, yielding the layerwise structure: \begin{align*} X &\approx Z_1 H_1 \ H_1 &\approx Z_2 H_2 \ &\vdots \ H_{m-1} &\approx Z_m H_m \end{align*} Each HiH_i is interpreted as a soft cluster-membership matrix for kik_i latent attributes (Trigeorgis et al., 2015).

2. Objective Functions and Constraints

For the unsupervised (purely linear) form, the Deep Semi-NMF objective is:

min{Z1Zm,Hm0}Cdeep=12XZ1ZmHmF2\min_{\{Z_1 \ldots Z_m, H_m \geq 0\}} C_{\text{deep}} = \frac{1}{2} \| X - Z_1 \ldots Z_m H_m \|_F^2

Optionally, non-negativity constraints can be enforced on all intermediate HiH_i matrices.

A trace-based reformulation is also valid:

Cdeep=12Tr[XTX2XT(Z1ZmHm)+(Z1ZmHm)T(Z1ZmHm)]C_{\text{deep}} = \frac{1}{2} \operatorname{Tr}[ X^T X - 2X^T (Z_1 \ldots Z_m H_m) + (Z_1 \ldots Z_m H_m)^T (Z_1 \ldots Z_m H_m) ]

When partial attribute labels are available, the semi-supervised extension, Deep WSF, augments the objective with graph Laplacian regularizers:

CWSF=Cdeep+12i=1mλiTr(HiTLiHi)C_{\text{WSF}} = C_{\text{deep}} + \frac{1}{2} \sum_{i=1}^m \lambda_i \operatorname{Tr}( H_i^T L_i H_i )

where each LiL_i is the Laplacian matrix built from available class labels for layer ii, and λi\lambda_i is a tuning hyperparameter (Trigeorgis et al., 2015).

3. Optimization Methods

Deep Semi-NMF utilizes a two-phase optimization scheme:

A. Greedy Layer-wise Pre-training:

Each layer solves a two-factor Semi-NMF on the output of the previous layer:

  • For i=1i = 1 to mm, factor Xi1ZiHiX_{i-1} \approx Z_i H_i (with X0=XX_0 = X).
  • Alternate updates:
    • ZiXi1HiZ_i \gets X_{i-1} H_i^{\dagger}, where \dagger denotes the Moore–Penrose inverse.
    • HiH_i is updated multiplicatively to ensure Hi0H_i \ge 0:

    HiHi[ZiTXi1]++[ZiTZi]Hi[ZiTXi1]+[ZiTZi]+HiH_i \gets H_i \odot \sqrt{ \frac{ [Z_i^T X_{i-1}]^{+} + [Z_i^T Z_i]^{-} H_i }{ [Z_i^T X_{i-1}]^{-} + [Z_i^T Z_i]^{+} H_i } }

    with A+=(A+A)/2A^+ = (|A| + A)/2, A=(AA)/2A^- = (|A| - A)/2.

B. Joint Fine-Tuning:

After pre-training, all factors are updated via alternating minimization:

  • For each layer ii, define Ψi=Z1Zi1\Psi_i = Z_1 \ldots Z_{i-1} and H~i\tilde{H}_i (equal to HiH_i if i=mi=m, else Zi+1ZmHmZ_{i+1} \ldots Z_m H_m).

  • ZiZ_i-update (closed-form least squares):

ZiΨiXH~iZ_i \gets \Psi_i^{\dagger} X \tilde{H}_i^{\dagger}

  • HiH_i-update (multiplicative, preserves non-negativity):

HiHi[ΨiTX]+H~iT+[ΨiTΨi]Hi[ΨiTX]H~iT+[ΨiTΨi]+HiH_i \gets H_i \odot \sqrt{ \frac{ [\Psi_i^T X]^+ \tilde{H}_i^T + [\Psi_i^T \Psi_i]^- H_i }{ [\Psi_i^T X]^- \tilde{H}_i^T + [\Psi_i^T \Psi_i]^+ H_i } }

For Deep WSF, the HiH_i-update includes λi\lambda_i-weighted Laplacian regularization terms in numerator and denominator, leveraging partial supervision.

4. Semi-Supervised Extension: Deep WSF

When partial attribute-label supervision is present, Deep WSF (Deep Weakly Supervised Factorization) incorporates a graph-based smoothness term into each layer’s factorization. For samples with known memberships yj(i)y_j^{(i)} in classes at layer ii, a similarity graph WiW_i is constructed and its Laplacian Li=DiWiL_i = D_i - W_i (with DiD_i diagonal) added to the loss as

Penaltyi=Tr(HiTLiHi)\text{Penalty}_i = \operatorname{Tr}( H_i^T L_i H_i )

The resulting optimization uses the same multiplicative HiH_i update rule as in Deep Semi-NMF, but adds λi(WiHi)\lambda_i(W_i H_i) in the numerator and λi(DiHi)\lambda_i(D_i H_i) in the denominator, promoting smoother, label-consistent attribute representations. The pretraining step for each layer is switched from standard Semi-NMF to WSF to integrate available label information from the outset (Trigeorgis et al., 2015).

5. Algorithmic Summary and Computational Complexity

The training process is as follows:

  1. Layer-wise Pre-training:

    • Initialize ZiZ_i, Hi>0H_i > 0 (e.g., SVD-based).
    • For i=1i=1 to mm, run Semi-NMF (or WSF if supervised) on Hi1ZiHiH_{i-1} \approx Z_i H_i.
    • Persist factors ZiZ_i, HiH_i.
  2. Joint Fine-Tuning:
    • Alternate ZiZ_i and HiH_i updates for each layer until convergence.

Per-iteration computational complexity is O(m[pnk+(p+n)k2])O(m[p n k + (p+n) k^2]), with kmaxkik \approx \max k_i (Trigeorgis et al., 2015).

6. Empirical Results and Benchmarks

Deep Semi-NMF and Deep WSF have been validated on several standard face datasets:

Dataset Samples Subjects Attributes
XM2VTS 2,360 295 8 images/subject
CMU PIE 2,856 68 42 illuminations/poses
CMU Multi-PIE subset 7,905 147 5 poses, 6 expressions

Input features included raw pixels (all non-negative) and image-gradient–orientation (IGO) descriptors (mixed-sign). Baselines comprised NMF, Semi-NMF, GNMF, Multi-layer NMF, NeNMF, WSF, DNMF, and CNMF.

Performance metrics:

Notable empirical findings:

  • Two-layer Deep Semi-NMF outperformed all single-layer and multi-layer NMF baselines in clustering by up to 15% AC gain.
  • IGO features yielded enhanced separation relative to Semi-NMF.
  • Supervised pretraining (Deep WSF on XM2VTS initializing Deep on CMU PIE) improved clustering accuracy by +5–8%.
  • On CMU Multi-PIE, Deep WSF’s per-layer attributes most accurately classified the corresponding ground-truth factors: pose, expression, identity, each at different layers (Trigeorgis et al., 2015).

7. Hierarchical Attribute Representation and Interpretability

Each non-negative matrix HiH_i in the deep hierarchy can be interpreted as a soft clustering over kik_i latent factors, corresponding to different attributes in the data. In multi-attribute face datasets, empirical assessment shows:

  • Layer 1 (largest k1k_1): broad separation (e.g., head-pose clusters).
  • Layer 2 (medium k2k_2): refinement into expression groups.
  • Layer 3 (small k3k_3): subject identity clusters.

Columns of each ZiZ_i represent “basis portraits” or latent prototypes, with rows of HiH_i indicating degrees of membership. Visualizing HiH_i across layers reveals a staged “peeling away” of data variability: initial layers partition by high-variance attributes (e.g., pose), later layers resolve lower-variance ones (e.g., identity). This layered decomposition underwrites the method’s capacity to learn disentangled and attribute-aware representations (Trigeorgis et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Deep Semi-NMF.