Deep Semi-NMF: Hierarchical Matrix Factorization

Updated 16 December 2025

Deep Semi-NMF is a hierarchical multi-layer matrix factorization framework that decomposes high-dimensional data into interpretable, non-negative soft cluster memberships.
It extends traditional Semi-NMF by stacking linear transformations to uncover nested feature hierarchies and reveal complex, overlapping attributes.
The method employs greedy layer-wise pretraining followed by joint fine-tuning with multiplicative update rules to improve clustering accuracy and classification performance.

Deep Semi-Non-negative Matrix Factorization (Deep Semi-NMF) is a hierarchical matrix factorization framework designed to recover interpretable, multi-level attribute representations from high-dimensional data. Extending the concept of Semi-NMF to a deep, multi-layer architecture, it models the generative structure of data as a product of stacked linear transformations culminating in non-negative latent factors. Unlike classical flat matrix factorization, Deep Semi-NMF captures hierarchies of attributes, with each layer producing a non-negative feature matrix interpreted as soft cluster memberships for latent factors. This approach allows the uncovering of complex, nested structure in datasets, particularly when clustering or class labels reflect multiple, overlapping factors of variation (Trigeorgis et al., 2015).

1. Model Architecture

Given a data matrix $X \in \mathbb{R}^{p \times n}$ , Deep Semi-NMF factorizes $X$ as:

$X \approx Z_1 Z_2 \ldots Z_m H_m$

where $Z_i \in \mathbb{R}^{k_{i-1} \times k_i}$ (with $k_0=p$ ) are stacked “basis” matrices with mixed signs and $H_m \in \mathbb{R}_{\geq 0}^{k_m \times n}$ is a non-negative feature (“attribute”) matrix. Intermediate layers introduce feature matrices $H_1, \ldots, H_{m-1}$ , each non-negative, yielding the layerwise structure: \begin{align*} X &\approx Z_1 H_1 \ H_1 &\approx Z_2 H_2 \ &\vdots \ H_{m-1} &\approx Z_m H_m \end{align*} Each $H_i$ is interpreted as a soft cluster-membership matrix for $k_i$ latent attributes (Trigeorgis et al., 2015).

2. Objective Functions and Constraints

For the unsupervised (purely linear) form, the Deep Semi-NMF objective is:

$\min_{\{Z_1 \ldots Z_m, H_m \geq 0\}} C_{\text{deep}} = \frac{1}{2} \| X - Z_1 \ldots Z_m H_m \|_F^2$

Optionally, non-negativity constraints can be enforced on all intermediate $H_i$ matrices.

A trace-based reformulation is also valid:

$C_{\text{deep}} = \frac{1}{2} \operatorname{Tr}[ X^T X - 2X^T (Z_1 \ldots Z_m H_m) + (Z_1 \ldots Z_m H_m)^T (Z_1 \ldots Z_m H_m) ]$

When partial attribute labels are available, the semi-supervised extension, Deep WSF, augments the objective with graph Laplacian regularizers:

$C_{\text{WSF}} = C_{\text{deep}} + \frac{1}{2} \sum_{i=1}^m \lambda_i \operatorname{Tr}( H_i^T L_i H_i )$

where each $L_i$ is the Laplacian matrix built from available class labels for layer $i$ , and $\lambda_i$ is a tuning hyperparameter (Trigeorgis et al., 2015).

3. Optimization Methods

Deep Semi-NMF utilizes a two-phase optimization scheme:

A. Greedy Layer-wise Pre-training:

Each layer solves a two-factor Semi-NMF on the output of the previous layer:

For $i = 1$ to $m$ , factor $X_{i-1} \approx Z_i H_i$ (with $X_0 = X$ ).
Alternate updates:
- $Z_i \gets X_{i-1} H_i^{\dagger}$ , where $\dagger$ denotes the Moore–Penrose inverse.
- $H_i$ is updated multiplicatively to ensure $H_i \ge 0$ :
$H_i \gets H_i \odot \sqrt{ \frac{ [Z_i^T X_{i-1}]^{+} + [Z_i^T Z_i]^{-} H_i }{ [Z_i^T X_{i-1}]^{-} + [Z_i^T Z_i]^{+} H_i } }$

with $A^+ = (|A| + A)/2$ , $A^- = (|A| - A)/2$ .

B. Joint Fine-Tuning:

After pre-training, all factors are updated via alternating minimization:

For each layer $i$ , define $\Psi_i = Z_1 \ldots Z_{i-1}$ and $\tilde{H}_i$ (equal to $H_i$ if $i=m$ , else $Z_{i+1} \ldots Z_m H_m$ ).
$Z_i$ -update (closed-form least squares):

$Z_i \gets \Psi_i^{\dagger} X \tilde{H}_i^{\dagger}$

$H_i$ -update (multiplicative, preserves non-negativity):

$H_i \gets H_i \odot \sqrt{ \frac{ [\Psi_i^T X]^+ \tilde{H}_i^T + [\Psi_i^T \Psi_i]^- H_i }{ [\Psi_i^T X]^- \tilde{H}_i^T + [\Psi_i^T \Psi_i]^+ H_i } }$

Repeat until $C_{\text{deep}}$ convergence (Trigeorgis et al., 2015).

For Deep WSF, the $H_i$ -update includes $\lambda_i$ -weighted Laplacian regularization terms in numerator and denominator, leveraging partial supervision.

4. Semi-Supervised Extension: Deep WSF

When partial attribute-label supervision is present, Deep WSF (Deep Weakly Supervised Factorization) incorporates a graph-based smoothness term into each layer’s factorization. For samples with known memberships $y_j^{(i)}$ in classes at layer $i$ , a similarity graph $W_i$ is constructed and its Laplacian $L_i = D_i - W_i$ (with $D_i$ diagonal) added to the loss as

$\text{Penalty}_i = \operatorname{Tr}( H_i^T L_i H_i )$

The resulting optimization uses the same multiplicative $H_i$ update rule as in Deep Semi-NMF, but adds $\lambda_i(W_i H_i)$ in the numerator and $\lambda_i(D_i H_i)$ in the denominator, promoting smoother, label-consistent attribute representations. The pretraining step for each layer is switched from standard Semi-NMF to WSF to integrate available label information from the outset (Trigeorgis et al., 2015).

5. Algorithmic Summary and Computational Complexity

The training process is as follows:

Layer-wise Pre-training:
- Initialize $Z_i$ , $H_i > 0$ (e.g., SVD-based).
- For $i=1$ to $m$ , run Semi-NMF (or WSF if supervised) on $H_{i-1} \approx Z_i H_i$ .
- Persist factors $Z_i$ , $H_i$ .
Joint Fine-Tuning:
- Alternate $Z_i$ and $H_i$ updates for each layer until convergence.

Per-iteration computational complexity is $O(m[p n k + (p+n) k^2])$ , with $k \approx \max k_i$ (Trigeorgis et al., 2015).

6. Empirical Results and Benchmarks

Deep Semi-NMF and Deep WSF have been validated on several standard face datasets:

Dataset	Samples	Subjects	Attributes
XM2VTS	2,360	295	8 images/subject
CMU PIE	2,856	68	42 illuminations/poses
CMU Multi-PIE subset	7,905	147	5 poses, 6 expressions

Input features included raw pixels (all non-negative) and image-gradient–orientation (IGO) descriptors (mixed-sign). Baselines comprised NMF, Semi-NMF, GNMF, Multi-layer NMF, NeNMF, WSF, DNMF, and CNMF.

Performance metrics:

Clustering: accuracy (AC), normalized mutual information (NMI), AUC of precision-recall.
Downstream classification: linear SVM accuracy using learned $H$ .

Notable empirical findings:

Two-layer Deep Semi-NMF outperformed all single-layer and multi-layer NMF baselines in clustering by up to 15% AC gain.
IGO features yielded enhanced separation relative to Semi-NMF.
Supervised pretraining (Deep WSF on XM2VTS initializing Deep on CMU PIE) improved clustering accuracy by +5–8%.
On CMU Multi-PIE, Deep WSF’s per-layer attributes most accurately classified the corresponding ground-truth factors: pose, expression, identity, each at different layers (Trigeorgis et al., 2015).

7. Hierarchical Attribute Representation and Interpretability

Each non-negative matrix $H_i$ in the deep hierarchy can be interpreted as a soft clustering over $k_i$ latent factors, corresponding to different attributes in the data. In multi-attribute face datasets, empirical assessment shows:

Layer 1 (largest $k_1$ ): broad separation (e.g., head-pose clusters).
Layer 2 (medium $k_2$ ): refinement into expression groups.
Layer 3 (small $k_3$ ): subject identity clusters.

Columns of each $Z_i$ represent “basis portraits” or latent prototypes, with rows of $H_i$ indicating degrees of membership. Visualizing $H_i$ across layers reveals a staged “peeling away” of data variability: initial layers partition by high-variance attributes (e.g., pose), later layers resolve lower-variance ones (e.g., identity). This layered decomposition underwrites the method’s capacity to learn disentangled and attribute-aware representations (Trigeorgis et al., 2015).

PDF Markdown Chat (Pro)

References (1)

A deep matrix factorization method for learning attribute representations (2015)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Deep Semi-NMF.