Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deep WSF: Weakly-Supervised Deep Semi-NMF

Updated 20 April 2026
  • The paper demonstrates that integrating weak supervision into deep semi-NMF frameworks enhances representation learning and improves clustering and classification accuracy.
  • The methodology involves multi-layer factorization with graph Laplacian regularization, enabling hierarchical learning and encoding of partial labels.
  • Empirical results show significant gains over standard Semi-NMF, validating its effectiveness on multi-attribute datasets like facial recognition.

Weakly-Supervised Deep Semi-NMF (Deep WSF) is a multi-layer matrix factorization framework designed to learn hierarchical and attribute-specific representations from partially labeled data. This paradigm builds on the classical Semi-Nonnegative Matrix Factorization (Semi-NMF) and incorporates partial prior information using weak supervision, extending factorization into multiple nonnegative layers. Deep WSF simultaneously enables unsupervised learning of hidden semantic features and explicit encoding of available label information, supporting both clustering and classification across complex, multi-attribute datasets (Trigeorgis et al., 2015).

1. Model Formulation and Layered Factorization

At its core, Deep WSF operates on a data matrix X∈Rp×nX \in \mathbb{R}^{p \times n}, where pp is the feature dimension and nn is the number of samples. The factorization proceeds through mm layers:

X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m

  • Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}: linear transformation (basis) matrices at layer ii, with k0=pk_0 = p and kmk_m the dimension of the final code.
  • Hi∈Rki×nH_i \in \mathbb{R}^{k_i \times n}: activation (code) matrices; a nonnegativity constraint pp0 is imposed at all layers.

Each intermediate layer reconstructs its input as pp1, giving a hierarchical structure analogous to deep networks. Optional entrywise nonlinearities pp2 can be inserted, making pp3, but the principal model is linear in the cited work.

For weak supervision, Deep WSF is equipped to integrate partial label information on one or more attributes at each layer, modeled via graph Laplacian regularization (Trigeorgis et al., 2015).

2. Objective Function and Regularization

The full optimization criterion is

pp4

Subject to: pp5 for all pp6.

  • Reconstruction Loss: pp7 enforces a compact encoding for pp8.
  • Graph Laplacian Regularization: Each pp9 is a Laplacian derived from partial labels or attribute similarity graphs at layer nn0; it penalizes divergence of low-dimensional codes nn1 for samples known (or presumed) to share an attribute.

If a nonlinearity nn2 is used, the reconstruction loss generalizes accordingly. The regularization parameter nn3 controls the influence of supervision at each layer and is typically selected in the range nn4.

3. Weak Supervision via Attribute Graphs

At each layer, weak supervision is achieved by constructing an adjacency graph nn5 reflecting known partial labels for the relevant attribute. The Laplacian is then nn6, where nn7 is the corresponding degree matrix.

  • nn8 if samples nn9 and mm0 share a known label for the supervised attribute at layer mm1, and mm2 otherwise.
  • The penalty mm3 becomes a sum over mm4, encouraging codes to cluster for must-linked items.

In datasets with multiple known attributes (e.g., identity, pose, expression), Deep WSF can be configured so that each layer encodes a representation specialized for one attribute, with separate graphs and Laplacians at each level.

4. Optimization Algorithm and Practicalities

Training proceeds in two stages:

  • Greedy Layerwise Pre-training: For each layer mm5, optimize the single-layer WSF subproblem:

mm6

using multiplicative updates for mm7 and least-squares or pseudo-inverse for mm8.

  • Global Fine-tuning: Alternately update all mm9 by:

    • X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m0 least-squares solution from reconstructed code.
    • X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m1 via component-wise multiplicative update:

    X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m2

    where X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m3, X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m4, and X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m5. - Optionally, renormalize to keep X≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m6 bounded.

The stopping criterion is typically a small relative objective change or a fixed iteration count; settings of 500–1,000 iterations are reported. Initialization is typically via SVD-based heuristics (NNDSVD or Gillis–Glineur).

5. Empirical Evaluation and Attribute Decoupling

In experimental settings on face datasets (XM2VTS, CMU-PIE, CMU-Multi-PIE), Deep WSF demonstrates statistically significant improvements in clustering accuracy (AC) and classification against Semi-NMF and alternative nonnegative or semi-supervised matrix factorization methods.

  • For example, on XM2VTS (final layer dimension 40), Semi-NMF achieves ACX≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m70.61, while Deep Semi-NMF yields ACX≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m80.68. Using Image Gradient Orientation features, Deep models reach ACX≈Z1Z2⋯ZmHmX \approx Z_1 Z_2 \cdots Z_m H_m90.77 (vs. Semi-NMF 0.63) (Trigeorgis et al., 2015).
  • In the three-attribute classification on CMU-Multi-PIE, Deep WSF learns Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}0 optimized for pose, expression, and identity, outperforming all previous semi-supervised NMF variants on identity classification by Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}110%, with attribute-specific accuracies of 100%, 82.9%, and 65.2%, respectively.
  • Supervised pre-training on one dataset can transfer beneficially to another, as shown by AC improvements from 0.56 to 0.62 on CMU-PIE after pre-training on XM2VTS.

6. Computational Complexity and Implementation Guidelines

Reported computational complexity for Deep WSF (in the linear model) is:

  • Pre-training: Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}2
  • Fine-tuning: Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}3 where Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}4, and Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}5, Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}6 are the number of iterations for pre-training and fine-tuning, respectively.

Key guidelines:

  • Number of layers Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}7 typically set to 2 or 3.
  • Hidden sizes Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}8 are dataset- and attribute-dependent, e.g., Zi∈Rki−1×kiZ_i \in \mathbb{R}^{k_{i-1} \times k_i}9, ii0–70.
  • Regularization parameters ii1 tuned via validation, recommended in ii2 for partial supervision.
  • Careful initialization is critical for convergence and stability.

7. Significance and Distinct Features

Deep WSF enables learning of deep, layered representations that are explicitly aligned with weak, attribute-level supervision, providing a principled methodology for capturing both global and attribute-specific structure in complex datasets (Trigeorgis et al., 2015). Its layerwise Laplacian regularization fosters disentanglement along known axes of variability, in contrast to flat NMF methods. Empirical results show superior clustering and classification, with robustness to mixed or partial labels. Deep WSF also supports multi-attribute learning, yielding layerwise representations specialized for each attribute, a capability not present in conventional shallow factorization frameworks.

A plausible implication is that Deep WSF stands as a foundation for future research on deep factorizations with multi-attribute or graph-based weak supervision, and it continues to inform more modern deep semi-NMF frameworks employing more advanced prior or label constraints (Zhang et al., 2020, Trigeorgis et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Weakly-Supervised Deep Semi-NMF (Deep WSF).