Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learnable Manifold Alignment (LeMA)

Updated 20 May 2026
  • Learnable Manifold Alignment (LeMA) is a semi-supervised cross-modality framework that integrates HS and MS data for enhanced land cover and land use classification.
  • It employs a joint optimization strategy that simultaneously learns a data-driven graph, a shared projection subspace, and a classifier to capture intrinsic data geometry.
  • Experiments on remote sensing datasets show LeMA achieving 5–10% improvements in overall accuracy and kappa coefficient over conventional methods.

Learnable Manifold Alignment (LeMA) is a semi-supervised cross-modality learning framework designed to exploit limited highly-discriminative hyperspectral (HS) data and abundant poorly-discriminative multispectral (MS) data for land cover and land use classification. Unlike prior manifold alignment techniques that rely on fixed Gaussian-kernel graphs, LeMA jointly learns a data-driven graph structure with the projection and classification parameters, enabling more effective cross-modality knowledge transfer and improved decision boundaries in the MS domain (Hong et al., 2019).

1. Problem Formulation

LeMA addresses cross-modality semi-supervised classification with the following structure:

  • Let NN paired, labeled HS and MS samples be given:
    • XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}: hyperspectral feature matrix.
    • XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}: multispectral feature matrix.
    • Y∈{0,1}c×NY \in \{0,1\}^{c \times N}: one-hot class label matrix for cc classes.
  • Let NUN_U unlabeled MS samples be provided: XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}.

The objective is to use the small set of labeled HS samples, in combination with the large set of unlabeled MS samples, to induce a classifier with good generalization on MS data.

Joint data matrices are constructed for paired and semi-supervised settings:

  • Labeled joint data: X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}, Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}.
  • Extended data with unlabeled MS: X~′=[XH00 0XMXU]∈R(dH+dM)×(2N+NU)\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}.

The framework seeks a shared XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}0-dimensional subspace, XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}1, via XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}2 with orthogonality XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}3, such that projected features XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}4 are maximally informative for classification. A linear regression/classification matrix XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}5 is learned to map XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}6 to labels.

2. Objective Function and Mathematical Formulation

The LeMA optimization problem is defined as:

XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}7

Subject to:

  • XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}8
  • XH∈RdH×NX_H \in \mathbb{R}^{d_H \times N}9, XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}0
  • XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}1 (scale constraint)
  • XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}2 for XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}3 both labeled in class XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}4

Here, XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}5 denotes the XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}6-th column of XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}7 and XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}8. The graph Laplacian is XM∈RdM×NX_M \in \mathbb{R}^{d_M \times N}9 with Y∈{0,1}c×NY \in \{0,1\}^{c \times N}0. The sum Y∈{0,1}c×NY \in \{0,1\}^{c \times N}1 captures the smoothness constraint. Critically, Y∈{0,1}c×NY \in \{0,1\}^{c \times N}2—the adjacency matrix defining manifold structure—is learned jointly with Y∈{0,1}c×NY \in \{0,1\}^{c \times N}3 and Y∈{0,1}c×NY \in \{0,1\}^{c \times N}4.

Orthogonality on Y∈{0,1}c×NY \in \{0,1\}^{c \times N}5 prevents degenerate scaling, and constraints on Y∈{0,1}c×NY \in \{0,1\}^{c \times N}6 ensure it is a valid similarity graph. An upper-bound Y∈{0,1}c×NY \in \{0,1\}^{c \times N}7 for class Y∈{0,1}c×NY \in \{0,1\}^{c \times N}8 enforces degree comparability with an LDA-like graph.

3. Graph-based Label Propagation

After learning Y∈{0,1}c×NY \in \{0,1\}^{c \times N}9 and cc0, labels can be propagated via the regularized objective:

cc1

with closed-form solution:

cc2

or iteratively via:

cc3

where cc4.

This label propagation further sharpens the decision boundaries by leveraging the learned data manifold.

4. ADMM-Based Optimization Strategy

LeMA applies a block coordinate descent with alternating direction method of multipliers (ADMM) to alternately update cc5, cc6, and cc7:

  • cc8-update: Closed-form ridge regression:

cc9

  • NUN_U0-update (ADMM): Solves

NUN_U1

with splits NUN_U2 and NUN_U3. Updates involve Lagrange multipliers and auxiliary variables; the NUN_U4-update uses thin SVD.

  • NUN_U5-update (ADMM): NUN_U6 is partitioned into block matrices; labeled–labeled blocks are set by LDA-like rules, while the cross-part (labeled–unlabeled and unlabeled–unlabeled) is optimized subject to symmetry, nonnegativity, bounds, and scale constraints. Multiple auxiliary splits and soft-thresholding/proximal operators are used.
  • Convergence: Monitored by relative change in global objective or ADMM residual norms. Recommended hyperparameters: NUN_U7–NUN_U8, NUN_U9, XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}0–XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}1, XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}2.

5. Inference and Decision Boundary Construction

After learning XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}3, XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}4, and XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}5, classification of a novel MS sample XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}6 proceeds via:

  1. Project XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}7 into the common subspace: XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}8.
  2. Predict class scores: XU∈RdM×NUX_U \in \mathbb{R}^{d_M \times N_U}9; assign class X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}0.
  3. Optional: Extend graph and perform label propagation for refined predictions.

The decision boundary in latent space X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}1 is defined where two coordinates of X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}2 are equal, reflecting the linearity of X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}3. In practice, X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}4 can also be input to off-the-shelf classifiers such as linear SVM or Canonical Correlation Forest (CCF) for performance comparison.

6. Experimental Setup and Results

LeMA was evaluated on:

  • University of Houston and Chikusei (HS X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}5 simulated Sentinel-2 MS),
  • DFC2018 MS-LiDAR & HS data.

Evaluation metrics included Overall Accuracy (OA), Average Accuracy (AA), and Cohen’s X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}6. Comparisons were performed against:

  • raw MS,
  • GLP (fixed-graph label propagation),
  • SMA (supervised MA),
  • S-SMA (semi-supervised MA),
  • CoSpace (joint subspace learning),
  • S-CoSpace (semi-supervised CoSpace),
  • and LeMA.

On both OA and X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}7, LeMA outperformed baselines by 5–10%. Key findings:

  • Small amounts of HS labels can reliably guide the larger MS domain.
  • Learning X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}8 from the data instead of using a fixed Gaussian kernel graph produces a better manifold.
  • Semi-supervised alignment plus label propagation yields the most accurate decision boundaries.
  • Both linear SVM and CCF performed well on features learned via LeMA (Hong et al., 2019).

7. Algorithm Summary

An overview of the LeMA algorithm flow is as follows:

  1. Initialization: Set random orthonormal X~=[XH0 0XM]∈R(dH+dM)×2N\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}9, zero Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}0, LDA-like assignment of Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}1 on labeled blocks, random initialization of optimization blocks for Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}2.
  2. Preprocessing: Compute Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}3, Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}4, Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}5.
  3. Block Coordinate Descent:
    • Update Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}6.
    • Update Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}7 (ADMM, see Algorithm 2 in (Hong et al., 2019)).
    • Update Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}8 and Y~=[Y,Y]∈Rc×2N\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}9 (ADMM, Algorithm 3); enforce symmetry.
    • Update X~′=[XH00 0XMXU]∈R(dH+dM)×(2N+NU)\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}0 (ADMM, Algorithm 4).
    • Assemble X~′=[XH00 0XMXU]∈R(dH+dM)×(2N+NU)\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}1 and compute Laplacian X~′=[XH00 0XMXU]∈R(dH+dM)×(2N+NU)\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}2.
    • Compute objective and check for convergence.
  4. Return: The final classifier X~′=[XH00 0XMXU]∈R(dH+dM)×(2N+NU)\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}3, projection X~′=[XH00 0XMXU]∈R(dH+dM)×(2N+NU)\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}4, and graph X~′=[XH00 0XMXU]∈R(dH+dM)×(2N+NU)\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}5.

All update rules, ADMM decompositions, and convergence criteria are reported explicitly in (Hong et al., 2019). This encapsulation enables full reproducibility and systematic extension of the LeMA methodology for cross-modality semi-supervised learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Learnable Manifold Alignment (LeMA).