Learnable Manifold Alignment (LeMA)

Updated 20 May 2026

Learnable Manifold Alignment (LeMA) is a semi-supervised cross-modality framework that integrates HS and MS data for enhanced land cover and land use classification.
It employs a joint optimization strategy that simultaneously learns a data-driven graph, a shared projection subspace, and a classifier to capture intrinsic data geometry.
Experiments on remote sensing datasets show LeMA achieving 5–10% improvements in overall accuracy and kappa coefficient over conventional methods.

Learnable Manifold Alignment (LeMA) is a semi-supervised cross-modality learning framework designed to exploit limited highly-discriminative hyperspectral (HS) data and abundant poorly-discriminative multispectral (MS) data for land cover and land use classification. Unlike prior manifold alignment techniques that rely on fixed Gaussian-kernel graphs, LeMA jointly learns a data-driven graph structure with the projection and classification parameters, enabling more effective cross-modality knowledge transfer and improved decision boundaries in the MS domain (Hong et al., 2019).

1. Problem Formulation

LeMA addresses cross-modality semi-supervised classification with the following structure:

Let $N$ $N$ paired, labeled HS and MS samples be given:
- $X_H \in \mathbb{R}^{d_H \times N}$ : hyperspectral feature matrix.
- $X_M \in \mathbb{R}^{d_M \times N}$ : multispectral feature matrix.
- $Y \in \{0,1\}^{c \times N}$ : one-hot class label matrix for $c$ classes.
Let $N_U$ unlabeled MS samples be provided: $X_U \in \mathbb{R}^{d_M \times N_U}$ .

The objective is to use the small set of labeled HS samples, in combination with the large set of unlabeled MS samples, to induce a classifier with good generalization on MS data.

Joint data matrices are constructed for paired and semi-supervised settings:

Labeled joint data: $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ , $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ .
Extended data with unlabeled MS: $\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}$ .

The framework seeks a shared $X_H \in \mathbb{R}^{d_H \times N}$ 0-dimensional subspace, $X_H \in \mathbb{R}^{d_H \times N}$ 1, via $X_H \in \mathbb{R}^{d_H \times N}$ 2 with orthogonality $X_H \in \mathbb{R}^{d_H \times N}$ 3, such that projected features $X_H \in \mathbb{R}^{d_H \times N}$ 4 are maximally informative for classification. A linear regression/classification matrix $X_H \in \mathbb{R}^{d_H \times N}$ 5 is learned to map $X_H \in \mathbb{R}^{d_H \times N}$ 6 to labels.

2. Objective Function and Mathematical Formulation

The LeMA optimization problem is defined as:

$X_H \in \mathbb{R}^{d_H \times N}$ 7

Subject to:

$X_H \in \mathbb{R}^{d_H \times N}$ 8
$X_H \in \mathbb{R}^{d_H \times N}$ 9, $X_M \in \mathbb{R}^{d_M \times N}$ 0
$X_M \in \mathbb{R}^{d_M \times N}$ 1 (scale constraint)
$X_M \in \mathbb{R}^{d_M \times N}$ 2 for $X_M \in \mathbb{R}^{d_M \times N}$ 3 both labeled in class $X_M \in \mathbb{R}^{d_M \times N}$ 4

Here, $X_M \in \mathbb{R}^{d_M \times N}$ 5 denotes the $X_M \in \mathbb{R}^{d_M \times N}$ 6-th column of $X_M \in \mathbb{R}^{d_M \times N}$ 7 and $X_M \in \mathbb{R}^{d_M \times N}$ 8. The graph Laplacian is $X_M \in \mathbb{R}^{d_M \times N}$ 9 with $Y \in \{0,1\}^{c \times N}$ 0. The sum $Y \in \{0,1\}^{c \times N}$ 1 captures the smoothness constraint. Critically, $Y \in \{0,1\}^{c \times N}$ 2—the adjacency matrix defining manifold structure—is learned jointly with $Y \in \{0,1\}^{c \times N}$ 3 and $Y \in \{0,1\}^{c \times N}$ 4.

Orthogonality on $Y \in \{0,1\}^{c \times N}$ 5 prevents degenerate scaling, and constraints on $Y \in \{0,1\}^{c \times N}$ 6 ensure it is a valid similarity graph. An upper-bound $Y \in \{0,1\}^{c \times N}$ 7 for class $Y \in \{0,1\}^{c \times N}$ 8 enforces degree comparability with an LDA-like graph.

3. Graph-based Label Propagation

After learning $Y \in \{0,1\}^{c \times N}$ 9 and $c$ 0, labels can be propagated via the regularized objective:

$c$ 1

with closed-form solution:

$c$ 2

or iteratively via:

$c$ 3

where $c$ 4.

This label propagation further sharpens the decision boundaries by leveraging the learned data manifold.

4. ADMM-Based Optimization Strategy

LeMA applies a block coordinate descent with alternating direction method of multipliers (ADMM) to alternately update $c$ 5, $c$ 6, and $c$ 7:

$c$ 8-update: Closed-form ridge regression:

$c$ 9

$N_U$ 0-update (ADMM): Solves

$N_U$ 1

with splits $N_U$ 2 and $N_U$ 3. Updates involve Lagrange multipliers and auxiliary variables; the $N_U$ 4-update uses thin SVD.

$N_U$ 5-update (ADMM): $N_U$ 6 is partitioned into block matrices; labeled–labeled blocks are set by LDA-like rules, while the cross-part (labeled–unlabeled and unlabeled–unlabeled) is optimized subject to symmetry, nonnegativity, bounds, and scale constraints. Multiple auxiliary splits and soft-thresholding/proximal operators are used.
Convergence: Monitored by relative change in global objective or ADMM residual norms. Recommended hyperparameters: $N_U$ 7– $N_U$ 8, $N_U$ 9, $X_U \in \mathbb{R}^{d_M \times N_U}$ 0– $X_U \in \mathbb{R}^{d_M \times N_U}$ 1, $X_U \in \mathbb{R}^{d_M \times N_U}$ 2.

5. Inference and Decision Boundary Construction

After learning $X_U \in \mathbb{R}^{d_M \times N_U}$ 3, $X_U \in \mathbb{R}^{d_M \times N_U}$ 4, and $X_U \in \mathbb{R}^{d_M \times N_U}$ 5, classification of a novel MS sample $X_U \in \mathbb{R}^{d_M \times N_U}$ 6 proceeds via:

Project $X_U \in \mathbb{R}^{d_M \times N_U}$ 7 into the common subspace: $X_U \in \mathbb{R}^{d_M \times N_U}$ 8.
Predict class scores: $X_U \in \mathbb{R}^{d_M \times N_U}$ 9; assign class $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 0.
Optional: Extend graph and perform label propagation for refined predictions.

The decision boundary in latent space $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 1 is defined where two coordinates of $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 2 are equal, reflecting the linearity of $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 3. In practice, $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 4 can also be input to off-the-shelf classifiers such as linear SVM or Canonical Correlation Forest (CCF) for performance comparison.

6. Experimental Setup and Results

LeMA was evaluated on:

University of Houston and Chikusei (HS $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 5 simulated Sentinel-2 MS),
DFC2018 MS-LiDAR & HS data.

Evaluation metrics included Overall Accuracy (OA), Average Accuracy (AA), and Cohen’s $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 6. Comparisons were performed against:

raw MS,
GLP (fixed-graph label propagation),
SMA (supervised MA),
S-SMA (semi-supervised MA),
CoSpace (joint subspace learning),
S-CoSpace (semi-supervised CoSpace),
and LeMA.

On both OA and $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 7, LeMA outperformed baselines by 5–10%. Key findings:

Small amounts of HS labels can reliably guide the larger MS domain.
Learning $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 8 from the data instead of using a fixed Gaussian kernel graph produces a better manifold.
Semi-supervised alignment plus label propagation yields the most accurate decision boundaries.
Both linear SVM and CCF performed well on features learned via LeMA (Hong et al., 2019).

7. Algorithm Summary

An overview of the LeMA algorithm flow is as follows:

Initialization: Set random orthonormal $\widetilde X = \begin{bmatrix} X_H & 0 \ 0 & X_M \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times 2N}$ 9, zero $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 0, LDA-like assignment of $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 1 on labeled blocks, random initialization of optimization blocks for $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 2.
Preprocessing: Compute $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 3, $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 4, $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 5.
Block Coordinate Descent:
- Update $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 6.
- Update $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 7 (ADMM, see Algorithm 2 in (Hong et al., 2019)).
- Update $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 8 and $\widetilde Y = [Y, Y] \in \mathbb{R}^{c \times 2N}$ 9 (ADMM, Algorithm 3); enforce symmetry.
- Update $\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}$ 0 (ADMM, Algorithm 4).
- Assemble $\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}$ 1 and compute Laplacian $\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}$ 2.
- Compute objective and check for convergence.
Return: The final classifier $\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}$ 3, projection $\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}$ 4, and graph $\widetilde X' = \begin{bmatrix} X_H & 0 & 0 \ 0 & X_M & X_U \end{bmatrix} \in \mathbb{R}^{(d_H + d_M) \times (2N + N_U)}$ 5.

All update rules, ADMM decompositions, and convergence criteria are reported explicitly in (Hong et al., 2019). This encapsulation enables full reproducibility and systematic extension of the LeMA methodology for cross-modality semi-supervised learning.

Markdown Report Issue Upgrade to Chat

References (1)

Learnable Manifold Alignment (LeMA) : A Semi-supervised Cross-modality Learning Framework for Land Cover and Land Use Classification (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Learnable Manifold Alignment (LeMA).

Learnable Manifold Alignment (LeMA)

1. Problem Formulation

2. Objective Function and Mathematical Formulation

3. Graph-based Label Propagation

4. ADMM-Based Optimization Strategy

5. Inference and Decision Boundary Construction

6. Experimental Setup and Results

7. Algorithm Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Learnable Manifold Alignment (LeMA)

1. Problem Formulation

2. Objective Function and Mathematical Formulation

3. Graph-based Label Propagation

4. ADMM-Based Optimization Strategy

5. Inference and Decision Boundary Construction

6. Experimental Setup and Results

7. Algorithm Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research