Class-Specific Multilinear Discriminant Analysis

Updated 20 March 2026

The paper introduces MCSDA, extending class-specific discriminant analysis to tensor data with mode-wise linear projections.
It employs an alternating optimization strategy to update projection matrices for optimal in-class clustering and out-of-class separation.
Empirical results demonstrate that MCSDA achieves faster computation and improved accuracy in applications like facial verification and stock prediction.

Multilinear Class-Specific Discriminant Analysis (MCSDA) is a tensor-based subspace learning technique that generalizes traditional class-specific discriminant analysis to high-order data. MCSDA maximizes the discrimination of each individual class in a feature space defined by low-dimensional multilinear projections, while maintaining the spatial and structural integrity of the original tensor representations. The method is formulated as an iterative optimization that alternately updates mode-wise projection matrices to achieve optimal class separation, with applications demonstrated in facial image verification and stock price prediction (Tran et al., 2017).

1. Problem Formulation and Objective

MCSDA addresses the challenge of transferring discriminant analysis techniques from vectorized data to high-order tensors. Given $N$ labeled samples $\{\mathcal{X}_j \in \mathbb{R}^{I_1 \times I_2 \times \cdots \times I_K},\ l_j\}_{j=1}^N$ with $K$ modes, the task is to learn, for each class $c \in \{1, \dots, C\}$ , a set of mode-specific linear projection matrices $\{U_c^{(n)} \in \mathbb{R}^{I_n \times I_n'}\}_{n=1}^K$ satisfying $I_n' < I_n$ .

Samples of class $c$ are treated as the positive class; all others are negative. The mean tensor for each class is $\mathcal{M}_c = \frac{1}{n_c} \sum_{j: l_j = c} \mathcal{X}_j$ . Each tensor is projected via

$\mathcal{Y}_j = \mathcal{X}_j \times_1 (U_c^{(1)})^\top \times_2 (U_c^{(2)})^\top \cdots \times_K (U_c^{(K)})^\top \in \mathbb{R}^{I_1' \times \cdots \times I_K'}$

with the goal that in-class projected tensors cluster tightly around the projected mean, while out-of-class samples are maximally separated. This one-versus-rest construction yields a set of class-specific models, each optimized for its respective class in the tensor feature space.

2. Multilinear Class-Specific Scatter and Optimization Criterion

For a fixed class $c$ , MCSDA defines the following distances:

The in-class (within-class) distance

$D_I^{(c)} = \sum_{j: l_j = c} \left\| \mathcal{X}_j \times_{n=1}^K (U_c^{(n)})^\top - \mathcal{M}_c \times_{n=1}^K (U_c^{(n)})^\top \right\|_F^2$

The out-of-class distance

$D_O^{(c)} = \sum_{j: l_j \neq c} \left\| \mathcal{X}_j \times_{n=1}^K (U_c^{(n)})^\top - \mathcal{M}_c \times_{n=1}^K (U_c^{(n)})^\top \right\|_F^2$

These are rewritten in the trace-ratio form:

$J_c \left(U_c^{(1)}, \dots, U_c^{(K)}\right) = \frac{\operatorname{tr}\left(U_c^{(1)\top} \cdots U_c^{(K)\top} S_B^{(c)} U_c^{(K)} \cdots U_c^{(1)} \right)}{\operatorname{tr}\left(U_c^{(1)\top} \cdots U_c^{(K)\top} S_W^{(c)} U_c^{(K)} \cdots U_c^{(1)} \right)}$

where $S_W^{(c)}$ aggregates in-class scatter and $S_B^{(c)}$ aggregates out-of-class scatter, with orthonormality constraints on each $U_c^{(n)}$ . The objective is to maximize $J_c$ for each class, ensuring that the projected out-of-class variance is maximized relative to the in-class variance.

3. Alternating Mode-Wise Optimization Algorithm

Due to the coupling of all mode-wise matrices in $J_c$ , optimization is performed via an alternating mode-wise strategy. At each iteration, for a given mode $n$ , two scatter matrices are computed after projecting along all modes except $n$ , followed by mode- $n$ unfolding:

In-class scatter:

$S_{I}^{\,n(c)} = \sum_{j: l_j = c} \left[ (\mathcal{X}_j - \mathcal{M}_c) \prod_{q \neq n} \times_q (U_c^{(q)})^\top \right]_{(n)} \left[ \cdot \right]_{(n)}^\top$

Out-of-class scatter:

$S_{O}^{\,n(c)} = \sum_{j: l_j \neq c} \left[ (\mathcal{X}_j - \mathcal{M}_c) \prod_{q \neq n} \times_q (U_c^{(q)})^\top \right]_{(n)} \left[ \cdot \right]_{(n)}^\top$

The optimization subproblem for each mode reduces to:

$\max_{U_c^{(n)}} \frac{\operatorname{tr}(U_c^{(n)\top} S_{O}^{\,n(c)} U_c^{(n)})}{\operatorname{tr}(U_c^{(n)\top} S_{I}^{\,n(c)} U_c^{(n)})},\quad \text{subject to } (U_c^{(n)})^\top U_c^{(n)} = I$

This is solved by computing the leading generalized eigenvectors of $S_{O}^{\,n(c)} u = \lambda S_{I}^{\,n(c)} u$ . The process iterates across all modes until convergence, sequentially updating the projections for each mode.

4. Structural Preservation and Computational Properties

MCSDA preserves the intrinsic spatial and structural information of tensor data by avoiding vectorization. Alternate projections along each tensor mode retain multilinear dependencies, essential for modeling high-order structures seen in images (spatial $\times$ spatial) or time series (features $\times$ time). The parameter count is reduced from a multiplicative scale (product of mode sizes in vectorization) to an additive one (sum of $I_n I_n'$ ). This reduction mitigates the curse of dimensionality and avoids small-sample-size pitfalls inherent in conventional vector approaches.

5. Empirical Evaluation

MCSDA's utility is demonstrated in two domains:

Face verification (ORL, CMU-PIE): Each image is a $40 \times 30$ tensor. Competing methods include vector-based CSDA, multilinear LDA (MDA), and enriched variants with HOG features. The metric is mean Average Precision (mAP) across class splits and train/test ratios. Findings show that MCSDA is $10$– $100\times$ faster than CSDA with only a slight mAP drop, consistently outperforms MDA, and benefits from feature enrichment when data are abundant.
Stock price prediction (FI-2010 limit-order-book): Each sample is a $144 \times 10$ tensor. Baselines include linear ridge regression, single-layer neural nets, BoF, neural BoF, CSDA, and MDA. With metrics such as per-class F1 and overall accuracy, MCSDA achieves the highest average F1 ( $\approx 46.7\%$ ), surpassing even deep bag-of-words variants, effectively capturing both feature and temporal multilinear structures.

6. Algorithmic Outline

The MCSDA learning procedure for a chosen class $c$ proceeds as follows:

Algorithm MCSDA(Class c)
Input:
    • Training tensors {X_j, l_j}
    • Target subspace sizes I_1' × ... × I_K'
    • Max iterations τ, threshold ε
Initialization:
    For each mode n, set U_c^{(n)} ← all-ones or random orthonormal
For t = 1 to τ:
    For n = 1 to K:
        1. Compute in-class scatter S_I^{n(c)} via mode-n unfolding
        2. Compute out-of-class scatter S_O^{n(c)} likewise
        3. Solve S_O^{n(c)} u = λ S_I^{n(c)} u
           and set U_c^{(n)} to top I_n' eigenvectors
    End For
    If sum_{n=1}^K || U_c^{(n)}(t) U_c^{(n)}(t–1)^T – I ||_F ≤ ε: stop
Output: {U_c^{(n)}}

Repeat for each class c to obtain class-specific projections.

The full algorithm performs independent one-vs-rest training for each class, yielding a collection of discriminative multilinear subspaces suitable for high-order data classification tasks (Tran et al., 2017).

Markdown Report Issue Upgrade to Chat

References (1)

Multilinear Class-Specific Discriminant Analysis (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Class-Specific Multilinear Discriminant Analysis (MCSDA).