Classwise Feature Learning Module (CFLM)

Updated 27 January 2026

CFLM is a family of modules designed to extract class-specific features using targeted memory, attention, or decomposition mechanisms.
It integrates into various pipelines, boosting performance in fine-grained image recognition, high-dimensional time series analysis, and remote sensing segmentation.
By focusing on per-class evidence, CFLM enhances model discrimination, leading to measurable improvements in accuracy and robustness.

The Classwise Feature Learning Module (CFLM) denotes a family of architectural units designed to extract, encode, and exploit class-specific information for enhanced discrimination in machine learning tasks. CFLM is characterized by mechanisms—memory, attention, or classwise decomposition—that explicitly focus feature processing or representation on per-class statistics or evidence, differing fundamentally from global or class-agnostic feature extractors. CFLM is now realized in diverse modalities: categorical memory modules for fine-grained image recognition (Deng et al., 2020), class-specific functional subspace projection for high-dimensional time series (Chatterjee et al., 2021), and per-class multiscale attention/decoding for robust remote sensing segmentation (Kieu et al., 20 Jan 2026).

1. Core Principles and Motivation

CFLM frameworks stem from the observation that feature distributions, discriminative patterns, and optimal evidence often demarcate themselves along class boundaries rather than remaining agnostic to label structure. This is particularly acute in fine-grained identification, small-sample high-dimensional series, or tasks with divergent modality requirements per class.

Classwise memory schemes, as in categorical memory networks, store prototypes as running means per class to pool inter-class variations and leverage similarities as additional cues (Deng et al., 2020). In functional analysis, each class admits a distinct basis of dominant principal components, facilitating projection into class-controlled low-dimensional subspaces (Chatterjee et al., 2021). In multimodal and multiresolution fusion, CFLM employs per-class spatial attention and classwise decoding pipelines to account for anisotropic and class-dependent modality importance (Kieu et al., 20 Jan 2026).

2. Architectural Instantiations

CFLM is instantiated in various neural or classical statistical pipelines. Below, key paradigm examples are summarized:

CFLM Variant	Domain / Task	Central Mechanism
Prototype Memory (CMN)	Fine-grained image	Per-class feature buffer, attention-weighted readout, augmentation (Deng et al., 2020)
Functional Classwise PCA	Functional time series	Classwise covariance eigenspace, projection (Chatterjee et al., 2021)
Classwise Attention & Decoding	Remote sensing segmentation	Per-class spatial attention, multiscale decoding (Kieu et al., 20 Jan 2026)

Categorical Memory Networks (CMN):

CFLM operates as a post-backbone memory module: (i) global-average-pooled convolutional features $f \in \mathbb{R}^D$ access an external buffer $M \in \mathbb{R}^{C \times D}$ , where each row $m_i$ tracks a class prototype. (ii) Attention $\mathbf{w}$ over prototypes is computed by a softmax-normalized similarity $w_i = \frac{\exp\bigl((f \cdot m_i)/\tau\bigr)}{\sum_{j=1}^C \exp\bigl((f \cdot m_j)/\tau\bigr)}.$ A response feature $\widetilde{m} = \sum_{i=1}^C w_i m_i$ is added to $f$ to yield $f_{\text{aug}} = f + \widetilde{m}$ , which is then classified via a fully connected layer (Deng et al., 2020).

Functional Classwise PCA:

CFLM projects each sequence (smoothed and basis-expanded) onto class-specific principal subspaces. For class $k$ , covariance operator $\mathcal{C}_k$ yields eigenfunctions $\psi_{k\ell}$ . For sample $x(t)$ , the projection $\alpha_{k\ell} = \langle x - \mu_k, \psi_{k\ell} \rangle$ over all $k, \ell$ are concatenated for classification (Chatterjee et al., 2021).

Classwise Attention/Decoding (Remote Sensing):

CFLM receives fused multiscale features $F_1,..,F_4$ . For each class, a learned query $q_k$ generates spatial attention $\alpha_k = \text{softmax}(q_k \cdot X/\sqrt{C})$ (with $X$ the flattened final feature). This mask modulates $F_4$ to produce class-specific feature $M_k$ . Each $M_k$ is decoded up through all scales using per-class decoders, with final per-class features concatenated and classified (Kieu et al., 20 Jan 2026).

3. Mathematical and Algorithmic Formulation

Each CFLM instantiation is governed by rigorously defined update and inference rules:

Memory Module Update (Image recognition):
- Training-time only, update prototype of true class $y$ via moving average:
$m_y \leftarrow m_y + \beta(f - m_y)$ - Only $m_y$ is updated, others remain unchanged. No gradients flow into $M$ .
Classwise Attention and Response:
- Attention weights $w_i$ as above.
- Readout response: $\widetilde{m} = \sum_i w_i m_i$ .
- Augmented feature: $f_{\rm aug} = f + \widetilde m$ .
Functional Classwise Projection:
- For class $k$ :
- Compute $\mathcal{C}_k$ , obtain eigenbasis $\{\psi_{k\ell}\}$ .
- Project:
$\alpha_{k\ell} = \int_T [x(t) - \mu_k(t)] \psi_{k\ell}(t) dt$ - Final representation: $z = [\alpha_{1,1},\dots,\alpha_{c,L}] \in \mathbb{R}^{cL}$ .
Remote Sensing (DIS2-CFLM):
- Attention and masking as above.
- Multiscale per-class decoding using nested convolution blocks and upsampling over fused feature hierarchies.
- Penultimate feature $Z$ aggregates per-class information for both final segmentation and as the endpoint for feature-level distillation.

4. Integration into Model Pipelines

CFLM units are highly modular, typically slotted after major feature extraction (CNN, transformer blocks, functional basis expansion, or fused encoders), and interfacing directly with classification or segmentation heads. Notably:

In CMN (Deng et al., 2020), CFLM follows the GAP stage and precedes a linear softmax head.
In functional time-series classification (Chatterjee et al., 2021), CFLM forms the entire project-classify front-end prior to LDA.
In DIS2 (Kieu et al., 20 Jan 2026), CFLM is embedded after feature fusion and before segmentation output, supplying both intermediate distilled features and final predictions.

In segmentation with missing modalities, CFLM is pivotal for class-dependent evidence selection; in fine-grained recognition, it anchors the semantics of prototype resemblance.

5. Empirical Results and Benchmarking

Performance impact of CFLM varies by domain and problem structure:

Image Classification: Adding CFLM to ResNet-50 produces consistent 2–4% top-1 accuracy improvements in fine-grained regimes (e.g., CUB-200-2011: baseline 85.8% $\to$ 88.2%, FGVC Aircraft: 90.0% $\to$ 93.8%) with negligible computational overhead (single $C \times D$ attention matrix multiply per sample) and zero added trainable parameters. No benefit, and possible degradation, is observed for generic datasets such as CIFAR-100, suggesting CFLM leverages fine category-level structure (Deng et al., 2020).
Functional Time Series: Classwise functional PCA-based CFLM is empirically validated to regularize noise, alleviate the curse of dimensionality, and retain discriminative axes for structured, small-sample (high- $D$ ) regimes (Chatterjee et al., 2021).
Remote Sensing Segmentation: DIS2 demonstrates that CFLM with classwise attention and decoding attains state-of-the-art accuracy under both full- and missing-modality scenarios, as well as strong feature and logit-level distillation alignment (Kieu et al., 20 Jan 2026).

6. Training Protocols and Practical Considerations

CFLM training incorporates:

No-gradient external updates for category memory (Deng et al., 2020).
Pooled covariance estimation and analytic eigen-decomposition in functional data (Chatterjee et al., 2021).
Deep supervision using auxiliary losses at all decoder scales and multimodal distillation objectives in segmentation (Kieu et al., 20 Jan 2026).
Standard loss choices such as cross-entropy, Dice, or LDA-based log-likelihood, with hyperparameters calibrated for each architecture and scenario.

Test-time usage typically “freezes” any learned prototypes or basis (no memory or eigenspace updates), operating in a read-only or project-only inference mode.

7. Extensions and Theoretical Implications

Several extensions are proposed:

Adaptive or neural basis learning in functional data settings (e.g., via autoencoders), nonlinear kernelization, and stacking multiple CFLM layers for deep compositional learning (Chatterjee et al., 2021).
Joint distillation and feature alignment targets (as in DLKD) for robust cross-modal learning (Kieu et al., 20 Jan 2026).
The unifying principle of CFLM—direct classwise control of feature induction—suggests broad applicability wherever class-conditional statistics diverge meaningfully from marginal distributions. A plausible implication is that CFLM or related paradigms may generalize to large-scale open-set recognition, meta/continual learning, or structured anomaly detection via class-partitioned feature control.

CFLM represents a technically versatile, domain-adaptive tool for per-class feature specialization and supervised discrimination, with mounting empirical and algorithmic evidence across domains (Deng et al., 2020, Chatterjee et al., 2021, Kieu et al., 20 Jan 2026).