MAUC Decomposition Based Feature Selection

Updated 22 February 2026

MDFS is a filter-based feature selection method that optimizes the multi-class AUC by decomposing the problem into binary subproblems for equitable class pair evaluation.
It uses an interleaved selection strategy based on per-pair AUC scores, mitigating the bias from easily separable class pairs and addressing imbalanced scenarios.
Variants like MDFS-1D and MDFS-2D extend its application to high-dimensional data while highlighting challenges in computational cost and selection stability.

MAUC Decomposition Based Feature Selection (MDFS) is a filter-based feature selection method specifically designed to maximize the multi-class Area Under the ROC Curve (MAUC). Traditional feature selection methods have previously focused on maximizing classification accuracy, but this metric has been recognized as insufficient for many multiclass and imbalanced scenarios. MDFS addresses this need by decomposing the MAUC objective into binary subproblems, ensuring all pairwise class distinctions contribute equitably to the selection of features optimized for MAUC (Wang et al., 2011).

1. Theoretical Foundations and MAUC Definition

In the multiclass setting with $c$ classes, the MAUC generalizes the binary class AUC by considering every pair of classes. A classifier $h$ maps an instance $x$ to a vector of confidence scores $(h_1(x),...,h_c(x))$ . For a given ordered pair $(i,j)$ , AUC $_{ij}$ is computed by treating class $i$ as positive and class $j$ as negative, using the $i$ -th component $h_i(x)$ as the score. The overall MAUC is defined as:

$MAUC = \frac{2}{c(c-1)} \sum_{1\le i < j \le c} \frac{AUC_{ij} + AUC_{ji}}{2}$

This unweighted averaging ensures each class pair is treated symmetrically, which is critical for fair evaluation and selection, especially in imbalanced multiclass problems. Optimizing for MAUC thus requires careful consideration of performance across all pairs, not just dominant or well-separated classes (Wang et al., 2011).

2. MDFS Feature Scoring and Selection Mechanism

MDFS assigns a score to each feature based on its ability to discriminate between classes, quantified as the AUC it achieves for each class pair. Specifically, for every feature $f_k$ and every class pair $(i,j)$ , compute:

$AUC_{ij}(f_k) = \frac{1}{|D_i||D_j|} \sum_{x \in D_i} \sum_{x^\prime \in D_j} s(x^{(k)}, x^{\prime(k)})$

where $s(a,b)$ is 1 if $a > b$ , 0.5 if $a = b$ , and 0 otherwise, and $x^{(k)}$ denotes the value of feature $k$ for instance $x$ .

Unlike simple averaging, MDFS preserves the per-pair feature rankings and employs an interleaved selection strategy (below) to ensure equal attention across all pairs, avoiding bias toward features that primarily discriminate easily separable class pairs. This addresses the “siren pitfall” encountered in direct MAUC or accuracy-based selection, where features good for easy subproblems dominate the selection, to the detriment of difficult pairs (Wang et al., 2011).

3. MDFS Algorithm and Computational Properties

The MDFS algorithm proceeds as follows:

Initialization: For each ordered class pair $(i,j)$ , construct the data subset $D_{ij}$ .
Feature Scoring: For each feature $f_k$ and pair $(i,j)$ , compute $AUC_{ij}(f_k)$ and rank features descendingly to construct ranking lists $L_{ij}$ .
Interleaved Selection: While the feature set $S$ $S$ has fewer than $K$ $K$ elements:
- Randomly select a class pair $(i,j)$ .
- Choose the highest-ranked feature from $L_{ij}$ not already in $S$ and add it to $S$ .
Termination: Return $S$ once $|S|=K$ .

The computational complexity of MDFS is dominated by AUC calculations:

For $m$ features, $n$ samples, and $c$ classes, the overall runtime is $O(m\,n\,\log n)$ .
The random scheduling and interleaving provide equitable pairwise coverage with minimal computational overhead (Wang et al., 2011).

4. Extensions: MDFS Variants for High-Dimensional and Correlated Data

Polewko-Klim and Rudnicki adapted MDFS for high-dimensional RNA-Seq data by employing both one-dimensional (MDFS-1D) and two-dimensional (MDFS-2D) scoring mechanisms (Polewko-Klim et al., 2020):

MDFS-1D: Each feature $X_k$ is tested in isolation. Mutual information $I(X_k;Y)$ is estimated (typically by histogram or $k$ -nearest neighbor) and converted to a $p$ -value, which is used for ranking.
MDFS-2D: Each pair $(X_k, X_\ell)$ is evaluated jointly, estimating $I((X_k,X_\ell);Y)$ , with the lowest $p$ -value among all pairs involving $X_k$ used as the feature’s score. This incorporates feature–feature synergy.

On high-dimensional (e.g., $p\approx 20,000$ features) data, exhaustive pairwise scoring in MDFS-2D is computationally intensive ( $O(p^2)$ ), necessitating GPU acceleration. No explicit regularization is conducted within MDFS; practitioners remove highly correlated features after ranking, typically by thresholding Spearman’s $|r|$ (e.g., $|r|>0.75$ ) (Polewko-Klim et al., 2020).

5. Empirical Performance and Stability

Extensive empirical evaluation demonstrates MDFS’s advantages and limitations:

Performance: Across multiple multiclass benchmarks (e.g., ISOLET, MNIST, and TCGA RNA-Seq datasets), MDFS consistently achieves higher MAUC than competitive baselines, especially when the number of selected features is moderate ( $N\approx 15–40$ ). Notably, the variant (MDFS-1D or MDFS-2D) yielding best results varies by dataset (Wang et al., 2011, Polewko-Klim et al., 2020).
Stability: MDFS-2D, while capable of detecting strong synergistic biomarkers (notably in KIRC data), exhibits low stability in the selected feature set unless restricted to a small number ( $N\lesssim 10$ ). MDFS-1D demonstrates slightly better stability but remains inferior to non-information-theoretic methods such as the $U$ -test (Polewko-Klim et al., 2020).
Scalability: MDFS-1D is computationally efficient ( $O(p \cdot T_1)$ for $T_1$ mutual information estimates), but MDFS-2D requires substantial resources at large $p$ .

Table: Summary of MDFS Experimental Results (RNA-Seq Cancer Data) (Polewko-Klim et al., 2020)

Dataset	Best Method	Features $N$	Top AUC
BRCA	$U$ -test	$\approx 20$	0.995
HNSC	MDFS-1D	$\approx 30$	—
KIRC	MDFS-2D	$\approx 25$	—
LUAD	$\sim$ equal 1D/UT	—	—

MDFS ensemble (union of top- $N$ sets by all selectors) yields intermediate AUC, never surpassing the best individual selector.

6. Practical Recommendations and Limitations

Classifier Agnosticism: MDFS is not tied to a particular classifier; it performs robustly with nearest-neighbor, tree-based, probabilistic, and SVM classifiers.
Imbalanced and Cost-Sensitive Contexts: Because the MAUC metric does not depend on class priors, MDFS is especially suitable for imbalanced and cost-sensitive scenarios (Wang et al., 2011).
Feature Redundancy: While MDFS does not explicitly account for inter-feature redundancy, post-selection pruning (e.g., with mRMR) can mitigate redundancy at minimal cost.
Feature Types: Numeric or ordinal-valued features are required; nominal features should be discretized.
Hyperparameter Tuning: MDFS itself has no regularization parameters, but MI estimation requires binning or $k$ -nearest neighbor choice. Practitioners should validate on held-out folds and monitor AUC/ $N$ to select an appropriate feature subset size.

Limitations include the high computational demands of MDFS-2D for large $p$ , low selection stability—especially when many features are selected—and strong dependence of optimal MDFS variant on the specific data domain (Polewko-Klim et al., 2020).

Empirical analyses consistently demonstrate that MDFS surpasses traditional filter methods—such as chi-square, symmetrical uncertainty, ReliefF, and direct MAUC ranking—on the MAUC metric. The critical distinction is MDFS’s one-vs-one decomposition and randomized interleaving, which circumvents the tendency for “easy” class pairs to dominate selection (“siren pitfall”).

A plausible implication is that, in domains where discriminative difficulty is highly unbalanced across class pairs, MDFS offers a principled, efficient solution that achieves stronger generalization on the MAUC objective. If further reduction in feature set redundancy is desired, complementary use with mRMR or post-hoc thresholding on pairwise correlations may be beneficial (Wang et al., 2011, Polewko-Klim et al., 2020).

Markdown Report Issue Upgrade to Chat

References (2)

Feature Selection for MAUC-Oriented Classification Systems (2011)

Analysis of ensemble feature selection for correlated high-dimensional RNA-Seq cancer data (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MAUC Decomposition Based Feature Selection (MDFS).