MAUC Decomposition Based Feature Selection
- MDFS is a filter-based feature selection method that optimizes the multi-class AUC by decomposing the problem into binary subproblems for equitable class pair evaluation.
- It uses an interleaved selection strategy based on per-pair AUC scores, mitigating the bias from easily separable class pairs and addressing imbalanced scenarios.
- Variants like MDFS-1D and MDFS-2D extend its application to high-dimensional data while highlighting challenges in computational cost and selection stability.
MAUC Decomposition Based Feature Selection (MDFS) is a filter-based feature selection method specifically designed to maximize the multi-class Area Under the ROC Curve (MAUC). Traditional feature selection methods have previously focused on maximizing classification accuracy, but this metric has been recognized as insufficient for many multiclass and imbalanced scenarios. MDFS addresses this need by decomposing the MAUC objective into binary subproblems, ensuring all pairwise class distinctions contribute equitably to the selection of features optimized for MAUC (Wang et al., 2011).
1. Theoretical Foundations and MAUC Definition
In the multiclass setting with classes, the MAUC generalizes the binary class AUC by considering every pair of classes. A classifier maps an instance to a vector of confidence scores . For a given ordered pair , AUC is computed by treating class as positive and class as negative, using the -th component as the score. The overall MAUC is defined as:
This unweighted averaging ensures each class pair is treated symmetrically, which is critical for fair evaluation and selection, especially in imbalanced multiclass problems. Optimizing for MAUC thus requires careful consideration of performance across all pairs, not just dominant or well-separated classes (Wang et al., 2011).
2. MDFS Feature Scoring and Selection Mechanism
MDFS assigns a score to each feature based on its ability to discriminate between classes, quantified as the AUC it achieves for each class pair. Specifically, for every feature and every class pair , compute:
where is 1 if , 0.5 if , and 0 otherwise, and denotes the value of feature for instance .
Unlike simple averaging, MDFS preserves the per-pair feature rankings and employs an interleaved selection strategy (below) to ensure equal attention across all pairs, avoiding bias toward features that primarily discriminate easily separable class pairs. This addresses the “siren pitfall” encountered in direct MAUC or accuracy-based selection, where features good for easy subproblems dominate the selection, to the detriment of difficult pairs (Wang et al., 2011).
3. MDFS Algorithm and Computational Properties
The MDFS algorithm proceeds as follows:
- Initialization: For each ordered class pair , construct the data subset .
- Feature Scoring: For each feature and pair , compute and rank features descendingly to construct ranking lists .
- Interleaved Selection: While the feature set has fewer than elements:
- Randomly select a class pair .
- Choose the highest-ranked feature from not already in and add it to .
- Termination: Return once .
The computational complexity of MDFS is dominated by AUC calculations:
- For features, samples, and classes, the overall runtime is .
- The random scheduling and interleaving provide equitable pairwise coverage with minimal computational overhead (Wang et al., 2011).
4. Extensions: MDFS Variants for High-Dimensional and Correlated Data
Polewko-Klim and Rudnicki adapted MDFS for high-dimensional RNA-Seq data by employing both one-dimensional (MDFS-1D) and two-dimensional (MDFS-2D) scoring mechanisms (Polewko-Klim et al., 2020):
- MDFS-1D: Each feature is tested in isolation. Mutual information is estimated (typically by histogram or -nearest neighbor) and converted to a -value, which is used for ranking.
- MDFS-2D: Each pair is evaluated jointly, estimating , with the lowest -value among all pairs involving used as the feature’s score. This incorporates feature–feature synergy.
On high-dimensional (e.g., features) data, exhaustive pairwise scoring in MDFS-2D is computationally intensive (), necessitating GPU acceleration. No explicit regularization is conducted within MDFS; practitioners remove highly correlated features after ranking, typically by thresholding Spearman’s (e.g., ) (Polewko-Klim et al., 2020).
5. Empirical Performance and Stability
Extensive empirical evaluation demonstrates MDFS’s advantages and limitations:
- Performance: Across multiple multiclass benchmarks (e.g., ISOLET, MNIST, and TCGA RNA-Seq datasets), MDFS consistently achieves higher MAUC than competitive baselines, especially when the number of selected features is moderate (). Notably, the variant (MDFS-1D or MDFS-2D) yielding best results varies by dataset (Wang et al., 2011, Polewko-Klim et al., 2020).
- Stability: MDFS-2D, while capable of detecting strong synergistic biomarkers (notably in KIRC data), exhibits low stability in the selected feature set unless restricted to a small number (). MDFS-1D demonstrates slightly better stability but remains inferior to non-information-theoretic methods such as the -test (Polewko-Klim et al., 2020).
- Scalability: MDFS-1D is computationally efficient ( for mutual information estimates), but MDFS-2D requires substantial resources at large .
Table: Summary of MDFS Experimental Results (RNA-Seq Cancer Data) (Polewko-Klim et al., 2020)
| Dataset | Best Method | Features | Top AUC |
|---|---|---|---|
| BRCA | -test | 0.995 | |
| HNSC | MDFS-1D | — | |
| KIRC | MDFS-2D | — | |
| LUAD | equal 1D/UT | — | — |
MDFS ensemble (union of top- sets by all selectors) yields intermediate AUC, never surpassing the best individual selector.
6. Practical Recommendations and Limitations
- Classifier Agnosticism: MDFS is not tied to a particular classifier; it performs robustly with nearest-neighbor, tree-based, probabilistic, and SVM classifiers.
- Imbalanced and Cost-Sensitive Contexts: Because the MAUC metric does not depend on class priors, MDFS is especially suitable for imbalanced and cost-sensitive scenarios (Wang et al., 2011).
- Feature Redundancy: While MDFS does not explicitly account for inter-feature redundancy, post-selection pruning (e.g., with mRMR) can mitigate redundancy at minimal cost.
- Feature Types: Numeric or ordinal-valued features are required; nominal features should be discretized.
- Hyperparameter Tuning: MDFS itself has no regularization parameters, but MI estimation requires binning or -nearest neighbor choice. Practitioners should validate on held-out folds and monitor AUC/ to select an appropriate feature subset size.
Limitations include the high computational demands of MDFS-2D for large , low selection stability—especially when many features are selected—and strong dependence of optimal MDFS variant on the specific data domain (Polewko-Klim et al., 2020).
7. Comparative Analysis and Related Methods
Empirical analyses consistently demonstrate that MDFS surpasses traditional filter methods—such as chi-square, symmetrical uncertainty, ReliefF, and direct MAUC ranking—on the MAUC metric. The critical distinction is MDFS’s one-vs-one decomposition and randomized interleaving, which circumvents the tendency for “easy” class pairs to dominate selection (“siren pitfall”).
A plausible implication is that, in domains where discriminative difficulty is highly unbalanced across class pairs, MDFS offers a principled, efficient solution that achieves stronger generalization on the MAUC objective. If further reduction in feature set redundancy is desired, complementary use with mRMR or post-hoc thresholding on pairwise correlations may be beneficial (Wang et al., 2011, Polewko-Klim et al., 2020).