Penalized mRMR for Sparse Feature Selection
- Penalized mRMR is a feature selection framework that integrates target relevance and inter-feature redundancy control via a continuous, penalized optimization formulation.
- It reformulates the discrete mRMR criterion using dependency measures like mutual information, applying LASSO, SCAD, or MCP penalties to encourage sparsity.
- By leveraging nonconvex penalties and a knockout filter for FDR control, the method robustly identifies informative features in high-dimensional settings.
Penalized Minimum Redundancy Maximum Relevance (mRMR) refers to a family of feature selection methodologies for high-dimensional data that aim to extract subsets of features that are simultaneously maximally relevant to the target variable and minimally redundant with respect to each other. The penalized mRMR principle extends classical mRMR by incorporating explicit penalization—typically via continuous optimization with regularization or via explicit penalty parameters—so as to provide sharper control over feature sparsity, redundancy, and stability, including guarantees such as false discovery rate (FDR) control.
1. Mathematical Formulation
At the core of penalized mRMR methods is the reinterpretation of the discrete mRMR objective as a continuous penalized optimization. The standard mRMR criterion selects a subset of features maximizing the trade-off between relevance and redundancy: where is a dependency measure (e.g., mutual information, HSIC).
The penalized mRMR procedure introduces a vector of relaxation parameters . The loss to be minimized is
with a V-statistic estimator for an association measure, and (continuous, non-negative) representing the importance of feature . Sparsity is induced by a penalty , where is, e.g., the LASSO (), SCAD, or MCP nonconvex regularizers: This convex–nonconvex hybrid framework ensures that features with insufficient marginal utility or redundant information are assigned zero coefficients.
A table summarizes choices:
Term | Description | Typical Choices |
---|---|---|
Dependency measure | MI, HSIC, projection correlation | |
Penalty | LASSO / SCAD / MCP | |
Feature coefficient (relax.) | Continuous, |
2. Feature Selection Mechanism and Sparsity
This penalized framework achieves feature selection by driving many to zero. Nonconvex penalties (such as SCAD or MCP) are explicitly designed to ensure:
- Small coefficients are shrunk towards zero (eliminating inactive features).
- Large, informative coefficients face negligible penalty (avoiding estimation bias).
- Sparsistency: Under appropriate regularity, the method consistently identifies the true set of non-informative features (i.e., those with ).
Features with high and low redundancy for selected will be retained. Inactive or highly redundant features with low added value relative to the penalty threshold are systematically eliminated.
3. FDR Control via Knockoff Multi-Stage Selection
To control false discoveries, the penalized mRMR pipeline incorporates a multi-stage procedure using the knockoff filter. The procedure is:
(a) Knockoff construction: Generate, for each feature , an auxiliary knockoff with coordinated statistical properties (same means/covariances).
(b) Statistic computation: For each feature, compute the knockoff statistic , where is the estimated coefficient for and that for .
(c) Thresholding: Set a level for FDR control, and select features such that , where is the minimal threshold ensuring
This adaptive procedure ensures—conditional on the screening—that the expected FDR among the selected features does not exceed the user-specified .
In high-dimensional settings (), a data splitting step is used: pre-screening is run on a subset to ensure $2p < n$ before knockoff construction, and selected features are merged with the main data for final selection.
4. Comparison with HSIC-LASSO and Related Methods
Penalized mRMR shares a conceptual structure with other dependency-based sparse methods such as HSIC-LASSO. Both:
- Use a kernel-based measure of feature–target dependence.
- Penalize redundancy via pairwise similarity in the quadratic term.
- Rely on or nonconvex penalties for sparsity.
Distinctive aspects of penalized mRMR:
- Tends to be more conservative in the number of retained features, yielding sparser models.
- Feature selection depends only on specifying an FDR threshold, not the explicit number of features.
- Use of nonconvex penalties ensures better support recovery compared to HSIC-LASSO's LASSO-only regime, especially under strong feature correlations.
Simulation studies and real high-dimensional biological data show that penalized mRMR usually selects a smaller (or comparable) set of active features with similar classification accuracy and improved FDR control compared to HSIC-LASSO.
5. Practical Implementation and Usage
Penalty/solver: LASSO-penalized variants are convex and can be implemented using standard solvers (e.g., CVXPY
). Nonconvex penalties (SCAD, MCP) require custom solvers, commonly solved with the local linear approximation (LLA) algorithm, which can be initialized with the LASSO path.
Parameter selection:
- (regularization strength) and (FDR threshold) are chosen via cross-validation or a hold-out validation set.
- If the knockoff filter finds no active features at a candidate FDR level, the threshold is relaxed and the procedure is repeated.
Association measure: The method is agnostic to the association measure —projection correlation and normalized HSIC are both shown to work well.
High-dimensional adaptation: For , data splitting and screening are essential to satisfy the model-X knockoff's requirement ($2p Software: Reference implementation is provided at https://github.com/PeterJackNaylor/SmRMR (Naylor et al., 26 Aug 2025). Empirical evaluation on synthetic processes (linear, nonlinear, discrete response) and real-world datasets (gene expression, GWAS, high-dimensional images) demonstrates: This suggests penalized mRMR is especially suitable in scientific settings where controlling the number of discoveries is critical and model parsimony is valued. A plausible implication is that advances in knockoff generation for and scalable nonconvex optimization will further enhance the applicability of penalized mRMR to ultra-high-dimensional biological and sensor data. In summary, the penalized mRMR approach reinterprets feature selection as a continuous, sparsity-inducing optimization problem in which feature importance is determined through the joint minimization of redundancy and maximization of target relevance—augmented by explicit FDR control in the presence of correlation structure—offering a robust platform for discovery-oriented variable selection in modern high-dimensional data regimes (Naylor et al., 26 Aug 2025).6. Empirical Performance and Applications
7. Limitations and Considerations