Weighted and Penalized mRMR
- Weighted and Penalized mRMR is a continuous, weight-based feature selection method that balances relevance and redundancy through a quadratic objective.
- It incorporates nonconvex penalties such as SCAD and MCP to enforce sparsity and accurately recover active features in high-dimensional settings.
- The approach employs a two-stage knockoff+ procedure to control the false discovery rate and ensure oracle screening properties.
Weighted and penalized minimum Redundancy Maximum Relevance (mRMR), formalized in the SmRMR framework, denotes a class of feature selection methodologies that unifies continuous, weight-based redundancy-aware variable screening with sparsity-inducing nonconvex penalization, and formal false discovery rate (FDR) control through a multi-stage knockoff procedure. SmRMR is motivated by the need for scalable, model-free selection of relevant features in ultra-high-dimensional datasets, particularly where classical mRMR is computationally prohibitive and unable to provide explicit statistical error control (Naylor et al., 26 Aug 2025).
1. Continuous Weighted mRMR Objective
SmRMR generalizes the classic discrete mRMR approach by introducing a continuous optimization framework with feature-wise nonnegative weights. Let denote features and the target. Define an association measure that is nonnegative and zero iff the arguments are independent (e.g., Hilbert-Schmidt Independence Criterion (HSIC) or Projection Correlation). The model constructs two matrices:
- with (feature-response associations)
- with (feature-feature associations)
The continuous, unpenalized mRMR objective is
which corresponds to minimizing
Feature weights directly capture each variable's overall redundancy-aware utility.
2. Nonconvex Penalization and Sparse Optimization
To induce sparsity (i.e., automatic exclusion of inactive features), SmRMR augments with a componentwise penalty:
where is a nonconvex regularizer such as:
- SCAD (Smoothly Clipped Absolute Deviation; param )
- MCP (Minimax Concave Penalty; param )
The explicit forms are:
| Penalty | Derivative | |
|---|---|---|
| SCAD | , <br>, <br>, | |
| MCP | , <br>, |
Asymptotically, or recovers LASSO. Nonconvex penalties sharpen zeroing of truly inactive coefficients, with the sparsity pattern in providing the support of relevant features.
3. Numerical Algorithm: Local Linear Approximation
SmRMR leverages the Local Linear Approximation (LLA) algorithm to address the nonconvexity of SCAD/MCP:
- Initialize by solving the nonnegative LASSO (i.e., ).
- Iterate for :
- (a) Compute ,
- (b) Solve the convex weighted-LASSO problem (via coordinate descent or QP),
- (c) If , terminate.
- Return .
Typically, suffices for convergence. This iterative reweighting adapts the penalty based on the current coefficient estimates, promoting sharper recovery of the feature support.
4. Multi-Stage Knockoff+ Filter for FDR Control
SmRMR integrates a two-stage knockoff pipeline to ensure explicit control of the false discovery rate at a user-specified level : Stage 1: Screening.
- Randomly split the data .
- Apply penalized mRMR to samples; retain a working set of size with .
Stage 2: Knockoff+ Selection.
- Generate model-X knockoff variables for (e.g., using the equi-correlated construction).
- On samples, solve penalized mRMR over all and , yielding weights , .
- Compute .
- Define the knockoff threshold at FDR level :
- Select . Theoretical guarantees (Thm 3.3) ensure .
5. Oracle Screening Properties and Theoretical Guarantees
Consider a sequence of problems with features, true active set size , and oracle pseudo-truth (with for , otherwise). Key assumptions include uniqueness of , bounded eigenvalues of , and regularity of the penalty:
- Minimum signal:
Under these, Theorem 3.1 shows existence of a local minimizer with
Theorem 3.2 (sparsistency) asserts that, for SCAD/MCP, if and ,
i.e., all inactive features are correctly excluded as (Naylor et al., 26 Aug 2025). This provides formal oracle-screening guarantees in high-dimensional regimes.
6. FDR Threshold Choice and Output Guarantees
The FDR level is a direct control lever for the model complexity. Practical choices are ; lower values yield sparser selections and lower FDP but may reduce power. If no variables are selected for a conservative , a variant “SmRMR” procedure increments until at least one selection is made, trading a marginal FDR increase for guaranteed nonempty sets. Cross-validation for is performed only during the knockoff stage, not at initial screening.
7. Empirical Comparisons and Implementation Considerations
SmRMR with SCAD/MCP matches the predictive accuracy of HSIC-LASSO while selecting $2$– fewer features and maintaining FDR below —a more conservative selection than HSIC-LASSO, which often has higher FDP. Classical discrete mRMR is impractical for large and less effective in redundancy control. On GWAS and high-dimensional biological datasets, SmRMR yields similar or better test performance than HSIC-LASSO with fewer selected SNPs/genes, simplifying interpretation.
Key implementation considerations:
- Each mRMR fit incurs complexity (kernel or -statistic construction); block HSIC or projection correlation can accelerate screening.
- Knockoff requires , necessitating the two-stage split; data recycling may increase power.
- Projection correlation (PC) may be preferable over HSIC for heavy-tailed or non-Euclidean data.
Code is publicly available at https://github.com/PeterJackNaylor/SmRMR (Naylor et al., 26 Aug 2025).