Weighted and Penalized mRMR

Updated 4 March 2026

Weighted and Penalized mRMR is a continuous, weight-based feature selection method that balances relevance and redundancy through a quadratic objective.
It incorporates nonconvex penalties such as SCAD and MCP to enforce sparsity and accurately recover active features in high-dimensional settings.
The approach employs a two-stage knockoff+ procedure to control the false discovery rate and ensure oracle screening properties.

Weighted and penalized minimum Redundancy Maximum Relevance (mRMR), formalized in the SmRMR framework, denotes a class of feature selection methodologies that unifies continuous, weight-based redundancy-aware variable screening with sparsity-inducing nonconvex penalization, and formal false discovery rate (FDR) control through a multi-stage knockoff procedure. SmRMR is motivated by the need for scalable, model-free selection of relevant features in ultra-high-dimensional datasets, particularly where classical mRMR is computationally prohibitive and unable to provide explicit statistical error control (Naylor et al., 26 Aug 2025).

1. Continuous Weighted mRMR Objective

SmRMR generalizes the classic discrete mRMR approach by introducing a continuous optimization framework with feature-wise nonnegative weights. Let $X_1,\ldots,X_p$ denote $p$ features and $Y$ the target. Define an association measure $D(\cdot,\cdot)$ that is nonnegative and zero iff the arguments are independent (e.g., Hilbert-Schmidt Independence Criterion (HSIC) or Projection Correlation). The model constructs two matrices:

$R_{yX} \in \mathbb{R}^p$ with $(R_{yX})_j = D(X_j, Y)$ (feature-response associations)
$R_{XX} \in \mathbb{R}^{p \times p}$ with $(R_{XX})_{j,k} = D(X_j, X_k)$ (feature-feature associations)

The continuous, unpenalized mRMR objective is

$\max_{w \geq 0} Q(w) := w^\top R_{yX} - \frac{1}{2} w^\top R_{XX} w,$

which corresponds to minimizing

$\mathcal{L}_0(w) := -w^\top R_{yX} + \frac{1}{2} w^\top R_{XX} w.$

Feature weights $w = (w_1,\ldots,w_p)^\top \in \mathbb{R}_+^p$ directly capture each variable's overall redundancy-aware utility.

2. Nonconvex Penalization and Sparse Optimization

To induce sparsity (i.e., automatic exclusion of inactive features), SmRMR augments $\mathcal{L}_0(w)$ with a componentwise penalty:

$\min_{w \geq 0} \mathcal{L}(w) := \mathcal{L}_0(w) + \sum_{j=1}^p P_\lambda(w_j),$

where $P_\lambda$ is a nonconvex regularizer such as:

SCAD (Smoothly Clipped Absolute Deviation; param $a>2$ )
MCP (Minimax Concave Penalty; param $b>0$ )

The explicit forms are:

Penalty	$P_\lambda(x)$	Derivative $\partial P_\lambda(x)$
SCAD	$\lambda x$ , $0\leq x\leq\lambda$ <br> $( -x^2 + 2a\lambda x - \lambda^2 )/2(a-1)$ , $\lambda < x \leq a\lambda$ <br> $(a+1)\lambda^2/2$ , $x>a\lambda$	$\lambda\,\mathbf{1}_{x\leq\lambda} + \left( (a\lambda - x)_+ / (a-1)\right) \mathbf{1}_{x>\lambda}$
MCP	$\lambda x - x^2/(2b)$ , $0\leq x\leq b\lambda$ <br> $b\lambda^2/2$ , $x>b\lambda$	$(\lambda - x/b)_+$

Asymptotically, $a\rightarrow\infty$ or $b\rightarrow\infty$ recovers LASSO. Nonconvex penalties sharpen zeroing of truly inactive coefficients, with the sparsity pattern in $w$ providing the support of relevant features.

3. Numerical Algorithm: Local Linear Approximation

SmRMR leverages the Local Linear Approximation (LLA) algorithm to address the nonconvexity of SCAD/MCP:

Initialize $w^{(0)}$ by solving the nonnegative LASSO (i.e., $P_\lambda(x)=\lambda x$ ).
Iterate for $s=1,\dots,M$ $s = 1, \dots, M$ :
- (a) Compute $v_j = \partial P_\lambda(w_j^{(s-1)})$ ,
- (b) Solve the convex weighted-LASSO problem $\min_{w\geq 0} \mathcal{L}_0(w) + \sum_j v_j w_j$ (via coordinate descent or QP),
- (c) If $\|w^{(s)}-w^{(s-1)}\|_2 < \epsilon$ , terminate.
Return $w^{(s)}$ .

Typically, $M=2$ suffices for convergence. This iterative reweighting adapts the penalty based on the current coefficient estimates, promoting sharper recovery of the feature support.

4. Multi-Stage Knockoff+ Filter for FDR Control

SmRMR integrates a two-stage knockoff pipeline to ensure explicit control of the false discovery rate at a user-specified level $\alpha$ : Stage 1: Screening.

Randomly split the data $(n_0, n_1)$ .
Apply penalized mRMR to $n_0$ samples; retain a working set $S_0$ of size $s_0$ with $2s_0 < n_1$ .

Stage 2: Knockoff+ Selection.

Generate model-X knockoff variables $\widetilde X_j$ for $j \in S_0$ (e.g., using the equi-correlated construction).
On $n_1$ samples, solve penalized mRMR over all $X_{S_0}$ and $\widetilde X_{S_0}$ , yielding weights $\widehat w_j$ , $\widetilde w_j$ .
Compute $W_j = \widehat w_j - \widetilde w_j$ .
Define the knockoff threshold at FDR level $\alpha$ :

$T(\alpha) = \min\left\{ t>0: \frac{1 + \#\{j: W_j\leq -t\}}{\max\{1,\#\{j:W_j\geq t\}\}} \leq \alpha \right\}$

Select $\widehat S(\alpha) = \{ j \in S_0 : W_j \geq T(\alpha)\}$ . Theoretical guarantees (Thm 3.3) ensure $E[\mathrm{FDP}\mid\text{all true features}\in S_0] \leq \alpha$ .

5. Oracle Screening Properties and Theoretical Guarantees

Consider a sequence of problems with $p_n$ features, true active set size $s_n$ , and oracle pseudo-truth $\theta_n^0$ (with $\theta_{n,j}^0\neq 0$ for $j\leq s_n$ , $=0$ otherwise). Key assumptions include uniqueness of $\theta_n^0$ , bounded eigenvalues of $R_{XX}$ , and regularity of the penalty:

$a_n := \max_{j\in S_n}\partial_2 P_\lambda(\theta_{n,j}^0)=O(n^{-1/2})$
$b_n := \max_{j\in S_n} \partial_{22}^2 P_\lambda(\theta_{n,j}^0)\to 0$
Minimum signal: $\min_{j\in S_n} \theta_{n,j}^0/\lambda_n \to \infty$

Under these, Theorem 3.1 shows existence of a local minimizer $\hat\theta_n$ with

$\|\hat\theta_n - \theta_n^0\|_2 = O_p\left( \sqrt{p_n s_n^2\log p_n} (n^{-1/2} + a_n) \right).$

Theorem 3.2 (sparsistency) asserts that, for SCAD/MCP, if $\lambda_n\to 0$ and $\sqrt{n/(p_ns_n^2\log p_n)} \lambda_n \to \infty$ ,

$P\left( \hat\theta_{n,j}=0 \text{ for all } j>s_n \right) \to 1,$

i.e., all inactive features are correctly excluded as $n\to\infty$ (Naylor et al., 26 Aug 2025). This provides formal oracle-screening guarantees in high-dimensional regimes.

6. FDR Threshold Choice and Output Guarantees

The FDR level $\alpha$ is a direct control lever for the model complexity. Practical choices are $\alpha \in [0.2, 0.4]$ ; lower values yield sparser selections and lower FDP but may reduce power. If no variables are selected for a conservative $\alpha$ , a variant “SmRMR $_2$ ” procedure increments $\alpha$ until at least one selection is made, trading a marginal FDR increase for guaranteed nonempty sets. Cross-validation for $\lambda_n$ is performed only during the knockoff stage, not at initial screening.

7. Empirical Comparisons and Implementation Considerations

SmRMR with SCAD/MCP matches the predictive accuracy of HSIC-LASSO while selecting $2$– $4\times$ fewer features and maintaining FDR below $\alpha$ —a more conservative selection than HSIC-LASSO, which often has higher FDP. Classical discrete mRMR is impractical for large $p$ and less effective in redundancy control. On GWAS and high-dimensional biological datasets, SmRMR yields similar or better test performance than HSIC-LASSO with fewer selected SNPs/genes, simplifying interpretation.

Key implementation considerations:

Each mRMR fit incurs $O(n^2p)$ complexity (kernel or $V$ -statistic construction); block HSIC or projection correlation can accelerate screening.
Knockoff requires $n_1>2s_0$ , necessitating the two-stage split; data recycling may increase power.
Projection correlation (PC $^2$ ) may be preferable over HSIC for heavy-tailed or non-Euclidean data.

Code is publicly available at https://github.com/PeterJackNaylor/SmRMR (Naylor et al., 26 Aug 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Sparse minimum Redundancy Maximum Relevance for feature selection (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Weighted and Penalized mRMR.

Weighted and Penalized mRMR

1. Continuous Weighted mRMR Objective

2. Nonconvex Penalization and Sparse Optimization

3. Numerical Algorithm: Local Linear Approximation

4. Multi-Stage Knockoff+ Filter for FDR Control

5. Oracle Screening Properties and Theoretical Guarantees

6. FDR Threshold Choice and Output Guarantees

7. Empirical Comparisons and Implementation Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Weighted and Penalized mRMR

1. Continuous Weighted mRMR Objective

2. Nonconvex Penalization and Sparse Optimization

3. Numerical Algorithm: Local Linear Approximation

4. Multi-Stage Knockoff+ Filter for FDR Control

5. Oracle Screening Properties and Theoretical Guarantees

6. FDR Threshold Choice and Output Guarantees

7. Empirical Comparisons and Implementation Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research