Papers
Topics
Authors
Recent
Search
2000 character limit reached

Weighted and Penalized mRMR

Updated 4 March 2026
  • Weighted and Penalized mRMR is a continuous, weight-based feature selection method that balances relevance and redundancy through a quadratic objective.
  • It incorporates nonconvex penalties such as SCAD and MCP to enforce sparsity and accurately recover active features in high-dimensional settings.
  • The approach employs a two-stage knockoff+ procedure to control the false discovery rate and ensure oracle screening properties.

Weighted and penalized minimum Redundancy Maximum Relevance (mRMR), formalized in the SmRMR framework, denotes a class of feature selection methodologies that unifies continuous, weight-based redundancy-aware variable screening with sparsity-inducing nonconvex penalization, and formal false discovery rate (FDR) control through a multi-stage knockoff procedure. SmRMR is motivated by the need for scalable, model-free selection of relevant features in ultra-high-dimensional datasets, particularly where classical mRMR is computationally prohibitive and unable to provide explicit statistical error control (Naylor et al., 26 Aug 2025).

1. Continuous Weighted mRMR Objective

SmRMR generalizes the classic discrete mRMR approach by introducing a continuous optimization framework with feature-wise nonnegative weights. Let X1,,XpX_1,\ldots,X_p denote pp features and YY the target. Define an association measure D(,)D(\cdot,\cdot) that is nonnegative and zero iff the arguments are independent (e.g., Hilbert-Schmidt Independence Criterion (HSIC) or Projection Correlation). The model constructs two matrices:

  • RyXRpR_{yX} \in \mathbb{R}^p with (RyX)j=D(Xj,Y)(R_{yX})_j = D(X_j, Y) (feature-response associations)
  • RXXRp×pR_{XX} \in \mathbb{R}^{p \times p} with (RXX)j,k=D(Xj,Xk)(R_{XX})_{j,k} = D(X_j, X_k) (feature-feature associations)

The continuous, unpenalized mRMR objective is

maxw0Q(w):=wRyX12wRXXw,\max_{w \geq 0} Q(w) := w^\top R_{yX} - \frac{1}{2} w^\top R_{XX} w,

which corresponds to minimizing

L0(w):=wRyX+12wRXXw.\mathcal{L}_0(w) := -w^\top R_{yX} + \frac{1}{2} w^\top R_{XX} w.

Feature weights w=(w1,,wp)R+pw = (w_1,\ldots,w_p)^\top \in \mathbb{R}_+^p directly capture each variable's overall redundancy-aware utility.

2. Nonconvex Penalization and Sparse Optimization

To induce sparsity (i.e., automatic exclusion of inactive features), SmRMR augments L0(w)\mathcal{L}_0(w) with a componentwise penalty:

minw0L(w):=L0(w)+j=1pPλ(wj),\min_{w \geq 0} \mathcal{L}(w) := \mathcal{L}_0(w) + \sum_{j=1}^p P_\lambda(w_j),

where PλP_\lambda is a nonconvex regularizer such as:

  • SCAD (Smoothly Clipped Absolute Deviation; param a>2a>2)
  • MCP (Minimax Concave Penalty; param b>0b>0)

The explicit forms are:

Penalty Pλ(x)P_\lambda(x) Derivative Pλ(x)\partial P_\lambda(x)
SCAD λx\lambda x, 0xλ0\leq x\leq\lambda<br>(x2+2aλxλ2)/2(a1)( -x^2 + 2a\lambda x - \lambda^2 )/2(a-1), λ<xaλ\lambda < x \leq a\lambda<br>(a+1)λ2/2(a+1)\lambda^2/2, x>aλx>a\lambda λ1xλ+((aλx)+/(a1))1x>λ\lambda\,\mathbf{1}_{x\leq\lambda} + \left( (a\lambda - x)_+ / (a-1)\right) \mathbf{1}_{x>\lambda}
MCP λxx2/(2b)\lambda x - x^2/(2b), 0xbλ0\leq x\leq b\lambda<br>bλ2/2b\lambda^2/2, x>bλx>b\lambda (λx/b)+(\lambda - x/b)_+

Asymptotically, aa\rightarrow\infty or bb\rightarrow\infty recovers LASSO. Nonconvex penalties sharpen zeroing of truly inactive coefficients, with the sparsity pattern in ww providing the support of relevant features.

3. Numerical Algorithm: Local Linear Approximation

SmRMR leverages the Local Linear Approximation (LLA) algorithm to address the nonconvexity of SCAD/MCP:

  1. Initialize w(0)w^{(0)} by solving the nonnegative LASSO (i.e., Pλ(x)=λxP_\lambda(x)=\lambda x).
  2. Iterate for s=1,,Ms=1,\dots,M:
    • (a) Compute vj=Pλ(wj(s1))v_j = \partial P_\lambda(w_j^{(s-1)}),
    • (b) Solve the convex weighted-LASSO problem minw0L0(w)+jvjwj\min_{w\geq 0} \mathcal{L}_0(w) + \sum_j v_j w_j (via coordinate descent or QP),
    • (c) If w(s)w(s1)2<ϵ\|w^{(s)}-w^{(s-1)}\|_2 < \epsilon, terminate.
  3. Return w(s)w^{(s)}.

Typically, M=2M=2 suffices for convergence. This iterative reweighting adapts the penalty based on the current coefficient estimates, promoting sharper recovery of the feature support.

4. Multi-Stage Knockoff+ Filter for FDR Control

SmRMR integrates a two-stage knockoff pipeline to ensure explicit control of the false discovery rate at a user-specified level α\alpha: Stage 1: Screening.

  • Randomly split the data (n0,n1)(n_0, n_1).
  • Apply penalized mRMR to n0n_0 samples; retain a working set S0S_0 of size s0s_0 with 2s0<n12s_0 < n_1.

Stage 2: Knockoff+ Selection.

  • Generate model-X knockoff variables X~j\widetilde X_j for jS0j \in S_0 (e.g., using the equi-correlated construction).
  • On n1n_1 samples, solve penalized mRMR over all XS0X_{S_0} and X~S0\widetilde X_{S_0}, yielding weights w^j\widehat w_j, w~j\widetilde w_j.
  • Compute Wj=w^jw~jW_j = \widehat w_j - \widetilde w_j.
  • Define the knockoff threshold at FDR level α\alpha:

T(α)=min{t>0:1+#{j:Wjt}max{1,#{j:Wjt}}α}T(\alpha) = \min\left\{ t>0: \frac{1 + \#\{j: W_j\leq -t\}}{\max\{1,\#\{j:W_j\geq t\}\}} \leq \alpha \right\}

  • Select S^(α)={jS0:WjT(α)}\widehat S(\alpha) = \{ j \in S_0 : W_j \geq T(\alpha)\}. Theoretical guarantees (Thm 3.3) ensure E[FDPall true featuresS0]αE[\mathrm{FDP}\mid\text{all true features}\in S_0] \leq \alpha.

5. Oracle Screening Properties and Theoretical Guarantees

Consider a sequence of problems with pnp_n features, true active set size sns_n, and oracle pseudo-truth θn0\theta_n^0 (with θn,j00\theta_{n,j}^0\neq 0 for jsnj\leq s_n, =0=0 otherwise). Key assumptions include uniqueness of θn0\theta_n^0, bounded eigenvalues of RXXR_{XX}, and regularity of the penalty:

  • an:=maxjSn2Pλ(θn,j0)=O(n1/2)a_n := \max_{j\in S_n}\partial_2 P_\lambda(\theta_{n,j}^0)=O(n^{-1/2})
  • bn:=maxjSn222Pλ(θn,j0)0b_n := \max_{j\in S_n} \partial_{22}^2 P_\lambda(\theta_{n,j}^0)\to 0
  • Minimum signal: minjSnθn,j0/λn\min_{j\in S_n} \theta_{n,j}^0/\lambda_n \to \infty

Under these, Theorem 3.1 shows existence of a local minimizer θ^n\hat\theta_n with

θ^nθn02=Op(pnsn2logpn(n1/2+an)).\|\hat\theta_n - \theta_n^0\|_2 = O_p\left( \sqrt{p_n s_n^2\log p_n} (n^{-1/2} + a_n) \right).

Theorem 3.2 (sparsistency) asserts that, for SCAD/MCP, if λn0\lambda_n\to 0 and n/(pnsn2logpn)λn\sqrt{n/(p_ns_n^2\log p_n)} \lambda_n \to \infty,

P(θ^n,j=0 for all j>sn)1,P\left( \hat\theta_{n,j}=0 \text{ for all } j>s_n \right) \to 1,

i.e., all inactive features are correctly excluded as nn\to\infty (Naylor et al., 26 Aug 2025). This provides formal oracle-screening guarantees in high-dimensional regimes.

6. FDR Threshold Choice and Output Guarantees

The FDR level α\alpha is a direct control lever for the model complexity. Practical choices are α[0.2,0.4]\alpha \in [0.2, 0.4]; lower values yield sparser selections and lower FDP but may reduce power. If no variables are selected for a conservative α\alpha, a variant “SmRMR2_2” procedure increments α\alpha until at least one selection is made, trading a marginal FDR increase for guaranteed nonempty sets. Cross-validation for λn\lambda_n is performed only during the knockoff stage, not at initial screening.

7. Empirical Comparisons and Implementation Considerations

SmRMR with SCAD/MCP matches the predictive accuracy of HSIC-LASSO while selecting $2$–4×4\times fewer features and maintaining FDR below α\alpha—a more conservative selection than HSIC-LASSO, which often has higher FDP. Classical discrete mRMR is impractical for large pp and less effective in redundancy control. On GWAS and high-dimensional biological datasets, SmRMR yields similar or better test performance than HSIC-LASSO with fewer selected SNPs/genes, simplifying interpretation.

Key implementation considerations:

  • Each mRMR fit incurs O(n2p)O(n^2p) complexity (kernel or VV-statistic construction); block HSIC or projection correlation can accelerate screening.
  • Knockoff requires n1>2s0n_1>2s_0, necessitating the two-stage split; data recycling may increase power.
  • Projection correlation (PC2^2) may be preferable over HSIC for heavy-tailed or non-Euclidean data.

Code is publicly available at https://github.com/PeterJackNaylor/SmRMR (Naylor et al., 26 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Weighted and Penalized mRMR.