Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimum Redundancy Maximum Relevance

Updated 26 May 2026
  • Minimum Redundancy Maximum Relevance (mRMR) is a method that selects features by maximizing mutual information with the target while minimizing redundancy among features.
  • It employs a greedy forward selection strategy and scalable implementations, including distributed frameworks, to efficiently handle high-dimensional datasets.
  • mRMR has proven effective in diverse applications such as genomics, biomedical analysis, and remote sensing, improving predictive accuracy with fewer, more informative features.

Minimum Redundancy Maximum Relevance (mRMR) is a foundational information-theoretic approach for feature selection that aims to identify compact subsets of features which maximize predictive relevance for a target variable while simultaneously minimizing redundancy among themselves. mRMR is widely used across machine learning domains, prominently in genomics, biomedical data analysis, remote sensing, functional data analysis, high-dimensional benchmarking, and interpretable model composition. Its operational core is jointly maximizing the mutual information between selected features and the response (maximum relevance), while penalizing or minimizing the aggregate mutual information among selected features (minimum redundancy).

1. Mathematical Formulation and Selection Objective

mRMR formalizes feature selection via the simultaneous optimization of two criteria over subsets SS of mm features from a candidate pool FF:

  • Relevance:

D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)

where I(fi;c)I(f_i; c) is the mutual information between feature fif_i and the class or target variable cc.

  • Redundancy:

R(S)=1∣S∣2∑fi,fj∈SI(fi;fj)R(S) = \frac{1}{|S|^2}\sum_{f_i,f_j\in S} I(f_i; f_j)

quantifying the average pairwise mutual information among features in SS.

mRMR then optimizes a scalar objective over candidate sets SS: mm0 alternatively, some variants employ the quotient mm1 to yield the "mutual information quotient" (MIQ) (Barker et al., 2024, Bowyer et al., 25 May 2026).

For computational tractability, mRMR is generally implemented via a sequential forward-selection strategy. At each step, the next feature mm2 to add to mm3 is chosen by: mm4 and mm5 is greedily added to mm6 (Barker et al., 2024, Mehrabi et al., 2023, Ebiele et al., 30 Mar 2026, Elmaizi et al., 2022).

2. Algorithmic Realizations, Scalability, and Extensions

Greedy Forward Selection

The canonical procedure iteratively builds mm7 via:

  1. Initialization: mm8.
  2. Add the unselected feature mm9 with the highest FF0.
  3. At each iteration, select FF1 maximizing FF2.
  4. Repeat until FF3.

This linear-in-features, quadratic-in-selected-size process is tractable for moderate FF4 but becomes bottlenecked in ultra-high-dimensional settings due to FF5 mutual information computations (Mehrabi et al., 2023, Reggiani et al., 2017, Vivek et al., 2022).

Distributed and Scalable Implementations

To address high dimensionality, distributed mRMR variants leverage MapReduce and Spark primitives—either via row-wise layouts (efficient for tall datasets) or column-wise/broadcast layouts (for wide/short applications) (Reggiani et al., 2017, Vivek et al., 2022). These implementations cache entropy and MI computations, enable partitioned aggregation, and reduce communication overhead, providing dramatic runtime reductions (up to 97% and 4–6× speedups compared to naïve versions).

Non-convex, Penalized, and Global Optima

Recent innovations include the continuous penalized mRMR (SmRMR), which solves a convex or nonconvex regularized minimization of an mRMR-inspired loss (incorporating, e.g., SCAD or MCP penalties for sparsity) (Naylor et al., 26 Aug 2025). Additionally, polyhedral relaxations yield provably optimal mixed-integer linear programming (MILP) formulations, enabling globally optimal mRMR feature sets for hundreds of features (He et al., 22 Aug 2025).

3. Mutual Information Estimation and Alternative Association Measures

Estimation of FF6 is data- and variable-type dependent:

  • Discrete variables: Use empirical plug-in/histogram estimates.
  • Continuous variables: Discretize into bins or use k-nearest-neighbor density estimators (KSG estimator, PCA-corrected KSG for continuous responses) (Bowyer et al., 25 May 2026, Schellhas et al., 2020).
  • Alternative measures: Distance correlation and related association statistics substitute for MI in some contexts (e.g., functional data analysis), offering tuning-free, smoothing-free, and nonlinear dependence capture (Berrendero et al., 2015, Schellhas et al., 2020). Such variants can achieve higher accuracy and select fewer features than MI-based mRMR.

4. Integration with Hybrid and Wrapper Methods

mRMR is often employed as a filter stage, followed by or in combination with wrapper-based (model-dependent) feature selection:

  • Hybrid frameworks: Combine mRMR with classifier-guided elimination (e.g., SVM-RFE). A convex combination of SVM weights and mRMR scores boosts predictive accuracy and yields more stable, interpretable subsets (Ding et al., 2024).
  • Metaheuristic wrappers: mRMR is paired with population-based optimizers (e.g., Binary Horse Herd Optimization) to restrict the wrapper search space, achieving efficient gene selection and improved accuracy (Mehrabi et al., 2023).
  • Multi-stage selection: Two-stage or staged filter methods (e.g., pre-pruning with maximum information gain, followed by mRMR, then a wrapper) significantly reduce computational cost and redundancy (Elmaizi et al., 2022).

5. Applications and Empirical Performance

mRMR has been applied across domains:

  • Biomedical and genomics: mRMR underpins robust gene selection with quantifiable improvements in SVM/RF accuracy and sharp reductions in dimensionality (Mehrabi et al., 2023, Elmaizi et al., 2022).
  • Emotion and signal recognition: In VR-based emotion recognition, mRMR reduces a 175-feature pupillometry representation to a critical 50-dimensional embedding, increasing classification accuracy from 85% to 98.8% (Barker et al., 2024).
  • Benchmarking LLMs: In LLM evaluation, mRMR-selected question subsets minimize RMSE and maximize rank correlation (Kendall's FF7, Spearman's FF8), outperforming AnchorPoints and IRT-based approaches and yielding much higher stability across random seeds (Bowyer et al., 25 May 2026).
  • Remote sensing: For hyperspectral imaging, mRMR, as part of a hybrid feature selection pipeline, achieves high accuracy with far fewer bands compared to information-gain alone or simple univariate filters (Elmaizi et al., 2022).

Empirical findings:

  • mRMR-based feature sets often reach or exceed the performance of full feature sets, with a much smaller subset (e.g., 7–11 features achieving better accuracy than 33 in power system transient stability assessment (Li et al., 2019); 14 selected from >11,000 radiomics descriptors maintaining cross-vendor AUC for SVM/RF models (Chaudhary et al., 2024)).
  • In large-scale benchmarking and biomedical applications, mRMR and its scalable variants enable feasible, reproducible large-FF9, large-D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)0 analyses with state-of-the-art classification and regression performance (Bowyer et al., 25 May 2026, Liu et al., 2022).
  • New univariate clustering-based variants, such as KGroups, approximate classical mRMR's performance while being two to three orders of magnitude faster, facilitating hyperparameter tuning and rapid prototyping (Ebiele et al., 30 Mar 2026).

6. Limitations, Modifications, and Recent Advances

Limitations

  • Computational cost: Classical mRMR is D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)1 and infeasible for very large D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)2 without distributed or approximate computation (Vivek et al., 2022).
  • Pairwise heuristics: True joint mutual information with the target is not optimized—greedy, pairwise reduction may miss higher-order, synergetic dependencies (Liu et al., 2022).
  • Estimation sensitivity: MI estimation for continuous data is subject to binning choice, kernel bandwidths, and sample size constraints (Ding et al., 2024).

Modifications

  • Tradeoff tuning: Weighted/formulated improvements (e.g., using a trade-off parameter D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)3 in D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)4 (Li et al., 2019)), enable fine-grained balancing of relevance and redundancy.
  • Augmentation with unique relevance: MRwMR-BUR integrates a "unique relevance" (UR) term---the conditional MI given all other features---to further emphasize non-redundant, indispensable predictors. This yields consistently smaller feature sets and 2–5% accuracy gains (Liu et al., 2022).
  • Penalized and FDR-controlled: SmRMR applies nonconvex penalties (SCAD, MCP) and interfaces with model-X knockoff filtering to achieve feature selection with statistical false discovery rate control, supporting both theoretical guarantees and empirical competitiveness with HSIC-LASSO (Naylor et al., 26 Aug 2025).

New Directions

  • MILP-based global optimization of the mRMR criterion enables provably optimal feature selection for moderately large sets (D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)5) (He et al., 22 Aug 2025).
  • Distance correlation and kernel-based association statistics can replace MI to yield tuning-free, smoothing-free, and unbiased estimators, especially effective for functional and highly correlated data (Berrendero et al., 2015, Schellhas et al., 2020).

7. Summary Table: Core mRMR Objective and Key Formulas

Component Mathematical Expression Key Property
Relevance D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)6 MI between feature and target
Redundancy D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)7 Average MI among features
Difference criterion D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)8 Classical "MID" scoring
Greedy update (new D(S)=1∣S∣∑fi∈SI(fi;c)D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)9) I(fi;c)I(f_i; c)0 One-step incremental update
Quotient criterion I(fi;c)I(f_i; c)1 Alternative to "MID"
Penalized SmRMR objective I(fi;c)I(f_i; c)2 Penalized estimation (continuous weights)
Unique relevance (UR, BUR) I(fi;c)I(f_i; c)3 Conditional MI given all other features (Liu et al., 2022)

References

Conclusion

mRMR provides a theoretically principled, empirically validated, and highly extensible framework for feature selection in high-dimensional learning, unifying mutual-information-based relevance with redundancy penalization. Its modern developments—distributed implementation, penalized relaxation, alternative association measures, and unique-relevance boosting—address computational, inferential, and practical limitations, ensuring ongoing relevance for large-data, multi-source, and interpretable modeling scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimum Redundancy Maximum Relevance (mRMR).