Minimum Redundancy Maximum Relevance

Updated 26 May 2026

Minimum Redundancy Maximum Relevance (mRMR) is a method that selects features by maximizing mutual information with the target while minimizing redundancy among features.
It employs a greedy forward selection strategy and scalable implementations, including distributed frameworks, to efficiently handle high-dimensional datasets.
mRMR has proven effective in diverse applications such as genomics, biomedical analysis, and remote sensing, improving predictive accuracy with fewer, more informative features.

Minimum Redundancy Maximum Relevance (mRMR) is a foundational information-theoretic approach for feature selection that aims to identify compact subsets of features which maximize predictive relevance for a target variable while simultaneously minimizing redundancy among themselves. mRMR is widely used across machine learning domains, prominently in genomics, biomedical data analysis, remote sensing, functional data analysis, high-dimensional benchmarking, and interpretable model composition. Its operational core is jointly maximizing the mutual information between selected features and the response (maximum relevance), while penalizing or minimizing the aggregate mutual information among selected features (minimum redundancy).

1. Mathematical Formulation and Selection Objective

mRMR formalizes feature selection via the simultaneous optimization of two criteria over subsets $S$ of $m$ features from a candidate pool $F$ :

Relevance:

$D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)$

where $I(f_i; c)$ is the mutual information between feature $f_i$ and the class or target variable $c$ .

Redundancy:

$R(S) = \frac{1}{|S|^2}\sum_{f_i,f_j\in S} I(f_i; f_j)$

quantifying the average pairwise mutual information among features in $S$ .

mRMR then optimizes a scalar objective over candidate sets $S$ : $m$ 0 alternatively, some variants employ the quotient $m$ 1 to yield the "mutual information quotient" (MIQ) (Barker et al., 2024, Bowyer et al., 25 May 2026).

For computational tractability, mRMR is generally implemented via a sequential forward-selection strategy. At each step, the next feature $m$ 2 to add to $m$ 3 is chosen by: $m$ 4 and $m$ 5 is greedily added to $m$ 6 (Barker et al., 2024, Mehrabi et al., 2023, Ebiele et al., 30 Mar 2026, Elmaizi et al., 2022).

2. Algorithmic Realizations, Scalability, and Extensions

Greedy Forward Selection

The canonical procedure iteratively builds $m$ 7 via:

Initialization: $m$ 8.
Add the unselected feature $m$ 9 with the highest $F$ 0.
At each iteration, select $F$ 1 maximizing $F$ 2.
Repeat until $F$ 3.

This linear-in-features, quadratic-in-selected-size process is tractable for moderate $F$ 4 but becomes bottlenecked in ultra-high-dimensional settings due to $F$ 5 mutual information computations (Mehrabi et al., 2023, Reggiani et al., 2017, Vivek et al., 2022).

Distributed and Scalable Implementations

To address high dimensionality, distributed mRMR variants leverage MapReduce and Spark primitives—either via row-wise layouts (efficient for tall datasets) or column-wise/broadcast layouts (for wide/short applications) (Reggiani et al., 2017, Vivek et al., 2022). These implementations cache entropy and MI computations, enable partitioned aggregation, and reduce communication overhead, providing dramatic runtime reductions (up to 97% and 4–6× speedups compared to naïve versions).

Non-convex, Penalized, and Global Optima

Recent innovations include the continuous penalized mRMR (SmRMR), which solves a convex or nonconvex regularized minimization of an mRMR-inspired loss (incorporating, e.g., SCAD or MCP penalties for sparsity) (Naylor et al., 26 Aug 2025). Additionally, polyhedral relaxations yield provably optimal mixed-integer linear programming (MILP) formulations, enabling globally optimal mRMR feature sets for hundreds of features (He et al., 22 Aug 2025).

3. Mutual Information Estimation and Alternative Association Measures

Estimation of $F$ 6 is data- and variable-type dependent:

Discrete variables: Use empirical plug-in/histogram estimates.
Continuous variables: Discretize into bins or use k-nearest-neighbor density estimators (KSG estimator, PCA-corrected KSG for continuous responses) (Bowyer et al., 25 May 2026, Schellhas et al., 2020).
Alternative measures: Distance correlation and related association statistics substitute for MI in some contexts (e.g., functional data analysis), offering tuning-free, smoothing-free, and nonlinear dependence capture (Berrendero et al., 2015, Schellhas et al., 2020). Such variants can achieve higher accuracy and select fewer features than MI-based mRMR.

4. Integration with Hybrid and Wrapper Methods

mRMR is often employed as a filter stage, followed by or in combination with wrapper-based (model-dependent) feature selection:

Hybrid frameworks: Combine mRMR with classifier-guided elimination (e.g., SVM-RFE). A convex combination of SVM weights and mRMR scores boosts predictive accuracy and yields more stable, interpretable subsets (Ding et al., 2024).
Metaheuristic wrappers: mRMR is paired with population-based optimizers (e.g., Binary Horse Herd Optimization) to restrict the wrapper search space, achieving efficient gene selection and improved accuracy (Mehrabi et al., 2023).
Multi-stage selection: Two-stage or staged filter methods (e.g., pre-pruning with maximum information gain, followed by mRMR, then a wrapper) significantly reduce computational cost and redundancy (Elmaizi et al., 2022).

5. Applications and Empirical Performance

mRMR has been applied across domains:

Biomedical and genomics: mRMR underpins robust gene selection with quantifiable improvements in SVM/RF accuracy and sharp reductions in dimensionality (Mehrabi et al., 2023, Elmaizi et al., 2022).
Emotion and signal recognition: In VR-based emotion recognition, mRMR reduces a 175-feature pupillometry representation to a critical 50-dimensional embedding, increasing classification accuracy from 85% to 98.8% (Barker et al., 2024).
Benchmarking LLMs: In LLM evaluation, mRMR-selected question subsets minimize RMSE and maximize rank correlation (Kendall's $F$ 7, Spearman's $F$ 8), outperforming AnchorPoints and IRT-based approaches and yielding much higher stability across random seeds (Bowyer et al., 25 May 2026).
Remote sensing: For hyperspectral imaging, mRMR, as part of a hybrid feature selection pipeline, achieves high accuracy with far fewer bands compared to information-gain alone or simple univariate filters (Elmaizi et al., 2022).

Empirical findings:

mRMR-based feature sets often reach or exceed the performance of full feature sets, with a much smaller subset (e.g., 7–11 features achieving better accuracy than 33 in power system transient stability assessment (Li et al., 2019); 14 selected from >11,000 radiomics descriptors maintaining cross-vendor AUC for SVM/RF models (Chaudhary et al., 2024)).
In large-scale benchmarking and biomedical applications, mRMR and its scalable variants enable feasible, reproducible large- $F$ 9, large- $D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)$ 0 analyses with state-of-the-art classification and regression performance (Bowyer et al., 25 May 2026, Liu et al., 2022).
New univariate clustering-based variants, such as KGroups, approximate classical mRMR's performance while being two to three orders of magnitude faster, facilitating hyperparameter tuning and rapid prototyping (Ebiele et al., 30 Mar 2026).

6. Limitations, Modifications, and Recent Advances

Limitations

Computational cost: Classical mRMR is $D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)$ 1 and infeasible for very large $D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)$ 2 without distributed or approximate computation (Vivek et al., 2022).
Pairwise heuristics: True joint mutual information with the target is not optimized—greedy, pairwise reduction may miss higher-order, synergetic dependencies (Liu et al., 2022).
Estimation sensitivity: MI estimation for continuous data is subject to binning choice, kernel bandwidths, and sample size constraints (Ding et al., 2024).

Modifications

Tradeoff tuning: Weighted/formulated improvements (e.g., using a trade-off parameter $D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)$ 3 in $D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)$ 4 (Li et al., 2019)), enable fine-grained balancing of relevance and redundancy.
Augmentation with unique relevance: MRwMR-BUR integrates a "unique relevance" (UR) term---the conditional MI given all other features---to further emphasize non-redundant, indispensable predictors. This yields consistently smaller feature sets and 2–5% accuracy gains (Liu et al., 2022).
Penalized and FDR-controlled: SmRMR applies nonconvex penalties (SCAD, MCP) and interfaces with model-X knockoff filtering to achieve feature selection with statistical false discovery rate control, supporting both theoretical guarantees and empirical competitiveness with HSIC-LASSO (Naylor et al., 26 Aug 2025).

New Directions

MILP-based global optimization of the mRMR criterion enables provably optimal feature selection for moderately large sets ( $D(S) = \frac{1}{|S|}\sum_{f_i\in S} I(f_i; c)$ 5) (He et al., 22 Aug 2025).
Distance correlation and kernel-based association statistics can replace MI to yield tuning-free, smoothing-free, and unbiased estimators, especially effective for functional and highly correlated data (Berrendero et al., 2015, Schellhas et al., 2020).

7. Summary Table: Core mRMR Objective and Key Formulas

Component	Mathematical Expression	Key Property
Relevance	$D(S) = \frac{1}{\|S\|}\sum_{f_i\in S} I(f_i; c)$ 6	MI between feature and target
Redundancy	$D(S) = \frac{1}{\|S\|}\sum_{f_i\in S} I(f_i; c)$ 7	Average MI among features
Difference criterion	$D(S) = \frac{1}{\|S\|}\sum_{f_i\in S} I(f_i; c)$ 8	Classical "MID" scoring
Greedy update (new $D(S) = \frac{1}{\|S\|}\sum_{f_i\in S} I(f_i; c)$ 9)	$I(f_i; c)$ 0	One-step incremental update
Quotient criterion	$I(f_i; c)$ 1	Alternative to "MID"
Penalized SmRMR objective	$I(f_i; c)$ 2	Penalized estimation (continuous weights)
Unique relevance (UR, BUR)	$I(f_i; c)$ 3	Conditional MI given all other features (Liu et al., 2022)

References

(Barker et al., 2024) Thelxinoë: Recognizing Human Emotions Using Pupillometry and Machine Learning
(Mehrabi et al., 2023) Efficient High-Dimensional Gene Selection Based on Binary Horse Herd Optimization Algorithm
(Li et al., 2019) Feature Selection for Transient Stability Assessment Based on Improved Maximal Relevance and Minimal Redundancy Criterion
(Bowyer et al., 25 May 2026) Efficient Benchmarking Is Just Feature Selection and Multiple Regression
(Reggiani et al., 2017) Feature Selection in High-Dimensional Dataset Using MapReduce
(Vivek et al., 2022) Scalable mRMR Feature Selection to Handle High Dimensional Datasets
(Berrendero et al., 2015) The mRMR Variable Selection Method: A Comparative Study for Functional Data
(Ding et al., 2024) Corporate Financial Distress Prediction: Based on Multi-source Data and Feature Selection
(Liu et al., 2022) Improving Mutual Information Based Feature Selection by Boosting Unique Relevance
(Ebiele et al., 30 Mar 2026) KGroups: A Versatile Univariate Max-Relevance Min-Redundancy Feature Selection Algorithm for High-dimensional Biological Data
(He et al., 22 Aug 2025) Optimal Data Reduction Under Information-Theoretic Criteria
(Naylor et al., 26 Aug 2025) Sparse Minimum Redundancy Maximum Relevance for Feature Selection
(Elmaizi et al., 2022) Hybridization of Filter and Wrapper Approaches for the Dimensionality Reduction and Classification of Hyperspectral Images
(Schellhas et al., 2020) Distance Correlation Sure Independence Screening for Accelerated Feature Selection in Parkinson's Disease Vocal Data
(Yu et al., 2021) Feature Selection for Efficient Local-to-Global Bayesian Network Structure Learning
(Kumar et al., 2019) Predicting Indian Stock Market Using the Psycho-linguistic Features of Financial News
(Chaudhary et al., 2024) Cross-Vendor Reproducibility of Radiomics-Based Machine Learning Models for Computer-aided Diagnosis

Conclusion

mRMR provides a theoretically principled, empirically validated, and highly extensible framework for feature selection in high-dimensional learning, unifying mutual-information-based relevance with redundancy penalization. Its modern developments—distributed implementation, penalized relaxation, alternative association measures, and unique-relevance boosting—address computational, inferential, and practical limitations, ensuring ongoing relevance for large-data, multi-source, and interpretable modeling scenarios.