Relevance–Redundancy Filters

Updated 22 February 2026

Relevance–redundancy filters are methods that select features by maximizing their statistical association with targets while minimizing information overlap.
They utilize techniques such as mutual information, kernel methods, and convex relaxations to balance informativeness and redundancy in complex datasets.
Empirical results demonstrate improved test error, AUC, and efficient network pruning, with advanced variants offering rigorous false discovery rate control.

A relevance–redundancy filter is a class of algorithmic approaches designed to select or rank variables (features, context chunks, or representations) based on (i) their statistical association with a target or query (relevance), and (ii) the pairwise or subset-wise overlap among them (redundancy), aiming to maximize informativeness while minimizing duplicated information. These filters are foundational in high-dimensional feature selection, information retrieval, deep neural network pruning, and unsupervised summary evaluation, and they underpin a wide spectrum of scalable machine learning and information-theoretic systems.

1. Mathematical Foundations of Relevance–Redundancy Filtering

Central to the relevance–redundancy paradigm is constructing an objective functional that explicitly rewards the inclusion of highly informative elements (maximal relevance) and penalizes their statistical overlap (minimal redundancy). For feature selection, classical formulations utilize mutual information as a measure of both relevance and redundancy. Specifically, for a candidate feature $X_i$ and target $Y$ , the relevance is $I(X_i;Y)$ , while the redundancy with respect to another feature $X_j$ is $I(X_i;X_j)$ , with mutual information defined as

$I(X;Y) = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)p(y)}$

(Bouaguel et al., 2012).

The optimization framework can be abstractly summarized as:

$\min_{\mathbf{x} \ge 0, \ 1^\mathrm{T} \mathbf{x} = 1} \; (1-\alpha) \mathbf{x}^\mathrm{T} \mathbf{Q} \mathbf{x} - \alpha \mathbf{F}^\mathrm{T} \mathbf{x}$

where $\mathbf{F}$ is the vector of relevance terms and $\mathbf{Q}$ encodes pairwise redundancy, with $\alpha \in [0,1]$ controlling the tradeoff (Bouaguel et al., 2012).

Extending to continuous, nonlinear, or kernelized settings, the quadratic and kernelized versions employ normalized HSIC or Wasserstein distance to measure redundancy and relevance, enabling nonparametric, distribution-sensitive filtering (Yamada et al., 2014, Nie et al., 2023).

In the context of retrieval-augmented generation (RAG), set-level objectives are constructed as

$F(q, S) = \alpha \sum_{c \in S} \operatorname{sim}(q, c) - \beta \sum_{c_i \neq c_j \in S} \operatorname{sim}(c_i, c_j)$

where $\operatorname{sim}(\cdot, \cdot)$ is typically cosine similarity (Peng et al., 31 Dec 2025).

2. Major Algorithms and Instantiations

mRMR and Extensions

The minimum Redundancy Maximum Relevance (mRMR) framework is the canonical relevance–redundancy feature filter, employing mutual information for both terms and selecting features either greedily or via quadratic programming (Zhao et al., 2019, Bouaguel et al., 2012). Quadratic programming relaxations, such as QPFS (Bouaguel et al., 2012), generalize mRMR by jointly optimizing the full feature-weighting vector, outperforming greedy mRMR in empirical risk.

Sparse and knockoff-augmented continuous mRMR variants (e.g., SmRMR) extend the classic formulation by penalizing the feature coefficients with nonconvex regularizers (SCAD, MCP) and using model-X knockoffs to rigorously control the false discovery rate of selected features. Under mild regularity and signal-to-noise conditions, SmRMR gives consistent support recovery and strict FDR control at user-specified thresholds (Naylor et al., 26 Aug 2025).

Nonparametric and Kernel-Based Methods

Solutions such as MVMR-FS (Nie et al., 2023) replace mutual information with distributional and shape-aware divergences, specifically using supervised kernel density estimates to measure inter-class separability (relevance) and the Wasserstein (earth mover’s) distance to capture redundancy between marginal feature distributions. MVMR-FS further searches the combinatorial subset space using an adaptive genetic algorithm, yielding robust accuracy gains without discretization or manual subset-size selection.

Kernel-based convex relaxations (e.g., N $^3$ LARS) formulate redundancy and relevance via the normalized HSIC, optimizing a nonnegative LARS path with parallelization via Nyström and MapReduce schemes. This approach attains global optimality, statistically grounded sparsity, and scalability to millions of candidates (Yamada et al., 2014).

Relevance–Redundancy in Deep Learning and RAG

In DNN filter/channel pruning, relevance–redundancy filtering is realized structurally: C-SGD (Centripetal SGD) groups and collapses filters within CNNs to enforce identical parameterization (pure redundancy) within clusters, enabling lossless pruning by removing duplicates—thus reorganizing redundancy for efficient network compression and obviating the need for fine-tuning (Ding et al., 2021).

In open-domain retrieval-augmented generation, AdaGReS instantiates a set-level relevance–redundancy filter over context chunks, optimizing a combined utility of query relevance and intra-set similarity under a token budget via greedy selection. The system adaptively tunes the redundancy penalty using batch statistics and achieves robust context selection, outperforming top- $k$ pooling under high redundancy (Peng et al., 31 Dec 2025).

3. Optimization, Algorithms, and Complexity

Convex quadratic programming approaches guarantee global optima for fixed tradeoff parameters, with per-solve cost $O(m^3)$ for $m$ features due to matrix factorizations, but scale poorly for $m \gg 10^3$ (Bouaguel et al., 2012). Nonnegative LARS and blockwise-map-reduce frameworks substantially alleviate scalability limits, with per-iteration cost $O(dnb/P)$ for $d$ features, $n$ samples, $b$ Nyström points, and $P$ computational nodes (Yamada et al., 2014).

Genetic optimization heuristics, as in MVMR-FS, search subset space globally and adaptively mutate and recombine populations to approach the minimal MVMR score, given a fitness defined by density-overlap and redundancy distances (Nie et al., 2023).

For feature screening with exact FDR control, the multi-stage knockoff filter combines initial screening via continuous or kernelized mRMR penalties and a secondary model-X knockoff selection, ensuring that nonzero weights uniquely flag relevant features while controlling the expected proportion of false positives (Naylor et al., 26 Aug 2025).

4. Empirical Results and Benchmarking

Empirical studies consistently demonstrate that relevance–redundancy filters reduce test error, type I/II errors, and generalization gap relative to relevance-only or redundancy-ignoring baselines. In UCI credit datasets, QPFS reduces test error by 3–4 points versus mRMR (Bouaguel et al., 2012). In large and high-dimensional biology and microarray benchmarks, N $^3$ LARS attains lowest redundancy within selected sets and improved classification AUC over mRMR and HSIC-Lasso (Yamada et al., 2014). MVMR-FS achieves absolute accuracy improvements of 5–11% over ten baselines on diverse continuous-feature benchmarks (Nie et al., 2023).

Sparsity-regularized filters (SmRMR) select substantially fewer features than HSIC-Lasso (lower FDR), while matching or exceeding predictive performance on simulations and real data. FDR control remains guaranteed (Naylor et al., 26 Aug 2025).

In deep CNN pruning, C-SGD achieves identical or slightly improved Top-1 accuracies at large FLOPs/parameter reductions, and uniquely allows “one-shot” pruning of all layers without need for retraining (Ding et al., 2021).

RAG context selection with AdaGReS obtains IOU improvements up to 15 points, robust redundancy suppression across domains, and superior answer quality on challenging open-domain and biomedical QA tasks relative to standard retrieval (Peng et al., 31 Dec 2025).

5. Limitations, Extensions, and Comparative Analysis

Relevance–redundancy filters, particularly those formulated as QP or kernel regression, experience computational limitations in very high dimensions (scaling as $O(m^3)$ or $O(n^2p)$ ), motivating the adoption of Nyström approximations, blockwise density computations, or global search heuristics for tractability (Yamada et al., 2014, Nie et al., 2023).

Mutual information-based methods require discretization or sophisticated density estimation, which induces instability on continuous or small datasets. Wasserstein-based and HSIC-based criteria mitigate discretization sensitivity (Nie et al., 2023, Yamada et al., 2014).

While standard filter methods do not provide statistical error control, multi-stage knockoff augmentation with nonconvex regularization explicitly achieves theoretical FDR guarantees, a property unavailable in basic mRMR or HSIC-Lasso pipelines (Naylor et al., 26 Aug 2025).

A comparison of principal relevance–redundancy filters follows:

Method	Relevance	Redundancy	Optimization	Statistical Guarantees
mRMR/QPFS	Mutual Information	Mutual Information	Greedy/QP	None
SmRMR	HSIC/Proj.Corr.	HSIC/Proj.Corr.	Nonconvex QP, knockoff	Consistency, FDR control
N $^3$ LARS	NHSIC	NHSIC	Convex, LARS	Global optimum
MVMR-FS	Density overlap	Wasserstein distance	Genetic search	Empirically tuned
AdaGReS (RAG)	Embedding similarity	Chunk–chunk similarity	Greedy, adaptive	$\epsilon$ -approximate submodularity

6. Domain-Specific Applications

Relevance–redundancy filtering underpins diverse machine learning subfields:

Feature Selection: Automated reduction of input dimensionality for classification/regression, model interpretation, and monitoring, with platforms such as Uber’s marketing systems employing mRMR pipelines in production (Zhao et al., 2019).
Deep Network Compression: Parameter-space collapse and lossless pruning in CNN architectures, with C-SGD enabling efficient, redundancy-aware architecture reduction (Ding et al., 2021).
Retrieval-Augmented Generation: Token-budgeted, redundancy-penalized context chunk selection in open-domain QA, directly optimizing answer quality (Peng et al., 31 Dec 2025).
Automatic Summarization Evaluation: Metrics combining centrality-weighted relevance with intra-summary redundancy to assess output quality without human references (Chen et al., 2021).

Relevance–redundancy filters remain an active research area for scalable, interpretable, and information-efficient selection and summarization systems across modalities and domains.