Instance Hardness Ensemble Filtering

Updated 6 May 2026

Instance Hardness Ensemble Filtering is a method that uses metrics like kDN to measure the difficulty of data points and filter out noisy examples.
It integrates probabilistic sample weighting and dynamic ensemble selection to maintain informative boundary instances while reducing the impact of ambiguous samples.
Empirical results indicate improved accuracy in noisy datasets, with best practices emphasizing tuning of hardness thresholds and careful combination of multiple metrics.

Instance Hardness Ensemble Filtering (IHEF) is a family of methods in supervised machine learning that systematically exploits the concept of instance hardness to guide data selection, model training, or prediction routing within ensemble frameworks. IHEF techniques leverage quantitative measures of example-wise difficulty—most commonly, the k-Disagreeing Neighbors (kDN) metric—to bias training or inferential processes against noisy or ambiguous samples, thereby improving robustness and generalization in the presence of data irregularities and class-boundary complexity.

1. Formalization of Instance Hardness

Instance hardness quantifies the propensity of a data point $(\mathbf{x}_i, y_i)$ to be misclassified or predicted with high error by a pool of models, often capturing overlap, noise, or ambiguity in local regions of feature space. The most widely adopted family of metrics is the k-Disagreeing Neighbors (kDN) measure, defined for classification as

$kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$

where $NN_k(\mathbf{x}_i)$ denotes the $k$ nearest neighbors of $\mathbf{x}_i$ in input space. Values of $kDN$ near $0$ imply consensus among neighbors (“easy” instances), whereas values close to $1$ suggest boundary points or potential label noise (“hard” instances) (Walmsley et al., 2018, Torquette et al., 2022).

Further instance hardness meta-features include Disjunct Class Percentage (DCP), Tree Depth (TD), Class Likelihood Difference (CLD), and geometric network statistics such as Ratio of Intra- vs. Extra-Class Distances (N2), Local-Set Cardinality (LSC), and others. In regression settings, analogous metrics assess error post-linear or local regression, distribution rarity, or output discontinuities (Torquette et al., 2022).

2. Instance Hardness in Ensemble Generation: Bagging-IH

The canonical instance hardness ensemble filter is Bagging-IH—an adaptation of bootstrap aggregation (Bagging) that probabilistically biases instance selection for base-model training in favor of lower-hardness points. For a training set $T$ of size $n$ , Bagging-IH assigns each sample $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 0 a selection score

$kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 1

and normalizes these to yield a sampling distribution $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 2. The uniform $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 3 floor guarantees that even $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 4 (maximal hardness) instances may still be sampled, although with reduced probability (Walmsley et al., 2018).

$k$ 9

At inference, the Bagging-IH ensemble aggregates base learner predictions via majority vote. By design, Bagging-IH attenuates the influence of likely noisy points (high $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 5) while retaining class-boundary instances with intermediate hardness due to the nonzero sampling floor.

3. Multi-Feature Hardness Filtering and Thresholding

Beyond kDN, diverse meta-feature–based instance hardness signals can be aggregated to guide explicit data filtering prior to training. Key pipeline steps are:

Compute per-instance hardness scores for a set of $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 6 hardness meta-features $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 7;
Normalize each feature to $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 8 scale;
Aggregate via mean or weighted sum (weights proportional to correlation with empirical instance-level error across a pool of learners);
Remove all points with aggregated hardness exceeding a threshold $kDN(\mathbf{x}_i) = \frac{1}{k} \sum_{j \in NN_k(\mathbf{x}_i)} \mathbf{1}(y_j \neq y_i),\quad k=5 \text{ typically}$ 9 or a quantile;
Train downstream model or ensemble on filtered data (Torquette et al., 2022).

A notional algorithm is: $\mathbf{x}_i$ 0 Best practices advise prioritizing continuously varying, high-correlation metrics such as CLD, N2, and LSC for classification, and LE, S2 for regression. Threshold choice can be tuned via validation or quantile selection (Torquette et al., 2022).

4. Instance Hardness in Dynamic Ensemble and Representation Selection

Recent frameworks exploit instance hardness for dynamic, per-example selection of input representation and classifier pool, as in DRES for fake news detection (Farhangian et al., 21 Sep 2025). Here, instance hardness (again kDN-based) is computed for each sample in multiple feature spaces (e.g., 14 textual embeddings), forming a hardness matrix $NN_k(\mathbf{x}_i)$ 0. At test time, for a query $NN_k(\mathbf{x}_i)$ 1 and each representation $NN_k(\mathbf{x}_i)$ 2, estimated hardness $NN_k(\mathbf{x}_i)$ 3 is the mean hardness of $NN_k(\mathbf{x}_i)$ 4 nearest training neighbors of $NN_k(\mathbf{x}_i)$ 5 in that space.

Dynamic representation selection: Pick $NN_k(\mathbf{x}_i)$ 6.
Dynamic ensemble selection: Within the chosen view, use dynamic ensemble selection (DES) algorithms—KNORA-E, DES-P, META-DES—to pick the most competent subset of classifiers based on neighborhood performance.

Empirical results demonstrate that jointly optimizing representation and classifier ensemble at the instance level via hardness estimation produces substantial accuracy gains compared to static or single-view designs. Notably, more than 50% of instances exhibit a cross-view hardness range $NN_k(\mathbf{x}_i)$ 7, motivating per-instance view selection (Farhangian et al., 21 Sep 2025).

5. Instance Hardness Filtering in Algorithm Selection for Combinatorial Optimization

Instance-hardness ensemble filtering extends beyond classic supervised learning to combinatorial algorithms. For instance, in combinatorial auctions, instance hardness is defined via the greedy optimality gap:

$NN_k(\mathbf{x}_i)$ 8

A binary hardness label $NN_k(\mathbf{x}_i)$ 9 is assigned given threshold $k$ 0 (calibrated by ROC analysis):

$k$ 1

A lightweight MLP is trained to predict this gap from a 20-dimensional structural feature vector reflecting known failure modes. The resulting “hardness classifier” achieves 94.7% test-set accuracy, and is used to route each instance: easy (greedy heuristic) vs. hard (expensive GNN-based specialist) (Kang, 16 Feb 2026). The hybrid pipeline matches greedy speed on easy cases and GNN performance on hard cases, reducing optimality gap from $k$ 2 (greedy) and $k$ 3 (GNN) to $k$ 4 (hybrid).

6. Empirical Outcomes and Practical Guidelines

Noise %	Perceptron OvA	Random Subspace	Bagging	Bagging-IH
0	69.94	68.39	78.60	78.02 (≈)
10	64.17	62.51	77.18	77.66 (+)
20	58.55	56.50	75.60	76.97 (+)
30	52.62	50.76	73.07	75.40 (+)
40	46.73	44.59	67.70	71.44 (+)

(“+” indicates statistical significance over Bagging.)

General recommendations:

Use $k$ 5 (kDN) and $k$ 6 (ensemble size) as robust defaults.
For kDN, proper feature scaling is essential; approximate nearest neighbor methods mitigate $k$ 7 cost for large datasets.
For regression, replace kDN with residual/error-based hardness metrics.
Avoid over-filtering by cross-validating the removal threshold.
For tasks with highly complex boundaries or high label imbalance, tune $k$ 8 and the sampling floor in ensemble generation to avoid under-sampling informative points (Walmsley et al., 2018, Torquette et al., 2022).

7. Limitations and Prospects

IHEF approaches rely on the quality and granularity of hardness estimates. Discrete measures (e.g. kDN, F1) may lack discrimination for “easy” regions, while tree-based metrics (TD, DCP) can be unstable in high dimensions. Current implementations often prioritize speed and tractability, sometimes at the cost of optimality (e.g., only one view selected in DRES; MLP-based thresholding in combinatorial problems).

Future directions highlighted include:

Combining multiple, complementary hardness measures for finer-grained filtering, particularly in high-noise or multi-view settings.
Learning to jointly aggregate softness and hardness signals across metric families and input domains.
Extending hardness-guided selection to contexts with imbalanced cost regimes, evolving data, or structured prediction tasks (Torquette et al., 2022, Farhangian et al., 21 Sep 2025, Kang, 16 Feb 2026).

Instance Hardness Ensemble Filtering thus unifies probabilistic sample weighting, data-centric filtering, and instance-dependent ensemble routing, demonstrating robust gains across diverse supervised learning and optimization tasks under label noise, boundary ambiguity, and heterogeneity.

Markdown Report Issue Upgrade to Chat

References (4)

An Ensemble Generation Method Based on Instance Hardness (2018)

Characterizing instance hardness in classification and regression problems (2022)

DRES: Fake news detection by dynamic representation and ensemble selection (2025)

Learning Structural Hardness for Combinatorial Auctions: Instance-Dependent Algorithm Selection via Graph Neural Networks (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Instance Hardness Ensemble Filtering.

Instance Hardness Ensemble Filtering

1. Formalization of Instance Hardness

2. Instance Hardness in Ensemble Generation: Bagging-IH

Bagging-IH Algorithm (cf. Algorithm 1, (Walmsley et al., 2018))

3. Multi-Feature Hardness Filtering and Thresholding

4. Instance Hardness in Dynamic Ensemble and Representation Selection

5. Instance Hardness Filtering in Algorithm Selection for Combinatorial Optimization

6. Empirical Outcomes and Practical Guidelines

Empirical summary (classification, Bagging-IH, (Walmsley et al., 2018)):

General recommendations:

7. Limitations and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Instance Hardness Ensemble Filtering

1. Formalization of Instance Hardness

2. Instance Hardness in Ensemble Generation: Bagging-IH

Bagging-IH Algorithm (cf. Algorithm 1, (Walmsley et al., 2018))

3. Multi-Feature Hardness Filtering and Thresholding

4. Instance Hardness in Dynamic Ensemble and Representation Selection

5. Instance Hardness Filtering in Algorithm Selection for Combinatorial Optimization

6. Empirical Outcomes and Practical Guidelines

Empirical summary (classification, Bagging-IH, (Walmsley et al., 2018)):

General recommendations:

7. Limitations and Prospects

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics