Minimal-Removal Counterfactual Sets

Updated 1 June 2026

Minimal-removal counterfactual sets are defined as the smallest group of features or interventions needed to flip a model's output, thereby operationalizing causal sufficiency.
Methodologies span submodular optimization, SAT/ILP enumeration, and gradient-based approaches, tailored to visual, textual, tabular, and network models.
Empirical evaluations demonstrate that these sets improve model interpretability, enhance recommendation explanations, and reveal underlying causal dependencies.

Minimal-removal counterfactual sets define the smallest (in cardinality) group of features, regions, elements, or interventions whose excision or alteration provokes a change in a model’s output. This combinatorial concept provides a rigorous operationalization of causal sufficiency: the set precisely isolates the minimal subset of an input whose modification is needed to flip a model's decision, recommendation, or prediction. As such, minimal-removal sets underpin a spectrum of research in explainable AI, recommender system explanations, robust visual attribution, statistical physics of counterfactuals, programmatic counterfactual fairness, and network resilience. Methodological frameworks include submodular optimization, SAT/ILP enumeration, influence-function analysis, energy landscape minimization, gradient-based attributions, and dynamic peeling in network science.

1. Formal Definitions and General Optimization Objectives

Across modalities, the minimal-removal counterfactual set problem entails, given a function $f$ and input $x$ , finding $S\subseteq \mathcal{I}$ (where $\mathcal{I}$ is a finite index or region set specific to the modality) such that altering/removing $S$ in $x$ provokes $f$ to differ in some target property (e.g., classification label flips), and $S$ is minimal in cardinality: $\min_{S \subseteq \mathcal{I}}\,|S| \quad \text{s.t.}\quad f(\mathsf{Remove}(x, S)) \neq f(x).$ Variants include hard constraints on minimality (no strict subset suffices), monotonicity, or probabilistic/thresholded outputs. Modal instantiations:

Image models: Regions, feature superpixels, or patches are ablated. $S$ is a subset of disjoint partition elements whose removal changes the top-1 predicted class (Chen et al., 15 Nov 2025).
Recommender systems: $x$ 0 is a set of past user actions/interactions whose deletion alters the predicted recommendation (Liu et al., 2022).
Tabular/feature vector settings: $x$ 1 denotes feature indices; minimal-removal sets are features set to "normal" values sufficient for classification change (Wang et al., 2023).
LLMs: $x$ 2 is a set of words or (in CIDR) word pairs in a sentence whose masking causes classifier output to flip (Chen et al., 2023).
Networks: $x$ 3 comprises network nodes whose removal causes functional collapse, e.g., the emptying of the $x$ 4-core (Schmidt et al., 2018).

All frameworks seek to avoid exhaustive enumeration via algorithmic, heuristic, or relaxation-based approaches, given the NP-completeness of exact minimization.

2. Model- and Modality-Specific Methodologies

Visual Classification and Attributions

In vision, the minimal-removal set is instantiated as the smallest set of spatial regions whose masking flips a classifier’s prediction. The Counterfactual LIMA approach (Chen et al., 15 Nov 2025) proceeds via:

Partitioning the input image $x$ 5 into $x$ 6 disjoint regions $x$ 7.
Introducing a binary mask $x$ 8 over $x$ 9 and seeking $S\subseteq \mathcal{I}$ 0 such that $S\subseteq \mathcal{I}$ 1 and $S\subseteq \mathcal{I}$ 2 is minimized.
The optimization is relaxed to submodular form using surrogate deletion/insertion scores, yielding a utility $S\subseteq \mathcal{I}$ 3 combining drive-to-flip and faithfulness to original class scores.
A greedy procedure incrementally selects the region maximizing the marginal gain in $S\subseteq \mathcal{I}$ 4 until the prediction flips or a budget $S\subseteq \mathcal{I}$ 5 is met.

This method enables attribution-guided counterfactual data augmentation, yielding empirical improvements in both in-distribution and out-of-distribution generalization (Chen et al., 15 Nov 2025).

Recommender System Explanations

Minimal-removal sets in recommendation frameworks (Liu et al., 2022) involve:

Estimating single-item influence via classical influence functions (for differentiable models) or retraining-based "data-based" influence (for non-gradient models).
Employing greedy or iterative-greedy search to select the minimal number of actions (historical user interactions) to remove until the recommendation output switches from $S\subseteq \mathcal{I}$ 6 to $S\subseteq \mathcal{I}$ 7.
Explanation quality is measured via "Explanation Success Percentage" and "Average Explanation Size." Notably, higher recommender accuracy (lower MSE) paradoxically correlates with decreased explainability in this metric, suggesting a limitation of current evaluation standards.

Counterfactual Explanations in Feature Spaces

Energy landscape and SAT-based strategies formalize minimal-removal as an explicit support minimization:

In energy-based frameworks (Evangelatos et al., 23 Mar 2025), the problem is rephrased in terms of minimizing an energy functional

$S\subseteq \mathcal{I}$ 8

where $S\subseteq \mathcal{I}$ 9 counts perturbed features. Simulated annealing with Boltzmann-weighted proposals is used to escape local minima and find global minimizers.

CEMSP (Wang et al., 2023) formulates the search as a Boolean SAT problem: binary variables encode "reset to normal" for each feature; the SAT solver enumerates all minimal masks $\mathcal{I}$ 0 yielding the decision flip. The procedure accommodates actionability, causality, and other constraints via CNF encoding and supports full enumeration for robust, flexible intervention selection.

Textual Models and Feature Interactions

CIDR (Chen et al., 2023) extends minimal-removal to account for feature interactions, notably in NLP, via:

Cooperative Integrated Gradients (CIG), a pairwise extension of Integrated Gradients, quantifying both single-feature and inter-feature effects on the output.
Transformation of the minimal-removal search into a knapsack problem over $\mathcal{I}$ 1 pairs, with CIG as "weight" and randomized "value" for solution diversification.
An iterative refinement via ensemble knapsack solves and statistical thresholding to yield high-confidence, truly minimal removal sets.
Scalability is ensured by pruning low-CIG pairs and approximate gradient integration.

3. Complexity and Computational Feasibility

The minimal-removal counterfactual set identification is generically NP-hard (subset cardinality minimization under non-monotone constraints). Modal-dependent strategies address tractability:

Greedy algorithms, though not guaranteed to be globally optimal (especially for non-submodular or highly interactive models), are effective in practice (vision (Chen et al., 15 Nov 2025), recommendation (Liu et al., 2022)).
SAT/ILP approaches (Wang et al., 2023) enable exact enumeration and robustness, albeit facing combinatorial explosion for high $\mathcal{I}$ 2; empirical evidence suggests modern solvers with aggressive pruning and monotonicity constraints render the problem feasible for moderate dimensions.
For pairwise interactions in language, knapsack relaxations and ensemble refinement (CIDR (Chen et al., 2023)) balance computational tractability and explanatory recall/precision.
In graphical models, scalable $\mathcal{I}$ 3 heuristics based on degree (CoreHD, Weak-Neighbor) and dynamic-ODE analysis provide both tight analytic bounds and practical algorithms for finding minimal node-removal sets in large networks (Schmidt et al., 2018).

4. Robustness, Diversity, and Flexibility

Robust counterfactual explanations require not only minimality but stability and flexibility:

Enumerative approaches (CEMSP (Wang et al., 2023)) yield multiple minimal-removal sets, empowering user selection under side-constraints (cost, actionability, domain knowledge).
Diversity is quantified via feature-participation and pairwise Hamming measures (Smyth et al., 2021), ensuring that counterfactual sets cover a range of plausible interventions and do not concentrate on trivial or redundant explanations.
In energy-based approaches, the entropy in the free-energy landscape ensures robustness by distributing probability mass over multiple near-optimal (minimal) configurations (Evangelatos et al., 23 Mar 2025).
The ability to generate endogenous (data-manifold) minimal-removal explanations prevents off-manifold artifacts and improves the plausibility and actionability of interventions (Smyth et al., 2021).

5. Empirical Evaluations and Domain-Specific Insights

The minimal-removal paradigm has been instantiated and validated across domains:

Visual models: In ImageNet-scale experiments, SS-CA leveraging minimal-removal sets for augmentation improves both in-distribution and out-of-distribution accuracy, as well as robustness to perturbations (Chen et al., 15 Nov 2025).
Recommender systems: Iterative greedy search combined with gradient-based influence yields explanation sets that are smaller and more likely to effect the targeted recommendation change, but performance degrades as model accuracy increases (Liu et al., 2022).
Tabular data: On synthetic and real-world datasets (with semantically-meaningful "normal" ranges), CEMSP delivers maximal flexibility/robustness, and supports actionability, causality, and feasibility constraints (Wang et al., 2023).
NLP: CIDR realizes higher feature minimality and comprehensiveness scores, especially in large pre-trained transformer models, demonstrating the critical role of accounting for feature interactions in generating faithful minimal sets (Chen et al., 2023).
Network science: CoreHD and Weak-Neighbor heuristics set state-of-the-art bounds for the minimal number of nodal removals needed for $\mathcal{I}$ 4-core collapse, with analytic ODE tracking and empirical evaluations on configuration-model and regular graphs demonstrating near-optimality (Schmidt et al., 2018).

6. Interpretational and Theoretical Perspectives

The minimal-removal principle informs both practical explainability and theoretical questions:

In quantum foundations, minimal counterfactual-restriction sets (cardinality $\mathcal{I}$ 5) suffice to block the derivation of Bell inequalities even when maintaining statistical independence and locality, clarifying the logical structure of contextuality and the necessity of counterfactual definiteness (Hance, 2019).
In modeling explainability, the limited size of minimal-removal sets exposes overreliance on shortcut features, incomplete causal learning, or context-driven interactions, motivating re-training or augmentation protocols targeting these deficiencies (Chen et al., 15 Nov 2025).
The apparent tradeoff between model accuracy and minimal-removal explainability in recommenders indicates a need for refined metrics of explanation quality, as higher confidence models may genuinely require larger or more complex minimal sets for prediction change (Liu et al., 2022).

7. Open Problems and Future Directions

Key challenges and directions include:

Developing efficient minimal-removal discovery algorithms for high-dimensional and non-differentiable models; black-box settings remain unresolved.
Quantifying explanation fidelity robustly, decoupled from model confidence or base accuracy, especially in settings with multiple competing optimal explanations.
Generalizing interaction-aware strategies (as in CIDR) to higher-order feature synergies with manageable computational cost.
Integrating minimal-removal frameworks into live model debugging, fairness auditing, or active retraining—especially as deep learning models are deployed in mission-critical scenarios.
Theoretical work on the minimal-removal principle raises foundational questions about the operational structure of explanations, causal sufficiency, and the boundaries of counterfactual reasoning across domains.

Minimal-removal counterfactual sets continue to play a central role in both the theory and practice of explainable and robust machine learning, as well as in the formal analysis of causal and statistical structures in diverse systems (Chen et al., 15 Nov 2025, Liu et al., 2022, Evangelatos et al., 23 Mar 2025, Wang et al., 2023, Chen et al., 2023, Smyth et al., 2021, Schmidt et al., 2018, Hance, 2019).