Clustering-Based Explainability Techniques

Updated 25 November 2025

Clustering-based explainability techniques are methods that translate unsupervised clustering results into human-understandable explanations using rules, prototypes, and decision trees.
They optimize explanations by maximizing cluster coverage and minimizing misclassification, balancing interpretability with accuracy in high-dimensional settings.
These methods support applications in finance, biomedical analysis, and regulatory compliance by providing transparent insights into complex cluster structures.

Clustering-based explainability techniques are a class of methods that provide interpretable, often human-readable, rationales for the outcomes of clustering algorithms, which are by nature unsupervised and typically lack intrinsic explanations for their partitions. These techniques aim to bridge the gap between the powerful but opaque results of standard clustering algorithms and the need for transparent, actionable insights in scientific, business, and regulatory contexts. They encompass a range of algorithmic frameworks including rule-based summarization, prototype selection, decision-tree induction, information-theoretic pattern mining, and the systematic integration of human-centric constraints.

1. Formal Problem Statement and Principles

Clustering-based explainability seeks to map unlabeled data $X \in \mathbb{R}^{n \times m}$ , partitioned into clusters $\{D_1, ..., D_k\}$ by any algorithm, to a structured set of cluster-specific explanations. The central aim is to associate each cluster not just with a set of instances, but also with a succinct, informative characterization in terms of the original features, human-understandable patterns, or representative instances.

Different approaches concretize the notion of an "explanation" in varied ways:

Pattern-based summaries: Conjunctions of literals or predicate-based rules (e.g., "gender=male ∧ income>50k") that cover most cluster members while minimizing overlap with other clusters (Ofek et al., 2024).
Prototype or exemplar selection: Small subsets of actual data points that serve as representative anchors for each cluster (Davidson et al., 2022).
Polyhedral descriptions: Intersections of low-complexity half-spaces, yielding geometric regions that define cluster membership (Lawless et al., 2022).
Decision trees: Threshold-based or axis-aligned trees whose root-to-leaf paths assign points to clusters with interpretable splits (Laber et al., 2021, Argov et al., 2 Nov 2025, Gamlath et al., 2021).
Information-theoretic criteria: Bicluster patterns that maximize the information content of explanations subject to description complexity (Vankwikelberge et al., 2021).

Formally, explanations $E_i$ for cluster $D_i$ are optimized with competing objectives: maximizing coverage of $D_i$ , minimizing inclusion of points from other clusters, and ensuring explanation simplicity (minimal number of features or rules).

2. Representative Algorithmic Frameworks

Information-theoretic Attributive Explanations

ExClus (Vankwikelberge et al., 2021) computes cluster explanations by maximizing the sum of information-theoretic deviations (relative to empirical feature-wise priors in $X$ ) over all clusters, penalized by the total number of statistics used (with tunable complexity hyperparameters $\alpha, \beta$ ): $S(\{P_i\}) = \frac{\sum_{i=1}^k I(P_i)}{C(\{P_i\})}$ where $I(P_i)$ is the sum of KL divergences between cluster and global feature distributions, and $C$ is a superlinear function of explanation complexity.

A restricted greedy search over hierarchical-tree-consistent partitions and feature subsets is used, producing cluster-specific bicluster patterns $P=(D,A,S)$ (members, attributes, feature stats).

Rule-based and Frequent-Pattern Mining

Cluster-Explorer (Ofek et al., 2024) formalizes explanations as concise conjunctions of predicates selected via generalized frequent-itemset mining (gFIM). For each cluster $c$ , one seeks sets $S$ of predicates with high cluster coverage and low separation error: $\mathrm{Coverage}_c(S) \geq \theta_{cov}, \quad \mathrm{Separation}_c(S) \leq \theta_{sep}, \quad \mathrm{Conciseness}(S) \geq \theta_{con}$ Attribute selection and taxonomy-augmented transactions yield concise, high-coverage, low-error explanations, scalable to high-dimensional and large datasets.

Decision-Tree-Based Models and Price of Explainability

The price of explainability quantifies the necessary increase in the objective (e.g., k-means/medians cost) incurred by imposing explainable structure, such as axis-aligned decision-tree clusterings. Results include: $\rho(k\text{-means}) = O(kd\log k), \; \rho(k\text{-medians}) = O(d\log k)$ with lower bounds of $\Omega(k)$ and $\Omega(\log k)$ , respectively (Laber et al., 2021, Gamlath et al., 2021). Greedy recursive tree construction algorithms build explainable clusterings that nearly attain these bounds, typically operating in $O(ndkH + nd\log n)$ time.

SpEx (Argov et al., 2 Nov 2025) generalizes this using spectral graph partitioning: axis-aligned cuts are chosen to minimize a normalized-cut cost in a graph encoding the cluster structure, allowing adaptation to non-centroid and non-Euclidean references. The existence of low-cut axis-aligned splits is guaranteed under geometric Cheeger-type inequalities.

3. Human-Interpretable Explanations: Patterns, Exemplars, and Polyhedra

Multiple methods focus on optimizing explanation forms for human interpretability:

Pattern Mining: Frequent closed-itemset discovery and coverage/discrimination constraints enable the identification of Boolean patterns that are specific and concise per cluster (Ofek et al., 2024, Guilbert et al., 2024).
Exemplar Selection: Exemplar-based explainability provides sets $A_j\subset C_j$ covering clusters within a threshold $\epsilon$ , casting explanation selection as a metric set cover or budgeted maximum coverage problem (Davidson et al., 2022).
Polyhedral Region Construction: Minimizing the number or total complexity of half-spaces, cluster regions are described as low-complexity polyhedra in $\mathbb{R}^m$ . Complexity and sparsity are optimized subject to a misclassification budget $\alpha$ , with many NP-hard constraints addressed via column generation and group-based relaxation (Lawless et al., 2022).
Tag and Symbol-Based Descriptions: In multi-modal or complex data, clusters are explained by selected symbolic tags found via integer programming under coverage and orthogonality constraints (Zhang et al., 2021, Liu et al., 2022).

4. Integration with Modern ML Pipelines and Applications

Clustering-based explainability is fundamental in biomedical analysis, finance, manufacturing, and digital pathology. In these contexts, methods must scale and handle:

High-dimensional/tabular and complex data modalities.
Black-box or deep representation spaces (e.g., t-SNE/UMAP projections, CNN activations).
Real-world trade-offs between explanation complexity and discrimination.

For instance, ExClus discovers human-interpretable differences in multidimensional cytometry data and socio-economic indicators (Vankwikelberge et al., 2021). Cluster-Explorer enables attribute-based explanations across 19 UCI datasets and complex clustering pipelines, outperforming XAI baselines in accuracy and conciseness (Ofek et al., 2024). Exemplar-based methods offer competitive summarization in text, vision, and tabular tasks (Davidson et al., 2022).

Pattern- and tag-based frameworks are essential for regulatory compliance (e.g., articulating financial segmentations to auditors (Horel et al., 2019)) and scientific transparency (gene signature discovery (Sousa et al., 25 Jul 2025)).

5. Theoretical Guarantees, Complexity, and Evaluation

Key guarantees and findings include:

Tight bounds on the "price of explainability" for tree-based explainable clusterings in high-dimensional settings (Laber et al., 2021, Gamlath et al., 2021).
Approximation factors for exemplar selection (e.g., $O(\log n)$ for set cover) and guarantees on cluster coverage (Davidson et al., 2022).
Empirical demonstration that frequent-pattern and constraint-programming methods scale to millions of instances and hundreds of features while remaining interpretable to domain experts (Ofek et al., 2024, Guilbert et al., 2024).

Evaluation metrics typically include coverage, separation error, conciseness, description complexity, interpretability trade-offs, and user studies validating the interpretability of explanations (Ofek et al., 2024).

6. Limitations and Open Problems

While significant progress has been made, the following challenges remain:

Hyperparameter selection and trade-off calibration (e.g., how to balance $\alpha, \beta$ or coverage vs. separation).
Scalability as the number of clusters, features, or candidates grows (especially for combinatorial/CP-based methods).
Expressiveness vs. interpretability: Enforcing disjointness (as in tag/descriptor models) or excessive pattern minimality may result in under-coverage or over-sparsity.
Generalization beyond fixed clustering references: Decision-tree or polyhedral methods typically approximate a given partition, which may not be optimal in terms of joint clustering–explainability objectives.
Handling feature correlations and continuous/categorical heterogeneity in more expressive explanation forms.

Future directions include further integration of information-theoretic priors, active user-guidance for explanation selection, adaptive graph-construction in spectral approaches, and rigorous user-centric interpretability metrics (Vankwikelberge et al., 2021, Argov et al., 2 Nov 2025, Ofek et al., 2024).