MMCM: Clustering-Based Multimodality Metric

Updated 22 November 2025

The paper introduces MMCM, a metric that quantifies multimodal coverage by clustering data into distinct modes based on density, mass, and stability.
It employs techniques like autoencoding, UMAP, HDBSCAN, and persistent homology to accurately capture and assess the diversity in model predictions.
The metric improves model evaluation by penalizing unrealistic predictions and ensuring robust, valid coverage across heterogeneous, multimodal datasets.

A Multimodality-aware Metric using Clustering-based Modes (MMCM) quantifies how well a representation, prediction, or learned model captures multiple distinct modes present in complex data, leveraging explicit clustering to define and evaluate these modes. MMCM metrics have become central in fields like generative modeling, community detection, human motion prediction, and optimal control, as most real-world phenomena contain heterogeneous, multimodal distributions. Through clustering, MMCM methods identify and use local structures, facilitating precise assessment of coverage, diversity, and validity across data modes.

1. Conceptual Foundations and Formal Definition

MMCM evaluates a sample set or prediction set in terms of how its elements are distributed among modes discovered via clustering in a relevant feature or latent space. The underlying premise is that explicit partitioning of the data or prediction space into clusters (i.e., modes) enables principled quantification of multimodal coverage—not just diversity in a metric space, but assignment to semantically or structurally meaningful sub-populations.

A general MMCM in the multi-relational data setting, as synthesized in (Ignatov et al., 2017), is defined for a prime–based $n$ -cluster $P = (Z_1, Z_2, \ldots, Z_n)$ as: $\mathrm{MMCM}(P) = \rho(P)^\alpha\; [\rho(P)\,\mathrm{mass}(P)]^\beta\; \sigma(P)^\gamma\; [1+\mathrm{ovp}(P)]^{-\delta},$ where:

$\rho(P) = \frac{|Y \cap (Z_1 \times \cdots \times Z_n)|}{\prod_{i=1}^n |Z_i|}$ is the mode (cluster) density,
$\mathrm{mass}(P) = |Y \cap (Z_1 \times \cdots \times Z_n)|$ ,
$g(P) = \rho(P)\,\mathrm{mass}(P)$ (the least-squares mass–density criterion),
$\sigma(P)$ denotes stability,
$\mathrm{ovp}(P)$ is the maximal normalized overlap with any other candidate,
$\alpha, \beta, \gamma, \delta$ are application-chosen nonnegative weights summing to 1.

By construction, MMCM is monotonically increasing in density, mass, and stability, and decreasing in overlap. This generality encompasses and formalizes several task-specific metrics.

2. MMCM in Human Motion Prediction

For probabilistic human motion forecasting, (Tokoro et al., 19 Nov 2025) introduces MMCM to assess both multimodal coverage and kinematic validity:

Modes are constructed by (a) autoencoding motion segments, (b) reducing dimensionality with UMAP, (c) clustering using HDBSCAN to obtain $M$ clusters (modes) in the latent space.
Coverage $C$ : The fraction of valid modes (those covered by multiple motion ground truths for a given past) that are hit by at least one model prediction:

$C = \frac{|M \cap \hat{M}|}{|M|},$

where $M$ is the set of modes observed in pseudo-ground-truth futures, $\hat{M}$ the set in model outputs.

Validity $V$ : The fraction of model predictions lying within any valid mode:

$V = \frac{|\{i: \hat{m}_i \in M\}|}{I},$

with $I$ the number of predictions.

Final MMCM score: Harmonic mean of $C$ and $V$

$\mathrm{MMCM} = \frac{2CV}{C+V}.$

This explicitly penalizes predictions that are diverse but unrealistic, as well as those which are valid but insufficiently diverse, addressing failure modes of prior metrics such as average pairwise diversity (APD) (Tokoro et al., 19 Nov 2025).

3. Topological and Metric Learning Variants

Optimal Control and Multi-Expert Selection: In (Merkt et al., 2020), MMCM is instantiated through topological analysis:

Trajectories are clustered using persistent homology (Vietoris–Rips filtration on segment-level Euclidean distances).
The number of persistent $H_1$ classes yields the number of significant modes.
Agglomerative clustering (single-linkage) assigns each trajectory to a mode.
Mixture-of-Experts models are then fit per mode, and the MMCM metric used for warm-start ranking combines: (1) proximity to the assigned expert (local Mahalanobis distance), (2) gating network probability, (3) mode-specific scaling, maintaining sensitivity to multimodality and solution discontinuities.

Metric Learning (MDaML style): For weakly supervised metric learning (Deng et al., 2021), the multimodality-aware perspective is handled by:

Explicitly clustering each semantic class into $K$ modes (fuzzy $k$ -means under a learnable Mahalanobis metric).
Assigning soft mode weights to each sample/triplet.
Optimizing a mode-weighted triplet loss, so that “similar” pairs are only pulled close within the same mode, and inter-mode similarity is not forced.
Leveraging optimization on the SPD manifold, thus decoupling the discovery of local structure from global class assignment.

A plausible implication is that such architectures are robust to real-world multimodal class structure and avoid over-constraining across distinct modes.

4. Formal Multimodal Clustering and Community Detection

In multi-mode data mining and community detection, (Ignatov et al., 2017) provides a generalized MMCM formulation:

n-mode clusters (e.g., biclusters, triclusters) formally defined using Galois operators and prime-based cluster construction.
Core metrics: density ( $\rho$ ), mass, least-squares score ( $g$ ), stability index ( $\sigma$ ), and overlap ( $\mathrm{ovp}$ ).
Composite MMCM: tunable weighted geometric mean of these, capturing size-density trade-off, robustness, and diversity.

This approach is explicitly multimodality-aware: modes are defined via clusterings over $n$ -way relations, and the MMCM can be used to select, score, and rank the discovered multimodal communities.

Metric	Key Component	Role in MMCM
$\rho(P)$	Cluster density	Ensures cluster is nontrivially filled
$\mathrm{mass}(P)$	Count of cluster incidence pairs	Trades off size vs. sparsity
$\sigma(P)$	Cluster stability	Filters accidental structures
$\mathrm{ovp}(P)$	Degree of cluster overlap	Penalizes redundancy, enforces diversity

MMCM-type metrics differ fundamentally from both moment-based (FID, IS) and dendrogram-based (DD) approaches:

Mixture Model Coverage Metrics explicitly fit generative mixture models, assigning samples to clusters/modes and evaluating per-component coverage and recall. This requires combinatorial matching of modes and ground truth components.
Dendrogram Distance (DD) (Carvalho et al., 2023) aligns hierarchical merge heights between real and generated data to detect missing or merged modes but does not require or produce explicit cluster assignments, operating instead on the ultrametric implied by single-linkage clustering.
MMCM leverages explicit clustering and unique cluster mode assignment, allowing for direct quantification of mode coverage, explicit penalization of out-of-distribution samples, and decomposition of coverage vs. mode-validity. As shown in (Tokoro et al., 19 Nov 2025), this leads to improved sensitivity to missing rare modes, and robustness against diffuse, implausible predictions often mis-rewarded by metrics focused solely on geometric spread.

A plausible implication is that MMCM can be flexibly adapted to different domains—motion, optimal control, community structure—by tailoring density, stability, and overlap criteria and cluster construction methodology.

6. Practical Guidelines and Usage

The practical deployment of MMCM comprises:

Clustering: Using autoencoding, topological filtration, or prime-based methods to discover modes.
Metric computation: For each test case or cluster, calculate density, mass, stability, and overlap.
Tuning: Select weights $(\alpha, \beta, \gamma, \delta)$ for the application-specific trade-off.
Thresholding: Filter out clusters below a minimum density or stability to ensure semantic significance.
Selection and ranking: Use the MMCM score to rank clusters, prediction sets, or candidate warm-starts.

Recommended weight settings are domain-dependent. For instance, the density-first regime sets $(\alpha, \beta, \gamma, \delta) = (0.5, 0.3, 0.1, 0.1)$ ; for very large but lower-density clusters, a volume-balanced configuration increases $\beta$ (Ignatov et al., 2017). For strict coverage of all possible formal concepts, one could enforce $\rho_{\min}=1$ and set the other weights to zero.

In human motion prediction, the MMCM score (harmonic mean of valid-mode coverage and sample validity) is averaged across all test samples for model comparison (Tokoro et al., 19 Nov 2025).

7. Empirical Performance and Limitations

MMCM variants have demonstrated improved accuracy and interpretability in diverse benchmarks:

Human motion prediction: MMCM yields consistent ranking, robustness to outlier or noisy trajectories, and penalizes unrealistic spread that previously inflated diversity scores (Tokoro et al., 19 Nov 2025).
Multimodal metric learning: Improves 3-NN classification accuracy by capturing within-class mode structure (Deng et al., 2021).
Optimal control warm-start: Dramatically increases solver success rates and reduces computation when compared to modality-agnostic alternatives (Merkt et al., 2020).
Community detection: OA-biclustering and its $n$ -mode generalizations, when scored by MMCM, efficiently extract non-redundant, significant communities in large networks (Ignatov et al., 2017).

However, MMCM’s efficacy depends critically on the appropriateness of the clustering step, parameter calibration, and (in some variations) availability of sufficiently rich mode-representative data. In highly overlapping or continuous-mode distributions, hard clustering may be less effective than soft variants or kernel methods. In some cases, mode discovery can be sensitive to embedding dimension and clustering hyperparameters, though relative method ranking tends to be robust (Tokoro et al., 19 Nov 2025).

A plausible implication is that future research may focus on adaptive or probabilistically grounded mode discovery to augment MMCM's generality and precision.