Multi-Modal Interpolative Decomposition

Updated 17 September 2025

Multi-Modal Interpolative Decomposition is a low-rank approximation technique that preserves mode consistency in multi-fidelity, stochastic systems.
The method employs cluster-aware basis selection and interpolation to ensure that surrogate models accurately replicate distinct physical regimes.
It enables efficient uncertainty quantification and design optimization by reducing the need for extensive high-fidelity samples in complex, multi-modal scenarios.

The multi-modal interpolative decomposition method is a class of low-rank approximation techniques that extends classical interpolative decomposition (ID) frameworks to address scenarios where the underlying data or system exhibits multi-modal behavior. This methodology is particularly relevant when outputs may arise from several distinct modes due to physical bifurcations or inherent stochasticity, such as in complex engineering systems subject to uncertainty. The multi-modal ID approach enables more reliable and physically meaningful bi-fidelity approximations, efficient feature selection in probabilistic settings, and advanced multi-modal data fusion in scientific computing, signal processing, and design optimization.

1. Background and Motivation

Classical interpolative decomposition is a “structure-preserving” factorization that approximates a matrix $A$ as $A \approx C V^*$ , where $C$ is formed by selecting a subset of columns (“skeleton columns”) and $V$ is an interpolation matrix, typically determined via column-pivoted QR factorization (Voronin et al., 2015). This construction enables the reconstruction of the matrix from representative features while preserving critical structures such as sparsity and interpretability.

Bi-fidelity modeling uses low-fidelity simulations to construct surrogates for more expensive high-fidelity simulations, traditionally leveraging the shared reduced-basis structure. The standard bi-fidelity ID assumes both fidelity levels are well-coupled by a global interpolation scheme: if $v_L(\xi)$ is the low-fidelity output at parameter $\xi$ , high-fidelity predictions at unobserved parameters are given as a linear combination of high-fidelity outputs at selected basis points with interpolation coefficients inherited from the low-fidelity model. However, when the quantity of interest (QoI) is multi-modal (e.g., due to bifurcations, multiple physical regimes, or hidden variables), this assumption fails. In such cases, naïve coefficient transfer across fidelities can yield physically meaningless or highly erroneous approximations (Cutforth et al., 10 Sep 2025).

2. Core Principles and Formulation

The multi-modal interpolative decomposition (ID) method adapts the standard ID framework to scenarios where the data exhibits mode-dependent variability. For a dataset with multi-modal outputs, the response at a fixed input parameter, $v(\xi)$ , is not deterministic but comes as a draw from a mixture distribution: $f(v ~|~ \xi) = \sum_{k=1}^K \pi_k(\xi) f_k(v; \xi)$ where $f_k$ is the conditional density for mode $k$ , and $\pi_k(\xi)$ are the mode probabilities, $\sum_k \pi_k(\xi) = 1$ (Cutforth et al., 10 Sep 2025).

To build a robust surrogate, the method:

Samples multiple low-fidelity outputs for each input parameter, capturing the full range of stochastic/mode-dependent behavior.
Estimates cluster/mode assignment for each sample, either through explicit physical criteria (e.g., ignition/non-ignition) or unsupervised clustering.
Constructs the ID such that the skeleton (basis) columns and the resulting interpolation coefficients maintain consistency in mode assignment between low-fidelity and high-fidelity data.
Selects basis samples for high-fidelity evaluation by ensuring mode-matching: each high-fidelity output used in the basis corresponds to a low-fidelity sample from the same mode, allowing correct transfer of interpolation coefficients.

The predicted high-fidelity output is then formed as: $\hat{v}_H(\xi, \omega) = \sum_{\ell=1}^{r} v_H(\xi_\ell, \omega_\ell^H) c_\ell(\xi, \omega)$ where $(\xi_\ell, \omega_\ell^H)$ are the inputs and (mode-consistent) random seeds of the high-fidelity basis, and $c_\ell(\xi, \omega)$ are the coefficients computed from low-fidelity samples with matched mode (Cutforth et al., 10 Sep 2025).

3. Methodological Advances over Standard ID

The primary advancement of the multi-modal ID approach lies in enforcing conditional (mode-aware) basis selection and interpolation. Rather than constructing a global, mode-agnostic reduced basis, the procedure explicitly acknowledges that interpolation coefficients are valid only within a given component (mode) of the mixture. This is operationalized by:

Generating multiple realizations per parameter in the low-fidelity model.
Matching the cluster label or mode between the high-fidelity and low-fidelity sample when assigning basis columns.
For non-basis columns, randomly selecting from low-fidelity samples to ensure stochastic representation.
Formulating the coefficient estimation as a regularized least-squares problem,

$\min_{C_L} \| \tilde{L} - \tilde{L}_{:,J} C_L \|_F^2 + \lambda \| C_L \|_F^2,$

where $\tilde{L}$ is the (possibly permuted and cluster-aligned) low-fidelity data matrix (Cutforth et al., 10 Sep 2025).

If cluster assignment is ambiguous or the underlying physics do not provide clear labels, unsupervised clustering or statistical mixture-model inference may be applied.

4. Computational and Statistical Benefits

By distinguishing and preserving mode occupancy, the multi-modal ID method ensures physically meaningful bi-fidelity approximations and robust parameter sensitivity analysis. For example, in the context of laser-ignited methane-oxygen rocket combustion under uncertainty:

The output (e.g., chamber pressure trace) exhibits a bifurcation: the system either ignites or fails to ignite, with the outcome depending stochastically on uncertain inputs.
Application of the multi-modal ID method yields bi-fidelity surrogates that correctly generate bimodal (or multi-modal) predictions, preserving the physical nature of the solution branches.
Only a small fraction of high-fidelity samples (e.g., 16% of the total dataset for rank-25 approximation out of 237 samples) are needed; the remaining predictions are furnished by the ID-based surrogate (Cutforth et al., 10 Sep 2025).

The statistical fidelity of the approximation is evaluated by comparing sensitivity metrics (such as RBD-FAST, PAWN, and Delta indices) between the full high-fidelity dataset and the bi-fidelity predictions, yielding correlation coefficients in the range 0.70–0.90.

Comparative Features Table

Aspect	Standard Bi-Fidelity ID	Multi-Modal ID
Basis/sample matching	Global, mode-agnostic	Cluster/mode-consistent
Multiple low-fidelity samples	Optional	Required
Mode/branch information	Ignored	Explicitly handled
Output fidelity in multi-modal QoI	Often poor (mode blending)	Physically consistent (no blending)
High-fidelity call cost	Low	Low (no increase)
Applicability	Unimodal, smooth QoIs	Bifurcated/multi-modal QoIs

In practice, the approach enables accurate uncertainty quantification and design optimization for complex, non-smooth responses at a fraction of the direct simulation cost.

5. Contexts and Applications

The multi-modal ID approach is particularly suited to:

Computational physics, chemistry, and engineering, where rare events or bifurcations can result in multi-modal output distributions.
Bi-fidelity uncertainty quantification for systems where stochastic responses or structurally distinct solution branches coexist.
Sensitivity analysis tasks, especially when traditional surrogates are confounded by non-smooth, branch-dependent outputs.

A canonical application is the rocket combustor case, where rapid, mode-appropriate surrogate construction is necessary for large-scale optimization with limited high-fidelity data (Cutforth et al., 10 Sep 2025). More broadly, any domain with mixture-model output behavior and severe computational constraints can benefit from this methodology.

6. Limitations and Future Directions

A major limitation is the reliance on correct cluster or mode identification, which may require domain knowledge or robust unsupervised learning methods. Implementation may be challenging when the number of modes is unknown or when cluster membership is ill-defined. Furthermore, the approach may require larger low-fidelity sample sets to adequately represent all modes, particularly if some modes are rare or highly variable.

Future research could focus on:

Automated, scalable clustering algorithms to facilitate mode assignment in high-dimensional or weakly separated cases.
Extensions to hierarchical or nested multi-modal scenarios, where responses may stratify recursively.
Theoretical analysis of stability and error bounds under imperfect mode matching.
Integration with active learning for optimal high-fidelity sample selection conditioned on mode diversity.

The multi-modal ID framework is part of a broader movement toward structure-preserving, interpretable low-rank models for heterogeneous and multi-fidelity data. Other related developments include block-randomized ID for hardware acceleration (Voronin et al., 2015), Bayesian and probabilistic ID variants for structured feature selection (Lu, 2022, Lu et al., 2022), and continuous trajectory-based decompositions for multi-source data fusion (Li et al., 7 Jun 2024). While these techniques address different problem settings, they share a focus on adaptively preserving the intrinsic data structure—be it sparsity, multimodality, or inter-modality transition—in computationally and statistically efficient ways.

The multi-modal interpolative decomposition method thus provides a crucial extension to the ID toolbox, enabling accurate and efficient surrogates in regimes where standard low-rank approximations are inadequate due to the presence of multiple solution branches and stochastic dynamics (Cutforth et al., 10 Sep 2025).