Dimensionality Reduction of Simulation Data

Updated 21 August 2025

Dimensionality reduction of simulation data is the process of obtaining a compact representation from high-dimensional outputs while retaining key physical invariants and reducing noise.
Classical tensor decompositions like CP and Tucker effectively compress simulation data using ALS and SVD-based methods to mitigate redundancy and computational cost.
Goal-oriented approaches integrate physics-based QoI penalties to ensure that critical observables, such as mass and kinetic energy, are faithfully preserved in the reduced models.

Dimensionality reduction of simulation data refers to the process of constructing a low-dimensional representation from originally high-dimensional data generated by (or for) numerical, physical, or empirical simulations. This operation is essential for reducing computational costs, mitigating redundancy and noise, and facilitating further data analysis, pattern discovery, or classification tasks. In the context of complex scientific and engineering simulations, it is also crucial that the reduced representations preserve quantities of interest, physical invariants, or statistical properties vital for downstream analysis.

1. Foundational Models: Classical Low-Rank Tensor Decompositions

Two primary tensor decomposition approaches for simulation data are the Canonical Polyadic (CP) and the Tucker decomposition:

CP Decomposition: A d-way tensor $\mathcal{X}$ is approximated as a sum of $R$ rank-one terms:

$\mathcal{X} \approx \mathcal{M} = \sum_{r=1}^R a_r^{(1)} \circ a_r^{(2)} \circ \ldots \circ a_r^{(d)},$

where each $a_r^{(k)}$ is a vector associated with mode $k$ , and “ $\circ$ ” denotes the outer product. The canonical decomposition seeks to solve:

$\min_{\mathcal{M} = \llbracket A^{(1)}, \ldots, A^{(d)} \rrbracket} \ \| \mathcal{X} - \mathcal{M} \|_F^2.$

Factor matrices are commonly computed via Alternating Least Squares (CP-ALS).

Tucker Decomposition: This generalizes SVD to higher-dimensions. The approximation is:

$\mathcal{X} \approx \mathcal{M} = \mathcal{G} \times_1 A^{(1)} \times_2 A^{(2)} \ldots \times_d A^{(d)},$

where $\mathcal{G}$ is a core tensor and $A^{(n)}$ are factor matrices. This is typically solved by:

$\min_{\mathcal{M} = \llbracket \mathcal{G}; A^{(1)}, \ldots, A^{(d)} \rrbracket} \ \| \mathcal{X} - \mathcal{M} \|_F^2.$

Common algorithms include Higher-Order SVD (HOSVD) and Higher-Order Orthogonal Iteration (HOOI).

These models allow substantial compression of simulation datasets, especially when underlying structure is low-rank.

2. Goal-Oriented Dimensionality Reduction: Quantities of Interest and Invariants

For scientific simulations, it is often insufficient to achieve low reconstruction error on entries of $\mathcal{X}$ . It may be crucial to ensure that key quantities of interest (QoIs)—such as mass, kinetic energy, internal energy, or other physical invariants—are accurately preserved in the reduced model. The goal-oriented approach augments the standard tensor decomposition objective with penalizations on these physics-driven or application-specific quantities:

$\min_{\mathcal{M}} \ \alpha_0 f(\mathcal{X}, \mathcal{M}) + \sum_{q=1}^Q \alpha_q \sum_{t \in \mathcal{T}_q} [g_q(\mathcal{X}_t) - g_q(\mathcal{M}_t)]^2,$

where:

$f(\mathcal{X}, \mathcal{M})$ is the base tensor reconstruction loss (generally Frobenius norm),
$g_q(\cdot)$ is a QoI functional evaluated at timestep or index $t$ (e.g., global mass, kinetic energy),
$\mathcal{T}_q$ is the index/time set for QoI $q$ ,
$\alpha_q$ ’s are relative weights.

For instance, in a combustion simulation with tensor $\mathcal{X} \in \mathbb{R}^{672 \times 672 \times 32 \times 626}$ , the total density at each spatial gridpoint and timestep is $D_\mathcal{X}(i_1, i_2, t) = \sum_{i_3=1}^{28} \mathcal{X}(i_1, i_2, i_3, t)$ , and the mass QoI is given by integrating $D_\mathcal{X}(\cdot)$ over spatial indices.

In plasma physics (e.g., MHD simulations), other invariants such as magnetic energy,

$g_{\text{ME}}(\mathcal{X}_t) = \int_{\Omega} \frac{\|\mathbf{B}\|^2}{2\mu_0} dx,$

are computed as functionals over fields stored in $\mathcal{X}_t$ , with numerical quadrature applied over the mesh.

By including these terms, the low-rank tensor model $\mathcal{M}$ is “steered” to satisfy physics-informed constraints, ensuring more trustworthy surrogate model data for scientific analysis.

3. Algorithmic Realization: Optimization and Derivative Structure

The augmented optimization problem is typically solved by initializing with a standard decomposition (CP-ALS or ST-HOSVD) followed by Newton-type methods or quasi-Newton optimization where derivatives of both the entrywise reconstruction loss and the QoI discrepancy are incorporated. For the CP model, the gradient with respect to the vectorized factor matrices incorporates both:

The derivative of the Frobenius reconstruction term—via standard Matricized Tensor Times Khatri-Rao Product (MTTKRP) operations,
The derivative of the QoI errors, which requires chain-rule differentiation through the tensor operations and subsequent evaluation of $g_q$ .

These derivatives are organized into block Jacobians and Hessian-vector products, enabling efficient use of trust-region or Gauss–Newton type solvers. The additional regularization from QoIs often ensures better conditioning of the optimization, particularly when some data entries are noisy or redundant.

4. Empirical Performance: Case Studies in Combustion and Plasma Physics

Extensive experiments in combustion (homogeneous charge compression ignition engines) and magnetohydrodynamics illustrate several key results:

QoI Preservation: Goal-oriented CP or Tucker decompositions (GO-CP, GO-Tucker) can reduce errors in mass/kinetic energy/integral invariants by 2–4 orders of magnitude compared to the standard decompositions, at the cost of a negligible increase in overall tensor reconstruction error (~0.04–0.09%).
Visualization and Scientific Interpretability: In 3D plasma simulations, goal-oriented tensor models better preserve structures such as magnetic islands and reconnection features in iso-surface plots, even at compression factors exceeding $10^3$ or $10^4$ .
Compression Ratios: Achievable compression rates (ratio of total entries to model parameters) exceed $10^4$ or $10^5$ with physics quantities faithfully represented.

The overall outcome is a low-dimensional representation that optimally trades off between entrywise data fidelity and correct reproduction of scientific observables.

5. Mathematical and Computational Considerations

The approach not only requires standard tensor algebraic computation but also careful balancing of penalty weights $\alpha_q$ to reflect the significance of each QoI. Implementation includes:

Efficient computation of gradients and Hessian-vector products for both the standard loss and each $g_q$ (the latter may leverage physical mesh structure or quadrature rules).
Initialization from standard decomposition algorithms to provide a feasible starting point.
Use of scalable optimization solvers as needed for very large tensor data.

Because many QoIs are global integrals or averages, their derivatives with respect to the tensor factors may be computationally cheaper than per-entry operations, further aiding scalability.

6. Significance and Impact for Simulation Data Analysis

Goal-oriented low-rank tensor decompositions enable science- and engineering-oriented dimensionality reduction by embedding domain-driven constraints into the surrogate models. This approach directly supports:

Fidelity of reduced models to scientific or engineering requirements,
Post-processing and analysis of enormous simulation datasets with confidence in the preservation of conserved quantities,
Further statistical or machine learning workflows on compressed representations without loss of critical invariants,
Compression and efficient storage in data-intensive fields such as combustion and plasma physics.

The methodology generalizes to any simulation regime where quantifiable invariants or observables are deemed more significant than generic reconstruction error, aligning dimensionality reduction with physical modeling objectives (Dunlavy et al., 15 Aug 2025).

7. Summary Table: Key Aspects

Aspect	Standard Tensor Decomposition	Goal-Oriented Tensor Decomposition
Optimization Objective	$\\|\mathcal{X} - \mathcal{M}\\|_F^2$	$\alpha_0 f(\mathcal{X},\mathcal{M}) + \sum_q \alpha_q \text{QoI Errors}$
Physics-Informed?	No	Yes (via explicit QoI penalties)
Compression/Accuracy Tradeoff	Emphasizes global fit	Balances global fit & physical accuracy
Complexity	ALS-type iterations, SVD-based	ALS or Newton-type with gradient through QoIs
Applications	General simulation data	Scenarios where preserving physical/invariant quantities is critical

In conclusion, goal-oriented low-rank tensor decompositions comprise an advanced paradigm for reducing simulation data, enabling preservation of quantities critical for scientific analysis and aligning data science models with principles of domain physics.

PDF Markdown Chat (Pro)

References (1)

Goal-Oriented Low-Rank Tensor Decompositions for Numerical Simulation Data (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Dimensionality Reduction of Simulation Data.