Data-Driven Heat Transport Modeling

Updated 12 November 2025

The paper presents a data-driven framework leveraging physics-informed neural networks and multiscale modeling to accelerate heat transport simulations.
It employs hybrid loss functions and SDF-based CNN surrogates to achieve sub-1.5% errors and up to 49% faster convergence compared to classical solvers.
The approach extends to inverse design, nonlocal constitutive modeling, and statistical field estimation, addressing challenges in diffusive, convective, and radiative regimes.

A data-driven approach for heat transport problems synthesizes machine learning and statistical inference techniques with physical principles, either to replace, accelerate, or augment conventional modeling of thermal processes. These methods have transformed both the direct solution of PDEs arising in conduction, convection, and radiation, as well as the inverse design, closure modeling, and optimization of heat transport systems. The primary strategies include weakly supervised and physics-informed neural surrogates for field prediction, data-driven constitutive and closure models for nonlocal or nonequilibrium transport, multiscale database and homogenization frameworks for metamaterial design, and spatiotemporal data-driven inference for field estimation in systems such as district heating and the global ocean. This article provides a rigorous treatment of the principal developments, illustrated by quantitative results, algorithmic formulations, architectures, and open challenges.

1. Physics-Informed and Weakly Supervised Neural Surrogates

Data-driven surrogates for heat conduction and related transport problems commonly employ convolutional neural network (CNN) architectures to learn mappings from boundary conditions and geometry to physical fields. These models exploit the spatial locality and translation-invariance of operators such as the Laplacian.

A canonical paradigm is the weakly supervised approach: rather than requiring labeled ground-truth field data, the PDE itself is encoded as a physics-informed loss via convolutional stencils, forcing outputs to satisfy the discrete operator residual. For the steady-state two-dimensional conduction equation $\nabla^2 T = 0$ , the discrete residual can be written as

$T_{i+1,j} + T_{i-1,j} + T_{i,j+1} + T_{i,j-1} - 4 T_{i,j} = 0$

and the associated loss for a predicted field $T_{\mathrm{pred}}$ is given by the squared $L^2$ -norm over the residual, implemented by a fixed $3 \times 3$ convolution kernel. Fully convolutional encoder–decoder networks with skip connections (U-Net) are trained on randomly sampled Dirichlet boundary conditions, never seeing the true solution during training (Sharma et al., 2018). Progressive multiscale loss schedules (coarse-to-fine) are essential for convergence at large resolutions and complex geometries, with test errors $<1.5\%$ on $256^2$ and $1024^2$ grids. The method is agnostic to labeled data and can be immediately generalized to other elliptic or parabolic PDEs, including non-linear or vector-field systems, by modifying the kernel(s).

A closely related line employs signed distance functions (SDFs) to encode geometry, due to their superior information content (distance-to-boundary) compared to binary masks (Peng et al., 2020). CNN surrogates trained on SDF-encoded domains achieve relative errors of $2$– $5\%$ and $10^3$ – $10^4\times$ speedups over classical solvers. SDF-based surrogates also generalize to unseen, highly non-convex, or disconnected shapes and can be embedded within real-time design and optimization workflows.

Hybrid approaches combine the data-driven loss with the physics-driven (PDE residual) term in a weighted sum. The combination yields both accelerated convergence (up to $49\%$ reduction in steps for physics-driven initializations) and physically consistent solutions (Ma et al., 2020). The PDE loss can be implemented via finite-difference or convolution operations, compatible with auto-differentiation and training on modern frameworks (e.g., PyTorch).

Method	Labeled Data Needed	Error (typ.)	Scalability
Pure data-driven CNN	Yes	$\sim$ 1–3%	$128^2$ – $1024^2$
Physics-informed CNN	No	$\sim$ 1–1.5%	$1024^2$
Combined loss	Optionally	$< 0.5$ %–$1.5$%	$128^2$ – $1024^2$

A salient feature is the “warm-start” effect: CNN output used as initialization for classical solvers reduces their iteration count by an order of magnitude.

2. Data-Driven Constitutive and Closure Modeling

For hydrodynamic regimes that cross from diffusive to rarefied or ballistic transport (e.g., plasmas, rarefied gases, nanoscale materials), the classical Fourier law $q = -\kappa \nabla T$ fails to accurately capture fluxes, necessitating extended or nonlocal closures.

A general data-driven closure models the heat flux as an expansion in spatial derivatives,

$q(x,t) = -\sum_{n=1}^\infty b_n \partial_x^n T(x,t)$

but this leads to ill-posedness (arbitrary truncation, noise amplification). Instead, by taking the spatial Fourier transform and considering entropy and homogeneity constraints, the expansion re-sums into a mode-dependent thermal conductivity $\kappa(k)$ , so that

$\tilde{q}(k) = - i k \, \kappa(k) \tilde{T}(k)$

where $k$ is the spatial frequency (Zheng et al., 2021). Machine learning (e.g., neural networks) is then used to learn the nonlocal scaling law $\kappa(k)$ directly from bulk fluctuation spectra (e.g., from Direct Simulation Monte Carlo), ensuring stability and noise-robustness and reducing the number of free parameters. This framework allows order-of-magnitude improvement in spectral prediction versus Navier–Stokes or moment closures over regimes $0.01 \lesssim \mathrm{Kn} \lesssim 0.2$ .

Mechanism–data fusion methods extend further by parameterizing higher-order closure variables (such as heat-flux-of-flux, dual-dissipative variables) via neural networks, but embedding these within a Conservation–Dissipation Formalism (CDF) so as to automatically enforce energy conservation and non-negative entropy production (Chen et al., 2022). Neural PDE coefficients are trained against Boltzmann-transport-equation data, remaining explicit and interpretable, and remain robust across diffusive, hydrodynamic, and ballistic regimes including discontinuous solutions.

In weakly-collisional plasma transport, it is effective to train neural networks not on field values, but on the gradient of flux with respect to physical state variables, with noise-robust label inference via volumetric Tikhonov regularization (Miniati et al., 2021). This ensures the MLP closure retains at least second-order accuracy in a PDE solver and allows subsequent symbolic regression to recover closed-form expressions valid even under substantial data noise.

3. Multiscale and Database-Driven Design for Heat Transport

Inverse design and topology optimization of materials and devices for prescribed heat manipulation functionality (cloaking, concentration, inversion) pose formidable combinatorial challenges. A two-scale data-driven optimization framework dramatically reduces the search space by parameterizing each macro finite element by a small number of effective conductivity tensor entries (e.g., $K^*_{11}, K^*_{22}$ for 2D orthotropy), obtained by homogenizing a large precomputed database of microstructures (RVEs) (Da et al., 2023).

Design proceeds by:

Building a microstructure database: $\sim 8,\!000$ unique topologies with associated homogenized $K^*$ on a fine grid.
Macro optimization: selecting the field of $K^*_{ij}(x)$ that minimizes multi-function objectives via adjoint-based topology optimization, with sensitivities computed analytically.
Database lookup: each macrocell's optimal $K^*_{ij}$ is mapped to a realizable microstructure in the database by nearest-neighbor search.

This method attains functional objectives (e.g., cloaking error to $O(10^{-4})$ , concentrator index $\eta \to 0.97$ ) and mean-squared $K^*$ error $<10^{-4}$ in mapping back to physical structures, with direct extension to 3D, dynamic, or nonlinear regimes. It generalizes to other transport phenomena by replacing the governing PDE and tensor variables.

4. Data-Driven Nonlocal and Spatiotemporal Heat Transport

Nonlocal and nonstationary heat transport in plasmas or radiative transfer scenarios is governed by integral operators with memory and spatial extent, often parameterized by formal convolution kernels (e.g., Luciani–Mora–Virmont, SNB).

Recent work leverages nonlocal theory–informed neural networks (e.g., LINN) to learn physically consistent, time-dependent heat flux kernels $\mathcal{W}_{\mathrm{NN}}(\widetilde{\lambda}, \widetilde{X}, \widetilde{t})$ from particle-in-cell kinetic simulation data (Luo et al., 19 Jun 2025). The neural kernel parametrizes the spatial convolution

$Q_e(x,t) = \sum_{x'} \mathcal{W}_{\mathrm{NN}}\bigl(\widetilde{\lambda}, \widetilde{X}, \widetilde{t}\bigr) Q_{\mathrm{SH}}(x', t) \Delta x$

Conditioning on normalized local mean-free-path, geometry, and scaled time, the kernel is forced positive and mirror-symmetric by architectural priors. Quantitative RMSEs and $L^2$ errors are $0.04$ (relative to free-streaming flux) and 6% in strongly nonlocal regimes, outperforming analytic models by a factor of 3–5.

Time-embedded convolutional neural networks (TCNNs) extend this approach by predicting both the dynamic nonlocality parameter and normalized heat flux as coupled fields, using deep 1D convolutions and learned time embeddings. The TCNN framework achieves sub-5% relative error for both outputs across $0.01 \lesssim \lambda_{\rm free}/L_T \lesssim 0.1$ , with strong generalization and consistency even in regimes where prior surrogates fail (Luo et al., 7 Sep 2025). These neural operator surrogates serve as drop-in replacements for analytical closures in multi-physics and radiation-hydrodynamics codes.

5. Data-Driven Statistical and Inverse Field Estimation

In field-estimation applications, such as quantifying spatiotemporally varying heat transport in climate or large-scale engineering networks, statistical data-driven approaches dominate.

A notable example is the inference of global ocean heat transport from in-situ Argo floats via a latent Gaussian process regression (LGPR) framework with local quadratic/seasonal mean fields, Matérn-3/2 covariance kernels, and a two-stage EM-based fitting (Park et al., 2021). The mean field is debiased by residual correction against long-term averages, ensuring unbiased predictions under non-Gaussian features or sharp thermocline gradients. Statistical consistency and cross-validation versus satellite benchmark products are quantitatively established (10–20% RMSE reduction after debiasing; interannual ENSO signals faithfully captured). The method is robust to missing data and can be extended via data fusion (e.g., glider measurements).

In district heating, heat-use pattern discovery is automated via unsupervised clustering using the shape-based k-shape algorithm on seasonally-averaged, z-normalized hourly profiles (Calikus et al., 2019). The approach identifies functional customer clusters and control strategies, flags anomalies (meter faults, unusual loads), and quantifies control–category mismatches over >1,000 substations, providing actionable insight for network optimization.

6. Optimization and Inverse Design with Data-Driven Surrogates

Optimization over geometric or control variables remains a key use case. Physics-driven neural surrogates (e.g., a U-Net trained solely via the PDE residual with no labeled field data) can be embedded as fast forward models within global optimization loops, such as particle swarm optimization, for plate layout or topology design problems (Ma et al., 2022). In this context, function evaluations are 100x faster than FEM solves at similar accuracy, enabling routine solution of inverse problems (e.g., minimizing plate-averaged temperature under variable boundary conditions and hole placements). This supports rapid design iteration, even in high-dimensional geometric spaces, and can be generalized to nonlinear, time-dependent, or coupled multiphysics regimes.

7. Challenges, Limitations, and Future Directions

Data-driven approaches for heat transport face several persistent and emerging challenges:

Generalization across regimes, e.g., rarefied to hydrodynamic, or local to nonlocal, requires careful choice of model class and data priors. Scaling-law modeling (e.g., of $\kappa(k)$ ) and mechanism–data fusion (CDF + neural nets) offer promising frameworks (Zheng et al., 2021, Chen et al., 2022).
Physical consistency, stability, and interpretable closure remain open; enforcing thermodynamic constraints (e.g., non-negative dissipation) at the model level is essential.
Multiscale and high-dimensional optimization is constrained by database size, microstructure granularity, and the mapping between optimized macroscale properties and realizable microgeometry (Da et al., 2023).
Data-driven surrogates are highly efficient but may degrade outside the training database (e.g., SDF surrogates for out-of-sample geometries (Peng et al., 2020)), or may not provide explicit fieldwise physical interpretation.
Uncertainty quantification in field and parameter estimation (Bayesian neural networks, e.g., HCE-BNN (Jiang et al., 2021)) is critical in sparse, noisy, or partially observed domains.

Ongoing research emphasizes hybridization: integrating high-fidelity simulation or kinetic data, enforcing physical constraints, employing statistical learning to correct model bias, and deploying surrogates as embedded components in larger design, control, or field estimation frameworks.

In conclusion, the data-driven approach to heat transport encompasses a spectrum of methodologies, from weakly supervised neural surrogates and physics-informed learning, to multiscale optimization and kernel-based constitutive modeling. The field is characterized by rapid advances in generalization capability, accuracy, and computational efficiency, and is increasingly integrated with domain knowledge, physical principles, and high-fidelity data sources across scales and application domains.