Differentiability in Unrolled Training of Neural Physics Simulators on Transient Dynamics (2402.12971v2)

Published 20 Feb 2024 in physics.comp-ph and cs.LG

Abstract: Unrolling training trajectories over time strongly influences the inference accuracy of neural network-augmented physics simulators. We analyze this in three variants of training neural time-steppers. In addition to one-step setups and fully differentiable unrolling, we include a third, less widely used variant: unrolling without temporal gradients. Comparing networks trained with these three modalities disentangles the two dominant effects of unrolling, training distribution shift and long-term gradients. We present detailed study across physical systems, network sizes and architectures, training setups, and test scenarios. It also encompasses two simulation modes: In prediction setups, we rely solely on neural networks to compute a trajectory. In contrast, correction setups include a numerical solver that is supported by a neural network. Spanning these variations, our study provides the empirical basis for our main findings: Non-differentiable but unrolled training with a numerical solver in a correction setup can yield substantial improvements over a fully differentiable prediction setup not utilizing this solver. The accuracy of models trained in a fully differentiable setup differs compared to their non-differentiable counterparts. Differentiable ones perform best in a comparison among correction networks as well as among prediction setups. For both, the accuracy of non-differentiable unrolling comes close. Furthermore, we show that these behaviors are invariant to the physical system, the network architecture and size, and the numerical scheme. These results motivate integrating non-differentiable numerical simulators into training setups even if full differentiability is unavailable. We show the convergence rate of common architectures to be low compared to numerical algorithms. This motivates correction setups combining neural and numerical parts which utilize benefits of both.

References (72)

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates that temporal unrolling, using both differentiable (38% improvement) and non-differentiable (up to 4.5-fold gain) methods, significantly boosts simulator accuracy.
It empirically evaluates multiple physical systems and reveals that training with a curriculum and incremental unrolling stabilizes long-term gradient computations.
The research underscores that integrating neural networks with legacy numerical solvers via non-differentiable unrolling offers a scalable, cost-effective path for high-fidelity hybrid simulations.

Overview of Temporal Unrolling in Neural Physics Simulators

The paper by Bjoern List and colleagues examines the influence of temporal unrolling on the training of neural network-augmented physics simulators and its effect on inference accuracy. The research identifies and disentangles the two main influences of temporal unrolling: ameliorating training distribution shift and enabling long-term gradient computations. The authors compare three distinct training modalities: the commonly used one-step training, fully differentiable unrolling (referred to as WIG), and non-differentiable unrolling without temporal gradients (NOG).

Key Contributions

Empirical Evaluation Across Systems and Architectures: Through rigorous empirical evaluation across various physical systems—including Kuramoto-Sivashinsky, Wake Flow, Kolmogorov Flow, and Compressible Aerofoil flow—the authors provide a comprehensive perspective on how temporal unrolling influences NN-augmented simulators' performance. They demonstrate that non-differentiable unrolling can offer significant improvements over standard differential setups.
Implications for Neural Hybrid Simulators: The paper highlights that even when numerical solvers are not differentiable, interfacing these solvers with neural architectures using NOG techniques can lead to substantial accuracy improvements. This approach does not necessitate differentiable implementations, making it particularly applicable to environments with legacy numerical codes.
Scalability and Convergence: The paper identifies a low convergence rate for neural architectures compared to traditional numerical algorithms, suggesting that a hybrid approach can exploit the scaling benefits of numerical solvers while incorporating the adaptability of neural networks. This insight is pivotal for large-scale scientific applications where cost-effective scaling is crucial.
Use of Curriculums for Training Stability: The research stresses the necessity of employing a curriculum when training with unrolled setups. Incrementally increasing the number of unrolled steps and adjusting the learning rate are recommended practices, ensuring stable training and reliable gradient flows.

Numerical Results

The results show a consistent improvement with unrolled training setups across different systems and architectures:

A non-differentiable but unrolled setup shows, on average, a 4.5-fold improvement over fully differentiable prediction setups, demonstrating the significant effect of reducing training distribution shift.
Fully differentiable unrolling (WIG) consistently outperforms other methods in terms of inference accuracy, showing the value of long-term gradients—resulting in a 38% average improvement over one-step methods.
Prediction tasks, however, benefit less from differentiable unrolling due to their broader range of valid convergence scenarios favoring smaller architectures under NOG setups.

Theoretical and Practical Implications

The paper's implications are extensive, both practically and theoretically. Practically, the findings encourage the intersection of machine learning and existing numerical simulation infrastructures, suggesting that unrolled training does not only enhance performance but is also essential for leveraging legacy numerical solutions. Theoretically, it advances our understanding of neural network training dynamics in temporal sequence modeling, particularly under chaotic or turbulent conditions like those found in fluid dynamics.

Future Directions

Future research could explore the deployment of tailored architectures that could potentially enhance neural network scaling efficiencies. Additionally, investigating applications beyond fluid dynamics and varying domains could test the generality of the findings. Moreover, exploring the full potential of neural and numerical hybrid models may reveal new paradigms in data-driven physics simulations.

By providing a detailed exploration and a robust empirical paper, the paper significantly contributes to the current understanding of neural networks in physics-based modeling, offering both data analysis and methodological recommendations that could influence future developments in the field.

PDF Markdown

Tweets

https://twitter.com/thuereyGroup/status/1766100864651989177

YouTube

Show All Videos