Data-Driven Acoustic Surrogate Model

Updated 29 September 2025

Data-driven acoustic surrogate models are computational constructs that replace costly direct simulations by leveraging machine learning and reduced-order techniques.
They integrate physical modeling with methods like DNNs, polynomial metamodels, and GNNs to predict acoustic metrics such as sound absorption and reverberation.
These models facilitate rapid design and inverse problem solving in complex acoustic systems while offering uncertainty quantification and controlled error bounds.

A data-driven acoustic surrogate model is a computational construct that leverages statistical, machine learning, or reduced-basis techniques to emulate the acoustic response of systems that would otherwise require computationally intensive direct simulations, such as finite element models (FEM), computational fluid dynamics (CFD), or full-wave solvers. These surrogates are trained on high-fidelity physical data and are deployable for rapid evaluation, uncertainty quantification, and optimization across diverse acoustic design and analysis applications.

1. Foundations and Model Types

The construction of data-driven acoustic surrogate models is fundamentally governed by the requirement to replace or accelerate evaluating an expensive physical model while retaining high predictive fidelity within a parameter space of interest. Several modeling paradigms appear in the literature, including:

Polynomial Metamodels: For instance, polynomial expansions (such as those based on multidimensional Legendre polynomials) are used to approximate the solution map from microstructural parameters to acoustic metrics (e.g., sound absorption coefficient) in multiscale material design (Trinh et al., 2017). The surrogate takes the general form

$\hat{q}(\xi) = \sum_{\alpha \in \mathbb{N}^n} \hat{q}_\alpha P_\alpha(\xi)$

where $P_\alpha$ denotes products of orthogonal polynomials over normalized input variables.

Deep Neural Networks (DNNs): DNNs are employed either as direct surrogates or inverse solvers, mapping acoustic target spectra to optimal design parameters (e.g., for Helmholtz resonator structures (Sun et al., 2021)), or mapping geometric/operating conditions directly to acoustic responses (e.g., sound transmission loss or underwater acoustic attenuation (Weeratunge et al., 2022)).
Gaussian Processes, Kriging, and Polynomial Chaos Expansion (PCE): These approaches are combined with supervised dimensionality reduction (e.g., kernel PCA) for high-dimensional problems, where the surrogate operates on compressed representations of input fields and provides uncertainty quantification (Lataniotis et al., 2018).
Graph Neural Networks (GNNs): For mesh-based simulations in aeroacoustic applications, GNNs encode the spatial and geometric information into a computational graph, prediction entire flow fields and derived acoustic quantities from geometry and boundary conditions (Hadizadeh et al., 22 Dec 2024).
Reduced Order/Balanced Truncation Methods: Algorithms such as the Eigensystem Realization Algorithm (ERA) construct surrogates by fitting input-output dynamical data with reduced state-space models, equipped with guaranteed error bounds and near real-time evaluation capabilities in predictive aeroacoustics (Rezaian et al., 2023).
Physics-Informed Neural Networks (PINNs) and Ray-Basis Neural Networks (RBNNs): By embedding the governing acoustic physics (wave or Helmholtz equations) into the network structure, surrogate models ensure outputs are physically plausible and allow efficient training with limited data (Li et al., 2022).

2. Data Acquisition, Generation, and Feature Engineering

Effective surrogate models rely on carefully curated datasets that represent the range of acoustic responses over the domain of input variables of interest.

Simulation-Based Datasets: High-fidelity datasets are often generated from FEM/CFD solutions over varied microstructures, geometric parameters, and environmental conditions. Sampling strategies such as Latin Hypercube Sampling (LHS) and one-at-a-time impulse excitation (for system identification) are reported (Trinh et al., 2017, Lataniotis et al., 2018, Rezaian et al., 2023).
Experimental Measurements: Automated robotic testbeds provide high-resolution acoustic impulse responses for 3D-printed surfaces, supplying ground truth datasets for training ML surrogates that map geometry to measured cumulative energy curves (Rust et al., 2021).
Feature Engineering: Domain-specific physical features (e.g., mass density, bending stiffness, resonance coefficients) are explicitly computed and concatenated to raw inputs, enhancing surrogate accuracy and interpretability; feature importance is typically bootstrapped via tree-based models or sensitivity analysis (Cunha et al., 2022).
Dimensionality Reduction: High-dimensional spatial, spectral, or geometrical inputs are compressed using PCA or kernel PCA (KPCA), ensuring tractability and improved generalization in the surrogate learning process (Lataniotis et al., 2018).
Data Augmentation and Transfer Learning: Techniques such as rotation/mirroring on symmetric tensor fields, and transfer learning from pre-trained surrogates, reduce overfitting and enhance performance under data scarcity (Jones et al., 2022).

3. Surrogate Model Calibration, Training, and Optimization

Model training strategies are tightly coupled with problem structure and accuracy requirements:

Offline Training: Surrogates are trained on large datasets, calibrated using validation and cross-validation techniques (e.g., leave-one-out or K-fold), and hyperparameters are selected by minimizing error metrics such as MSE, MAE, or relative error rates (Trinh et al., 2017, Sun et al., 2021, Weeratunge et al., 2022).
Active Learning and Data Efficiency: Sampling policies (such as the student-teacher framework) minimize expensive labeled data requirements by targeting high-error or high-uncertainty regions of parameter space, with policies (ε-HQS) shown to reduce samples needed by up to 40% (Vardhan et al., 2022).
Physics-Guided Training: In PINNs and RBNNs, losses penalize deviation from physical constraints (e.g., adherence to the wave equation or boundary conditions), and may include sparsity or information maximization terms (Li et al., 2022).
Hyperparameter and Model Order Reduction: Truncation order (for polynomial expansions or SVD modes) is chosen based on error convergence studies; balanced truncation surrogates leverage ERA with tangential interpolation and gappy-POD for computational tractability (Rezaian et al., 2023).
Optimization Integration: Surrogates are embedded in design or inverse design loops, using evolutionary or genetic algorithms to maximize performance indices (e.g., absorption, reduction in sound pressure) under geometric or physical constraints (Weeratunge et al., 2022, Hadizadeh et al., 22 Dec 2024).

4. Performance Assessment and Practical Deployment

Performance is measured in terms of both predictive fidelity and computational efficiency:

Accuracy: Surrogates commonly achieve uniform relative errors of 1–3% across validation datasets; well-constructed models faithfully reproduce physically critical features such as resonance peaks, energy decay, or echo density (Trinh et al., 2017, Weeratunge et al., 2022, Sun et al., 2021, Mezza et al., 29 Mar 2024).
Computational Speedup: Evaluating surrogate models is typically several orders of magnitude faster than direct simulations. For instance, DNN surrogates for underwater coatings accelerate evaluation by a factor of $4.5\times 10^3$ over FEM (Weeratunge et al., 2022), and GNN surrogates for flow/acoustics offer speed-ups of $10^3$ over full CFD (Hadizadeh et al., 22 Dec 2024).
Generality and Data Efficiency: Surrogates such as RBNNs retain strong extrapolation capability and require significantly fewer training points than black-box ML models by encoding physical constraints (Li et al., 2022).
Validation Against Experiment and High-Fidelity Simulation: Performance is benchmarked by comparison with experimental measurements, full simulation, and in some cases, theoretical estimations or metric baselines (e.g., transfer matrix, FEM, or TMM) (Sun et al., 2021, Rust et al., 2021).

5. Applications Across Acoustic Regimes

Data-driven acoustic surrogate models are now integrated into a spectrum of applications:

Microstructural and Process Optimization: Polynomial surrogates or DNNs are employed for rapid exploration of acoustic material design spaces, optimizing metrics such as absorption across parameterized microgeometries (Trinh et al., 2017, Weeratunge et al., 2022).
Inverse Acoustic Structure Design: DNNs enable direct, fast mappings from target spectra to optimal device parameters, effectively solving non-invertible design problems (e.g., multi-order Helmholtz resonators for sound insulation) (Sun et al., 2021).
Early-Stage Room and Environmental Acoustics: Fast ML surrogates for room acoustic metrics (reverberation time, speech intelligibility indices) streamline building design and planning (Abarghooie et al., 2021). Differentiable FDNs are used for realistic, tunable reverberation synthesis with perceptually salient temporal controls (Mezza et al., 29 Mar 2024).
Multiphysical Shape Optimization: GNN surrogates facilitate integrated aerodynamic and acoustic optimization of airfoils, balancing lift and trailing edge noise with explicit Pareto trade-offs and adaptive geometry morphing (Hadizadeh et al., 22 Dec 2024).
Propagation and Source Localization in Complex Environments: Hybrid physics-ML surrogates (e.g., RBNNs) and model-uncertainty–driven adaptation (e.g., JSEA) enable efficient, robust underwater localization and propagation modeling, even with partially known or mismatched environments (Li et al., 2022, Kari et al., 30 Mar 2025).

6. Methodological and Theoretical Advances

Key advances include:

Integration of Physical Modeling and Data-Driven Regression: Surrogates increasingly blend physics-based constraints (hard-coded into the architecture or loss functions) with empirical regression, offering improved physical consistency and data efficiency (Li et al., 2022, Cunha et al., 2022).
Differentiable and Fully Trainable Acoustic Networks: The design of fully differentiable models (e.g., delay networks with learnable delay lines (Mezza et al., 29 Mar 2024)) enables end-to-end optimization for perceptual and physical properties, surpassing classical genetic or heuristic schemes.
Efficient Dimensionality Handling: Nested optimization of dimensionality reduction and surrogate calibration (DRSM) sidesteps the curse of dimensionality, supporting applications with $O(10^4)$ input features (Lataniotis et al., 2018).
Uncertainty Quantification and Surrogate Validation: Bayesian inference (e.g., in NVH applications), leave-one-out cross-validation, and parametric bootstrapping formalize the estimation of surrogate reliability, guiding deployment and design risk (Prakash et al., 2022).

7. Impact, Limitations, and Prospects

Data-driven acoustic surrogate models dramatically accelerate simulation and design workflows, supporting multi-objective and inverse optimization tasks, reducing dependence on domain-specific expertise, and providing controlled error estimation and interpretability. Their deployment is transforming architectural, environmental, industrial, and submarine acoustic design—enabling rapid prototyping, agile iteration, and uncertainty-aware assessment at scales previously infeasible.

Limitations persist in handling extremely non-smooth or stochastic simulation outputs, generalization far outside the training domain, and scaling to fully 3D or coupled multiphysics scenarios. However, advances in active learning, hybrid PINN architectures, and transfer learning continue to mitigate these challenges.

A plausible implication is that as surrogate modeling becomes further entwined with high-quality data acquisition (including robotics-enabled measurement workflows), new regimes of real-time, adaptive acoustic simulation and control will become tractable, fundamentally shifting acoustic engineering practice and research.