MCMC with Adaptive Principal-Component Transformation: Rotation-Invariant Universal Samplers for Bayesian Structural System Identification

Published 25 Apr 2026 in stat.AP, stat.ME, and stat.ML | (2604.23381v1)

Abstract: Over decades, Markov chain Monte Carlo (MCMC) methods have been widely studied, with a typical application being the quantification of posterior uncertainties in Bayesian system identification of structural dynamic models. To address the issue of excessively low sampling efficiency in generic MCMC methods when applied to specific problems, researchers developed several MCMC algorithms that integrate trainable neural networks to replace and enhance their critical components. Later, meta-learning MCMC methods emerged to reduce training time. However, they require considerable similarity between test and training tasks, while their sampling efficiency is constrained by trade-off-simplified network designs. This paper proposes the Adaptive Principal-Component (PC) Meta-learning Stochastic Gradient Hamiltonian Monte Carlo (APM-SGHMC) algorithm. It adaptively rotates coordinate axes in the parameter space to align with the PC directions of the current posterior samples, ensuring rotation-invariance of sampling performance with respect to the posterior distribution. By incorporating translation-invariance, scale-invariance, and rotation-invariance in a unified framework, APM-SGHMC enables universal samplers to acquire generalizable knowledge across diverse Bayesian system identification tasks using minimalistic tasks while eliminating the constraints imposed by network design trade-offs on sampling efficiency. Practical feasibility issues are also addressed. Two Bayesian system identification case studies demonstrate its effectiveness and universality: our method overcomes the case-by-case limitations of traditional data-driven approaches, achieving zero-shot generalization across structurally distinct models without retraining and maintaining consistent superior performance across all scenarios.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper presents APM-SGHMC, a rotation-invariant MCMC sampler using adaptive principal-component transformation for zero-shot generalization across diverse structural models.
It employs an EMA-based update mechanism leading to up to 265x higher effective sample sizes per hour than HMC and consistently lower negative ELBO values.
The study demonstrates the method's practical transferability from building to bridge systems without retraining, significantly reducing computational overhead.

Authoritative Summary of "MCMC with Adaptive Principal-Component Transformation: Rotation-Invariant Universal Samplers for Bayesian Structural System Identification" (2604.23381)

Introduction and Problem Context

The paper addresses the limitations of standard Markov chain Monte Carlo (MCMC) algorithms within the Bayesian structural system identification domain. Generic MCMC samplers, notably Metropolis-Hastings, Transitional MCMC (TMCMC), and Hamiltonian Monte Carlo (HMC), exhibit suboptimal efficiency, especially in high-dimensional parameter spaces where strong correlations limit exploration. Recent advances involve neural network (NN)-enhanced MCMC and meta-learning approaches intended to generalize sampling strategies across tasks while minimizing retraining overhead. However, trade-offs in NN architecture—particularly component-wise input/output designs—impose intrinsic limitations on generalizability and sampling efficiency.

APM-SGHMC Algorithm: Design and Theoretical Advances

The Adaptive Principal-Component Meta-learning Stochastic Gradient Hamiltonian Monte Carlo (APM-SGHMC) algorithm constitutes the central innovation. Its architecture combines translation-invariance, scale-invariance, and rotation-invariance via an adaptive principal-component (PC) transformation embedded in a meta-learning stochastic simulation framework. In contrast to AM-SGHMC, which only handles translation and scale invariance, APM-SGHMC leverages PC decomposition of posterior samples to rotate the coordinate axes dynamically, thus aligning sampling strategies with dominant directions of posterior uncertainty.

This rotation-invariance removes the dependency on component-wise NNs, inherently boosting generalizability and efficiency, as the sampler encounters structurally diverse Bayesian inference tasks. The adaptive estimation mechanism for PCs is realized through a real-time, exponential moving average (EMA)-based update, robustly handling correlation structures as they evolve during burn-in and main sampling phases.

The algorithm also introduces affine-invariant potential energy statistics for convergence diagnostics and divergence mitigation, facilitating robust adaptation even in the presence of outlier and non-converged chains. The step-size adaptation and momentum reset mechanisms further stabilize Markov chain evolution.

Practical Implementation and Algorithmic Details

APM-SGHMC operationalizes its rotation-invariance by transforming parameter vectors via adaptively estimated PC directions, ensuring the sampling performance is invariant to the orientation of the posterior. The EMA process for statistics—including mean, variance, and PC directions—is decoupled with iteration-varying decay rates, enabling flexibility during fast convergence and precision during stationary sampling.

Non-converged samples are diagnosed through affine-invariant potential energy thresholds, and their influence is neutralized in adaptive estimation. Sampling updates for parameter and momentum states are rotated accordingly, and step-size relaxation coefficients are dynamically adjusted to accelerate convergence and prevent divergence.

Neural network design is minimalist but sufficient; the architecture integrates MLP, linear-transform, and radial basis function (RBF) shortcuts for the gyroscopic-coupling and damping matrix parameterizations. Input pre-processing discards domain-specific parameter categories to promote universality across structurally distinct tasks.

Empirical Evaluation: Numerical Results and Claims

Two case studies empirically validate the claims:

Building Structural Model Identification

Trained on a 4-story steel braced-frame benchmark, APM-SGHMC is tested on 2-, 4-, and 6-story structures with varying noise conditions. Metrics include negative ELBO (approximated Kullback-Leibler divergence), Effective Sample Size (ESS), and ESS per hour (sampling efficiency). APM-SGHMC consistently attains the lowest negative ELBO values, indicating optimal posterior approximation, and demonstrates ESS/h that is 265x (2-story), 174x (4-story), and 171x (6-story) higher than HMC, and 39x, 21x, and 19x higher than AM-SGHMC. Furthermore, ESS values are nearly invariant across cases, substantiating the elimination of parameter correlation as a decisive factor for efficiency.

Bridge Structural Model Identification

APM-SGHMC trained on the building task generalizes directly to three distinct bridge model classes (6-, 17-, and 29-parameter settings) with no retraining. HMC, requiring task-specific adaptation, serves as the baseline. Once again, APM-SGHMC achieves superior negative ELBO, and the sampling efficiency is 76x (6-para), 81x (17-para), and 21x (29-para) higher than HMC. The relatively modest ESS for the 29-parameter case is traced to adaptation-phase length, but the universality of the method across structurally disparate systems is strongly evidenced.

Implications, Theoretical Impact, and Future Outlook

By incorporating adaptive PC transformation, APM-SGHMC achieves rotation-invariance, obviating the need for task-specific retraining and architectural trade-offs characteristic of NN-enhanced samplers. The principal implication is the capacity for zero-shot generalization across model classes—enabling universal MCMC samplers that scale efficiently to structurally distinct Bayesian inference problems. Beyond structural identification, this approach is applicable to any domain where posterior correlations hamper classical MCMC efficiency.

Practically, APM-SGHMC enables substantive reductions in computational overhead and time for Bayesian updating in structural dynamics, potentially transforming workflows in engineering model calibration, uncertainty quantification, and data-driven structural health monitoring.

Theoretically, it advances meta-learning in MCMC, suggesting that—combined with adaptive, rotation-invariant feature representations—samplers can acquire abstract, generalizable strategies beyond component-wise heuristics. Future developments might involve further automation of adaptive phase length, investigation of nonlinear PC strategies, and integration with differentiable model-based simulation pipelines for even wider applicability.

Conclusion

APM-SGHMC provides a substantial formal advance in meta-learning MCMC methods for Bayesian structural system identification by ensuring translation-, scale-, and rotation-invariance in sampler performance. Adaptive PC transformation allows efficient and universal sampling across problem variants with strongly correlated posterior structures, achieving high Effective Sample Size and sampling efficiency compared to both classical and state-of-the-art NN-enhanced approaches. The chosen architecture and convergence handling strategies are robust and scalable, opening new avenues for universally generalizable algorithms across Bayesian system identification and related fields.

Markdown Report Issue