Scalable Power Sampling

Updated 1 February 2026

Scalable power sampling is an integrated framework that combines statistical, algorithmic, and physical methods to tackle high-dimensional and rare-event distributions.
It utilizes adaptive importance sampling, structure-aware surrogate modeling, and data augmentation to reduce variance and computational costs in power system analysis.
Recent advances demonstrate improved reliability, energy efficiency, and real-time performance across applications from power grid reliability to large language model inference.

Scalable power sampling encompasses algorithmic, statistical, and physical frameworks for sampling high-dimensional, structure-dependent, or rare-event distributions in power systems and other domains, with an emphasis on computational efficiency, tractable variance, and adaptability to large-scale or real-time requirements. It includes adaptive stochastic estimators, structure-aware data-driven algorithms, and scalable physical implementations, facilitating rigorous inference, surrogate model training, or data generation in settings like power grid reliability, streaming time series analysis, and learning-based or hardware-embedded environments.

1. Adaptive Importance Sampling in Power Grid Reliability

Adaptive importance sampling achieves scalable rare-event estimation in high-dimensional power system reliability problems by leveraging a physics-informed, adaptive mixture of truncated Gaussian distributions. The reliability constraint is formulated as a union of high-dimensional linear inequalities: given random injections $x\sim N(\mu,\,\Sigma)$ , the system fails if any $\omega^i{}^\top x > b_i$ for $i=1,\ldots,J$ . The failure probability

$p = \mathbb{P}\Big[\bigcup_{i=1}^J\{\omega^i{}^\top x > b_i\}\Big] = \int_{\mathbb{R}^n} 1\Big\{\max_i(\omega^i{}^\top x - b_i)>0\Big\} f(x)\,dx$

is estimated by sampling from an adaptive proposal

$q(x;\theta) = \sum_{i=1}^J \theta_i\,q_i(x)$

where $q_i(x)$ is the base $N(\mu,\Sigma)$ truncated to $\{\omega^i{}^\top x > b_i\}$ . The weights $\theta$ are iteratively updated via entropic mirror descent on the simplex to minimize the importance sampling variance. Each iteration samples from $q(x;\theta)$ , computes the importance weight $w(x)=f(x)/q(x;\theta)$ , estimates the stochastic gradient, and updates $\theta$ in closed form.

The resulting estimator $\hat{p}$ achieves sample variance scaling of order $O\bigl(\tfrac{1}{N}V^* + \tfrac{\sqrt{\ln J}}{N^{3/2}}\bigr)$ , requiring only $O(\sqrt{\ln J})$ dependence on the number of constraints $J$ , compared to $O(J)$ for static mixtures or $O(1/p)$ for naive Monte Carlo. Empirical studies on IEEE-30 to Polish-3120 systems show that $O(10^2$ – $10^3)$ samples are sufficient even for target probabilities $p\sim 10^{-5}$ and $J\sim 10^3$ , with adaptive runs for the largest systems completing in under two minutes on commodity hardware. Embedding the full DC power flow physics into the truncated Gaussian mixture enables real-time, grid-scale rare-event estimation with sharply reduced sample and computational complexity (Lukashevich et al., 2021).

2. Structure-Aware Sampling for Power System Surrogate Modeling

Correlation sampling enables scalable, physically meaningful coverage of static power grid operating spaces for surrogate model training and scenario analysis. By starting from a Dirichlet base distribution on normalized allocations $s\in\mathbb{R}^m$ and enforcing partial correlation constraints estimated from historic or simulated data, the algorithm fills and extends the admissible region beyond pure copula or uniform random approaches. Pairwise adaptations enforce the strongest ( $|r|\geq t$ ) empirical correlations, while Gaussian noise relaxation prevents over-constraining and increases design diversity. The method captures both strong empirical dependencies and synthetic diversity, generating thousands of high-dimensional samples in seconds to minutes for $m$ in the low hundreds, and can be parallelized or vectorized for even larger instances.

Empirical P–Q scatter plots show that correlation sampling fills the physically feasible domain more thoroughly than stratified random sampling and avoids the over-concentration characteristic of copula-based designs, supporting improved surrogate learning and statistical robustness for power flow analysis (Balduin et al., 2022).

3. Scale-Adaptive Sampling and Generalization in Power Flow Learning

Local topology slicing (LTS) is a scalable data augmentation technique designed to expose learning models, such as graph neural networks or transformers, to a diverse set of network subgraphs sampled from the full power grid. Starting from random seed nodes, breadth-first expansions create subgraphs of variable size and topology, with boundary fluxes adjusted to maintain Kirchhoff’s laws. Boundary adjustments convert external flows to equivalent nodal injections, preserving network physics. This multi-scale sampling rebalances the distribution of graph statistics—including average degree and algebraic connectivity—relative to the full system.

Empirical evaluation on the IEEE-39 and a 690-bus real grid shows that LTS data augmentation improves model generalization accuracy under unseen bus/topology regimes by 18–36 percentage points, with additional gains from physically motivated target prediction (direct bus voltages and branch powers), multi-task graph heads, and physical consistency losses. The LTS technique is generalizable across power system classes, and the overall framework supports robust cross-scale surrogate learning for operational and planning scenarios (Li et al., 4 Jan 2026).

4. Optimal Online Sampling in Streaming Power Grid Time Series

D-optimal subsampling for multi-dimensional streaming time series in power grid sensor data involves real-time, adaptive sample selection designed to maximize the information content per update under computational or energy constraints. The statistically optimal sampling function $s(x)$ is determined by the D-optimality criterion, which seeks to maximize $\det I(s)$ , where $I(s)=\mathbb{E}[s(x)\,xx^\top]$ , subject to a fixed average sampling rate $q$ .

The optimal policy is a mixture of Bernoulli sampling and hard-thresholded leverage score sampling, determined by the current Mahalanobis leverage $\ell(x) = x^\top\Sigma_x^{-1}x$ . In practice, every incoming data point is sampled with baseline probability $q_0$ or, if $\ell(x)$ exceeds a threshold $r$ , with probability $q_0 + (1-q_0)1\{\ell(x)>r\}$ . This mixture ensures robustness to distribution drift and rare events. Theoretical guarantees include asymptotic normality, small residuals under estimated moments and thresholds, and first-order efficiency relative to full-data recursive least squares.

Empirical results on 19-country European hourly load data show that relaxed-LSS attains full-sample estimation and prediction accuracy using far fewer updates than Bernoulli baselines, with rapid adaptation to load shocks and order-of-magnitude compute and energy savings (Xie et al., 2023).

5. Scalable Dataset Generation for AC-OPF in Large Power Grids

For large-scale AC optimal power flow (AC-OPF) datasets, a scalable generation method enforces uniform coverage in total system load $P^{tot} = \sum_{d} p_d$ by explicit slicing of the load polytope, rather than sampling individual loads independently. For each slice, a coordinate-directions hit-and-run (CDHR) algorithm efficiently samples load setpoints constrained to a narrow band of $P^{tot}$ values, after which AC-OPF is solved with slack variables to allow for occasional unfeasible draws without costly pre-filtering.

Dataset quality is quantified by three complementary criteria: marginal variability (Shannon entropy), activation pattern diversity (average Hamming distance of variable-bound status), and activation frequency of variable bounds. The proposed method (HEDGeOPF) provides a 55% increase in dispatch variability and over 250% increase in pattern diversity relative to uniform random sampling, while maintaining runtime per AC-OPF solve at or below 1 second for 118-bus systems. The method supports trivial parallelization up to 4661-bus test cases. RAMBO (maximally distant OPF sampling) achieves slightly higher constraint activation but at 30–115 $\times$ higher computational cost, making HEDGeOPF the practicable quality–scalability compromise for training ML models on realistic AC-OPF instances (Baù et al., 26 Aug 2025).

6. Training-Free Power Sampling for LLMs

In the context of LLM reasoning, scalable power sampling denotes a theoretically grounded, training-free approach for sampling from the power distribution $\pi(x)\propto p(x)^\alpha$ , where $p(x)$ is the LLM's generative distribution. The global power sequence distribution can be approximated at each decoding step with per-token local scaling factors $\zeta_t(x_t)$ , providing future-aware normalization:

$p^\text{pow}_\alpha(x_t|\cdot)\propto p(x_t|\cdot)^\alpha\,\zeta_t(x_t).$

The $\zeta_t$ factor is estimated via Monte Carlo rollouts under the base model and a jackknife correction. This enables per-token future reward estimation without MCMC or reward models.

Empirical benchmarking against MCMC-based power sampling and RL post-training (GRPO) demonstrates that scalable power sampling matches or surpasses RL or MCMC on math, code, and QA tasks, while reducing inference latency by over 10 $\times$ relative to MCMC (0.22 min/prompt vs. 2.5 min/prompt, Qwen2.5-Math-7B). The algorithm incurs only a 2.5–3.5 $\times$ slowdown against vanilla decoding and is implemented without any gradient updates. This approach rigorously connects observable RL gains to distributive sharpening, formalizes power sampling as a local, scalable process, and enables real-time high-quality LLM inference (Ji et al., 29 Jan 2026).

7. Physical and Hardware-Driven Scalable Sampling

Hardware-based scalable power sampling is exemplified in the use of room-temperature stochastic magnetic tunnel junctions (sMTJs) for generating truly random Float16 samples at four orders of magnitude lower energy than conventional pseudo-random generators. Bernoulli device parameters are calibrated via current-mode DACs to match the required exponent, mantissa, and sign distributions of the IEEE Float16 format, yielding direct hardware mapping to uniform floating-point draws. Arbitrary 1D distributions are generated via a mixture-of-uniforms approach: random selection among non-overlapping buckets, followed by uniform sampling within each bucket, and further lifted by convolution (sum) or pointwise (Bayesian) product. The maximum $L^1$ error in representing the true density is bounded by $\frac{L\Delta}{2}\max|f'|$ , vanishing as bucket size $\Delta$ decreases.

Measured efficiency is $22.42$ mJ for $2^{30}$ Float16 samples, outperforming PCG32 and MT19937 by factors of $5649$ and $9721$, respectively. Throughput reaches 1 MSPS per sMTJ column, scaling linearly with additional columns. The methodology enables physically scalable, resource-affordable, and rigorously quantifiable hardware-embedded sampling for ML and probabilistic applications (Alder et al., 2024).

In summary, scalable power sampling unifies statistical, algorithmic, and physical-layer advances in sampling complex, resource-constrained, or high-variance distributions. Recent literature provides rigorously analyzed and empirically validated methods—ranging from adaptive importance sampling and correlation-driven design to scalable LLM inference schemes and low-power physical random number generation—that together address the scaling bottlenecks across statistical estimation, ML-driven control, and trusted hardware generation of power system and broader probabilistic models.