Overlapping Batch Means (OBM)

Updated 6 August 2025

Overlapping Batch Means (OBM) is a variance estimation technique that forms heavily overlapping batches to improve estimation accuracy in dependent data settings.
The method constructs nearly identical overlapping sub-samples, reducing estimator variance compared to traditional nonoverlapping batch means.
Optimal batch size selection in OBM balances bias and variance, yielding strong consistency and reliable confidence intervals in simulation and MCMC analyses.

The overlapping batch means (OBM) method is a variance estimation technique designed for simulation output analysis, notably in Markov chain Monte Carlo (MCMC) and dependent time series settings. Unlike traditional nonoverlapping batch means (BM) estimators, OBM increases the efficiency of variance estimates by constructing heavily overlapping batches, which leads to a reduction in estimator variance and more reliable assessment of Monte Carlo standard errors (MCSE). OBM is closely related to certain spectral variance estimators and underpins several state-of-the-art approaches to uncertainty quantification and confidence region construction in the presence of dependent data.

1. Construction of the OBM Estimator

For a stationary sequence (e.g., the output of an MCMC run or a time series), the OBM approach forms batches of fixed size $b_n$ with maximal overlap: batch $j$ consists of observations $X_{j+1}, ..., X_{j+b_n}$ for $j = 0, 1, ..., n-b_n$ , giving $n-b_n+1$ overlapping batches. Denoting the mean-centered outputs as $Y_i = g(X_i) - \mu$ , the batch means are

$Y_j(b_n) = \frac{1}{b_n} \sum_{k=1}^{b_n} Y_{j+k},$

where $\mu = E[g(X)]$ . The OBM estimator for the asymptotic variance $o^2$ of the sample mean is

$\hat{o}^2_\mathrm{OBM} = \frac{1}{(n-b_n)(n-b_n+1)} \sum_{j=0}^{n-b_n}[Y_j(b_n) - \bar{Y}_n]^2,$

with $\bar{Y}_n = \frac{1}{n}\sum_{i=1}^n Y_i$ (0811.1729).

This quadratic-form structure, with each point reused in many overlapping batch means, "smooths" the estimator and reduces variability relative to BM.

2. Theoretical Properties: Consistency and Efficiency

The OBM estimator's consistency and efficiency are established under geometric ergodicity and standard moment conditions (e.g., $E|g(X)|^{2+\delta} < \infty$ for some $\delta > 0$ ), alongside batch size requirements such as $b_n/n \to 0$ and $b_n^{-1}\log n$ bounded (0811.1729). The following properties hold:

Strong Consistency: $\hat{o}^2_\mathrm{OBM} \to o^2$ almost surely as $n \to \infty$ under suitable conditions.
Mean-Square Consistency: $\mathrm{MSE}(\hat{o}^2_\mathrm{OBM}) \to 0$ as $n \to \infty$ .
Asymptotic Variance Reduction: The OBM estimator's variance constant is asymptotically $4/3$ (i.e., $\mathrm{Var}(\hat{o}^2_\mathrm{OBM}) \sim (4/3)o^4/n$ ), compared to the constant $2$ in standard BM, yielding a variance reduction of approximately 1/3 asymptotically.

This efficiency gain can lead to more accurate confidence intervals, particularly in settings with high temporal correlation, without sacrificing theoretical guarantees of convergence.

3. Choice of Batch Size and Bias–Variance Trade-off

Both the bias and variance of OBM estimators depend critically on the batch size $b_n$ . In the mean-square error (MSE) decomposition,

$\mathrm{MSE} \sim c_1/b_n + c_2 b_n/n,$

where $1/b_n$ is the squared bias and $b_n/n$ is the variance component. Minimizing MSE with respect to $b_n$ yields the rate-optimal scaling $b_n^* \propto n^{1/3}$ , or more explicitly,

$b_n^* = \left\lceil K \left(\frac{\tau^2}{o^4}\right)^{1/3} n^{1/3} \right\rceil,$

where $K$ is an unknown proportionality constant and $\tau^2$ and $o^4$ depend on the process autocovariances (0811.1729, Liu et al., 2018). In finite samples, especially for highly correlated chains, larger batch sizes (e.g., $b_n \sim n^{1/2}$ or even $n^{2/3}$ ) may achieve better coverage and lower bias.

Advanced selection techniques fit AR( $m$ ) models to the output to estimate the process autocovariances, enabling direct computation of the optimal batch size, which improves robustness relative to nonparametric pilot methods (Liu et al., 2018).

Summary Table: Asymptotic Properties

Estimator	Asymptotic Variance Const.	Optimal Batch Size
BM (nonoverlap)	2	$n^{1/3}$
OBM (overlap)	$4/3$	$n^{1/3}$
Weighted BM (flat top window)	$\approx$ 1.875 × SV	$n^{1/3}$ [typically]

4. Empirical Performance and Practical Guidance

Comprehensive simulation studies (AR(1) models, Bayesian regression) underscore that OBM (and related SV estimators with windows such as Tukey–Hanning) yields confidence intervals with empirical coverage near the nominal level for moderate correlations when $b_n = \lceil n^{1/2} \rceil$ , and for high correlations only with larger $b_n$ (0811.1729). Key findings include:

For moderately correlated series, BM, OBM, and SV produce reliable results with appropriate batch sizing.
For highly autocorrelated data, larger batch sizes are necessary; undersized $b_n$ lead to poor variance estimation and compromised coverage.
OBM methods consistently outperform BM in variance reduction, at a cost of increased computation and memory.
Weighted BM estimators (Liu et al., 2018) can approach OBM/SV accuracy with substantial computational savings, particularly in high-dimensional or long-chain scenarios, but with a modest inflation in MSE.

Recommendations favor OBM or SV estimators (Tukey–Hanning window) and the use of batch sizes scaling at least as $n^{1/2}$ for strongly correlated chains.

5. Theoretical Developments: Nonasymptotic and Concentration Inequalities

Recent work provides explicit nonasymptotic concentration inequalities for OBM variance estimators when applied to uniformly geometrically ergodic Markov chains (Moulines et al., 13 May 2025). Using martingale decomposition methods based on the Poisson equation, the estimator's deviation from the true asymptotic variance can be bounded as follows:

$\E\left[ \left| \hat{\sigma}_\mathrm{OBM}^2(f) - \sigma_\infty^2(f) \right|^p \right]^{1/p} \lesssim \frac{p^2}{\sqrt{n - b_n + 1}} + \frac{p^2 \sqrt{b_n}}{\sqrt{n - b_n + 1}} + \text{(remainder)},$

where constants depend explicitly on $p$ (moment order), the batch size $b_n$ , sample size $n$ , and the mixing time (rate of convergence to stationarity). This quantifies the estimator's concentration about the true variance: better mixing (smaller mixing time) implies sharper concentration, and the rate degrades gracefully as $b_n$ increases relative to $n$ .

6. Applications in Simulation, MCMC, and Confidence Interval Construction

OBM serves a central role in MCMC output analysis, uncertainty quantification, and confidence region construction:

In simulation settings with dependent output, OBM approximates the sampling distribution of estimator errors for bias, variance, or quantiles, often outperforming classical bootstrap in dependent data (Jeon et al., 2023).
For construction of confidence intervals for functionals such as quantiles or process parameters, OBM Studentizes the estimator using the variance across overlapping batch means, leading to valid coverage in both small- and large-batch regimes (Su et al., 2023).
OBM-based techniques are directly integrated into fixed-width stopping rules and automated MCSE reporting.

Advanced procedures leverage OBM's strong and higher-order consistency, and software packages now often include OBM or spectral variance estimates as defaults.

7. Methodological Developments and Future Research Directions

While the strong asymptotic properties of OBM are well established, several open directions remain:

Further development of nonasymptotic theory for OBM estimators, including sharp constants and optimality under complex dependence structures (Moulines et al., 13 May 2025).
Extension of central limit theorems for OBM estimators (analogous to those for nonoverlapping BM (Chakraborty et al., 2019)), which would advance theoretical guarantees for confidence interval construction.
Implementational advances that reduce the computational overhead of forming overlapping batches, including weighted batch means and alternative windowing methodologies.
Enhanced batch size selection procedures, particularly via AR( $m$ )-based pilot estimation, for high-dimensional or strongly dependent MCMC applications (Liu et al., 2018).

OBM remains a foundational tool for the quantitative analysis of simulation and MCMC output, offering a robust methodology for variance estimation and inferential procedures in dependent data settings. Its performance characteristics—low estimator variance, strong consistency under mild conditions, and adaptability to high-dimensional and highly correlated contexts—ensure its continued relevance and active development in statistical simulation and computational statistics.