Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
92 tokens/sec
Gemini 2.5 Pro Premium
50 tokens/sec
GPT-5 Medium
22 tokens/sec
GPT-5 High Premium
21 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
230 tokens/sec
2000 character limit reached

Overlapping Batch Means (OBM)

Updated 6 August 2025
  • Overlapping Batch Means (OBM) is a variance estimation technique that forms heavily overlapping batches to improve estimation accuracy in dependent data settings.
  • The method constructs nearly identical overlapping sub-samples, reducing estimator variance compared to traditional nonoverlapping batch means.
  • Optimal batch size selection in OBM balances bias and variance, yielding strong consistency and reliable confidence intervals in simulation and MCMC analyses.

The overlapping batch means (OBM) method is a variance estimation technique designed for simulation output analysis, notably in Markov chain Monte Carlo (MCMC) and dependent time series settings. Unlike traditional nonoverlapping batch means (BM) estimators, OBM increases the efficiency of variance estimates by constructing heavily overlapping batches, which leads to a reduction in estimator variance and more reliable assessment of Monte Carlo standard errors (MCSE). OBM is closely related to certain spectral variance estimators and underpins several state-of-the-art approaches to uncertainty quantification and confidence region construction in the presence of dependent data.

1. Construction of the OBM Estimator

For a stationary sequence (e.g., the output of an MCMC run or a time series), the OBM approach forms batches of fixed size bnb_n with maximal overlap: batch jj consists of observations Xj+1,...,Xj+bnX_{j+1}, ..., X_{j+b_n} for j=0,1,...,nbnj = 0, 1, ..., n-b_n, giving nbn+1n-b_n+1 overlapping batches. Denoting the mean-centered outputs as Yi=g(Xi)μY_i = g(X_i) - \mu, the batch means are

Yj(bn)=1bnk=1bnYj+k,Y_j(b_n) = \frac{1}{b_n} \sum_{k=1}^{b_n} Y_{j+k},

where μ=E[g(X)]\mu = E[g(X)]. The OBM estimator for the asymptotic variance o2o^2 of the sample mean is

o^OBM2=1(nbn)(nbn+1)j=0nbn[Yj(bn)Yˉn]2,\hat{o}^2_\mathrm{OBM} = \frac{1}{(n-b_n)(n-b_n+1)} \sum_{j=0}^{n-b_n}[Y_j(b_n) - \bar{Y}_n]^2,

with Yˉn=1ni=1nYi\bar{Y}_n = \frac{1}{n}\sum_{i=1}^n Y_i (0811.1729).

This quadratic-form structure, with each point reused in many overlapping batch means, "smooths" the estimator and reduces variability relative to BM.

2. Theoretical Properties: Consistency and Efficiency

The OBM estimator's consistency and efficiency are established under geometric ergodicity and standard moment conditions (e.g., Eg(X)2+δ<E|g(X)|^{2+\delta} < \infty for some δ>0\delta > 0), alongside batch size requirements such as bn/n0b_n/n \to 0 and bn1lognb_n^{-1}\log n bounded (0811.1729). The following properties hold:

  • Strong Consistency: o^OBM2o2\hat{o}^2_\mathrm{OBM} \to o^2 almost surely as nn \to \infty under suitable conditions.
  • Mean-Square Consistency: MSE(o^OBM2)0\mathrm{MSE}(\hat{o}^2_\mathrm{OBM}) \to 0 as nn \to \infty.
  • Asymptotic Variance Reduction: The OBM estimator's variance constant is asymptotically $4/3$ (i.e., Var(o^OBM2)(4/3)o4/n\mathrm{Var}(\hat{o}^2_\mathrm{OBM}) \sim (4/3)o^4/n), compared to the constant $2$ in standard BM, yielding a variance reduction of approximately 1/3 asymptotically.

This efficiency gain can lead to more accurate confidence intervals, particularly in settings with high temporal correlation, without sacrificing theoretical guarantees of convergence.

3. Choice of Batch Size and Bias–Variance Trade-off

Both the bias and variance of OBM estimators depend critically on the batch size bnb_n. In the mean-square error (MSE) decomposition,

MSEc1/bn+c2bn/n,\mathrm{MSE} \sim c_1/b_n + c_2 b_n/n,

where 1/bn1/b_n is the squared bias and bn/nb_n/n is the variance component. Minimizing MSE with respect to bnb_n yields the rate-optimal scaling bnn1/3b_n^* \propto n^{1/3}, or more explicitly,

bn=K(τ2o4)1/3n1/3,b_n^* = \left\lceil K \left(\frac{\tau^2}{o^4}\right)^{1/3} n^{1/3} \right\rceil,

where KK is an unknown proportionality constant and τ2\tau^2 and o4o^4 depend on the process autocovariances (0811.1729, Liu et al., 2018). In finite samples, especially for highly correlated chains, larger batch sizes (e.g., bnn1/2b_n \sim n^{1/2} or even n2/3n^{2/3}) may achieve better coverage and lower bias.

Advanced selection techniques fit AR(mm) models to the output to estimate the process autocovariances, enabling direct computation of the optimal batch size, which improves robustness relative to nonparametric pilot methods (Liu et al., 2018).

Summary Table: Asymptotic Properties

Estimator Asymptotic Variance Const. Optimal Batch Size
BM (nonoverlap) 2 n1/3n^{1/3}
OBM (overlap) $4/3$ n1/3n^{1/3}
Weighted BM (flat top window) \approx1.875 × SV n1/3n^{1/3} [typically]

4. Empirical Performance and Practical Guidance

Comprehensive simulation studies (AR(1) models, Bayesian regression) underscore that OBM (and related SV estimators with windows such as Tukey–Hanning) yields confidence intervals with empirical coverage near the nominal level for moderate correlations when bn=n1/2b_n = \lceil n^{1/2} \rceil, and for high correlations only with larger bnb_n (0811.1729). Key findings include:

  • For moderately correlated series, BM, OBM, and SV produce reliable results with appropriate batch sizing.
  • For highly autocorrelated data, larger batch sizes are necessary; undersized bnb_n lead to poor variance estimation and compromised coverage.
  • OBM methods consistently outperform BM in variance reduction, at a cost of increased computation and memory.
  • Weighted BM estimators (Liu et al., 2018) can approach OBM/SV accuracy with substantial computational savings, particularly in high-dimensional or long-chain scenarios, but with a modest inflation in MSE.

Recommendations favor OBM or SV estimators (Tukey–Hanning window) and the use of batch sizes scaling at least as n1/2n^{1/2} for strongly correlated chains.

5. Theoretical Developments: Nonasymptotic and Concentration Inequalities

Recent work provides explicit nonasymptotic concentration inequalities for OBM variance estimators when applied to uniformly geometrically ergodic Markov chains (Moulines et al., 13 May 2025). Using martingale decomposition methods based on the Poisson equation, the estimator's deviation from the true asymptotic variance can be bounded as follows:

$\E\left[ \left| \hat{\sigma}_\mathrm{OBM}^2(f) - \sigma_\infty^2(f) \right|^p \right]^{1/p} \lesssim \frac{p^2}{\sqrt{n - b_n + 1}} + \frac{p^2 \sqrt{b_n}}{\sqrt{n - b_n + 1}} + \text{(remainder)},$

where constants depend explicitly on pp (moment order), the batch size bnb_n, sample size nn, and the mixing time (rate of convergence to stationarity). This quantifies the estimator's concentration about the true variance: better mixing (smaller mixing time) implies sharper concentration, and the rate degrades gracefully as bnb_n increases relative to nn.

6. Applications in Simulation, MCMC, and Confidence Interval Construction

OBM serves a central role in MCMC output analysis, uncertainty quantification, and confidence region construction:

  • In simulation settings with dependent output, OBM approximates the sampling distribution of estimator errors for bias, variance, or quantiles, often outperforming classical bootstrap in dependent data (Jeon et al., 2023).
  • For construction of confidence intervals for functionals such as quantiles or process parameters, OBM Studentizes the estimator using the variance across overlapping batch means, leading to valid coverage in both small- and large-batch regimes (Su et al., 2023).
  • OBM-based techniques are directly integrated into fixed-width stopping rules and automated MCSE reporting.

Advanced procedures leverage OBM's strong and higher-order consistency, and software packages now often include OBM or spectral variance estimates as defaults.

7. Methodological Developments and Future Research Directions

While the strong asymptotic properties of OBM are well established, several open directions remain:

  • Further development of nonasymptotic theory for OBM estimators, including sharp constants and optimality under complex dependence structures (Moulines et al., 13 May 2025).
  • Extension of central limit theorems for OBM estimators (analogous to those for nonoverlapping BM (Chakraborty et al., 2019)), which would advance theoretical guarantees for confidence interval construction.
  • Implementational advances that reduce the computational overhead of forming overlapping batches, including weighted batch means and alternative windowing methodologies.
  • Enhanced batch size selection procedures, particularly via AR(mm)-based pilot estimation, for high-dimensional or strongly dependent MCMC applications (Liu et al., 2018).

OBM remains a foundational tool for the quantitative analysis of simulation and MCMC output, offering a robust methodology for variance estimation and inferential procedures in dependent data settings. Its performance characteristics—low estimator variance, strong consistency under mild conditions, and adaptability to high-dimensional and highly correlated contexts—ensure its continued relevance and active development in statistical simulation and computational statistics.