Cluster-Robust Standard Errors

Updated 13 November 2025

Cluster-robust standard errors are variance estimators that adjust for within-cluster correlation and heteroskedasticity, ensuring reliable inference in grouped data.
They are applied in fields like economics and social sciences for data naturally grouped by schools, firms, or regions, addressing design-based sampling issues.
Advancements include multiway clustering, small-sample corrections, and weighted estimators to mitigate biases from high leverage points and heavy-tailed cluster sizes.

Cluster-robust standard errors (CRSEs) are variance estimators for linear and nonlinear estimators that adjust for arbitrary correlation and heteroskedasticity within pre-specified clusters of observations, under the assumption of independence across clusters. This technique is central in empirical social sciences, especially in applications where data naturally fall into groups such as schools, firms, households, or geographical units. Theoretical and practical developments in CRSEs address single-way, multiway, and unknown clustering, and engage with issues of small sample bias, heavy-tailed cluster sizes, the number of regressors, and the design-based motivations underpinning clustering.

1. Mathematical Formulation and Theoretical Properties

The canonical CRSE for OLS regression with $G$ clusters is given by the “sandwich” formula: $\widehat{\mathrm{Var}}_{\mathrm{CR}}(\hat\beta) = (X^\top X)^{-1}\left[\sum_{g=1}^{G} X_g^\top \hat u_g \hat u_g^\top X_g\right](X^\top X)^{-1}$ where $X_g$ and $\hat u_g$ denote design matrices and residuals for cluster $g$ (Fai, 2022). The estimator is consistent for the sampling variance provided: (i) clusters are independent, and (ii) no individual cluster’s weight dominates the sample, i.e., $\sup_g N_g^2/N \to 0$ as $G \to \infty$ ( $N$ =total sample size) (Sasaki et al., 2022). Under these conditions, central limit theorems guarantee valid Wald-type inference for finite-dimensional parameters.

Extensions to multiway clustering—for example, with two or more non-nested cluster indices—build on inclusion–exclusion logic or empirical process theory, producing estimators such as the Cameron–Gelbach–Miller (CGM) two-way CRSE: $\widehat{V}_{TWCR}(\hat\beta) = \hat M^{-1} \left(\sum_{i=1}^{N} S_i S_i' + \sum_{t=1}^{T} S_t S_t' - \sum_{i=1}^{N}\sum_{t=1}^{T} (X_{it}\hat u_{it})(X_{it}\hat u_{it})' \right) \hat M^{-1}$ where $S_i$ and $S_t$ are within-cluster score sums (Chiang et al., 2023, Davezies et al., 2018). Consistency and asymptotic normality results are available under separate exchangeability and “dissociation” conditions (Davezies et al., 2018).

2. Motivation: Sampling, Assignment, and Design-Based Inference

The justification for clustering—and the correct level at which to do so—arises from the design of the sampling and/or treatment assignment mechanism. Cluster adjustment is crucial if:

A random sample of clusters is drawn from a super-population (two-stage sampling), generating between-cluster heterogeneity (Abadie et al., 2017).
Treatment is assigned at the level of clusters, inducing within-cluster dependence in counterfactual outcomes or treatment effects (Su et al., 2021).
Structural dependence, such as spillovers, persists within but not across finer units (e.g., households within villages) (Fukumoto, 11 Nov 2025).

The “all-or-nothing” property of conventional CRSEs is a consequence of modeling all within-cluster covariance as arbitrary and all across-cluster covariance as zero (Abadie et al., 2017). Conservative practice clusters at the highest plausible level (e.g., state) to avoid anti-conservative inference, but this may sacrifice precision if true dependence does not cross lower-level boundaries.

3. Extensions and Methodological Innovations

a. Multiway and Unknown Clustering

Multiway clustering is handled via inclusion–exclusion variance estimators and empirical process theory. Consistent estimation can be achieved by summing single-way cluster variances across all dimensions; positive-definite variants and pigeonhole bootstrapping yield valid inference even with few clusters per dimension (Davezies et al., 2018, Chiang et al., 2019).

Unknown clustering structures are tackled by estimating the “long-run” covariance matrix of transformed scores (e.g., via HAC methods), thresholding to define clusters, and then applying standard CRSE formulas to the estimated groupings (Bai et al., 2019, Cai, 2021).

b. Small-Sample Corrections and High-Dimensional Settings

Traditional CRSEs are downward-biased in finite samples, especially with few clusters or high-leverage covariates. Bias-reduced linearization (BRL or “CR2”) (Pustejovsky et al., 2016, Welz et al., 2022), leave-cluster-out crossfit (LCOC) (Fai, 2022), and diagonal-only corrections have all been proposed. For small $G$ , adjustments to critical values (e.g., Satterthwaite degrees of freedom or Hotelling’s $T^2$ -based F-tests) are necessary to control Type I error (Pustejovsky et al., 2016).

When the number of regressors becomes a non-negligible fraction of the sample ( $p/n \not\to 0$ ), the classic CRSE can remain biased even asymptotically. In such cases, substitute estimators combining crossfitting and leave-one-cluster-out logic restore unbiasedness and consistency (D'Adamo, 2018, Fai, 2022).

c. Robustness to Heavy-Tailed and Highly Unbalanced Clusters

Heavy-tailed cluster size distributions (e.g., US states with California dominating) violate CRSE validity: infinite variance of cluster scores yields dramatic overrejection in tests (Sasaki et al., 2022). Weighted CRSEs (WCR), which estimate at the cluster mean level and downweight large clusters by $1/N_g$ , restore finite-variance conditions and lead to valid inference under power-law size imbalances (Sasaki et al., 2022).

d. Data-Driven and Finite-Sample Methods for Choosing Clustering Level

The “reclustering” method (Fukumoto, 11 Nov 2025) offers a permutation-based, finite-sample valid test for whether a given fine clustering is sufficient. By randomly relabeling fine clusters into pseudo-gross clusters and evaluating whether observed gross-level CRSE is an outlier in the permutation distribution, one rejects the null of independence across fine clusters and defaults to gross-level clustering. This method outperforms earlier asymptotic tests (such as the MacKinnon–Nielsen–Webb SV test or Ibragimov–Müller VMB test) in small $G$ and small samples.

4. Empirical Practice and Implementation

a. Model Types and Standard Software

CRSEs are implemented in OLS, IV/2SLS, GMM, and meta-regression. For overidentified IV, the cluster-robust Conditional Wald test extends conditional inference to clustered settings, retaining size even under weak identification (Lee et al., 2023). Standard software (Stata, R, Python) provides cluster-robust covariance estimators; for multiway clustering, packages such as twowayjack (Stata), pigeonhole bootstrapping routines (Davezies et al., 2018), and tailored code for threshold-based cluster discovery are available (MacKinnon et al., 13 Jun 2024, Bai et al., 2019).

b. Clustered Experiments and Paired/Stratified Designs

In cluster-randomized experiments and matched-pair or small-strata settings, naïve CRSEs at overly fine levels (e.g., unit-of-randomization) can be anti-conservative, especially when treatment assignment is negatively dependent (one treated per pair). In those designs, clustering must be at the level of assignment (pair or stratum), not the analysis unit (Chaisemartin et al., 2019, Su et al., 2021). Small-sample corrections or randomization-based inference is strongly recommended when the number of clusters is small.

c. Model-Assisted and Design-Based Justification

Design-based (finite-population) arguments provide an alternative justification for CRSEs, notably in randomized experiments: under cluster-level randomization, cluster “sandwich” variances are conservative for the (possibly heterogeneous) average treatment effect; for efficiency, regressions on cluster totals or appropriately weighted cluster averages achieve the semiparametric lower bound (Su et al., 2021).

5. Limitations, Pathologies, and Open Directions

a. Heavy-Tailed Clustering and Non-Robustness

CRSEs are vulnerable to catastrophic overrejection if cluster sizes are power-law distributed with exponent $<2$ . Standard large- $G$ asymptotics, which presuppose that no cluster dominates, become invalid, and even jackknife adjustments cannot mitigate the pathology. Weighted approaches are necessary in this regime (Sasaki et al., 2022).

b. Pathological (Non-Gaussian) Limiting Laws in Multiway Clustering

Gaussian CLTs for two-way clustered means do not always hold: when the underlying array has only a few strong “interactive factors” and no additive cluster effects, the limiting distribution can be degenerate or non-Gaussian. Inference then requires an empirical (e.g., permutation or exchangeable bootstrap) rather than analytic approach (Chiang et al., 2023).

c. Cluster Discovery and Data-Driven Grouping

When clusters are unknown, thresholding long-run covariances or label propagation algorithms can recover group structure, especially in panels with large $T$ . This allows plug-in CRSEs or studentized permutation tests to be valid without a priori cluster assignments (Cai, 2021, Bai et al., 2019).

d. Small Number of Clusters, High Leverage, and Dimensionality

With small $G$ or high-leverage regressors, standard (CR0/CR1) cluster-robust tests may over-reject. BRL/CR2, jackknife, or adjusted F- and t-distribution critical values are necessary for Type I error control (Pustejovsky et al., 2016, Welz et al., 2022, MacKinnon et al., 13 Jun 2024).

6. Practical Recommendations

Always report the clustering level ( $G$ ), average and maximum cluster sizes, and all relevant sample characteristics. Use diagnostic plots (e.g., Hill estimators, cluster-mean/variance plots) to assess heavy-tailedness (Sasaki et al., 2022).
For small $G$ ( $\lesssim 30$ ), supplement cluster-robust SEs with bias-reduced or jackknife approaches, use Satterthwaite or Hotelling $T^2$ df, and consider randomization inference.
In multiway clustering, positive-definite estimators and the pigeonhole bootstrap are recommended, especially with few clusters per dimension (Davezies et al., 2018, Chiang et al., 2019).
When clustering level is ambiguous, apply a finite-sample valid test (e.g., reclustering (Fukumoto, 11 Nov 2025)).
In the presence of power-law or highly unbalanced cluster sizes, use weighted CRSEs; otherwise, coverage may be severely compromised (Sasaki et al., 2022).
For high-dimensional controls, use crossfit or leave-cluster-out corrections to avoid bias (D'Adamo, 2018, Fai, 2022).
In paired or small-strata experiments with fixed effects, always cluster at the assignment (pair or stratum) level, not the analysis unit (Chaisemartin et al., 2019).
For unknown clusters, use covariance thresholding to estimate clustering structure before computing CRSEs or performing randomization inference (Bai et al., 2019, Cai, 2021).

CRSE methodology continues to evolve, with ongoing research on higher-way clustering, data-driven clustering, robust inference under weak identification, and adaptations to modern machine learning estimators. The practitioner’s default must always be to check the classical assumptions, to adjust for design-based features, and to apply state-of-the-art corrections when these fail.