Distance-based ICC (dbICC) Overview
- Distance-based ICC (dbICC) is a reliability metric defined using arbitrary pairwise distances to quantify variance between and within subjects in non-scalar data.
- It employs between-subject and within-subject mean squared distances with bootstrap bias correction to improve empirical coverage in high-dimensional settings.
- The framework generalizes classical ICC via an extended Spearman–Brown formula, making it applicable to vectors, curves, graphs, and fMRI connectivity matrices.
The distance-based intraclass correlation coefficient (dbICC) is a generalization of the classical intraclass correlation coefficient (ICC), extending measurement reliability to settings where observations are non-scalar or the measurement space admits no natural ICC. Defined in terms of arbitrary distances between observations, dbICC enables reliability assessment for data types such as vectors, curves, graphs, and covariance matrices. The framework provides a well-defined proportion of variance attributable to between-subject differences versus within-subject variability, operationalized entirely via pairwise distances. Bias correction procedures and theoretical extensions, such as a generalized Spearman–Brown formula, provide robust inference and study-planning tools for complex, high-dimensional, or structured measurement data (Xu et al., 2019).
1. Formal Definition and Sample Estimation
Let denote the number of subjects, the number of repeated measurements for subject , and the th observation from subject . For an arbitrary distance function on the observation space, two principal quantities underpin dbICC:
- Between-subject mean squared distance:
- Within-subject mean squared distance:
The distance-based intraclass correlation coefficient is then formulated as:
This ratio quantifies the proportion of total distance-based "variance" due to subject-level signal, analogous to variance components in classical settings. For empirical data, the expectations are replaced by averages over all appropriate index pairs:
0
1
2
2. Bias Correction for Bootstrap Confidence Intervals
Estimation of dbICC confidence intervals is complicated by the unknown sampling distribution of 3. Standard practice is to use a subject-level nonparametric bootstrap:
- Resample subjects with replacement.
- For each bootstrap sample, compute 4 as in the sample formulas.
However, naive resampling can produce pseudo "between-subject" pairs where both indices correspond to the same original subject, leading to downward-biased estimates of 5 and, consequently, overstated reliability. The bias correction consists of excluding any 6 pair from the 7 sum if 8 in the 9th bootstrap sample. 0 within a subject needs no such correction. The resulting quantiles of the corrected 1 form valid confidence intervals, substantially improving empirical coverage, especially in settings with small or moderate 2 (Xu et al., 2019).
3. Generalized Spearman–Brown Formula
The dbICC admits an extended version of the Spearman–Brown formula for predicting the reliability of averaged measurements. Under the general "true-score+error" Hilbert space model 3, with appropriate orthogonality and variance-type distance definitions, the multi-replication dbICC for 4 independent repeats is
5
where 6 and 7 is the within-subject mean squared error across 8 repetitions. In the classical scalar case, 9 recovers the usual Spearman–Brown formula:
0
For vector or covariance-matrix data, the growth rate of reliability with 1 is governed by the specific form of 2; e.g., for 3 IID multivariate measurements, 4, yielding 5 (Xu et al., 2019).
4. Simulation Studies
Xu, Reiss, and Cribben conducted extensive simulation experiments to evaluate point and interval estimation behavior for dbICC. In their scenarios, 6, 7, with varying signal-to-noise ratios 8, sample sizes 9, and 0 repeats. True dbICC values 1 ranged from 0.2 to 0.8.
Findings include:
- The point estimator 2 exhibits negative bias at small 3 (e.g., median bias for 4), diminishing with larger 5.
- Naive bootstrap confidence intervals tend to under-cover the nominal 95% level, especially for small 6 (e.g., 86.0% coverage at 7, 8), whereas the bias-corrected bootstrap achieves substantially better accuracy (e.g., 90.8% coverage at 9, 0).
- Coverage rates approach nominal as 1 increases or 2 increases (Xu et al., 2019).
5. Application: Test–Retest Reliability for fMRI-Derived Connectivity Matrices
To demonstrate dbICC in a high-dimensional, non-scalar context, Xu et al. analyzed test–retest data on resting-state brain functional connectivity derived from fMRI scans. The dataset comprised 25 healthy adults, each scanned twice, generating 333 × 333 ROI correlation matrices (NYU TRT dataset). Preprocessing included motion correction, spatial normalization, tissue segmentation, nuisance regression, spatial smoothing (FWHM 6mm), and band-pass filtering.
Three distance metrics on correlation matrices were evaluated:
- 3 (Frobenius) norm on flattened lower-triangular entries,
- 4 entrywise norm,
- 5, with 6 the Pearson correlation between lower triangles ("correlation of correlations").
dbICC values (point with 95% bias-corrected bootstrap CI):
| Region | 7 | 8 | 9 |
|---|---|---|---|
| All 333 ROIs | 0.378 (0.329, 0.424) | 0.382 (0.335, 0.426) | 0.382 (0.338, 0.426) |
| Default Mode (41) | 0.488 (0.403, 0.562) | 0.493 (0.404, 0.570) | 0.487 (0.414, 0.555) |
| Visual Network (39) | 0.434 (0.362, 0.508) | 0.435 (0.354, 0.515) | 0.451 (0.401, 0.500) |
dbICC for the full set was lower than for subnetworks (Default Mode, Visual), and soft-thresholding did not further improve reliability (Xu et al., 2019).
6. Practical Recommendations and Methodological Guidance
- Selection of 0 should be tailored to the data: Euclidean, Frobenius, dynamic-time-warping, or correlation-based distances are appropriate for vectors, matrices, or time series, respectively.
- Compute the full 1 distance matrix, average over off-diagonal within-subject and between-subject blocks to obtain 2 and 3.
- Employ subject-level nonparametric bootstrap, applying the bias correction for 4 by excluding pairs representing the same individual in the resampled set.
- Plan studies using the generalized Spearman–Brown relation: plot 5 versus 6 (measurement intensity) to estimate the requisite 7 for a target reliability.
- Perform sensitivity analyses to alternative distance choices, thresholding strategies, and sample size.
- An implementation in R is available at https://github.com/wtagr/dbicc (Xu et al., 2019).
7. Context and Theoretical Significance
dbICC advances measurement reliability analysis to accommodate non-scalar, structured, and high-dimensional data realms, overcoming limitations of classical ICC. The approach is applicable wherever pairwise measurement dissimilarities are meaningful, including connectomics, time series, and functional data. The theoretical coherence with classical reliability models, extensibility to Hilbert spaces, and empirical validity underpin dbICC as a robust and unifying tool for generalized reliability quantification (Xu et al., 2019).