Papers
Topics
Authors
Recent
Search
2000 character limit reached

Same-All Cross-validation (SAC)

Updated 3 July 2026
  • Same-All Cross-validation (SAC) is a method that compares models trained on individual subsets (SAME) versus pooled data (ALL) to assess cross-subset similarity in non-i.i.d. settings.
  • SAC employs nested K-fold cross-validation and paired t-tests to statistically evaluate whether data pooling improves model performance.
  • SAC provides actionable insights into when data heterogeneity, such as temporal or geographic differences, enhances predictive accuracy or leads to negative transfer.

Same-All Cross-validation (SAC) is a principled approach for quantifying the similarity of learnable or predictable patterns across distinct data subsets in supervised learning. SAC evaluates whether model performance on a given test subset improves or deteriorates when training is performed on pooled data from all subsets (the ALL split) compared to training only on the target subset (the SAME split). This methodology serves as a key component of SOAK (Same/Other/All K-fold cross-validation), providing statistically rigorous insight into when subset pooling is advantageous versus harmful, particularly in contexts of non-i.i.d. train/test distributions due to temporal, geographic, or otherwise labeled heterogeneity (Hocking et al., 2024).

1. Definition and Position within SOAK

SAC operates by comparing two K-fold cross-validation models for each subset σ\sigma: one trained exclusively on data from σ\sigma (SAME) and another trained on the union of all subsets (ALL). By focusing on the SAME versus ALL comparison, SAC isolates the effect of pooling on predictive error for each subset, omitting the “OTHER” model considered in the full SOAK framework.

In settings where the traditional i.i.d. assumption between train and test samples does not hold—such as temporal drift, spatial clustering, or other categorical splitting—SAC directly addresses the principal question: does pooling data across subsets yield improved predictive performance on new, potentially dissimilar, target subsets? If performance is enhanced by pooling, shared learnable structure is inferred; if degraded, this suggests qualitative differences between subsets leading to negative transfer (Hocking et al., 2024).

2. Algorithmic Procedure

SAC utilizes a nested K-fold cross-validation over all subset–fold pairs. For each subset σ{1,,S}\sigma \in \{1, \ldots, S\} and fold κ{1,,K}\kappa \in \{1, \ldots, K\}, the following definitions and model fits apply:

  • TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}
  • SAMEσ,κ={i:si=σ and kiκ}SAME_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i \ne \kappa\}
  • ALLσ,κ={i:kiκ}ALL_{\sigma,\kappa} = \{i : k_i \ne \kappa\}

On each fold:

  • Train MsameM_\mathrm{same} on SAMEσ,κSAME_{\sigma,\kappa}.
  • Train MallM_\mathrm{all} on σ\sigma0.
  • Evaluate σ\sigma1 and σ\sigma2 on σ\sigma3, obtaining errors σ\sigma4 and σ\sigma5 respectively.

After iterating over all folds, for each σ\sigma6:

  • Compute mean error across folds: σ\sigma7, σ\sigma8 analogously.
  • Compute per-fold difference: σ\sigma9.
  • Perform a paired σ{1,,S}\sigma \in \{1, \ldots, S\}0-test on σ{1,,S}\sigma \in \{1, \ldots, S\}1.

This workflow yields for each subset a direct, statistically tested measure of the gain or loss accrued by pooling during training.

3. Mathematical Formulation and Similarity Score

The central quantitative output of SAC is the Same-All similarity score for each subset σ{1,,S}\sigma \in \{1, \ldots, S\}2:

  • Fold-wise difference: σ{1,,S}\sigma \in \{1, \ldots, S\}3
  • Mean Same-All similarity: σ{1,,S}\sigma \in \{1, \ldots, S\}4

The sign of σ{1,,S}\sigma \in \{1, \ldots, S\}5 encodes the relevance of pooling:

  • If σ{1,,S}\sigma \in \{1, \ldots, S\}6 (σ{1,,S}\sigma \in \{1, \ldots, S\}7), pooling reduces error, indicating high cross-subset similarity.
  • If σ{1,,S}\sigma \in \{1, \ldots, S\}8, pooling raises error, implying predictive dissimilarity and possible negative transfer.

This similarity score provides a standardized metric by which to judge the cohesiveness of patterns underlying labeled data groupings.

4. Statistical Inference and Confidence Estimation

Statistical significance of observed differences is assessed via a paired σ{1,,S}\sigma \in \{1, \ldots, S\}9-test over the κ{1,,K}\kappa \in \{1, \ldots, K\}0 values of κ{1,,K}\kappa \in \{1, \ldots, K\}1 for each subset κ{1,,K}\kappa \in \{1, \ldots, K\}2. Assuming approximate normality of these fold-level contrasts, the test statistic is

κ{1,,K}\kappa \in \{1, \ldots, K\}3

with κ{1,,K}\kappa \in \{1, \ldots, K\}4 degrees of freedom. A two-sided κ{1,,K}\kappa \in \{1, \ldots, K\}5-value is reported to assess the null hypothesis κ{1,,K}\kappa \in \{1, \ldots, K\}6. The κ{1,,K}\kappa \in \{1, \ldots, K\}7 confidence interval for the mean difference is:

κ{1,,K}\kappa \in \{1, \ldots, K\}8

Confidence intervals that lie entirely below zero indicate a significant benefit to pooling (κ{1,,K}\kappa \in \{1, \ldots, K\}9); intervals entirely above zero indicate harm from pooling.

5. Empirical Examples and Interpretations

Empirical studies using SAC have addressed datasets with meaningful partitioning by geography, time, or other categorical features. Key findings include:

Dataset Subset Type SAC Outcome
CanadaFiresA/D Satellite fires Positive TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}0 (pooling harms)
FishSonar_river Rivers Positive TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}1 (pooling harms)
aztrees3/aztrees4 Geographic quadrants Positive TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}2 (pooling harms)
NSCH_autism Survey years (2019/2020) Small negative TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}3 (pooling aids)

Interpretation of these findings:

  • Strongly positive TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}4 and TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}5 are indicative of low inter-subset similarity; pooling degrades predictivity, suggesting distinct underlying generative mechanisms per subset.
  • Negative and significant TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}6 values suggest learnable structure is sufficiently shared that pooling supports generalization (Hocking et al., 2024).
  • Cases with TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}7 near zero are interpreted as neutral with respect to pooling.

A summary across all subsets—using min, max, and mean TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}8 and their associated TESTσ,κ={i:si=σ and ki=κ}TEST_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i = \kappa\}9-values—yields a granular view of transferability.

6. Practical Considerations and Methodological Limitations

Several operational and theoretical issues may influence SAC outcomes:

  • Choice of SAMEσ,κ={i:si=σ and kiκ}SAME_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i \ne \kappa\}0: Larger SAMEσ,κ={i:si=σ and kiκ}SAME_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i \ne \kappa\}1 (e.g., 10) decreases bias in SAMEσ,κ={i:si=σ and kiκ}SAME_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i \ne \kappa\}2 but increases computational burden linearly. Sufficiently large SAMEσ,κ={i:si=σ and kiκ}SAME_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i \ne \kappa\}3 sets are required to ensure stable model fitting.
  • Computational burden: Requires SAMEσ,κ={i:si=σ and kiκ}SAME_{\sigma,\kappa} = \{i : s_i = \sigma \ \text{and} \ k_i \ne \kappa\}4 model fits. Training ALL models is especially computationally intensive, often motivating use of regularized linear learners or parallelization strategies.
  • Data heterogeneity: SAC assumes non-i.i.d. effects strictly from subset membership; within subset–fold cells, ordinary CV exchangeability must generally hold.
  • Interpretational caution: While SAC identifies whether pooling is beneficial or detrimental, it does not reveal causal mechanisms, such as concept drift, covariate shift, or label noise. Diagnostic discriminability across subsets may arise from any of these, or other, latent sources.

These considerations frame the appropriate deployment of SAC and enable informed interpretation of its outcomes in practice.

7. Synthesis and Role within Data Science Methodology

SAC provides a crucial statistical diagnostic for pattern-sharing across labeled data partitions commonly encountered in modern data science. By formalizing a robust, fold-wise comparison of training strategies, SAC enables practitioners to empirically adjudicate between pooling and non-pooling training regimes for each target subset. This approach has particular relevance in settings with known or suspected non-i.i.d. structure—such as evolving time series, multi-region studies, or tiered population surveys—serving as an evidence-based guide for the principled combination (or separation) of data subsets to optimize predictive performance and generalization (Hocking et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Same-All Cross-validation (SAC).