Papers
Topics
Authors
Recent
2000 character limit reached

Actionable Interpretability via Causal Hypergraphs: Unravelling Batch Size Effects in Deep Learning

Published 21 Jun 2025 in cs.LG and cs.AI | (2506.17826v1)

Abstract: While the impact of batch size on generalisation is well studied in vision tasks, its causal mechanisms remain underexplored in graph and text domains. We introduce a hypergraph-based causal framework, HGCNet, that leverages deep structural causal models (DSCMs) to uncover how batch size influences generalisation via gradient noise, minima sharpness, and model complexity. Unlike prior approaches based on static pairwise dependencies, HGCNet employs hypergraphs to capture higher-order interactions across training dynamics. Using do-calculus, we quantify direct and mediated effects of batch size interventions, providing interpretable, causally grounded insights into optimisation. Experiments on citation networks, biomedical text, and e-commerce reviews show that HGCNet outperforms strong baselines including GCN, GAT, PI-GNN, BERT, and RoBERTa. Our analysis reveals that smaller batch sizes causally enhance generalisation through increased stochasticity and flatter minima, offering actionable interpretability to guide training strategies in deep learning. This work positions interpretability as a driver of principled architectural and optimisation choices beyond post hoc analysis.

Summary

  • The paper presents HGCNet, a causal framework using hypergraphs and deep structural causal models to analyze how batch size impacts deep learning generalization.
  • Empirical results show smaller batch sizes consistently improve generalization and test accuracy across graph and text datasets, mediated by increased gradient noise and flatter minima.
  • This framework provides actionable interpretability, enabling principled batch size selection and informing optimization and architecture design for better generalization.

Actionable Interpretability via Causal Hypergraphs: Unravelling Batch Size Effects in Deep Learning

This paper presents a principled causal framework, HGCNet, for analyzing and interpreting the effects of batch size on generalization in deep learning, with a focus on graph and text domains. The approach leverages deep structural causal models (DSCMs) and hypergraph representations to capture higher-order dependencies among key training variables: batch size, gradient noise, minima sharpness, model complexity, and generalization. The authors employ do-calculus to quantify both direct and mediated effects of batch size interventions, providing actionable interpretability for optimization and architecture design.

Causal Hypergraph Framework

The core methodological contribution is the construction of a causal hypergraph H=(V,E)\mathcal{H} = (\mathcal{V}, \mathcal{E}), where the variable set V\mathcal{V} includes batch size (BB), gradient noise (NN), minima sharpness (SS), model complexity (CC), and generalization (GG). Unlike standard causal graphs, the hypergraph structure models higher-order interactions, such as the joint influence of NN and SS on CC. The causal pathways are formalized as BNSCGB \rightarrow N \rightarrow S \rightarrow C \rightarrow G, with additional hyperedges capturing multi-variable dependencies.

The mathematical formulation links batch size to gradient noise (N1/BN \propto 1/B), sharpness (S1/BS \propto 1/B), and generalization (G1/SG \propto 1/S), supporting the hypothesis that smaller batch sizes increase stochasticity, promote flatter minima, and improve generalization. The framework enables the estimation of the average treatment effect (ATE) of batch size interventions using do-calculus, ensuring identifiability and disentanglement of confounders.

Empirical Evaluation

The authors conduct extensive experiments on both graph-based (Cora, CiteSeer) and text-based (Amazon, PubMed) datasets, comparing HGCNet to strong baselines including GCN, GAT, PI-GNN, BERT, and RoBERTa. The results consistently demonstrate that smaller batch sizes (e.g., B=16B=16) yield higher test accuracy and lower generalization gaps across all domains and models. For instance, on the Cora dataset, HGCNet achieves 83.9% accuracy with B=16B=16 versus 81.5% with B=512B=512. On the Amazon dataset, the improvement is from 89.2% (B=512B=512) to 92.4% (B=16B=16).

Ablation studies further validate the causal model. Removing gradient noise or flattening minima independently leads to significant drops in generalization, confirming the mediating roles of NN and SS. The hypergraph structure is shown to be essential: replacing it with a pairwise causal graph degrades accuracy by 1.6–2.3% across datasets. The causal claims are supported by statistical significance tests (all p<0.01p < 0.01) and Hessian spectrum analysis, which empirically confirms that larger batch sizes yield sharper minima.

Practical Implications

The findings have direct implications for the design and training of deep learning models in domains where data dependencies are complex and traditional image-based heuristics do not transfer. The causal analysis provides a principled basis for batch size selection, moving beyond empirical tuning. The framework also supports adaptive batch sizing strategies, which can balance computational efficiency and generalization performance. For example, progressive batch scaling achieves nearly optimal accuracy with reduced training time.

The integration of actionable interpretability into the optimization process enables practitioners to make informed decisions about hyperparameter settings, regularization, and architectural choices. The approach is computationally feasible, with the main trade-off being increased training time for smaller batch sizes. However, the generalization gains (2–4% across tasks) are substantial and robust to learning rate schedules and regularization strategies.

Theoretical and Future Directions

The use of hypergraph-based DSCMs represents a significant step toward modeling the complex, multi-variable interactions inherent in modern deep learning systems. The causal framework bridges the gap between structural theory and empirical practice, offering a template for analyzing other hyperparameters and architectural components. The results challenge the prevailing assumption that large batch sizes are always preferable for efficiency, demonstrating that smaller batches can be causally linked to improved generalization through well-defined mediators.

Future research could extend this framework to other domains (e.g., reinforcement learning, multi-modal tasks), explore more sophisticated adaptive batch strategies, and integrate causal interpretability into automated machine learning pipelines. The approach also opens avenues for developing new regularization techniques and optimization algorithms that explicitly target the causal pathways identified by the hypergraph model.

Conclusion

This work establishes a causally grounded, interpretable, and empirically validated framework for understanding and optimizing batch size in deep learning. By modeling higher-order dependencies via hypergraphs and quantifying effects with do-calculus, the authors provide both theoretical insight and practical guidance for improving generalization in graph and text domains. The results underscore the importance of actionable interpretability as a driver of principled model design and optimization, with broad implications for the development of robust, transparent, and efficient AI systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.