CritiCore Module: Quantifying DNN Module Criticality

Updated 3 January 2026

CritiCore Module is a complexity measure for DNNs that quantifies the essentiality of individual network modules by analyzing the loss trajectory from initialization to convergence.
It computes module criticality by examining permissible parameter deviations and local flatness, utilizing grid search and noise perturbation to assess training accuracy impact.
The approach establishes formal PAC–Bayesian generalization bounds, distinguishing critical modules and outclassing traditional norm-based or spectral metrics.

The CritiCore Module, or module criticality, is a complexity measure for deep neural networks (DNNs) that quantifies the extent to which specific network modules are essential to network performance. It operates by probing the geometry of the loss landscape along the parameter trajectory between initialization and convergence for each module, incorporating both the permissible deviation from initialization (distance) and the local flatness (robustness to noise). CritiCore module criticality serves as both an explanatory and predictive tool, demarcating “critical” modules—whose rewinding to initialization substantially harms training accuracy—from “non-critical” modules, with a formal connection to generalization bounds via PAC–Bayes theory (Chatterji et al., 2019).

1. Mathematical Definition

Let a DNN be structured as a directed acyclic graph with $d$ modules, each indexed by $i$ and parameterized by $\theta_i \in \mathbb{R}^{k_i}$ . Given random initialization $\theta_i^0$ and trained value $\theta_i^f$ for module $i$ , define for any $\alpha_i \in [0,1]$ the interpolated weights:

$\theta_i(\alpha_i) := (1 - \alpha_i)\theta_i^0 + \alpha_i\theta_i^f$

and introduce a perturbation $u_i \sim \mathcal{N}(0, \sigma_i^2 I_{k_i})$ . Construct $f_{\Theta^{\alpha} + U}$ as the network that uses the perturbed parameters for the $i$ th module and keeps all others at their trained values. For an error tolerance $\epsilon > 0$ , the module criticality is:

$\mu_i(f_\Theta) = \min_{\substack{0 \leq \alpha_i \leq 1 \ \sigma_i \geq 0}} \left\{ \frac{\alpha_i^2 \|\theta_i^f - \theta_i^0\|_F^2}{\sigma_i^2} : \mathbb{E}_{u_i}\left[L_S(f_{\Theta^\alpha + U})\right] \leq L_S(f_{\Theta^f}) + \epsilon \right\}$

where $L_S$ is the empirical (0–1) loss on the training set. The network-wide criticality is the sum $\mu(f_\Theta) = \sum_{i=1}^d \mu_i(f_\Theta)$ .

The criticality ( $\mu_i$ ) is minimized when the valley in loss between $\theta_i^0$ and $\theta_i^f$ is both long (large $\alpha_i$ achievable without losing accuracy) and flat (tolerant to noise, large $\sigma_i$ allowed).

2. Practical Computation Procedure

Given a trained network and error tolerance $\epsilon$ , CritiCore computation proceeds as follows for each module:

Select a discrete grid of $\alpha_i \in [0,1]$ (e.g., $\{0, 0.1, ..., 1.0\}$ ).
For each $\alpha_i$ , construct $\theta_i(\alpha_i)$ .
For this $\alpha_i$ $α_{i}$ , maximize $\sigma_i$ $σ_{i}$ such that when adding $u_i \sim \mathcal{N}(0, \sigma_i^2I)$ $u_{i} \sim N (0, σ_{i}^{2} I)$ to $\theta_i(\alpha_i)$ $θ_{i} (α_{i})$ :
- The empirical loss on the training set remains at most $L_S(f_{\Theta^f}) + \epsilon$ , estimated over several samples ( $T$ typically 5–10).
- This is commonly solved by bisection search on $\sigma_i$ .
For each candidate $(\alpha_i, \sigma_i)$ , compute $R(\alpha_i,\sigma_i) = \alpha_i^2 \|\theta_i^f - \theta_i^0\|_F^2 / \sigma_i^2$ . Set $\mu_i$ to the minimum achieved $R$ over all candidates.
Sum across modules to obtain $\mu(f_\Theta)$ .

Common discretization choices are coarse $\alpha$ -grids and logarithmic $\sigma$ search. A small threshold $\epsilon$ (e.g., $0.01$) is recommended to ensure only slight degradation in training performance.

3. Formal Connection to Generalization

CritiCore module criticality admits a PAC–Bayesian generalization bound. For chosen $\alpha_i$ and variance $\sigma_i^2$ , consider the posterior with each module's weights centered at $\theta_i(\alpha_i)$ and perturbed by Gaussian noise. The following bound holds (Theorem 3.1 (Chatterji et al., 2019)):

$\mathbb{E}_U[L_D(f_{\Theta^{\alpha}+U})] \leq \mathbb{E}_U[L_S(f_{\Theta^{\alpha}+U})] + \sqrt{ \frac{ \frac{1}{4} \sum_i k_i \log\left[1 + \frac{\alpha_i^2\|\theta_i^f - \theta_i^0\|_F^2}{k_i \sigma_i^2}\right] + \log(m/\delta) + O(1) }{m-1} }$

Optimizing over ( $\alpha_i$ , $\sigma_i$ ), subject to the empirical loss constraint, yields (Corollary 3.2) a bound on the true risk at the trained weights:

$L_D(f_{\Theta^f}) \leq \epsilon + \sqrt{ \frac{ \frac{1}{4}\mu(f_\Theta) + \log(m/\delta) + O(1) }{m-1} }$

Thus, smaller network criticality $\mu(f_\Theta)$ leads to a tighter upper bound on test error, aligning criticality directly with generalization ability.

4. Key Empirical Findings

Empirical evaluation on ResNet-18 trained on CIFAR-10 highlights the following:

Rewinding experiments at module level: Certain modules (“Stage2.block1.conv2”) exhibit large increases in training error when reset to initialization—deemed “critical”—while most modules can be reset with negligible effect, hence “non-critical.”
Loss landscape geometry: Loss plotted along both the interpolation (trained to initialized) and noise directions displays that non-critical modules possess a wide, flat valley to initialization, while critical modules' valleys narrow near initialization, indicating greater sensitivity.
Failure of prior measures: Singular-value spectra, norms such as $\|\theta^f - \theta^0\|$ , and CKA-based activation similarity do not distinguish critical from non-critical modules. Importantly, measures based purely on endpoint distances or single-point flatness miss these nuances.
Correlation with generalization: The architecture ranking by CritiCore ( $\mu(f_\Theta)$ ) achieves higher Kendall $\tau$ correlation with the observed generalization gap compared to norm-based or conventional PAC–Bayesian measures.

5. Comparison to Prior Complexity Measures

A summary contrasting CritiCore module criticality with previously proposed network complexity metrics:

Measure	Limitation Relative to Criticality
Product of Frobenius norms	Only measures magnitude, not valley shape
Product of spectral norms	Ignores sensitivity/flatness
Distance to initialization	Endpoint-only, misses flatness
Sum of spectral-norm-distances	Linearized, fails to rank critical modules
PAC–Bayes at zero noise	Flatness at one point, not distance toward initialization

CritiCore uniquely combines the extent of displacement from initialization (how far) with local flatness (how robust); it aggregates both the normed distance and the tolerable noise, yielding discrimination where earlier approaches do not.

6. Algorithmic Implementation

A high-level pseudocode (verbatim from (Chatterji et al., 2019)) for empirical module criticality computation is:

μ = 0
for i in 1..d:
    best_R = +∞
    Δ = norm_F(θᵢᶠ – θᵢ⁰)
    for α in α_grid:  # e.g. α_grid = [0,0.1,…,1]
        θ̄ᵢ = (1–α)*θᵢ⁰ + α*θᵢᶠ
        # Find max σ where noise leaves train-error <= orig + ε
        σ_max = binary_search_on_σ(low=0, high=σ_upper):
            for trial in 1..T:  # T ~ 5–10
                u ~ N(0, σ^2 I)
                Θ_test = Θᶠ; Θ_test[i] = θ̄ᵢ + u
                accs[trial] = train_error(f_{Θ_test}, S)
            if mean(accs) <= orig_error + ε: OK σ else too_large
        R = (α^2 * Δ^2) / (σ_max^2 + tiny)
        best_R = min(best_R, R)
    μᵢ = best_R
    μ += μᵢ
return {μᵢ}, μ

Typical choices include

\epsilon \sim 0.01

, coarse

\alpha

-grids, logarithmic

\sigma

search, and

T=5

–$10$ samples for estimation. Refinement can target promising

\alpha

regions.

7. Significance and Interpretive Summary

CritiCore module criticality provides a rigorous, operationally meaningful measure that probes the loss-valley structure between initialization and final weights, considering both permissible parameter drift without significant accuracy loss and the flatness of the loss around the traversed path. As it connects directly to PAC–Bayesian generalization bounds and delineates which modules fundamentally influence generalization, CritiCore unifies explanatory and predictive roles, outperforming linear or norm-based measures that ignore valley geometry or inter-module heterogeneity (Chatterji et al., 2019). This framework thus addresses both “why these modules matter” and “which architectures generalize better,” with empirical validation and efficient computational algorithms.

Markdown Report Issue Upgrade to Chat

References (1)

The intriguing role of module criticality in the generalization of deep networks (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CritiCore Module.

CritiCore Module: Quantifying DNN Module Criticality

1. Mathematical Definition

2. Practical Computation Procedure

3. Formal Connection to Generalization

4. Key Empirical Findings

5. Comparison to Prior Complexity Measures

6. Algorithmic Implementation

7. Significance and Interpretive Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CritiCore Module: Quantifying DNN Module Criticality

1. Mathematical Definition

2. Practical Computation Procedure

3. Formal Connection to Generalization

4. Key Empirical Findings

5. Comparison to Prior Complexity Measures

6. Algorithmic Implementation

7. Significance and Interpretive Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research