Criticality-Derived Weighting

Updated 16 March 2026

Criticality-derived weighting is a framework that assigns importance to data or network elements based on critical states like phase transitions and marginal stability.
It integrates methodologies from deep learning, data assimilation, and reinforcement learning to enhance training dynamics and overall system performance.
Empirical studies demonstrate that this approach improves robustness and prediction accuracy by emphasizing rare or highly influential samples and structures.

Criticality-derived weighting encompasses a set of methodologies where the relative importance of elements—examples, samples, network parameters, or infrastructure nodes—is assigned according to measures of “criticality” drawn from statistical physics, optimization theory, human behavior, or data-driven signals. These schemes are unified by the notion that systems (physical, computational, or sociotechnical) exhibit optimality, robustness, or maximal functional range near points of criticality (phase transitions, marginal stability, peak uncertainty, or maximal influence). Criticality-derived weighting frameworks appear in domains as varied as deep learning, data assimilation, resilience analysis, offline reinforcement learning, and neural network initialization. The following sections review key principles, formal methodologies, canonical models, algorithmic prescriptions, and cross-domain impacts, drawing on recent research across multiple fields.

1. Foundations of Criticality-Derived Weighting

Criticality, in this context, refers to the heightened dynamical, informational, or functional relevance of system components when certain quantitative criteria are met—often associated with phase transitions, marginal stability, or maximal influence over global system behavior. Criticality-derived weighting schemes prescribe that these components be upweighted in objective functions, sampling distributions, or network initialization, either for improved training dynamics, robustness, sampling fidelity, or societal resilience.

In deep learning, the central observation is that the gradient magnitude with respect to a sample's output logits (i.e., $|\partial \ell / \partial z|$ ) directly quantifies its “pull” on the model; large-magnitude gradients mark “critical” data points for parameter updates (Wang et al., 2019). In ensemble data assimilation, weights attached to critical points are determined by both a data-mismatch functional and a local Jacobian determinant, reflecting local posterior geometry (Ba et al., 2023). Human-centered resilience metrics combine behavioral dependence, structural substitutability, and access patterns to generate per-facility criticality weights that modulate regional vulnerability scores (Ma et al., 18 Dec 2025). Offline RL leverages signals from uncertainty quantification or long-tail events to amplify data from rare, high-risk, or information-dense samples, directly injecting criticality-derived sampling probabilities into objective functions (Guillen-Perez, 25 Aug 2025). Finally, in network theory and statistical field perspectives, criticality appears as a set of initialization and architectural prescriptions ensuring maximal depth-to-width dynamical range and stable signal propagation (Sundberg et al., 1 Aug 2025).

2. Mathematical Schemes and Formal Definitions

Mathematical formalizations of criticality-derived weights vary by domain but share common elements: a criticality signal $c_i$ is computed per element (sample, configuration, facility, etc.), then normalized to produce nonnegative weights $w_i$ for use in reweighting objectives, sampling, or predictions.

Deep Learning Loss Manipulation (DM)

In DM, the training objective for a model with parameter $\theta$ and logits $z_i$ for sample $i$ is altered by prescribing a desired emphasis density $g(p_i)$ over the softmax probability $p_i$ :

Compute model output $z_i$ , $p_i = \mathrm{softmax}(z_i)_{y_i}$ .
Define the target gradient magnitude $w_i^{DM} = g(p_i)$ , e.g. polynomial, exponential, or normal forms.
Rescale the standard loss gradient to enforce $\|\nabla_{z_i}^{DM}\| = w_i^{DM}$ .
Optionally normalize weights within a batch: $\hat w_i = w_i^{DM} / \sum_j w_j^{DM} \times B$ .
Inject modified gradients in backpropagation (Wang et al., 2019).

Ensemble-Based Data Assimilation

Criticality weights for each critical-point sample $x^*$ : $w(x^*) \propto \exp\left[-\varphi(x^*)\right] J(x^*)^{-1}$ where $\varphi(x^*)$ is a quadratic data-mismatch (combining prior deviation and data misfit), and $J(x^*)$ is the local Jacobian determinant of the mapping from prior samples to critical points. Under linear updates, $J(x^*)$ is constant and effectively cancels; for hybrid nonlinear mappings, $J(x^*)$ varies, often leading to multimodal posteriors and nontrivial sample efficacy (Ba et al., 2023).

Functional Criticality in Infrastructure

The functional criticality score for facility $f$ is

$C_f = \frac{1}{N_f} \sum_{i\in O_f} \frac{V_{i,f}}{s_i}$

where $V_{i,f}$ is the visit count from origin $i$ to facility $f$ and $s_i$ is substitutability of origin $i$ . These are linearly normalized to $C_f^{\rm norm} \in [0,1]$ and used as multiplicative weights in downstream risk assessment (Ma et al., 18 Dec 2025).

Offline RL Long-Tail and Uncertainty Weighting

Timestep or scenario-level criticality signals $c_t \in [0,1]$ (e.g., model uncertainty, action rarity, heuristic risk) are normalized: $w_t = \frac{c_t}{\sum_{t'}c_{t'}}$ and introduced into the weighted loss or sampling distribution of the RL agent (Guillen-Perez, 25 Aug 2025).

Table 1: Examples of Criticality-Weighting Function Forms

Domain	Criticality Signal	Weighting Formula
Deep nets	$p_i$ (softmax)	$g(p_i) = p^\alpha (1-p)^\eta$ , $e^{\beta(1-p)}$
Data assimilation	Data-mismatch, Jacobian	$\exp[-\varphi(x^)] J(x^)^{-1}$
Resilience	Visits, substitutability	$C_f = N_f^{-1} \sum_i V_{i,f}/s_i$
RL	Uncertainty, rarity, risk	$w_t = c_t/(\sum_{t'}c_{t'})$

Across settings, the weighting function is designed to emphasize, suppress, or equilibrate components as a function of their role in critical system behavior.

3. Algorithmic Prescriptions and Implementation

The practical implementation of criticality-derived weighting depends on the recognition of criticality signals, normalization, and their injection into core algorithms.

Derivative Manipulation (DM): Compute per-example criticality via the forward model, determine $g(p_i)$ , normalize, and rescale backpropagation gradients accordingly. This enables direct control over which regions of $p_i$ (“easy”, “hard”, “intermediate”) are stressed during optimization, subsuming categorical cross-entropy, mean absolute or squared error, focal loss, and other sample-reweighting protocols as special cases (Wang et al., 2019).
Weighted RML in Data Assimilation: After assembling perturbed prior ensembles, each sample’s criticality is computed via a joint data-mismatch functional and (potentially sample-dependent) Jacobian, then incorporated into importance-weighted sampling for marginal posterior estimation. When hybrid models are present, local curvature and nonlinearity produce sample-specific weights essential for capturing multimodal posteriors (Ba et al., 2023).
Human-Centered Infrastructure Analysis: Populate an origin–facility matrix from behavioral mobility records, compute substitutability-adjusted dependence per facility, normalize within lifeline type, and apply as weights in the aggregation of hazard-exposure or vulnerability indices. The framework aligns infrastructure criticality assessments with real-world use rather than asset-centric proxies (Ma et al., 18 Dec 2025).
Offline RL Data Curation: After quantifying criticality via kinematic risk, interaction scores, action rarity, or model ensemble uncertainty, normalize sample scores, and use as sampling weights in batch stochastic optimization. Empirically, per-timestep uncertainty-weighting maximizes reactive safety, while scenario-level weighting improves long-horizon planning. Implementation leverages WeightedRandomSampler or scenario-resampling primitive in data pipelines (Guillen-Perez, 25 Aug 2025).
Critical Network Initialization: Statistical field theory and renormalization-group analysis yield hyperparameter formulas for weight/bias variance and depth/width ratios to ensure criticality of signal propagation, stable training, and optimal dynamical range. For ReLU networks, the prescription is $W^{(\ell)}_{ij}\sim N(0, 2/n)$ , $b=0$ , with learning rates scaled by $1/L$ and $L/n$ ratios kept low (Sundberg et al., 1 Aug 2025). In organizational Ising models, weights are iteratively fitted to reproduce critical long-range correlation structure, yielding maximal mutual information and behavioral transitions in embodied controllers (Aguilera et al., 2017).

4. Empirical Performance and Cross-Domain Impact

Empirical studies report that criticality-derived weighting strategies consistently outperform conventional uniform- or heuristic-based approaches under data imbalance, noise, complex dynamics, or risk propagation settings.

Deep Models: DM yields substantial gains on vision and language tasks with severe label noise or class imbalance (e.g., CIFAR-100 at 40% noise: accuracy increases from 53.2% to 61.0%; Clothing1M: 73.3% vs. best prior 72.2%) (Wang et al., 2019).
Data Assimilation: Weighted RML with criticality weighting enables accurate posterior estimation in highly non-Gaussian, multimodal settings (e.g., hierarchical Gaussian models, nonlinear permeability transforms); hybrid weighting outperforms standard iterative ensemble smoothers in non-convex regimes (Ba et al., 2023).
Resilience Planning: Functional criticality analysis exposes deeply concentrated behavioral dependence, with a small minority of facilities absorbing the majority of functional risk (2.8% of grocery stores, 14.8% of hospitals classified as highly critical); normalized criticality weights drive population-weighted vulnerability indices, revealing that climate-induced flood vulnerability grows disproportionately in critical service nodes (Ma et al., 18 Dec 2025).
Offline RL: Non-uniform sampling by model uncertainty reduces collision rates by nearly a factor of three (from 16.0% to 5.5%) in autonomous driving, compared to baseline CQL agents trained with uniform sampling. Scenario-level criticality weighting optimizes planning, while per-timestep weighting directly improves safety and comfort metrics (Guillen-Perez, 25 Aug 2025).
Neural Network Training: Critical initialization and architecture scaling (ReLU, $C_W^*=2/n$ ) enables stable training with stochastic gradient descent in nuclear binding energy models, achieving few-MeV final errors. Noncritical initialization or suboptimal $L/n$ ratios lead to instability or degraded performance (Sundberg et al., 1 Aug 2025).

5. Methodological Variants and Domain Extensions

Significant methodological diversity exists within criticality-derived weighting, as evidenced by differences in the underlying criticality signals and normalization procedures:

Sample criticality: Gradient magnitude or model uncertainty as immediate indicators of “critical” data; directly impacts loss functions in supervised/deep learning and RL.
Correlation-driven weighting: Weight matrices $\{J_{ij}\}$ learned to match critical correlations from physical models, as in Ising-based architectures for embodied agents; emphasizes scale-free, maximally informative dynamics (Aguilera et al., 2017).
Jacobian-augmented importance: In ensemble data assimilation, the Jacobian determinant modulates sample weights to correct for local curvature and nonlinearity, especially essential for multimodal or ill-posed problems (Ba et al., 2023).
Behavioral functional dependence: Facility importance assessed by behavioral (mobility-derived) dependence, substitutability, and catchment size in infrastructure resilience; aligns weighting with real-world systemic impact (Ma et al., 18 Dec 2025).
RG-based initialization and dynamical criticality: Explicit field-theory analysis provides layerwise or architecture-dependent prescriptions for hyperparameters, ensuring extended stable signal propagation (Sundberg et al., 1 Aug 2025).

A plausible implication is that as criticality-derived weighting schemes continue to propagate across scientific and engineering domains, further domain-specific signals of criticality (e.g., energy landscapes, network flow, mutual information) could be operationalized for weighting, driving advanced robustness and adaptivity.

6. Theoretical Significance and Limitations

Criticality-derived weighting offers a unifying physical/statistical foundation for a broad class of weighting, sampling, and initialization schemes. By rooting importance in critical behavior—via gradients, correlation structure, uncertainty, or functional necessity—these frameworks typically maximize information flow, robustness to perturbation (e.g., label noise, rare events, multimodal posteriors), and behavioral flexibility.

However, several limitations are noted across the literature:

Optimization: For deep networks, many criticality-based analyses are limited to simple SGD optimizers and may be superseded by adaptive schemes empirically; optimality of criticality under all training protocols is not guaranteed (Sundberg et al., 1 Aug 2025).
Implementation: Functional criticality metrics in infrastructure require extensive, high-fidelity human mobility records, and may be sensitive to temporal or spatial sampling biases (Ma et al., 18 Dec 2025).
Scalability: In ensemble techniques, effective sample size ( $N_{\rm eff}$ ) may become limiting under sharply multimodal geometries; denoising, regularization, or larger ensembles may be required (Ba et al., 2023).
Generalization: The universality of criticality as an organizing principle is supported in several models, but domain-specific adjustments, normalization conventions, and limitations on signal extraction must be empirically validated.

7. Synthesis and Outlook

Criticality-derived weighting constitutes a mathematically tractable, physically motivated, and empirically validated design principle for robustly weighting components in complex systems across machine learning, inference, control, and resilience analysis. Key attributes include:

Transparent mapping from criticality signal to sample/facility/parameter weight.
Subsumption and generalization of existing loss reweighting, hard-mining, curriculum, or regularization schemes.
Universal applicability in domains with identifiable phase transitions, uncertainty concentration, or dynamically marginal regimes.

As research progresses, deeper integration of criticality-derived weighting with adaptive optimization, high-resolution behavioral data, and mechanistic scientific models is expected to drive new advances in system robustness, controllability, and interpretability across domains.