Green-Enrichment & Red-Depletion Statistics
- Green-enrichment and red-depletion statistics are defined as the excess or deficit of observed events relative to a null model, using metrics like z-scores and hypergeometric tests.
- The methodology combines analytical approximations, permutation methods, and saddlepoint techniques to provide efficient and accurate quantification of statistical signals.
- Applications span spatial omics, genomics, photochemistry, and language model watermarking, illustrating the interdisciplinary impact of these statistical frameworks.
Green-enrichment and red-depletion statistics quantify the degree to which objects, events, or entities labeled as "green" or "red" are over- or under-represented relative to a null expectation. The terminology, context-specific but found across spatial statistics, genomics, signal detection, network science, and photochemistry, demarcates statistical paradigms where enrichment (excess relative to null) and depletion (deficit relative to null) are critical for quantitative analysis. The operational definitions and analytical machinery for these statistics span closed-form z-score expressions, hypergeometric and saddlepoint approximations, intersection testing, and physically explicable emission ratios.
1. Statistical Definitions and Conceptual Framework
The term "green-enrichment" signifies an observed excess of "green" category interactions or events compared to what would be expected under a suitable null model; "red-depletion" denotes a statistically significant deficit of "red" events. Statistical significance is typically established either via analytic approximations (z-scores), permutation or Monte Carlo resampling, or, for exact calculations, hypergeometric or generalized urn models.
Let denote the observed count of interactions (or features) of type linked to type . Analytical frameworks specify the null expectation and variance under random assignment or sampling. The canonical test statistic is the standardized z-score: A z-score signals green-enrichment; signals red-depletion. Tail probabilities (p-values) further quantify the statistical evidence against the null (Andersson et al., 23 Jun 2025, Hu et al., 22 Dec 2025, Kalinka, 2013, Stojmirović et al., 2010).
2. Analytical Neighborhood Enrichment in Spatial Omics
In spatial omics, green-enrichment and red-depletion statistics test whether cells or transcripts labeled "green" are spatially clustered more than expected, and whether "red" labels are under-clustered, respectively (Andersson et al., 23 Jun 2025). The data consist of spatial points each labeled green or red; a neighborhood graph encodes topological proximity.
Key steps:
- Compute the neighbor-count vectors , , where , are indicator vectors.
- Null model: labels are randomly reassigned with replacement, yielding expected neighbor-counts and variances .
- For points of type , the expected -neighbor count is ; variance .
- The observed counts and analytical z-scores are computed as above.
- Positive : more green-green edges; negative : fewer red-red edges.
The analytical method substantially outperforms permutation-based approaches in efficiency (O(E+N) vs O(E·N_MC)), with empirical correlation ≥0.95 to Monte Carlo z-scores (Andersson et al., 23 Jun 2025).
3. Enrichment and Depletion in Combinatorial, Genomic, and Intersection Analysis
In combinatorial settings, such as genomics or co-localization, green-enrichment and red-depletion describe the probability of observing intersections (e.g., genes or features) among categories exceeding or falling below the null hypothesis. The null distribution for the size of the intersection of two random samples (green and red channels) from a universe of items is given by the hypergeometric probability: One-tailed p-values for enrichment () and depletion () directly quantify statistical significance (Kalinka, 2013). This framework generalizes to channels/urns, with the intersection PMF computed via nested sums (Kalinka, 2013).
The widely used R package "hint" provides these calculations for both enrichment and depletion, including for more than two channels.
4. Weighted-Sum Methods and Saddlepoint Approximation
Weighted-sum approaches, such as the SaddleSum method, extend enrichment/depletion metrics to cases where the entities possess real-valued weights (e.g., expression levels). For a term of size with weights , the sum is compared to the null distribution of sums of random weights. Using empirical cumulant-generating functions and saddlepoint approximation, the tail probability for green-enrichment () or red-depletion () is efficiently estimated: where , , and are derived from saddlepoint equations (Stojmirović et al., 2010). This approach achieves accurate statistical significance even for small term sizes and does not require arbitrary dichotomization of weights.
5. Signal Detection and Watermarking in LLMs
In the context of digital watermarking, green-enrichment and red-depletion statistics serve as detection metrics for the presence of a "triple-set" watermark in LLM outputs (Hu et al., 22 Dec 2025). Tokens are classified into Green, Yellow, and Red sets via context-dependent partitioning; only Green and Yellow sets are sampled during watermarking.
For a generated text :
- Green and Red hit rates , are computed.
- Under the null (no watermark), hits are binomially distributed with target ratios .
- Green-enrichment z-score:
- Red-depletion z-score:
- One-sided p-values , are aggregated by Fisher's method.
This statistical pipeline yields high detection accuracy at low false-positive rates (≈0.5%) (Hu et al., 22 Dec 2025).
6. Physical and Photochemical Interpretations in Cometary Emissions
In molecular astrophysics, green and red refer to the forbidden atomic oxygen lines at 5577 Å ("green") and 6300 + 6364 Å ("red-doublet"). The green-to-red intensity ratio () serves as a compositional diagnostic of cometary comae. "Green enrichment" refers to , indicating increased O(S) production—typically due to enhanced CO abundance or sampling the inner, collisionally dominated coma. "Red depletion" () occurs for HO-dominated comae with low O(S) yields and large projected observing apertures (Bhardwaj et al., 2012, Raghuram et al., 2014).
The coupled chemistry-emission models specify balance equations for metastable oxygen densities and emission intensities, allowing analytic or numerical computation of under varying physical parameters: Model results show that moderate CO/HO ratios (≳5–10%) can increase well above 0.1, while pure HO cases saturate at ≈ 0.03–0.06 (Raghuram et al., 2014).
7. Green-Enrichment and Red-Depletion in Oscillator Networks
In the blue-green-red Kuramoto–Sakaguchi oscillator model, "green-enrichment" and "red-depletion" quantify lead–lag relationships between phase centroids of agent networks. The phase difference (Blue minus Green) and (Blue minus Red) encode these phenomena:
- : Green phase advanced relative to Blue ("Green-enrichment").
- : Blue leads Red ("Red-depletion").
Critical thresholds for these transitions are derived from the steady-state solutions and stability analysis of the reduced centroid dynamics. Nonlinear mixing induces regions where, due to cross-population frustration, Green can move ahead of Blue as Blue attempts to stay ahead of Red—a three-way emergent effect (Zuparic et al., 2020).
In conclusion, green-enrichment and red-depletion provide a statistically rigorous framework for interpreting over-representation (enrichment) and under-representation (depletion) in a variety of scientific domains. The specific realization—whether through analytic z-scores, classical urn-based hypergeometric models, saddlepoint approximations, or physical emission ratios—depends on domain-specific data structure and inferential aims. Analytical advances have enabled rigorous, scalable computation and interpretation in settings ranging from spatial omics to network dynamics and astrophysical spectroscopy.