Papers
Topics
Authors
Recent
2000 character limit reached

Green-Enrichment & Red-Depletion Statistics

Updated 29 December 2025
  • Green-enrichment and red-depletion statistics are defined as the excess or deficit of observed events relative to a null model, using metrics like z-scores and hypergeometric tests.
  • The methodology combines analytical approximations, permutation methods, and saddlepoint techniques to provide efficient and accurate quantification of statistical signals.
  • Applications span spatial omics, genomics, photochemistry, and language model watermarking, illustrating the interdisciplinary impact of these statistical frameworks.

Green-enrichment and red-depletion statistics quantify the degree to which objects, events, or entities labeled as "green" or "red" are over- or under-represented relative to a null expectation. The terminology, context-specific but found across spatial statistics, genomics, signal detection, network science, and photochemistry, demarcates statistical paradigms where enrichment (excess relative to null) and depletion (deficit relative to null) are critical for quantitative analysis. The operational definitions and analytical machinery for these statistics span closed-form z-score expressions, hypergeometric and saddlepoint approximations, intersection testing, and physically explicable emission ratios.

1. Statistical Definitions and Conceptual Framework

The term "green-enrichment" signifies an observed excess of "green" category interactions or events compared to what would be expected under a suitable null model; "red-depletion" denotes a statistically significant deficit of "red" events. Statistical significance is typically established either via analytic approximations (z-scores), permutation or Monte Carlo resampling, or, for exact calculations, hypergeometric or generalized urn models.

Let oA,Bo_{A,B} denote the observed count of interactions (or features) of type AA linked to type BB. Analytical frameworks specify the null expectation μA,B\mu_{A,B} and variance σA,B2\sigma_{A,B}^2 under random assignment or sampling. The canonical test statistic is the standardized z-score: zA,B=oA,BμA,BσA,B2z_{A,B} = \frac{o_{A,B} - \mu_{A,B}}{\sqrt{\sigma_{A,B}^2}} A z-score zG,G0z_{G,G} \gg 0 signals green-enrichment; zR,R0z_{R,R} \ll 0 signals red-depletion. Tail probabilities (p-values) further quantify the statistical evidence against the null (Andersson et al., 23 Jun 2025, Hu et al., 22 Dec 2025, Kalinka, 2013, Stojmirović et al., 2010).

2. Analytical Neighborhood Enrichment in Spatial Omics

In spatial omics, green-enrichment and red-depletion statistics test whether cells or transcripts labeled "green" are spatially clustered more than expected, and whether "red" labels are under-clustered, respectively (Andersson et al., 23 Jun 2025). The data consist of NN spatial points each labeled green or red; a neighborhood graph WW encodes topological proximity.

Key steps:

  • Compute the neighbor-count vectors yG=WbGy^G = W b^G, yR=WbRy^R = W b^R, where bGb^G, bRb^R are indicator vectors.
  • Null model: labels are randomly reassigned with replacement, yielding expected neighbor-counts μB\mu_B and variances νB\nu_B.
  • For nAn_A points of type AA, the expected BB-neighbor count is μA,B=nAμB\mu_{A,B} = n_A \mu_B; variance σA,B2=nAνB\sigma^2_{A,B} = n_A \nu_B.
  • The observed counts oA,Bo_{A,B} and analytical z-scores zA,Bz_{A,B} are computed as above.
  • Positive zG,Gz_{G,G}: more green-green edges; negative zR,Rz_{R,R}: fewer red-red edges.

The analytical method substantially outperforms permutation-based approaches in efficiency (O(E+N) vs O(E·N_MC)), with empirical correlation ≥0.95 to Monte Carlo z-scores (Andersson et al., 23 Jun 2025).

3. Enrichment and Depletion in Combinatorial, Genomic, and Intersection Analysis

In combinatorial settings, such as genomics or co-localization, green-enrichment and red-depletion describe the probability of observing intersections (e.g., genes or features) among categories exceeding or falling below the null hypothesis. The null distribution for the size XX of the intersection of two random samples (green and red channels) from a universe of nn items is given by the hypergeometric probability: P(X=k)=(ngreenk)(nngreennredk)(nnred)P(X=k) = \frac{{n_{green} \choose k} {n-n_{green} \choose n_{red}-k}}{{n \choose n_{red}}} One-tailed p-values for enrichment (Penrich=Pr[Xkobs]P_{\rm enrich} = Pr[X \geq k_{obs}]) and depletion (Pdeplete=Pr[Xkobs]P_{\rm deplete} = Pr[X \leq k_{obs}]) directly quantify statistical significance (Kalinka, 2013). This framework generalizes to NN channels/urns, with the intersection PMF computed via nested sums (Kalinka, 2013).

The widely used R package "hint" provides these calculations for both enrichment and depletion, including for more than two channels.

4. Weighted-Sum Methods and Saddlepoint Approximation

Weighted-sum approaches, such as the SaddleSum method, extend enrichment/depletion metrics to cases where the entities possess real-valued weights (e.g., expression levels). For a term TT of size mm with weights wiw_i, the sum ST=iTwiS_T = \sum_{i\in T} w_i is compared to the null distribution of sums of mm random weights. Using empirical cumulant-generating functions and saddlepoint approximation, the tail probability for green-enrichment (P[STs]P[S_T \ge s]) or red-depletion (P[STs]P[S_T \le s]) is efficiently estimated: P(STs)Φ(z^)+ϕ(z^)(1z^1y^)P(S_T \ge s) \approx \Phi(\hat{z}) + \phi(\hat{z})\left(\frac{1}{\hat{z}} - \frac{1}{\hat{y}}\right) where θ^\hat{\theta}, y^\hat{y}, and z^\hat{z} are derived from saddlepoint equations (Stojmirović et al., 2010). This approach achieves accurate statistical significance even for small term sizes and does not require arbitrary dichotomization of weights.

5. Signal Detection and Watermarking in LLMs

In the context of digital watermarking, green-enrichment and red-depletion statistics serve as detection metrics for the presence of a "triple-set" watermark in LLM outputs (Hu et al., 22 Dec 2025). Tokens are classified into Green, Yellow, and Red sets via context-dependent partitioning; only Green and Yellow sets are sampled during watermarking.

For a generated text x1,,xLx_1,\ldots,x_L:

  • Green and Red hit rates p^G=SG/L\hat{p}_G = S_G / L, p^R=SR/L\hat{p}_R = S_R / L are computed.
  • Under the null (no watermark), hits are binomially distributed with target ratios γg,γr\gamma_g,\,\gamma_r.
  • Green-enrichment z-score: ZG=(p^Gγg)/γg(1γg)/LZ_G = (\hat{p}_G - \gamma_g) / \sqrt{\gamma_g (1-\gamma_g)/L}
  • Red-depletion z-score: ZR=(γrp^R)/γr(1γr)/LZ_R = (\gamma_r - \hat{p}_R) / \sqrt{\gamma_r (1-\gamma_r)/L}
  • One-sided p-values pG=1Φ(ZG)p_G = 1-\Phi(Z_G), pR=1Φ(ZR)p_R = 1-\Phi(Z_R) are aggregated by Fisher's method.

This statistical pipeline yields high detection accuracy at low false-positive rates (≈0.5%) (Hu et al., 22 Dec 2025).

6. Physical and Photochemical Interpretations in Cometary Emissions

In molecular astrophysics, green and red refer to the forbidden atomic oxygen lines at 5577 Å ("green") and 6300 + 6364 Å ("red-doublet"). The green-to-red intensity ratio (G/RG/R) serves as a compositional diagnostic of cometary comae. "Green enrichment" refers to G/R0.1G/R \gg 0.1, indicating increased O(1^1S) production—typically due to enhanced CO2_2 abundance or sampling the inner, collisionally dominated coma. "Red depletion" (G/R0.1G/R \ll 0.1) occurs for H2_2O-dominated comae with low O(1^1S) yields and large projected observing apertures (Bhardwaj et al., 2012, Raghuram et al., 2014).

The coupled chemistry-emission models specify balance equations for metastable oxygen densities and emission intensities, allowing analytic or numerical computation of G/RG/R under varying physical parameters: G/R=I5577I6300+I6364G/R = \frac{I_{5577}}{I_{6300} + I_{6364}} Model results show that moderate CO2_2/H2_2O ratios (≳5–10%) can increase G/RG/R well above 0.1, while pure H2_2O cases saturate at G/RG/R ≈ 0.03–0.06 (Raghuram et al., 2014).

7. Green-Enrichment and Red-Depletion in Oscillator Networks

In the blue-green-red Kuramoto–Sakaguchi oscillator model, "green-enrichment" and "red-depletion" quantify lead–lag relationships between phase centroids of agent networks. The phase difference αBG\alpha_{BG} (Blue minus Green) and αBR\alpha_{BR} (Blue minus Red) encode these phenomena:

  • αBG<0\alpha_{BG} < 0: Green phase advanced relative to Blue ("Green-enrichment").
  • αBR>0\alpha_{BR} > 0: Blue leads Red ("Red-depletion").

Critical thresholds for these transitions are derived from the steady-state solutions and stability analysis of the reduced centroid dynamics. Nonlinear mixing induces regions where, due to cross-population frustration, Green can move ahead of Blue as Blue attempts to stay ahead of Red—a three-way emergent effect (Zuparic et al., 2020).


In conclusion, green-enrichment and red-depletion provide a statistically rigorous framework for interpreting over-representation (enrichment) and under-representation (depletion) in a variety of scientific domains. The specific realization—whether through analytic z-scores, classical urn-based hypergeometric models, saddlepoint approximations, or physical emission ratios—depends on domain-specific data structure and inferential aims. Analytical advances have enabled rigorous, scalable computation and interpretation in settings ranging from spatial omics to network dynamics and astrophysical spectroscopy.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Green-Enrichment and Red-Depletion Statistics.