Papers
Topics
Authors
Recent
2000 character limit reached

Zero-Inflated Continuous Optimization (ZICO)

Updated 19 December 2025
  • Zero-Inflated Continuous Optimization (ZICO) is a framework for learning DAG structures from zero-inflated count data using specialized ZI-GLMs.
  • It integrates sparsity regularization and a differentiable acyclicity constraint to effectively distinguish structural zeros from sampling zeros.
  • Empirical results on simulated networks and transcriptomics data show ZICO achieves faster, more accurate causal structure recovery than standard methods.

Zero-Inflated Continuous Optimization (ZICO) is a framework for learning the structure of directed acyclic graphs (DAGs) from zero-inflated count data. ZICO formulates the structure learning problem as a smooth, constrained optimization involving node-wise zero-inflated generalized linear models (ZI-GLMs), sparsity regularization, and a differentiable acyclicity constraint. The method is designed to distinguish structural zeros from sampling zeros—a critical challenge in contexts such as gene regulatory network inference, single-cell transcriptomics, and other domains in which excess zeros occur due to underlying biological or measurement processes. ZICO enables scalable and accurate recovery of causal structures in settings where standard methods fail to model zero inflation effectively (Sato et al., 18 Dec 2025).

1. Problem Formulation and Motivation

The essential input is an n×dn \times d data matrix,

X=(xi1,...,xid)i=1n,xijN0,X = (x_{i1}, ..., x_{id})_{i=1}^n, \quad x_{ij} \in \mathbb{N}_0,

characterized by high rates of exact zeros (“zero-inflation”). Standard DAG learning procedures—such as NOTEARS, greedy equivalence search (GES), or SCORE—are ill-equipped for zero-inflated settings, typically assuming continuous or unadjusted count models (e.g., Poisson), which do not distinguish between structural zeros (from explicit zero-inflation mechanisms) and sampling zeros. This leads to systematically biased edge scores and impaired structure recovery.

The goal is to infer a weighted adjacency matrix WRd×dW \in \mathbb{R}^{d\times d}, where the (k,j)(k,j) entry wkjw_{kj} encodes the directed influence from node kk to node jj, under a DAG constraint enforced on WW.

2. Zero-Inflated Generalized Linear Models

For each node jj, ZICO models conditional distributions using a two-component ZI-GLM mixture. Specialized forms are as follows:

2.1 Zero-Inflated Poisson (ZIP)

For a sample ii, node jj: p(xijxi;wj(0),wj(1),γj,δj)={(1πij)+πijeμij,xij=0 πijμijxijxij!eμij,xij>0p(x_{ij}\mid x_i;w_j^{(0)},w_j^{(1)},\gamma_j,\delta_j) = \begin{cases} (1-\pi_{ij}) + \pi_{ij}e^{-\mu_{ij}}, & x_{ij} = 0\ \pi_{ij} \dfrac{\mu_{ij}^{x_{ij}}}{x_{ij}!} e^{-\mu_{ij}}, & x_{ij} > 0 \end{cases} where

πij=sigmoid(γj+wj(0)Txi),μij=exp(δj+wj(1)Txi).\pi_{ij} = \mathrm{sigmoid}(\gamma_j + w_j^{(0)\,T} x_i), \quad \mu_{ij} = \exp(\delta_j + w_j^{(1)\,T} x_i).

The logit link parametrizes structural zeros, and the log link models the mean of the count component.

2.2 Zero-Inflated Negative Binomial (ZINB)

Adds dispersion parameter rj>0r_j > 0: p(xij=0)=(1πij)+πij(rjrj+μij)rjp(x_{ij} = 0) = (1 - \pi_{ij}) + \pi_{ij} \Big(\frac{r_j}{r_j + \mu_{ij}}\Big)^{r_j}

p(xij>0)πij(xij+rj1xij)(rjrj+μij)rj(μijrj+μij)xijp(x_{ij} > 0) \propto \pi_{ij} \binom{x_{ij} + r_j - 1}{x_{ij}} \Big(\frac{r_j}{r_j + \mu_{ij}}\Big)^{r_j} \Big(\frac{\mu_{ij}}{r_j + \mu_{ij}}\Big)^{x_{ij}}

Each node’s parameters are collectively θj=(wj(0),wj(1),γj,δj,rj)\theta_j = (w_j^{(0)}, w_j^{(1)}, \gamma_j, \delta_j, r_j).

3. Smooth Score-Based Objective and Regularization

The aggregate (negative) log-likelihood is: L(W0,W1,γ,δ,r)=1ni=1nj=1dij,L(W_0, W_1, \gamma, \delta, r) = -\frac{1}{n} \sum_{i=1}^n \sum_{j=1}^d \ell_{ij}, with node-specific coefficients, where W0=[w1(0),...,wd(0)]W_0 = [w_1^{(0)}, ..., w_d^{(0)}] and similarly for W1W_1.

To induce sparsity, ZICO incorporates an 1\ell_1 or group-1\ell_1 penalty: R(W0,W1)=j=1dkj(wkj(0),wkj(1))2R(W_0, W_1) = \sum_{j=1}^d \sum_{k \neq j} \| (w_{kj}^{(0)}, w_{kj}^{(1)})\|_2 or for elementwise sparsity, W01+W11\|W_0\|_1 + \|W_1\|_1.

The smooth objective optimizes: f(W0,W1,γ,δ,r)=L(W0,W1,γ,δ,r)+λR(W0,W1)f(W_0, W_1, \gamma, \delta, r) = L(W_0, W_1, \gamma, \delta, r) + \lambda R(W_0, W_1)

4. Differentiable Acyclicity Constraints

ZICO enforces global DAG structure via a differentiable surrogate constraint: h(W)=Tr(eWW)d=0,h(W) = \mathrm{Tr}\bigl( e^{W \circ W} \bigr) - d = 0, where WWW \circ W denotes elementwise squaring; this function vanishes if and only if the directed graph with adjacency W|W| has no directed cycles.

The method requires both graphs implied by W0W_0 and W1W_1 to be acyclic, ensuring a proper DAG structure for each set of weights.

5. Constrained Optimization Framework

The core optimization problem is: minW0,W1,γ,δ,rf(W0,W1,γ,δ,r)s.t. h(W0)=0,h(W1)=0\min_{W_0, W_1, \gamma, \delta, r} f(W_0, W_1, \gamma, \delta, r) \quad \text{s.t. } h(W_0) = 0,\, h(W_1) = 0 This is solved using an augmented Lagrangian or penalty functional, e.g.,

Lρ(W)=f(W)+αh(W)+ρ2h(W)2,\mathcal{L}_{\rho}(W) = f(W) + \alpha h(W) + \frac{\rho}{2} h(W)^2,

with central-path style updates over the dual variable α\alpha and penalty ρ\rho, alternating with gradient-based steps (AdamW) and mini-batch likelihood evaluation.

Regularization scheduling includes cosine-annealing for λ\lambda (promotes gradual sparsification) and decay of ρ\rho to adjust constraint enforcement dynamically. After each gradient step, proximal (soft-thresholding) operations are applied for elementwise 1\ell_1 penalization. Convergence is determined by primal feasibility (h(W)\|h(W)\|), dual residuals, and objective change criteria.

6. Theoretical Properties and Computational Complexity

Under standard smoothness assumptions, limit points of the augmented Lagrangian approach correspond to Karush–Kuhn–Tucker points for the original constrained problem. Local convergence rates can be derived under standard assumptions on the Hessian of ff.

The computational cost of a gradient step is O(Bd2+d3)O(|B| d^2 + d^3), where B|B| is the batch size; for d500d \lesssim 500, the cubic term is negligible. The algorithm scales linearly in nn by mini-batching and can achieve linear or even sublinear complexity in dd via exploitation of sparsity and low-rank approximations for the acyclicity constraint.

7. Empirical Evaluation and Applications

Empirical studies demonstrate ZICO’s advantages in structure recovery from zero-inflated count data. Principal findings include:

  • Simulated ER networks (d=20,30,50d=20,30,50, n=500n=500): ZICO (with ZINB) yields the lowest structural Hamming distance (SHD) and structural intervention distance (SID) among ten compared methods. For d=50d=50, TPR ≈ 0.78, FDR ≈ 0.41, SHD ≈ 180, in approximately 55 s, whereas ZiGDAG reports SHD ≈ 283 in ≈ 18,000 s.
  • Barabási–Albert graphs: ZICO–ZINB achieves SHD ≈ 74, SID ≈ 1247 in ≈ 49 s, outperforming NOTEARS (Poisson variant) and greedy search approaches.
  • Single-cell transcriptomics (scMultiSim): ZICO (ZIP/ZINB) achieves AUPRC ratios up to three times random baseline, comparable to or outperforming GENIE3, SINCERITIES, LEAP, NOTEARS, and GRNBoost2, with explicit acyclicity enforcement.
  • Dropout settings: ZICO–ZIP/ZINB outperforms pure Poisson/NB models, affirming the necessity of explicit zero-inflation modeling under nontrivial zero-generating mechanisms.

ZICO provides efficient, vectorized, and mini-batched learning for large-scale settings, making it suitable for reverse engineering gene regulatory networks and broader contexts with zero-inflated counts. The method recovers zero-inflated causal structures with improved accuracy and one to two orders of magnitude faster than greedy search or MCMC-based alternatives (Sato et al., 18 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Zero-Inflated Continuous Optimization (ZICO).