Zero-Inflated Continuous Optimization (ZICO)

Updated 19 December 2025

Zero-Inflated Continuous Optimization (ZICO) is a framework for learning DAG structures from zero-inflated count data using specialized ZI-GLMs.
It integrates sparsity regularization and a differentiable acyclicity constraint to effectively distinguish structural zeros from sampling zeros.
Empirical results on simulated networks and transcriptomics data show ZICO achieves faster, more accurate causal structure recovery than standard methods.

Zero-Inflated Continuous Optimization (ZICO) is a framework for learning the structure of directed acyclic graphs (DAGs) from zero-inflated count data. ZICO formulates the structure learning problem as a smooth, constrained optimization involving node-wise zero-inflated generalized linear models (ZI-GLMs), sparsity regularization, and a differentiable acyclicity constraint. The method is designed to distinguish structural zeros from sampling zeros—a critical challenge in contexts such as gene regulatory network inference, single-cell transcriptomics, and other domains in which excess zeros occur due to underlying biological or measurement processes. ZICO enables scalable and accurate recovery of causal structures in settings where standard methods fail to model zero inflation effectively (Sato et al., 18 Dec 2025).

1. Problem Formulation and Motivation

The essential input is an $n \times d$ data matrix,

$X = (x_{i1}, ..., x_{id})_{i=1}^n, \quad x_{ij} \in \mathbb{N}_0,$

characterized by high rates of exact zeros (“zero-inflation”). Standard DAG learning procedures—such as NOTEARS, greedy equivalence search (GES), or SCORE—are ill-equipped for zero-inflated settings, typically assuming continuous or unadjusted count models (e.g., Poisson), which do not distinguish between structural zeros (from explicit zero-inflation mechanisms) and sampling zeros. This leads to systematically biased edge scores and impaired structure recovery.

The goal is to infer a weighted adjacency matrix $W \in \mathbb{R}^{d\times d}$ , where the $(k,j)$ entry $w_{kj}$ encodes the directed influence from node $k$ to node $j$ , under a DAG constraint enforced on $W$ .

2. Zero-Inflated Generalized Linear Models

For each node $j$ , ZICO models conditional distributions using a two-component ZI-GLM mixture. Specialized forms are as follows:

2.1 Zero-Inflated Poisson (ZIP)

For a sample $i$ , node $j$ : $p(x_{ij}\mid x_i;w_j^{(0)},w_j^{(1)},\gamma_j,\delta_j) = \begin{cases} (1-\pi_{ij}) + \pi_{ij}e^{-\mu_{ij}}, & x_{ij} = 0\ \pi_{ij} \dfrac{\mu_{ij}^{x_{ij}}}{x_{ij}!} e^{-\mu_{ij}}, & x_{ij} > 0 \end{cases}$ where

$\pi_{ij} = \mathrm{sigmoid}(\gamma_j + w_j^{(0)\,T} x_i), \quad \mu_{ij} = \exp(\delta_j + w_j^{(1)\,T} x_i).$

The logit link parametrizes structural zeros, and the log link models the mean of the count component.

2.2 Zero-Inflated Negative Binomial (ZINB)

Adds dispersion parameter $r_j > 0$ : $p(x_{ij} = 0) = (1 - \pi_{ij}) + \pi_{ij} \Big(\frac{r_j}{r_j + \mu_{ij}}\Big)^{r_j}$

$p(x_{ij} > 0) \propto \pi_{ij} \binom{x_{ij} + r_j - 1}{x_{ij}} \Big(\frac{r_j}{r_j + \mu_{ij}}\Big)^{r_j} \Big(\frac{\mu_{ij}}{r_j + \mu_{ij}}\Big)^{x_{ij}}$

Each node’s parameters are collectively $\theta_j = (w_j^{(0)}, w_j^{(1)}, \gamma_j, \delta_j, r_j)$ .

3. Smooth Score-Based Objective and Regularization

The aggregate (negative) log-likelihood is: $L(W_0, W_1, \gamma, \delta, r) = -\frac{1}{n} \sum_{i=1}^n \sum_{j=1}^d \ell_{ij},$ with node-specific coefficients, where $W_0 = [w_1^{(0)}, ..., w_d^{(0)}]$ and similarly for $W_1$ .

To induce sparsity, ZICO incorporates an $\ell_1$ or group- $\ell_1$ penalty: $R(W_0, W_1) = \sum_{j=1}^d \sum_{k \neq j} \| (w_{kj}^{(0)}, w_{kj}^{(1)})\|_2$ or for elementwise sparsity, $\|W_0\|_1 + \|W_1\|_1$ .

The smooth objective optimizes: $f(W_0, W_1, \gamma, \delta, r) = L(W_0, W_1, \gamma, \delta, r) + \lambda R(W_0, W_1)$

4. Differentiable Acyclicity Constraints

ZICO enforces global DAG structure via a differentiable surrogate constraint: $h(W) = \mathrm{Tr}\bigl( e^{W \circ W} \bigr) - d = 0,$ where $W \circ W$ denotes elementwise squaring; this function vanishes if and only if the directed graph with adjacency $|W|$ has no directed cycles.

The method requires both graphs implied by $W_0$ and $W_1$ to be acyclic, ensuring a proper DAG structure for each set of weights.

5. Constrained Optimization Framework

The core optimization problem is: $\min_{W_0, W_1, \gamma, \delta, r} f(W_0, W_1, \gamma, \delta, r) \quad \text{s.t. } h(W_0) = 0,\, h(W_1) = 0$ This is solved using an augmented Lagrangian or penalty functional, e.g.,

$\mathcal{L}_{\rho}(W) = f(W) + \alpha h(W) + \frac{\rho}{2} h(W)^2,$

with central-path style updates over the dual variable $\alpha$ and penalty $\rho$ , alternating with gradient-based steps (AdamW) and mini-batch likelihood evaluation.

Regularization scheduling includes cosine-annealing for $\lambda$ (promotes gradual sparsification) and decay of $\rho$ to adjust constraint enforcement dynamically. After each gradient step, proximal (soft-thresholding) operations are applied for elementwise $\ell_1$ penalization. Convergence is determined by primal feasibility ( $\|h(W)\|$ ), dual residuals, and objective change criteria.

6. Theoretical Properties and Computational Complexity

Under standard smoothness assumptions, limit points of the augmented Lagrangian approach correspond to Karush–Kuhn–Tucker points for the original constrained problem. Local convergence rates can be derived under standard assumptions on the Hessian of $f$ .

The computational cost of a gradient step is $O(|B| d^2 + d^3)$ , where $|B|$ is the batch size; for $d \lesssim 500$ , the cubic term is negligible. The algorithm scales linearly in $n$ by mini-batching and can achieve linear or even sublinear complexity in $d$ via exploitation of sparsity and low-rank approximations for the acyclicity constraint.

7. Empirical Evaluation and Applications

Empirical studies demonstrate ZICO’s advantages in structure recovery from zero-inflated count data. Principal findings include:

Simulated ER networks ( $d=20,30,50$ , $n=500$ ): ZICO (with ZINB) yields the lowest structural Hamming distance (SHD) and structural intervention distance (SID) among ten compared methods. For $d=50$ , TPR ≈ 0.78, FDR ≈ 0.41, SHD ≈ 180, in approximately 55 s, whereas ZiGDAG reports SHD ≈ 283 in ≈ 18,000 s.
Barabási–Albert graphs: ZICO–ZINB achieves SHD ≈ 74, SID ≈ 1247 in ≈ 49 s, outperforming NOTEARS (Poisson variant) and greedy search approaches.
Single-cell transcriptomics (scMultiSim): ZICO (ZIP/ZINB) achieves AUPRC ratios up to three times random baseline, comparable to or outperforming GENIE3, SINCERITIES, LEAP, NOTEARS, and GRNBoost2, with explicit acyclicity enforcement.
Dropout settings: ZICO–ZIP/ZINB outperforms pure Poisson/NB models, affirming the necessity of explicit zero-inflation modeling under nontrivial zero-generating mechanisms.

ZICO provides efficient, vectorized, and mini-batched learning for large-scale settings, making it suitable for reverse engineering gene regulatory networks and broader contexts with zero-inflated counts. The method recovers zero-inflated causal structures with improved accuracy and one to two orders of magnitude faster than greedy search or MCMC-based alternatives (Sato et al., 18 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

DAG Learning from Zero-Inflated Count Data Using Continuous Optimization (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Zero-Inflated Continuous Optimization (ZICO).