Papers
Topics
Authors
Recent
2000 character limit reached

CIfly: Efficient Causal Inference Framework

Updated 5 January 2026
  • CIfly is a framework that reformulates causal inference tasks as reachability queries in a state-space graph, allowing linear-time computation.
  • It employs a rule-table schema to define state transitions, simplifying complex causal reasoning without requiring costly graph moralization or projection.
  • The open-source implementation in Rust, with Python and R bindings, supports rapid prototyping and demonstrates significant speedups over traditional methods.

CIfly is a framework for efficient algorithmic development in graphical causal inference that reformulates a wide spectrum of causal graph reasoning tasks—such as d-separation, back-door and front-door criteria evaluation, adjustment-set validity, and instrumental-variable checks—as reachability queries in dynamically constructed state-space graphs. The approach exploits the observation that many of these tasks can be uniformly represented and solved by determining which constructed “states” are traversable from a specified initial set under simple transition rules, thereby reducing their computational cost to linear time relative to the size of the original causal graph. CIfly formalizes this methodology using a rule-table schema and provides a performant, open-source implementation in Rust with bindings for both Python and R, supporting rapid prototyping of new causal inference algorithms and outperforming traditional approaches based on moralization and latent projection (Wienöbst et al., 18 Jun 2025).

1. Fundamental Concepts and State-Space Construction

At its core, CIfly encodes graphical causal inference tasks as reachability problems in a specialized state-space graph. Each state in this graph is a triple (v,n,c)(v, n, c), where vv is a node from the original graph, nn denotes a “neighbor type” (such as the directionality of an edge in a DAG: \rightarrow or \leftarrow), and cc is an auxiliary finite label called a “color” that tracks context-dependent flags (e.g., whether a particular conditioning set has been crossed). This formulation allows the mapping:

Vdir=V×NE×C,V_{\rm dir} = V \times \mathcal N_{\mathcal E} \times C,

where VV is the vertex set, E\mathcal E is the set of edge types, NE\mathcal N_{\mathcal E} the induced neighbor-type set, and CC the color set. State transitions are encoded as directed edges:

(v1,n1,c1)(v2,n2,c2),(v_1, n_1, c_1) \rightarrow (v_2, n_2, c_2),

whenever v1v_1 and v2v_2 are adjacent in the original graph with compatible neighbor types and a Boolean transition function ϕ\phi evaluates to true.

This modular construction enables the isolation of complex causal reasoning logic to a small number of transition rules, delegating the rest to standard graph traversal.

2. Rule Table Schema and Algorithm Specification

Instead of requiring users to hardcode potentially large and intricate Boolean formulas for state transitions, CIfly introduces a rule-table schema. Each row in a rule table specifies:

  • the current state pattern: a pair (n1,c1)(n_1, c_1) or a wildcard,
  • the next state pattern: a pair (n2,c2)(n_2, c_2) or a wildcard,
  • a logical rule: a Boolean predicate involving set membership of current or next states in input sets (e.g., XX, ZZ), built from in, not in, and, or.

During traversal, the first rule whose patterns match the current and potential next states is applied. If the logical rule evaluates true, the transition occurs.

A compact specification suffices for complex tasks. For example, d-separation in an ADMG requires rules on colliders and conditioning sets, encoded succinctly using this table mechanism.

3. Linear-Time Complexity Guarantee

CIfly provides a formal guarantee of linear-time execution for any causal inference problem expressed within its schema. Theorem 3.4 establishes that, given an (E,)(\mathcal{E}, \ell)-CIfly reduction and input graph GG of pp nodes and mm edges, the on-the-fly DFS/BFS algorithm computes outputs in O(p+m)O(p + m) time. The proof leverages the facts that each state is visited at most once, every graph edge is examined in only finitely many neighbor-type and color contexts, and all transition-rule lookups and predicate evaluations are constant-time operations (Wienöbst et al., 18 Jun 2025).

This contrasts sharply with classical approaches based on graph moralization or latent projection, both of which are shown to be computationally equivalent to Boolean matrix multiplication and therefore lower-bounded by Ω(p2.37)\Omega(p^{2.37}) in the worst case—rendering them inefficient for large graphs.

4. Relation to Moralization and Latent Projection

Traditional causal inference algorithms often rely on graph moralization (adding undirected edges among all parents of a node) or latent projection (preserving conditional independencies after marginalizing certain nodes). CIfly demonstrates that both moralization and latent projection fundamentally require solving problems of at least the complexity of Boolean matrix multiplication or transitive closure, with best-known algorithms bounded below by Ω(p2.37)\Omega(p^{2.37}) and Ω(p2)\Omega(p^2), respectively.

CIfly bypasses these bottlenecks entirely: the state-space graph is never globally constructed, and only the minimal local context necessary for each reachability query is instantiated during BFS or DFS. This yields practical speedups and eliminates the need for expensive matrix operations or explicit projection steps.

5. Implementation, API, and Integrations

CIfly is implemented in Rust as a compiled backend (cifly), processing user-specified rule tables (text files) into efficient transition jump-tables and constant-time Boolean expression evaluators. The library exposes a single function:

  • In R: reach(graph, list("X"=Xset, "Z"=Zset), "path/to/table.txt")
  • In Python: reach(graph, {"X": Xset, "Z": Zset}, "table.txt")

Bindings are provided as ciflyr on CRAN and ciflypy on PyPI. The API expects a graph object and set assignments, together with a rule table. The system parses the table, initializes internal structures, and executes the on-the-fly traversal.

The implementation leverages precomputed jump-tables mapped to (n1,c1,n2)(n_1, c_1, n_2) tuples and evaluates AST-encoded logical rules in constant time. Only the states encountered during traversal are materialized.

6. Empirical Performance and Use Cases

Benchmarking demonstrates that CIfly achieves substantial performance improvements:

  • Up to 105×10^5\times faster than the R package pcalg (for adjustment-set validity in large CPDAGs, p=500p = 500),
  • Approximately 15×15\times faster than DAGitty,
  • Comparable to or better than gadjid for parent-adjustment distance computations.

Crucially, these gains are achieved while maintaining O(p+m)O(p+m) worst-case runtime, independent of the size or complexity of induced (moral, projection) graphs.

CIfly has been used to re-implement a range of established graphical causal inference tasks—such as d-separation, adjustment criteria in different graph classes, and parent-adjustment distances—within the same reachability-based framework. The system also facilitates the design of new algorithms, for example, for instrumental-variable identification, by simply specifying new rule tables.

An illustrative usage for d-separation in an ADMG involves preparing the graph, sets XX and ZZ, and a concise rule table (e.g., adm-dsep.txt). The reachability query returns all nodes reachable from XX via open walks given ZZ, without any manual moralization or custom DFS logic.

7. Applicability and Extensibility

CIfly serves as a versatile formal primitive—a “BLAS for causal-inference algorithms”—abstracting the mechanics of reachability and state tracking. Researchers need only formulate the relevant transition rules and colors; the framework efficiently manages all traversals. The concise rule-table language and the linear-time guarantee enable both rapid prototyping and scalable deployment for diverse causal inference problems, including but not limited to:

  • d-separation in various graphical models,
  • adjustment-set and back-door/front-door test criteria,
  • possible descendants and instrumental variable identification.

A plausible implication is that as new causal inference criteria and identification tasks are developed, CIfly's state-space and rule-table approach can continue to provide an efficient, generalizable foundation for their algorithmic realization (Wienöbst et al., 18 Jun 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to CIfly.