Dependence Graph Construction

Updated 21 January 2026

Dependence graph construction is a framework that models dependencies between system elements using nodes and edges to capture statistical, logical, and programmatic relations.
It integrates methods from static and dynamic analysis, enabling applications such as fault localization, data provenance, and secure interaction tracing.
Advanced construction algorithms leverage conditional independence and attention-tracing techniques to ensure precise graph recovery and scalable system analysis.

Dependence graph construction refers to methodologies and formal frameworks for representing, extracting, and analyzing dependencies—statistical, logical, programmatic, semantic, or structural—between components of complex systems as graphs. Vertices in a dependence graph encode entities (variables, functions, data, processes, or concepts), and edges express a directed, undirected, or weighted notion of influence, dependence, or information flow. Across domains, dependence graph construction enables rigorous reasoning about independence, causality, fault localization, graphical models, provenance, and more. This article surveys foundational principles, canonical constructions, and modern methodologies, grounding each in primary literature.

1. Formal Definitions and Canonical Dependence Graphs

Dependence graphs are defined by a set of nodes, representing system elements, and a specified edge semantics encoding dependence relations. Core categories include:

Conditional independence graphs: Vertices correspond to random variables. Edges encode conditional dependence, often established via pairwise Markov properties (e.g., $(i,j)\not\in E \Leftrightarrow X_i \perp X_j \mid X_{-\{i,j\}}$ ). Construction may be nonparametric, using the conditional dependence coefficient $R_{i,j}$ , leading to a graph $E=\{(i,j): R_{i,j}\neq 0\}$ (Furmańczyk, 2023).
Program dependence graphs (PDGs): Nodes represent program statements. Edges may encode data dependence (flow of values) or control dependence (flow of execution). Construction is staged: extract a control-flow graph (CFG), infer postdominator-based control dependencies, and identify data dependencies via variable definitions and uses (Askarunisa et al., 2012).
Summary graphs: Encapsulate conditional independence structures after marginalizing and conditioning, preserving pertinent independence properties among a reduced variable set. Construction is governed by edge-inducing rules and matrix-algebraic operations (partial closure/inversion), often starting from a regression (chain) graph (Wermuth, 2010).
Local independence graphs: Nodes are stochastic processes; a directed edge $i\to j$ encodes that the intensity process of $Y_j$ is not locally independent of $Y_i$ , as formalized via compensators and Doob–Meyer decompositions. Higher-order asymmetric separation (δ-separation) allows reading off complex local independences (Didelez, 2012).
Dependence graphs from program execution: Vertices are runtime data values; edges are dynamically constructed as per call context and value creation, supporting provenance and data transparency through causal slices and Jonsson-Tarski conjugate operators (Bond et al., 2024).
Decision dependence graphs (DDGs): For LLM-based decision provenance, nodes encode logical concepts (e.g., user inputs, tools, generated arguments); edges with attention-derived weights represent the influence of source concepts on target decisions (Wang et al., 28 Aug 2025).

2. Construction Algorithms: Structural, Statistical, and Semantic Approaches

Graph construction is context-specific, relying on system structure, statistical dependencies, or semantic traces.

Structural or symbolic construction: Program dependence graphs are constructed by parsing source or bytecode, extracting the CFG, and layering data/control dependencies derived from syntactic and static analysis. For example, a Java PDG is built by:
1. Annotating AST nodes,
2. Building control-flow and dominance relations,
3. Adding control dependence via postdominator trees,
4. Adding data dependence for variable usage chains,
5. Merging into a directed PDG (Askarunisa et al., 2012).
Statistical construction: In graphical modeling, edges represent statistically determined dependencies. For nonparametric graphical models, the edge set $E$ is recovered by:
1. Estimating conditional dependence coefficients $\widehat{R}_{i,j}$ for all pairs,
2. Thresholding the magnitude, or using sparse-penalized regression to induce sparsity,
3. Calibrating hyperparameters (e.g., threshold $\lambda$ ) via model selection criteria (Furmańczyk, 2023).
Dynamic/semantic construction: Provenance or dynamic program analysis frameworks instrument execution to assign every value a unique address, dynamically build data-dependence edges during evaluation, and extract input-output relations or linked-input cognacy relations post hoc using reachability and Boolean algebra (Bond et al., 2024).
Attention-tracing for LLM decision graphs: In DDG construction, attention matrices (layer/head-wise) are aggregated using Gaussian-weighting, filtered for sink tokens, then decomposed by input-output span blocks. "Total Attention Energy" aggregates squared attention, normalized to produce edge weights between logical concepts and generated tool-call elements (Wang et al., 28 Aug 2025).

3. Mathematical Properties and Theoretical Guarantees

Rigorous guarantees underpin most dependence-graph constructions.

Consistency and recovery: For nonparametric dependence graphs, uniform consistency of $\widehat{R}_{i,j}$ and proper threshold selection ensures $P\{\widehat{E} = E\} \to 1$ as $n \to \infty$ under mild smoothness and tail assumptions. Parallel arguments hold for penalized recovery (e.g., Glasso, D-trace) in elliptical settings, with established convergence rates for empirical error (Furmańczyk, 2023).
Monotone property preservation: For $p$ -robust random graphs—where every edge's presence probability, conditional on all others, is at least $p$ —the probability of any monotone property is bounded below by its probability in $G(n,p)$ . The inductive coupling argument exploits the chain rule decomposition and backward induction, ensuring stochastic dominance for monotone graph properties (Ranjbar-Mojaveri et al., 2020).
Separation criteria: Summary graphs and local independence graphs possess active-path/Markov properties that precisely characterize which independence and dependence relations can be inferred from the constructed graph. DAG and non-DAG equivalences are characterized via semi-directed cycles and chordless paths (Wermuth, 2010, Didelez, 2012).
Spectral invariants: Linear dependence graphs of vector spaces exhibit spectra that are fully characterized in terms of field size and dimension; the adjacency, Laplacian, and distance matrices yield explicit eigenvalue formulas (Bhuniya et al., 2017).

4. Generalization and Special Constructions

Construction methods adapt to varied mathematical, computational, and modeling settings:

Spanning tree dependence graphs: Edge dependence is quantified by $d_G(e)=\tau_G(e)/\tau(G)$ , where $\tau_G(e)$ is the number of spanning trees containing $e$ . For every rational $p/q$ in $(0,1)$ , explicit bipartite or planar-multigraph constructions realize a graph $G$ with edge dependence $p/q$ . Details hinge on combinatorial use of necklaces of complete bipartite graphs (for bipartite realization) and duality of theta-graphs (for planar multigraphs). Restrictions on planarity yield existence and nonexistence results within certain intervals (Yang et al., 2022).
Sequential and set-based dependency parsing: In non-projective dependency parsers, edge prediction is incremental, with sets (not sequences) of gold arcs determining learning supervision. The transition-based method, with set-based KL loss, naturally supports arbitrary, non-projective graphical structures without a fixed output order, and demonstrates improved empirical performance for non-projective languages (Welleck et al., 2019).
Probabilistic program dependence graphs (PPDGs): PDGs are augmented with conditional probability tables, learned from test-set traces, converting the structural dependence grammar into a full Bayesian network. This supports statistical inference and ranking for tasks such as fault localization (Askarunisa et al., 2012).

5. Applications: Provenance, Fault Localization, Security, and Visualization

Data provenance and transparency: Dynamic dependence graphs capture fine-grained data provenance for interactive exploration, fact-checking, and input-output traceability in data-driven visualization platforms. Boolean-algebraic conjugates and fixpoint operators compute linked-cognacy and input relevance, supporting O(|V|+|E|)-time interactive queries (Bond et al., 2024).
Fault localization: PPDGs support model-based fault localization, augmenting the classical PDG structure with statistical model dependencies quantifying likelihoods of abnormal program states during test execution. Bayesian-network reasoning enables prioritized root-cause analysis (Askarunisa et al., 2012).
Security for LLM-driven agents: DDGs constructed via attention-tracing provide semantic-level provenance and anomaly detection for model-tool interactions, notably in detecting and attributing tool poisoning attacks in standardized model-tool protocols. Empirical results demonstrate high precision and attribution accuracy with no token overhead (Wang et al., 28 Aug 2025).
Graphical Model Recovery and Inference: Conditional-independence graphs inferred nonparametrically (or via correlations/partial correlations) enable inference on multivariate distributions, robust to non-Gaussian, heavy-tailed, or nonlinear dependencies (Furmańczyk, 2023).

6. Limitations, Open Directions, and Theoretical Challenges

Non-monotone events: For $p$ -robust random graphs, lower-bound preservation only holds for monotone properties. Analytical treatment of non-monotone properties (e.g., fixed degree sequences) and explicit bounds for particular dependent models remain open problems (Ranjbar-Mojaveri et al., 2020).
Graph class restrictions: While full rational dependence is realizable via bipartite and planar-multigraphs, for simple planar graphs, dependence is provably lower-bounded away from zero (no $dep(G)\leq 1/3$ ), with the realizability interval for $1/3 < p/q \leq 1/2$ still open (Yang et al., 2022).
Scaling and computational complexity: Highly nonparametric or dynamic graph extraction, particularly with nearest-neighbor conditional dependence estimation or large-scale program tracing, may become prohibitive for large $p$ or codebases, though optimizations (kd-tree acceleration, bucketing, SSA) are employed to mitigate this (Furmańczyk, 2023, Askarunisa et al., 2012).
Extending to continuous time and asymmetric settings: Local independence graphs demand specialized asymmetric separation rules, and technical assumptions on component orthogonality and compensator measurability. Their generalization to cyclic or partially observed settings is nontrivial (Didelez, 2012).

7. Summary Table: Construction Domains and Key Properties

Graph Type	Construction Principle	Key Reference
Conditional-independence	Estimate $\widehat{R}_{i,j}$ or $K_{ij}$	(Furmańczyk, 2023)
Program Dependence	Static/dynamic analysis; CFG/PDG	(Askarunisa et al., 2012)
Summary Graphs	Edge-inducing path closure; matrix algebra	(Wermuth, 2010)
p-robust Random Graphs	Marginal lower-bound, coupling construction	(Ranjbar-Mojaveri et al., 2020)
Spanning Tree Dependence	Combinatorial necklace / theta-graph duality	(Yang et al., 2022)
LLM Decision Provenance	Attention-based weighted graph over concepts	(Wang et al., 28 Aug 2025)
Data Provenance/Vis	Dynamic runtime value instrumentation	(Bond et al., 2024)
Local Independence	Stochastic process compensators; δ-separation	(Didelez, 2012)
Linear Dep. Graphs	Vector space adjacency via scalar dependence	(Bhuniya et al., 2017)

Each methodology is grounded in precise mathematical criteria, with algorithmic implementation dictated by the nature of dependency, statistical structure, and inferential objective. Continued advances in scalable algorithms, theory for expressive dependence representations, and cross-domain adaptations of dependence-graph construction remain active areas of research.

Markdown Upgrade to Chat

References (10)

A construction of a graphical model (2023)

Fault Localization for Java Programs using Probabilistic Program Dependence Graph (2012)

Probability distributions with summary graph structure (2010)

Asymmetric separation for local independence graphs (2012)

Cognacy Queries over Dependence Graphs for Transparent Visualisations (2024)

MindGuard: Tracking, Detecting, and Attributing MCP Tool Poisoning Attack via Decision Dependence Graph (2025)

A General Dependency Structure for Random Graphs and Its Effect on Monotone Properties (2020)

On the spectrum of linear dependence graph of finite dimensional vector spaces (2017)

On spanning tree edge denpendences of graphs (2022)

10.

Sequential Graph Dependency Parser (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dependence Graph Construction.