Dependence Graph Construction
- Dependence graph construction is a framework that models dependencies between system elements using nodes and edges to capture statistical, logical, and programmatic relations.
- It integrates methods from static and dynamic analysis, enabling applications such as fault localization, data provenance, and secure interaction tracing.
- Advanced construction algorithms leverage conditional independence and attention-tracing techniques to ensure precise graph recovery and scalable system analysis.
Dependence graph construction refers to methodologies and formal frameworks for representing, extracting, and analyzing dependencies—statistical, logical, programmatic, semantic, or structural—between components of complex systems as graphs. Vertices in a dependence graph encode entities (variables, functions, data, processes, or concepts), and edges express a directed, undirected, or weighted notion of influence, dependence, or information flow. Across domains, dependence graph construction enables rigorous reasoning about independence, causality, fault localization, graphical models, provenance, and more. This article surveys foundational principles, canonical constructions, and modern methodologies, grounding each in primary literature.
1. Formal Definitions and Canonical Dependence Graphs
Dependence graphs are defined by a set of nodes, representing system elements, and a specified edge semantics encoding dependence relations. Core categories include:
- Conditional independence graphs: Vertices correspond to random variables. Edges encode conditional dependence, often established via pairwise Markov properties (e.g., ). Construction may be nonparametric, using the conditional dependence coefficient , leading to a graph (Furmańczyk, 2023).
- Program dependence graphs (PDGs): Nodes represent program statements. Edges may encode data dependence (flow of values) or control dependence (flow of execution). Construction is staged: extract a control-flow graph (CFG), infer postdominator-based control dependencies, and identify data dependencies via variable definitions and uses (Askarunisa et al., 2012).
- Summary graphs: Encapsulate conditional independence structures after marginalizing and conditioning, preserving pertinent independence properties among a reduced variable set. Construction is governed by edge-inducing rules and matrix-algebraic operations (partial closure/inversion), often starting from a regression (chain) graph (Wermuth, 2010).
- Local independence graphs: Nodes are stochastic processes; a directed edge encodes that the intensity process of is not locally independent of , as formalized via compensators and Doob–Meyer decompositions. Higher-order asymmetric separation (δ-separation) allows reading off complex local independences (Didelez, 2012).
- Dependence graphs from program execution: Vertices are runtime data values; edges are dynamically constructed as per call context and value creation, supporting provenance and data transparency through causal slices and Jonsson-Tarski conjugate operators (Bond et al., 2024).
- Decision dependence graphs (DDGs): For LLM-based decision provenance, nodes encode logical concepts (e.g., user inputs, tools, generated arguments); edges with attention-derived weights represent the influence of source concepts on target decisions (Wang et al., 28 Aug 2025).
2. Construction Algorithms: Structural, Statistical, and Semantic Approaches
Graph construction is context-specific, relying on system structure, statistical dependencies, or semantic traces.
- Structural or symbolic construction: Program dependence graphs are constructed by parsing source or bytecode, extracting the CFG, and layering data/control dependencies derived from syntactic and static analysis. For example, a Java PDG is built by:
- Annotating AST nodes,
- Building control-flow and dominance relations,
- Adding control dependence via postdominator trees,
- Adding data dependence for variable usage chains,
- Merging into a directed PDG (Askarunisa et al., 2012).
Statistical construction: In graphical modeling, edges represent statistically determined dependencies. For nonparametric graphical models, the edge set is recovered by:
- Estimating conditional dependence coefficients for all pairs,
- Thresholding the magnitude, or using sparse-penalized regression to induce sparsity,
- Calibrating hyperparameters (e.g., threshold ) via model selection criteria (Furmańczyk, 2023).
Dynamic/semantic construction: Provenance or dynamic program analysis frameworks instrument execution to assign every value a unique address, dynamically build data-dependence edges during evaluation, and extract input-output relations or linked-input cognacy relations post hoc using reachability and Boolean algebra (Bond et al., 2024).
- Attention-tracing for LLM decision graphs: In DDG construction, attention matrices (layer/head-wise) are aggregated using Gaussian-weighting, filtered for sink tokens, then decomposed by input-output span blocks. "Total Attention Energy" aggregates squared attention, normalized to produce edge weights between logical concepts and generated tool-call elements (Wang et al., 28 Aug 2025).
3. Mathematical Properties and Theoretical Guarantees
Rigorous guarantees underpin most dependence-graph constructions.
- Consistency and recovery: For nonparametric dependence graphs, uniform consistency of and proper threshold selection ensures as under mild smoothness and tail assumptions. Parallel arguments hold for penalized recovery (e.g., Glasso, D-trace) in elliptical settings, with established convergence rates for empirical error (Furmańczyk, 2023).
- Monotone property preservation: For -robust random graphs—where every edge's presence probability, conditional on all others, is at least —the probability of any monotone property is bounded below by its probability in . The inductive coupling argument exploits the chain rule decomposition and backward induction, ensuring stochastic dominance for monotone graph properties (Ranjbar-Mojaveri et al., 2020).
- Separation criteria: Summary graphs and local independence graphs possess active-path/Markov properties that precisely characterize which independence and dependence relations can be inferred from the constructed graph. DAG and non-DAG equivalences are characterized via semi-directed cycles and chordless paths (Wermuth, 2010, Didelez, 2012).
- Spectral invariants: Linear dependence graphs of vector spaces exhibit spectra that are fully characterized in terms of field size and dimension; the adjacency, Laplacian, and distance matrices yield explicit eigenvalue formulas (Bhuniya et al., 2017).
4. Generalization and Special Constructions
Construction methods adapt to varied mathematical, computational, and modeling settings:
- Spanning tree dependence graphs: Edge dependence is quantified by , where is the number of spanning trees containing . For every rational in , explicit bipartite or planar-multigraph constructions realize a graph with edge dependence . Details hinge on combinatorial use of necklaces of complete bipartite graphs (for bipartite realization) and duality of theta-graphs (for planar multigraphs). Restrictions on planarity yield existence and nonexistence results within certain intervals (Yang et al., 2022).
- Sequential and set-based dependency parsing: In non-projective dependency parsers, edge prediction is incremental, with sets (not sequences) of gold arcs determining learning supervision. The transition-based method, with set-based KL loss, naturally supports arbitrary, non-projective graphical structures without a fixed output order, and demonstrates improved empirical performance for non-projective languages (Welleck et al., 2019).
- Probabilistic program dependence graphs (PPDGs): PDGs are augmented with conditional probability tables, learned from test-set traces, converting the structural dependence grammar into a full Bayesian network. This supports statistical inference and ranking for tasks such as fault localization (Askarunisa et al., 2012).
5. Applications: Provenance, Fault Localization, Security, and Visualization
- Data provenance and transparency: Dynamic dependence graphs capture fine-grained data provenance for interactive exploration, fact-checking, and input-output traceability in data-driven visualization platforms. Boolean-algebraic conjugates and fixpoint operators compute linked-cognacy and input relevance, supporting O(|V|+|E|)-time interactive queries (Bond et al., 2024).
- Fault localization: PPDGs support model-based fault localization, augmenting the classical PDG structure with statistical model dependencies quantifying likelihoods of abnormal program states during test execution. Bayesian-network reasoning enables prioritized root-cause analysis (Askarunisa et al., 2012).
- Security for LLM-driven agents: DDGs constructed via attention-tracing provide semantic-level provenance and anomaly detection for model-tool interactions, notably in detecting and attributing tool poisoning attacks in standardized model-tool protocols. Empirical results demonstrate high precision and attribution accuracy with no token overhead (Wang et al., 28 Aug 2025).
- Graphical Model Recovery and Inference: Conditional-independence graphs inferred nonparametrically (or via correlations/partial correlations) enable inference on multivariate distributions, robust to non-Gaussian, heavy-tailed, or nonlinear dependencies (Furmańczyk, 2023).
6. Limitations, Open Directions, and Theoretical Challenges
- Non-monotone events: For -robust random graphs, lower-bound preservation only holds for monotone properties. Analytical treatment of non-monotone properties (e.g., fixed degree sequences) and explicit bounds for particular dependent models remain open problems (Ranjbar-Mojaveri et al., 2020).
- Graph class restrictions: While full rational dependence is realizable via bipartite and planar-multigraphs, for simple planar graphs, dependence is provably lower-bounded away from zero (no ), with the realizability interval for still open (Yang et al., 2022).
- Scaling and computational complexity: Highly nonparametric or dynamic graph extraction, particularly with nearest-neighbor conditional dependence estimation or large-scale program tracing, may become prohibitive for large or codebases, though optimizations (kd-tree acceleration, bucketing, SSA) are employed to mitigate this (Furmańczyk, 2023, Askarunisa et al., 2012).
- Extending to continuous time and asymmetric settings: Local independence graphs demand specialized asymmetric separation rules, and technical assumptions on component orthogonality and compensator measurability. Their generalization to cyclic or partially observed settings is nontrivial (Didelez, 2012).
7. Summary Table: Construction Domains and Key Properties
| Graph Type | Construction Principle | Key Reference |
|---|---|---|
| Conditional-independence | Estimate or | (Furmańczyk, 2023) |
| Program Dependence | Static/dynamic analysis; CFG/PDG | (Askarunisa et al., 2012) |
| Summary Graphs | Edge-inducing path closure; matrix algebra | (Wermuth, 2010) |
| p-robust Random Graphs | Marginal lower-bound, coupling construction | (Ranjbar-Mojaveri et al., 2020) |
| Spanning Tree Dependence | Combinatorial necklace / theta-graph duality | (Yang et al., 2022) |
| LLM Decision Provenance | Attention-based weighted graph over concepts | (Wang et al., 28 Aug 2025) |
| Data Provenance/Vis | Dynamic runtime value instrumentation | (Bond et al., 2024) |
| Local Independence | Stochastic process compensators; δ-separation | (Didelez, 2012) |
| Linear Dep. Graphs | Vector space adjacency via scalar dependence | (Bhuniya et al., 2017) |
Each methodology is grounded in precise mathematical criteria, with algorithmic implementation dictated by the nature of dependency, statistical structure, and inferential objective. Continued advances in scalable algorithms, theory for expressive dependence representations, and cross-domain adaptations of dependence-graph construction remain active areas of research.