Runtime Dependency Graph (RDG)

Updated 12 March 2026

Runtime Dependency Graph (RDG) is a dynamic, directed graph that captures real-time invocations and dependencies among code units.
It leverages runtime instrumentation techniques (e.g., Python’s sys.setprofile and bytecode injection) to record actual interactions across functions, microservices, and transactional systems.
RDGs are applied in TDD, microservice benchmarking, and concurrency control, offering actionable insights for performance optimization and resource planning.

A Runtime Dependency Graph (RDG) is a dynamically constructed, directed graph that precisely models execution-time dependencies between code units such as functions, methods, classes, microservices, or database actions. Unlike static call graphs derived from code structure, the RDG records actual invocations and operational dependencies observed during the execution of a program, workload, or transactional batch. RDGs are used for diverse purposes, including software engineering data synthesis, microservice benchmarking, dependency introspection, and concurrency control in distributed transaction processing. RDG construction, structure, and usage vary substantially across frameworks and domains, but all implementations emphasize runtime observation as opposed to static analysis.

1. Formal Definitions and Core Data Structures

An RDG is formally described as a tuple $(V, E)$ , where $V$ is a set of runtime-activated nodes and $E \subseteq V \times V$ represents directed dependency edges. Node and edge semantics are instantiated differently by each system:

SWE-Flow (Zhang et al., 10 Jun 2025):
- $V(f) \subseteq \mathrm{FN}$ , with each function node $v = (\texttt{filepath}, \texttt{lineno}, \texttt{function\_name})$ captured during a test $f$ .
- $E(f) \subseteq V(f) \times V(f)$ records a directed edge $(u \to v)$ iff $u$ directly calls $v$ at runtime within $f$ .
- $\mathcal{S}_{\mathrm{RDG}} = \{(f, \mathrm{RDG}(f)) \mid f \in \mathcal{F}_{\mathrm{TTFN}}\}$ stores all per-test RDGs.
DGG for Microservices (Du et al., 2024):
- $V = \{ (\text{ms}, \text{iface}, \text{label}) \}$ where each node is a microservice-interface triple, extended by operational labels.
- $E = \{ (u \to v; w; t) \}$ encodes invocation $(u \to v)$ with weight $w$ denoting repeat count and $t$ as communication mode (e.g., http, rpc).
Classport (Cofano et al., 23 Oct 2025):
- $V = \text{Dep} \cup \{\text{App}\}$ , with application and dependency nodes.
- $E_{\text{app}}$ and $E_{\text{dep}}$ connect application to used dependencies, and inter-dependency usage with $(d_1, d_2)$ iff at runtime a class from $d_1$ calls one from $d_2$ .
DistDGCC (Yao et al., 2017):
- $V$ comprises “record-action” vertices, each $(T.\text{id}, \text{tableName}, \text{primaryKey}, \text{opType}, \text{parameters})$ for a transactional operation.
- $E$ includes logical (program-order), temporal (conflict), and node (cross-partition) dependencies, structuring the per-batch RDG as a DAG.

2. Construction Algorithms and Instrumentation

RDG generation requires runtime instrumentation to log dependencies:

SWE-Flow:

Utilizes Python's sys.setprofile API. The runtime hooks each function entry/exit, maintaining a stack $S$ of activation records. For every function call in the project, it records caller/callee edges in $E$ . Time and space complexity are $O(C)$ , where $C$ counts call events within test execution.

DGG:

Ingests production tracing records (e.g., Jaeger, Alibaba RPC), extracting a per-request call graph CG = $(V, E)$ . Multiple per-request call graphs are merged per-service $S$ to yield $G_S = (V_S, E_S)$ , optionally collapsing repeated edges and tracking invocation weights. The final generator samples synthetic topologies using conditional probabilities $P(u \to C \mid s, d)$ , where $u$ is a parent, $C$ a children-set, $s$ siblings, and $d$ depth, derived from empirical traces.

Classport:

Bytecode instrumentation injects dependency metadata (@ClassportInfo annotation) into every Java class at build time. At runtime, a Java Agent collects the set of dependencies actually traversed, building the RDG by mapping observed method entries and dynamic invocations to dependencies via the mapping $M : C \rightarrow \text{Dep}$ .

DistDGCC:

Parses each transaction batch into record-actions, maintaining per-record state ( $\text{lastWrite}[r]$ , $\text{readers}[r]$ ). It incrementally constructs logical, temporal, and node-dependency edges, representing each batch as an acyclic RDG. This enables append-only, lock-free construction and massive parallelism for batch processing.

3. Applications Across Domains

Framework	RDG Node Type	RDG Edge Semantics	Principal Application
SWE-Flow	Function (Python)	Runtime call (direct)	TDD data synthesis, increments
DGG	Microservice-Interface	Runtime invocation, weighted, typed	Microservice call benchmarking
Classport	Java Dependency	Dependency usage at runtime	Supply chain security, introspection
DistDGCC	Record-Action	Conflict, logical, node dependency	Concurrency control, recovery

Software Engineering (SWE-Flow): RDG defines the mapping from unit tests to the sequence-minimal set of functions needing implementation at each TDD step. This powers the derivation of incremental development schedules and enables faithful, test-driven code/task synthesis (Zhang et al., 10 Jun 2025).
Microservices (DGG): Fine-grained RDGs, capturing interface dynamics and invocation modality, enable realistic benchmarking and drive autoscalers that tune resource allocation based on current execution structure (FineGrained-Scale), attaining up to 44.8% CPU resource savings while maintaining QoS (Du et al., 2024).
Java Supply Chain Security (Classport): RDG introspection exposes which dependencies are actually active at runtime, supporting integrity validation, forensic auditing, and detection of unused or vulnerable third-party code (Cofano et al., 23 Oct 2025).
Distributed Transaction Processing (DistDGCC): RDGs synchronize, parallelize, and log distributed transactions. This DAG-centric methodology eliminates the need for classical locking, allows fast parallel recovery, and fundamentally underpins dependency logging and conflict-serializable concurrency (Yao et al., 2017).

4. Illustrative Examples

SWE-Flow: For a project with functions add and mul and corresponding tests, two RDGs are generated:
- For test_sum: $V = \{\text{test\_sum}, \text{add}\}$ , $E = \{(\text{test\_sum} \to \text{add})\}$
- For test_area: $V = \{\text{test\_area}, \text{mul}\}$ , $E = \{(\text{test\_area} \to \text{mul})\}$
- Deeper chains appear for nested calls, e.g., test_sum → add → log_call (Zhang et al., 10 Jun 2025).
DGG: Each microservice request trace yields a call graph CG with edge weights representing repeated calls and labels encoding communication mode. Merging call graphs results in a service-level RDG with synthetic graph generation driven by cluster-wise conditional probabilities reflective of production traces (Du et al., 2024).
Classport: At runtime, a HashSet $S$ records observed dependencies. Edges from App → Dep and Dep → Dep are reconstructed from call data, effectively forming a star or general usage graph for the executed workload (Cofano et al., 23 Oct 2025).
DistDGCC: Each batch of transactions forms a DAG where vertices are per-tuple record actions; edges of temporal and logical flavor encode all ordering necessary for serializability and recovery. Logging and recovery operate directly over this structure (Yao et al., 2017).

5. Analytical Properties and Performance

The asymptotic and empirical properties of RDG construction and usage are as follows:

Time and space complexity are dominated by the number of events or actions traced ( $O(C)$ in SWE-Flow, $O(|V| + |E|)$ elsewhere).
In microservices, DGG-generated RDGs yield distributions (depth, node count, topology) nearly indistinguishable from real production traces, with Jensen–Shannon divergence for depth and node-count at 0.034 and 0.053, respectively (substantially better than topology-agnostic benchmarks) (Du et al., 2024).
In transactional systems, batch-wise RDG construction enables 2–3× higher throughput versus lock-based approaches under high contention, with dependency logging supporting up to 5× faster parallel recovery (Yao et al., 2017).
For Java build systems, injection of runtime dependency metadata by Classport adds minimal build and runtime overhead (1–2% of build time, 0.74–4.27% runtime slowdown, 10–29% jar size increase), with perfect recall/precision for actively executed dependencies (Cofano et al., 23 Oct 2025).

6. Comparative Perspective and Limitations

RDGs fundamentally contrast with static dependency analyses by reflecting only executed paths and actual runtime behavior, thus providing minimal and accurate coverage sets for data synthesis, resource planning, and recovery.

However, observed limitations include:

Incompleteness if workload does not activate all potential paths or classes (Classport: misses dependencies absent from execution) (Cofano et al., 23 Oct 2025).
Potential memory overhead for large-scale traces or large transactional batches in microservices and distributed databases.
In distributed transactions, the requirement for batch arrival and DAG closure can create latency trade-offs under latency-sensitive workloads (Yao et al., 2017).
Metadata injection and annotation (Classport) can invalidate package signatures, raising deployment and security complexities (Cofano et al., 23 Oct 2025).

Future work across systems includes expanding introspection coverage, supporting streaming or incremental execution models, and securing or validating metadata against runtime tampering. In microservice settings, further refinement of RDG-based autoscaling strategies remains an open area for efficiency gains (Du et al., 2024).

7. Integration into Systems and Workflows

RDGs serve as the orchestration backbone in several advanced frameworks:

SWE-Flow leverages the RDG for incremental deconstruction of codebases, derivation of verifiable TDD tasks, and structured repository reconstruction from test-driven evidence (Zhang et al., 10 Jun 2025).
DGG integrates RDGs in a data-driven loop to drive both benchmarking and online resource scaling, with clustering enabling adaptive, production-realistic benchmarking (Du et al., 2024).
Classport embeds RDG-centric introspection natively into Java application lifecycles using agent-based runtime data collection for security and supply-chain analyses (Cofano et al., 23 Oct 2025).
DistDGCC deploys per-batch RDGs for lock-free, deterministically parallel execution and recovery—linking concurrency control, logging, and replay into a single, highly effective operational paradigm (Yao et al., 2017).