Process Call Graphs in Software Systems
- Process Call Graphs are graph-based representations that encode runtime or static dependencies between software entities, offering a clear framework for analysis.
- They are pivotal in applications such as software comprehension, testing, malware detection, and microservice resource management.
- Extraction techniques, including static analysis, dynamic instrumentation, and reverse engineering, reveal key graph properties like hub functions, clustering, and centrality.
Process Call Graphs (PCGs) are graph-based representations designed to encode the runtime or static dependencies between entities, typically functions or processes, in software systems. PCGs provide a rigorous framework for analyzing, classifying, and modeling the relational and structural aspects of software behavior, both at the intra-process and inter-process levels. Their applications span software comprehension, security analysis, malware detection, ecosystem-scale dependency analysis, and microservices resource management.
1. Definitions and Core Formalism
The concept of a PCG varies according to context. In software analysis, PCGs generally refer to:
- Static Call Graphs: Directed graphs with nodes representing program functions and edges denoting statically resolved caller–callee relations. Construction involves analysis of source or binary code, possibly referencing trees or abstract syntax graphs (0803.4025).
- Process Interaction Graphs: Nodes represent spawned operating system processes; edges denote process creation events and inter-process communication during program execution (as observed in dynamic malware sandboxes) (Aneja et al., 11 Oct 2025).
- Pairwise Compatibility Graphs (PCGs): A more abstract formalism. Given a tuple —where is a tree (the "witness" tree), assigns non-negative edge weights, and are bounds—a graph is a PCG if iff the tree distance between is in (Xiao et al., 2018). Restricting to a star structure yields star-PCGs (Xiao et al., 2018, Monti et al., 2022).
Formally, for PCGs extracted from dynamic execution:
- with as process nodes, as directed edges from process creation ( if process A spawns B) or IPC events.
For static call graphs:
- where are function identifiers and the set of call relations, sometimes labeled with additional intraprocedural metadata.
For pairwise compatibility graphs:
- distance in weighted tree (between leaves ) is in interval .
2. Graph-Theoretic Characterizations
PCGs exhibit rich graph-theoretic properties:
- Degree Distributions: For function call graphs, in-degree (number of callers per function) follows a power-law () while out-degree (number of callees per function) is exponentially distributed. This manifests "hub" functions (high indegree) and "distributive" callers (0803.4025).
- Clustering Coefficients and Profiles: The normalized local clustering coefficient
quantifies local density. Clustering profiles analyze clustering at hops , revealing local subsystem aggregation (0803.4025).
- Centrality Measures: Betweenness centrality identifies critical nodes traversed by many shortest paths, instrumental for targeted testing (0803.4025).
- Assortativity and Correlations: Pearson coefficient for degree correlations indicates network stratification, with empirical differences between imperative and functional languages (0803.4025).
- Scale-Free and Scale-Richness: The scale-free metric
with typically approaches zero, signifying "scale-rich" (high-degree nodes connect to low-degree nodes) rather than "scale-free" fractality (0803.4025).
Process call graphs derived from PCG abstractions may inherit similar degree/statistical features, but detailed properties can depend on extraction modality (static, dynamic, hybrid).
3. Extraction and Construction Methodologies
- Static Analysis: Involves parsing source or binary code to resolve functions and their invocations. Techniques include class/method/property signature matching (with formal grammar-like rules) (Veenendaal et al., 2016), or leveraging disassembly and symbolic labeling in malware binaries (Kinable et al., 2010). Recent methods automate signature extraction for multi-layer enterprise code, reporting 78.26% accuracy and 90% time reduction compared to manual search (Veenendaal et al., 2016).
- Dynamic Analysis: PCGs are inferred by instrumenting or sandboxing executables (e.g., Any.Run), capturing process creation and communication events over a defined interval (Aneja et al., 11 Oct 2025).
- Reverse Engineering from Unknown ISAs: Opcode candidacy heuristics extract call/return patterns using parameterized algorithms, optimizing the OCP-Score to identify plausible control flow edges without prior ISA knowledge (Pettersen et al., 15 Jan 2024).
- Incremental Ecosystem-Scale Generation: Partial call graphs are generated per library/package, stored, and later stitched using a universal class hierarchy. This modular approach addresses computational constraints at scale (Keshani, 2021).
4. Applications and Impact
- Software Comprehension and Testing: Clustering in call graphs reflects modular structure; high betweenness nodes pinpoint critical control points for prioritized testing. Graph measures such as epidemic thresholds (, as adjacency matrix eigenvalue) model bug propagation, confirming larger graphs are more fragile (0803.4025).
- Malware Detection and Classification: Graph matching (minimizing graph edit distance), cluster analysis with k-medoids and DBSCAN, enables grouping variants into malware families. Dynamic PCGs (process interaction graphs) encode behavioral signals complementary to static FCGs, enhancing detection via joint embeddings (see GeminiNet architecture) (Kinable et al., 2010, Aneja et al., 11 Oct 2025).
- Microservice Resource Management: Fine-grained call graphs (capturing repeated calls, interface diversity, sibling microservice effects) enable more efficient scaling, with resource efficiency gains up to 44.8% vs. baseline (Du et al., 26 Dec 2024).
- Program Evolution Analytics: Mining evolving PCGs yields evolution rules (CGERs) and subgraph motifs (CGESs); their stability across versions supports dependency tracking and impact analysis in large systems (Chaturvedi, 2022).
- Visualization and Ensemble Analysis: Advanced techniques (ensemble-Sankey diagrams, boxplot overlays) allow exploration of variability and performance bottlenecks in large ensembles of runtime PCGs (Kesavan et al., 2020).
5. Recognition Algorithms and Structure Theory
- Star-PCGs and Star--PCGs: Recognizing whether a given graph admits a star-PCG representation (witness tree is a star) hinges on discovering a gap-free vertex ordering and verifying neighborhood consecutiveness. Polynomial-time algorithms () now exist for star-PCGs (Xiao et al., 2018).
- Star--PCG Framework: For any graph , there exist weights and mutually exclusive intervals so that . The star number denotes the minimal needed; exact results are given for small graphs, cycles ( for ), and multidimensional grids (Monti et al., 2022).
- Algorithmic Challenges: Complexity for recognizing star--PCGs () is open; forbidden pattern sequences constrain constructible witness trees.
6. Open Problems and Research Directions
- Comprehensive PCG Characterization: Full structural characterization for generalized (non-star) witness trees remains unresolved, especially with combinatorial explosion in possible distance assignments (Xiao et al., 2018, Monti et al., 2022).
- Recognition and Complexity: Determining efficient recognition algorithms for star--PCGs (), as well as acyclic graphs requiring higher star numbers, is an open question (Monti et al., 2022).
- Benchmark Generation for Microservices: Accurately simulating stochastic call graphs for benchmarking resource allocation and QoS in microservice environments with production-level characteristics (Du et al., 26 Dec 2024).
- Joint Embedding Models in Security and Analysis: Effectively synthesizing static/dynamic call graphs for robust malware detection, resilience against adversarial obfuscation, and improved prediction of software anomalies (Aneja et al., 11 Oct 2025).
- Scalable Visualization and Analytics: Formalizing metrics and visual encodings for large call graph ensembles, enabling comparative runtime analysis in high-performance computing contexts (Kesavan et al., 2020).
7. Contextual and Cross-Domain Significance
Process Call Graphs, both as static function graphs and dynamic process interaction graphs, underpin a wide range of analyses across software engineering, security, evolutionary analytics, and resource optimization. Their intrinsic graph-theoretic properties—power-law degree distributions, clustering, centrality, and stratification—are largely language-independent and persist across domains. This universality makes PCGs a foundational abstraction for rigorous software reasoning, facilitates cross-platform and cross-domain tool development, and provides quantitative metrics for vulnerability, maintainability, and performance assessment.