Dependency Management in OpenStack

Updated 10 August 2025

Dependency management in OpenStack is a framework for capturing, representing, and resolving interdependent software components using techniques such as state graphs, declarative modeling, and hypergraph solvers.
Declarative modeling and automated verification methodologies—ranging from SOSG and TOSCA to ML-driven predictions and contract-based proxies—streamline deployment and ensure system robustness.
Practical implementations integrate multi-layered dependency analyses, contract enforcement, and collaborative code reviews to proactively manage risks and reduce recovery time in large-scale environments.

Dependency management in OpenStack is a multidimensional challenge that spans software component interrelationships, runtime state propagation, package resolution, architectural stability, operational reliability, and collaborative development processes. OpenStack’s large-scale deployments and multi-layered architecture—often integrating with platforms like Ceph, Kubernetes, or federated clouds—require rigorous approaches to capturing, analyzing, and maintaining dependency information. Contemporary research illuminates a diversity of strategies, from property graphs and declarative modeling to formal hypergraph solvers and machine learning-based prediction of change dependencies.

1. Capturing and Representing Dependencies: State Graphs and System Integration

The System Operation State Graph (SOSG) formalism is a property graph method that consolidates heterogeneous system states and events from OpenStack and Ceph into a unified, traversable graph structure (Xiang et al., 2016). In SOSG, three vertex types—entities (e.g., VM, block device), states, and runtime events—are joined by two classes of edges:

Spatial edges connect entities with associated states and events, explicitly capturing cross-module dependencies (e.g., a VM’s linkage to a Ceph volume).
Temporal edges encode evolution by ordering state/event vertices by timestamp, facilitating analysis of operational progression.

The SOSG is algorithmically constructed in a four-step process: parsing raw logs to extract key-value pairs (forming state/event vertices), statistical identifier discovery (yielding entity vertices), the addition of spatial edges (based on “mentions”), and timeline sorting for temporal edges. The resultant graph supports scalable, parallel construction on platforms like GraphX, demonstrated on a 125-node cluster with 43 million vertices and 57 million edges assembled in ≈25 minutes.

Queries about dependency impacts (e.g., which VMs are affected by a hardware failure) translate to graph traversals—typically breadth-first search (BFS)—that follow entity links through the dependency network across abstraction layers. This approach not only enables operational visibility but also supports anomaly detection by measuring the similarity of subgraphs (VM dependency sets) using generalized Jaccard distance:

$d(S_i, S_j) = 1 - \frac{|T(S_i) \cap T(S_j)|}{|T(S_i) \cup T(S_j)|},$

where $T(S)$ is the set of dependency triplets in a VM’s subgraph. Deviations signal potential hidden consistency issues.

2. Declarative Modeling and Orchestration of Dependencies

Declarative modeling, as developed in the GARR Cloud platform (&&&1&&&), utilizes high-level “charms” and YAML bundles (via Juju) to encode service dependencies and resource constraints. These declarative definitions specify requirements, provided interfaces, and resource constraints (e.g., memory, CPU, storage) for each service:

$A : \{\text{requires}: \{I_B, I_C\},\; \text{constraints}:\; \{\text{memory} \geq x,\, \text{vCPUs} \geq y,\, \text{storage} \geq z\}\}$

This modeling abstracts deployment “what” from “how,” letting the orchestration engine converge the system state by executing idempotent hooks. In a federated OpenStack deployment, regions are linked via a central Keystone identity service and dashboard, with placement constraints (e.g., availability zones, host aggregates) and project-level isolation enforced via hierarchical quotas.

In heterogeneous clouds, orchestration is standardized through TOSCA templates (Caballer et al., 2017), which describe the application topology as a dependency graph $G=(V,E)$ of nodes and edges (dependencies). TOSCA templates are dynamically translated by Heat Translator into OpenStack-native HOT documents, automating deployment order and configuration of both VM and container resources. This abstraction supports elastic scaling and multi-provider interoperability while handling both static and dynamic dependency resolution.

3. Automated Dependency Verification and Service Reliability

Automated dependency management extends beyond deployment to service-level correctness and reliability. Model-driven approaches wrap critical cloud services (e.g., Keystone) with stateful, contract-based proxies that enforce preconditions and postconditions through formal models—UML for static structure, OCL for behavioural invariants (Rauf et al., 2018):

$\text{PreCondition}(\text{POST}(\ldots)): [\text{logic involving token, user credential, scope, etc.}]$

$\text{PostCondition}(\text{POST}(\ldots)): [\text{expected token, catalog values}]$

The Django Web Framework is used to implement the wrapper with clearly separated models and views, enabling robust contract enforcement on RESTful endpoints. Such instrumentation validates dependencies (e.g., ensuring a user is authenticated before issuing a token request) and supports automated detection of security and functional discrepancies.

Industrial-scale production environments employ a Dependency Management System (DMS) (Yang et al., 2022) integrating tracing, config parsing, and manual reports to generate a dependency graph. Service-level (deployment, runtime, and operational) and microservice-level dependencies (static, environment, dynamic interactions) are quantified in terms of intensity using time-series similarity:

$I = \frac{1}{T} \sum_{t=1}^{T} \text{sim}(x_{\text{caller}}(t),\, x_{\text{callee}}(t)),$

where $x_{\text{caller}}(t), x_{\text{callee}}(t)$ are status vectors at time $t$ . This enables both proactive identification of hazardous dependencies and reactive mitigation during cascading failures—demonstrated to reduce recovery time by over 60% in Huawei Cloud.

4. Package Ecosystem Challenges and Hypergraph-Based Resolution

OpenStack’s reliance on vast ecosystem dependencies exposes it to complex challenges catalogued in large empirical studies (Mens et al., 27 Sep 2024): technical lag (outdated packages, slow update propagation), breaking changes (11–12% of upgrades introduce incompatibilities), semantic versioning errors, NP-complete dependency resolution, bloated and trivial dependencies, supply chain vulnerabilities, and legal issues from license incompatibilities.

HyperRes (Gibb et al., 12 Jun 2025) formalizes dependency resolution as a hypergraph problem, unifying cross-ecosystem and system-level dependencies:

$P = \{(n,v)|n\in N,v\in V_n\}$

$\text{deps}: P \rightarrow 2^{2^P}$

$\forall p\in V(G),\, \forall d \in \text{deps}(p),\, \exists!e\in d,\, (p,e)\in E(G),\, e\in V(G)$

This approach enables interoperability and specialization—integrating Python, OS, and hardware dependencies—while recognizing the NP-completeness of the constraint satisfaction problem.

5. Code Review and Change Dependency Management

Architecture erosion in OpenStack is closely linked to dependency-related “smells,” particularly cyclic dependencies (Li et al., 2022). Code reviews in the Nova and Neutron projects surfaced cyclic dependency symptoms in approximately 11.9% of erosion-related discussions. These cycles are detected by traversing the dependency graph $G=(V,E)$ :

$\exists v_0, v_1, \ldots, v_k \in V:\, v_0 = v_k;\; \forall\, i\in[0, k-1],\,(v_i, v_{i+1}) \in E$

Mitigation typically involves refactoring shared functionality into helper modules to break cycles. Code review processes, documented in 21,274 comments over five years, illustrate a positive trend: most flagged dependency issues are resolved, and fewer new erosion symptoms arise as architectural awareness and practices mature.

Automated cross-patch collaboration via patch linkage (Wang et al., 2022) explicitly renders dependencies between code changes, improving awareness, coordination, and the quality of reviews and subsequent changes. Collaboration metrics ( $P(\text{Source}\to\text{Target})=|S\cap T'|/|S|$ ) quantify the probability of a contributor to one patch engaging with a linked patch, with empirical evidence demonstrating significant increases in active cross-patch participation when linkage requests are explicit.

Recent advances include ML-driven prediction of change dependencies in OpenStack (Arabat et al., 7 Aug 2025). Dependency links (“Depends-On,” “Needed-By”) enable Zuul-based CI/CD systems to orchestrate builds and feature deployment. Yet, the manual identification of dependencies is onerous (median of 57.12 hours searching among 463 changes), with more than half of dependency links only established during code review or after failures. The proposed approach employs two ML models:

Dependent change classifier (36 features; AUC ≈ 79.33%; Brier ≈ 0.11).
Dependent pair predictor (82 features including pairwise metrics; AUC ≈ 91.89%; Brier ≈ 0.014).

This methodology reliably surfaces candidate dependencies for developers, with high recall in top-k ranking of suggestions, thus reducing manual effort and latency.

6. Operational, Security, and Federated Contexts

A number of deployments illustrate practical dependency management under complex operational constraints. Stratus (Bollig et al., 2018) integrates OpenStack and Ceph to support controlled-access data compliance, leveraging modular service orchestration (Nova, Cinder, Keystone, etc.) with strict resource quotas, firewall rules, and disk/image lifecycle automation (DiskImage Builder, Cloud-Init, Puppet). Ceph pool management—including erasure coding and S3 cache policies—interacts directly with OpenStack APIs, building a multi-tiered dependency chain that supports both high security and long-running jobs.

Automated OS and container deployment leveraging Preseed, racadm, and Kolla-based containerization (Gibelin et al., 2019) ensure reproducibility and scalability, with declaratively managed YAML configurations versioned in GitLab for rapid recovery and upgrade. VXLAN encryption guarantees project data isolation within the OpenStack platform.

Interactive cluster visualisation through digital twins (Gomes et al., 2021) maps physical and virtual components (hypervisors and VMs) into a 3D model, enabling direct manipulation and explicit resource dependency management. Real-time bidirectional synchronization with REST APIs means that operational changes propagate instantaneously, with resource load formulas such as:

$U_h = \frac{\sum_{i\in H} \text{CPU}_i}{\text{CPU}_{\text{max}}}$

providing immediate feedback for load balancing and migration decisions.

KupenStack (Yadav et al., 2021) employs Kubernetes as a declarative control plane for OpenStack (“OpenStack as Code”), wrapping OpenStack services as custom resources and orchestrating dependencies through reconciliation loops:

$\Delta = S_{\text{desired}} - S_{\text{actual}};\; \frac{dS}{dt} = f(\Delta)$

This native integration enables self-healing, automated scaling, and zero-downtime upgrades, relying on mature controller/operator mechanisms.

7. Synthesis, Limitations, and Future Directions

Dependency management strategies in OpenStack combine property graphs, declarative models, orchestration standards, formal hypergraph solvers, automated contract enforcement, ML-driven prediction, and collaborative review linkage. Each method contributes to dependency visibility, correctness, robustness, and operational agility. However, intrinsic limitations persist—NP-completeness of resolution, semantic mismatches across package managers, manual overhead in code review, and the persistent threat of supply chain attacks and technical lag.

Best practices emerging from the literature include:

Maintaining explicit dependency graphs for runtime and architectural state.
Adopting declarative provisioning and strict resource constraint models.
Integrating contract-based and model-driven wrappers for critical services.
Employing hybrid approaches (e.g., ML classifiers) for change coordination and proactive dependency discovery.
Leveraging centralized configuration audit and infrastructure as code.
Periodic code review and automated analysis to detect cycles and architectural erosion.

The progressive refinement of these practices—combined with empirical feedback and metric-driven adjustment—is essential for reliable, maintainable, and scalable OpenStack deployments.