Papers
Topics
Authors
Recent
2000 character limit reached

Incremental Analysis of Legacy Applications Using Knowledge Graphs for Application Modernization

Published 11 May 2025 in cs.SE and cs.IR | (2505.06885v1)

Abstract: Industries such as banking, telecom, and airlines - o6en have large so6ware systems that are several decades old. Many of these systems are written in old programming languages such as COBOL, PL/1, Assembler, etc. In many cases, the documentation is not updated, and those who developed/designed these systems are no longer around. Understanding these systems for either modernization or even regular maintenance has been a challenge. An extensive application may have natural boundaries based on its code dependencies and architecture. There are also other logical boundaries in an enterprise setting driven by business functions, data domains, etc. Due to these complications, the system architects generally plan their modernization across these logical boundaries in parts, thereby adopting an incremental approach for the modernization journey of the entire system. In this work, we present a so6ware system analysis tool that allows a subject ma=er expert (SME) or system architect to analyze a large so6ware system incrementally. We analyze the source code and other artifacts (such as data schema) to create a knowledge graph using a customizable ontology/schema. Entities and relations in our ontology can be defined for any combination of programming languages and platforms. Using this knowledge graph, the analyst can then define logical boundaries around dependent Entities (e.g. Programs, Transactions, Database Tables etc.). Our tool then presents different views showcasing the dependencies from the newly defined boundary to/from the other logical groups of the system. This exercise is repeated interactively to 1) Identify the Entities and groupings of interest for a modernization task and 2) Understand how a change in one part of the system may affect the other parts. To validate the efficacy of our tool, we provide an initial study of our system on two client applications.

Summary

Incremental Analysis of Legacy Applications Using Knowledge Graphs for Application Modernization

The paper "Incremental Analysis of Legacy Applications Using Knowledge Graphs for Application Modernization" presents a robust approach to addressing challenges associated with the modernization of legacy software systems. These systems, prevalent in industries such as banking, telecom, and airlines, are built on outdated technologies such as COBOL and PL/1. The lack of updated documentation and the absence of the original developers compound the complexity of modernization efforts.

The authors introduce a tool that facilitates the incremental analysis of large software systems, leveraging knowledge graphs (KG) to manage the intricate web of dependencies and data relationships within these systems. This approach centers around the creation of a customizable ontology/schema that integrates with various programming languages and platforms. The core aim is to enable system architects and subject matter experts (SMEs) to define logical boundaries around dependent entities, like programs and database tables, making it possible to visualize and manage these dependencies more effectively.

Central to the proposed methodology is the systematic creation and analysis of "increments." An increment represents a bounded scope containing selected artifacts and their related dependencies. This method allows SMEs to initially identify crucial components and iteratively extend the increment by examining interactions across boundary lines. The authors validate this concept via an initial study on two client applications, showing that incremental analysis aids in pinpointing dependencies that must be resolved during modernization, effectively serving as a bridge between static analysis insights and modernization strategies.

The conceptual framework and operational architecture of the proposed system comprise three primary components: Code Discovery and Knowledge Graph Construction, Increment Creation via Neighborhood Detection, and Incremental Analysis. The code discovery phase employs static analysis tools like IBM's ADDI to generate entities and relations in the form of a knowledge graph, stored within a Neo4j database. This graph then serves as the foundation for increment creation, where neighborhood detection algorithms assist in identifying related entities tied to a specific modernization goal.

The Incremental Analysis component is notably sophisticated. By analyzing logical boundaries and the "inside-out" and "outside-in" edges—representing incoming and outgoing interactions—SMEs can evaluate the impacts of prospective changes on the broader codebase. Through iterative refinement of the increment, architects can ensure that modernization efforts minimize unwanted side effects on other system components.

The benefits of utilizing knowledge graphs extend beyond incremental analysis. The representation of large, complex systems into a language-agnostic ontology supports flexibility and scalability, facilitating the inclusion of business functions and data domains. This extensibility positions the tool as a versatile asset in any modernization effort, unconfined by specific programming languages or system architectures.

Practically, this methodology carries significant implications for legacy system transformation. By enabling focused analysis and modernization of specific system segments, organizations can manage risks associated with extensive overhauls and strategically prioritize investments in system evolution. Theoretically, this suggests future exploration into automated refinement of increments and enhanced utilization of existing knowledge graphs for advanced data science tasks, such as anomaly detection in large, complex systems.

In conclusion, the authors provide a compelling, methodical approach to legacy system modernization. By introducing incremental analysis supported by knowledge graphs, they offer a valuable tool to both researchers and practitioners aiming to navigate the complexities of transforming legacy applications into adaptable, future-ready systems. Further research might explore more diverse application scenarios or investigate the potential of integrating operational log analysis to enrich the insights furnished by the knowledge graph framework.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.