SSAT: Semantic Software Architecture Tree
- SSAT is a semantically-rich intermediate representation that bridges ambiguous natural language requirements and exact software artifacts.
- It uses a hierarchical tree structure to map modules, files, classes, and functions, ensuring clarity and traceability across code synthesis stages.
- SSAT enables iterative refinement via agents in architecture, skeleton, and code stages, significantly improving test pass rates and project structure fidelity.
The Semantic Software Architecture Tree (SSAT) is a structured, semantically-rich intermediate representation designed to bridge the semantic gap between ambiguous, high-level natural language requirements and the precise software artifacts required for machine-executable code. Introduced in the context of ProjectGen, a multi-agent framework for project-level code generation, SSAT enables systems to systematically convert user intent and documentation into exactly specified software architecture, providing an unambiguous, hierarchical "blueprint" that underpins automated, large-scale code synthesis across multiple files, modules, classes, and functions.
1. Motivation and Problem Context
The motivation for SSAT arises from persistent issues in project-level code generation with LLMs. In practical software projects, requirements are usually conveyed via mixtures of prose, UML diagrams, and architectural sketches, leading to multiple potential points of misinterpretation or omission by generative agents. Prior methods either rely on free-text representations, which resist accurate parsing and mapping back to code structure, or attempt to process entire projects as monolithic prompts, resulting in severe loss of hierarchy, dependency information, and cross-file linkage. SSAT directly addresses these limitations by capturing the full logical decomposition of a project into modules, files, and symbols, making the intermediate representation both machine- and human-interpretable; it is explicitly designed to align with established software engineering decomposition and documentation practices.
2. Formal Structure and Representation
SSAT is formally a rooted, ordered tree , where:
- is a set of nodes partitioned by type:
- ModuleNode (software modules)
- FileNode (source code files)
- GlobalCodeNode (global-level code in files)
- ClassNode (classes or types)
- FunctionNode (functions/methods)
- defines parent-child containment relationships, recursively expressing project composition.
Each node includes:
- : Identifier (e.g., filename, function name)
- : Short natural-language description
Node-type-specific attributes include:
- FileNode :
- : Relative disk path
- , , : Sets of contained nodes
- ClassNode :
- : Set of member functions
- FunctionNode :
A visual schema:
1 2 3 4 5 6 7 8 9 |
<ModuleNode name=... desc=... files=[
<FileNode name=... path=... desc=...>
<GlobalCodeNode .../>
<ClassNode name=... desc=...>
<FunctionNode name=... params=[...] desc=.../>
</ClassNode>
<FunctionNode .../>
</FileNode>
]> |
3. SSAT Construction Workflow
Within ProjectGen, SSAT construction occurs entirely within the Architecture Design phase, orchestrated by two agents in an iterative, memory-augmented refinement loop:
- ArchAgent: Generates a candidate SSAT from product requirements, UML, and architectural documentation, guided by a format specification and prior memory.
- JudgeA: Evaluates the candidate SSAT for requirement coverage, alignment with diagrams and file names, interface clarity, and absence of circular dependencies, returning a [0, 10] score and natural-language feedback.
- Memory : Maintains up to past feedback-plus-diff summaries, retrieved by semantic similarity (BM25), influencing subsequent iterations.
Pseudocode outline:
1 2 3 4 5 6 7 8 9 10 11 |
Input: PRD, UML diagrams, Architecture Doc
Initialize memory M_A = ∅
for t = 1 to T_max:
O ← ArchAgent.generate_SSAT(specs, SSAT format, M_A)
score, feedback ← JudgeA.evaluate_SSAT(O)
if score ≥ θ_A:
return O
else:
D ← diff(previous_SSAT, O)
M_A ← select_top_k(M_A ∪ {feedback ∪ D}, k=γ_A)
return last O |
4. Role Within Multi-Agent Generation Pipelines
SSAT serves as the exclusive output of ProjectGen's architecture phase and as the sole input to later skeleton and code generation stages:
- Stage 1 (Architecture Design): Agents produce validated SSAT.
- Stage 2 (Skeleton Generation): SkeletonAgent traverses SSAT's FileNodes, generating source files with appropriate imports, class stubs, and function signatures populated with docstrings from node descriptions. Structural judges (JudgeS) enforce fidelity to SSAT, and memory buffers enable iterative refinement.
- Stage 3 (Code Filling): FileNodes are topologically sorted to obey import dependencies. Individual skeletons, together with previously generated content, are passed to CodeAgent, which populates function bodies. JudgeC executes “check tests,” returning failed file names, trace logs, and suggestions; memory accumulates logs and diffs for up to γ_C iterations.
SSAT's presence eliminates the need for agents to independently re-parse ambiguous prose between stages, instead providing a full, explicit accounting of files to generate, symbol signatures, and required docstrings.
5. Example: The "lice" Task from DevBench
For the DevBench "lice" task—requiring a rainfall analysis module with file reading, data filtering, and a helper for valid year detection—the initial SSAT output by ProjectGen is (abridged):
1 2 3 4 5 6 7 8 9 10 |
ModuleNode(name="lice", desc="Rainfall analysis utilities", files=[
FileNode(name="global_functions.py", path="lice/global_functions.py", desc="Helpers",
functions=[
FunctionNode(name="valid_year", parameters=[("year", "str", "YYYY format")], desc="..."),
FunctionNode(name="filter_zero_days", parameters=[("data", "List[int]", "daily rainfall")], desc="...")
]
),
FileNode(name="stats.py", ...),
FileNode(name="__main__.py", ...)
]) |
1 2 3 |
def valid_year(year: str) -> bool: """Check year in YYYY.""" pass |
Stage 3 fills in the logic. On review, “valid_year” is discovered missing by JudgeC; following code correction (adding return len(year)==4 and year.isdigit()), check tests pass.
6. Evaluation, Benefits, and Empirical Impact
Ablation studies demonstrate that naively replacing a natural-language “repository sketch” (as in CodeS) with SSAT in ProjectGen raises passed test count from 25→34 (DeepSeek-V3) and 18→32 (GPT-4o) on the DevBench dataset. Incorporating full iterative Arch/Skel/Code judge loops with SSAT further increases performance to 52 (DeepSeek-V3) and 47 (GPT-4o) passes out of 124.
For more complex, realistic tasks (CodeProjectEval), ProjectGen—with SSAT—was the sole system to pass any tests on medium-size projects (816–2,949 LOC), while all baselines failed. SSAT preserves structure, as evidenced by SketchBLEU scores consistently exceeding 90 once in place.
These results highlight SSAT’s dual role in achieving higher program correctness (by constraining and guiding generation) and in maintaining project structural consistency across multi-stage synthesis.
7. Limitations and Prospective Directions
SSAT’s accuracy and expressiveness are limited by the fidelity of the input requirements (product requirements documents, UML, architecture documents). If these are incomplete or inconsistent, critical functionality may be omitted in the initial tree. For larger projects (exceeding 15 files or 5,000 LOC), ProjectGen continues to under-generate both file count and total code volume; this suggests that the existence of an SSAT cannot by itself induce the desired scale without additional intervention or feedback mechanisms.
During multi-iteration refinement, memory stores of prior diffs and feedback can enlarge prompt context, and the top- memory selection heuristic may drop potentially relevant history. Future enhancements proposed by the authors include interactive, real-time editing of SSAT by human developers; automated code-to-architecture feedback to correct possible drift between architecture and code; and extending SSAT representations and constraints to statically typed or compiled languages where interface and type declarations introduce further structure.
A plausible implication is that further integration between SSAT inference and continuous codebase analysis could enable robust, scalable project-level code generation even for very large software systems.