- The paper introduces a two-phase LLM translation pipeline that encapsulates function-level safety through ABI-preserving wrappers and type-directed rewriting.
- It leverages an agentic refinement phase with a 17-tool loop to address global unsafety, achieving significant reductions in pointer dereferences and unsafe code.
- Empirical evaluation on GNU Coreutils and Laertes shows 100% test-vector pass rates and improved idiomatic Rust quality over previous approaches.
ENCRUST: An LLM-Centric, Scaffolded Framework for Safe C-to-Rust Translation
Introduction and Motivation
The translation of legacy C systems code to memory-safe Rust is a critical open problem due to C's lack of explicit safety guarantees and its prevalence in security-sensitive domains. Traditional C-to-Rust transpilation pipelines, such as C2Rust, output Rust code that simply mirrors C's pointer- and memory-unsafe constructs—yielding functionally correct but unsafe Rust. Attempts at rule-based post-processing or incremental transformation, e.g., Laertes and Crown, only partially address pointer-related unsafety, fail at semantic translation, and do not generalize to complex project-wide cross-cutting changes. LLM-based approaches showed promise but encountered previously insurmountable scaling issues, particularly in call-site adaptation and cross-unit dependency management.
ENCRUST introduces a fundamentally decoupled two-phase pipeline that orchestrates LLM translation and codebase refinement under continuous behavioral verification, addressing both local and global translation failures endemic in prior work.
ENCRUST Architecture: Two-Phase Translation Pipeline
ENCRUST is structured as a two-phase pipeline, each phase explicitly targeting different classes of translation obstacles.
Phase 1: Encapsulated Substitution
The first phase addresses function translation modularity and ABI stability through the introduction of an ABI-preserving wrapper pattern. For each function, the system extracts its logic from the boundary adaptation concerns:
- Wrapper/Safe Pair Generation: Every C function f with signature T1​×⋯×Tn​→Tr​ is replaced with a pair f (the outer wrapper, preserving external naming and ABI) and fsafe​ (the LLM-generated safe function using idiomatic Rust types).
- Compile-Test–Gated Loop: An LLM is invoked with contextually scoped prompt material (original C source, prior translations, callee signatures), producing candidate wrapper/safe pairs. Every candidate undergoes compilation and test-vector validation before being committed, ensuring the "Live Scaffold Invariant"—the workspace remains compiling and passing throughout translation.
- Type-Directed Wrapper Elimination (TDWE): Once function translation completes, an automated pass rewrites all call sites to invoke safe inner functions directly, eliminating the wrapper indirection except in a small set of structurally ambiguous cases, thus converging towards an idiomatic, unsafe-free interface.
Figure 1: ENCRUST's two-phase pipeline: phase 1 employs function-level wrappers and automatic, type-directed call site rewriting; phase 2 applies agentic program-wide refinement using a 17-tool agentic loop, closing safety gaps.
Phase 2: Agentic Refinement
The second phase is designed to resolve unsafety that inherently escapes per-function scope:
- Task Discovery and Tool-Equipped Agentic Loop: Using static analysis, ENCRUST identifies static mut globals, skipped wrappers, failed struct translations, and functions outside retry budgets. Each unresolved unsafe pattern is converted to an explicit translation or migration task.
- 17-Tool Agent Suite: The system equips an LLM with 17 navigation, modification, analysis, and verification tools, including source traversal, batch rewriting, semantic linkage, and a compile-and-test verification gate. For each task, an agentic loop operates until behavioral correctness (relative to a pre-recorded test baseline) is achieved.
- Checkpointing and Safe Rollback: All edits are auto-snapshotted before every agentic task; the system supports automatic rollback on agentic failure or iteration budget exhaustion, ensuring no workspace corruption.
Safety Preservation and Code Quality
ENCRUST’s most distinctive feature is that translation is always checked at the codebase level, not in isolation. This overcomes the semantic drift and dependency mismatches characteristic of prior LLM and rule-based pipelines. Struct migration is supported via a dual-struct abstraction, sidestepping pointer-based aliasing pitfalls and eliminating use-after-free errors endemic to naive pointer ownership translation.
The pipeline maintains precise safety metrics:
- Raw pointer declarations and dereferences: ENCRUST achieves up to a 57% reduction in pointer dereferences and a 44% reduction in declarations on Coreutils.
- Unsafe lines of code and casts: Over both Coreutils and Laertes, ENCRUST reduces unsafe code lines by ~38% and unsafe casts by up to 60% over the C2Rust baseline.
- Idiomaticity: As measured by Clippy warning count, ENCRUST's final code is less noisy and more idiomatic than prior LLM-based approaches; the idiomatic gap with best-effort manual Rust code remains attributed to remaining complex pointer idioms and FFI vestiges.
Empirical Evaluation
ENCRUST is evaluated on 197,706 lines of code across 7 GNU Coreutils and 8 Laertes libraries (totaling 2,366 functions):
- Correctness (Test-Vector Pass Rate): Maintains a strict 100% pass rate on all covered inputs for all benchmarks, assured by compile-and-test gates at every execution stage.
- Function Translation Scalability: Achieves a function-level compile pass rate of up to 99.2% on certain Coreutils targets, with remaining failures handled via agentic Phase 2 or safely retained as legacy stubs.
- Completeness vs Prior Work: Unlike EvoC2Rust and similar LLM baselines, which routinely fail to produce compiling whole-project outputs, ENCRUST guarantees a compiling and test-passing crate at every intermediate milestone.
Implications, Limitations, and Future Directions
The ENCRUST framework demonstrates the viability of structured, agentic LLM translation for migration of legacy C to safe Rust with project-scale behavioral preservation. Practically, it provides a reproducible pipeline—one that supports safe incremental migration without downtime, enables auditability through persistent behavioral correctness, and pushes LLMs into tractable, well-scoped tasks for which they are known to perform best.
Key limitations include verification coverage (relies entirely on test-vector suite coverage; behaviors not exercised are not guaranteed), and best-effort TDWE removal in phase 1, meaning not all residual unsafety is eliminated if not surfaced in verification. Some highly pointer-centric patterns and complex ABI-dependent signatures remain outside fully-automated safe translation. The agentic loop’s completion rate is ~70% over all tasks, indicating further scope for robustness improvement.
Potential future advances include enhanced coverage-oriented test generation, hybrid symbolic-execution-augmented verification, extension to inline assembly handling, and the substitution of open-weights LLMs for improved reproducibility.
Conclusion
ENCRUST provides a modular, test-driven, LLM- and agentic-refinement based migration framework capable of translating large real-world C projects to behaviorally equivalent, memory-safe Rust. By separating per-function encapsulated translation from program-wide verification and refinement, and enforcing correctness at every step, ENCRUST closes critical automation gaps in safe systems language migration. The pipeline constitutes a practical and theoretically sound approach for organizations seeking to eliminate entire classes of memory safety bugs from legacy code with the assistance of automated program transformation.