Rustine: Automated C-to-Rust Translation
- Rustine is a pipeline that automates the transformation of C code to memory-safe, idiomatic Rust by integrating static analysis, refactoring, and large language models.
- It minimizes raw pointers and unsafe constructs while achieving an average assertion-level equivalence of 87%, validated on diverse open-source projects.
- Its cost-effective design and scalable architecture enable end-to-end translation of multi-file repositories, including build scripts and tests.
Rustine is a fully automated pipeline for repository-level C to idiomatic, memory-safe Rust translation. Developed to address the limitations of existing translation techniques—which either lack scalability and code quality or are prohibitively expensive—Rustine systematically integrates program analysis, code refactoring, explicit dependence tracking, and a two-tier LLM strategy. Validated on 23 open-source C programs ranging from 27 to 13,200 lines of code, Rustine produces compilable, idiomatic Rust code while significantly reducing unsafe constructs and improving readability, with average automated assertion-level functional equivalence of 87% and near-complete correctness following modest human-in-the-loop debugging interventions. Its architecture makes it cost-effective (≈$0.50 per project) and scalable to large, multi-file repositories, supporting end-to-end translation that includes build scripts and tests (Dehghan et al., 25 Nov 2025).
1. Motivation and Design Objectives
C remains pervasive across software infrastructure but lacks intrinsic memory safety, in contrast to Rust, which enforces strict safety and modern patterns. Existing C→Rust translation solutions fall into two principal families:
- Transpilation-based approaches (e.g., C2Rust) scale to large codebases but yield Rust littered with raw pointers, pointer arithmetic, frequent use of
unsafe, and minimal engagement with the Rust ecosystem. - LLM-based approaches (e.g., Syzygy, RustMamp) demonstrate higher idiomaticity and safety but require expensive, high-end models and substantial manual scaffolding; their cost (~$800+ per project) renders them impractical for large codebases.
Rustine was engineered to achieve five explicit goals: (A) end-to-end, automated translation of multi-file repositories including tests; (B) output that is 100% compilable with high test-verified functional equivalence; (C) code that minimizes raw pointers and unsafe constructs while leveraging Rust’s idioms and standard library; (D) cost-effectiveness through majority use of commodity LLMs, escalating to premium models only selectively; and (E) actionable developer support for resolving any residual semantic mismatches (Dehghan et al., 25 Nov 2025).
2. Pipeline Architecture and Key Algorithms
Rustine’s pipeline consists of five sequential stages, each incrementally refining the translation and facilitating robustness and scalability.
2.1 Preprocessing
The C preprocessor is invoked to flatten all #include, expand macros, and resolve conditional compilation. This creates macro-free C suitable for downstream analysis, preventing token pasting and preprocessor-induced corner cases.
2.2 Refactoring (Unsafe-to-Safe)
Before translation, C code is automatically refactored to eliminate idioms that hinder safe translation. Key transformations include:
- Pointer arithmetic desugaring: Pointer increments, arithmetic, and comparisons are rewritten to explicit index-based accesses based on a function-wide analysis of all pointer uses. For example, every usage of
*p++is replaced byp[p_idx]with an associated index variable managed in scope. - Constness maximization: Immutable parameters are detected and annotated
constto signal to the LLM that&Treferences in Rust are appropriate. - Large function splitting: Functions exceeding half the context window of the LLM are decomposed into helper routines by identifying suitable basic blocks, collecting data dependencies, and parameterizing helper functions as needed.
All refactorings are verified via regression tests, rolling back if semantic changes are detected.
2.3 Dependence Graph Construction
A project-wide directed graph (DG) is built, capturing functions, structs, unions, globals, typedefs and their references. Strongly connected components are collapsed to ensure module and mutual recursion fidelity. The DG informs translation order (bottom-up), module skeleton instantiation, and provides context (e.g., prior translated definitions) to the LLM.
External API usage is automatically mapped: core libc functions are mapped to Rust std/libc counterparts; other APIs are wrapped using bindgen-generated FFI, included as translation context.
2.4 Translation (Two-Tier LLM Loop)
Each translation unit (TU), supplied in graph order, is processed in an iterative LLM loop:
- Stage 1: The “base model” (cost-effective, GPT-3.5 level) is prompted up to a budgeted number of attempts, with context including the C TU, dependency signatures, a fixed in-context example, and natural language “hint block” (e.g., prefer
Vec, avoid raw pointers). - Stage 2: On persistent compiler errors, a “reasoning model” (GPT-4-class) is engaged, equipped with error-specific mini-prompts from an adaptive in-context learning (ICL) pool. If a compiler error is novel, the LLM generates a new fix example, storing it for future reuse mapped by error code.
2.5 Validation and Automated Debugging
Translated code is compiled and tests are run; on test failures, Rustine comments out the failing assertion and proceeds to collect others. For each test failure, the LLM receives the original C, translated Rust, and relevant call stack, and proposes repairs. Passing fixes are integrated; persistent failures are left for optional human patching, typically expedited by Rustine’s per-test traces.
3. Scalability, Safety, and Idiomaticity
Rustine explicitly trades off superficial speed for end-to-end quality:
- Pointer handling: Systematic refactoring eliminates pointer arithmetic and raw pointers as far upstream as possible. Across all projects, only 15 pointer arithmetic expressions survive (mainly for opaque FFI scenarios), down from 2,636 in the original C. Raw pointer declarations are reduced to 198 (from 4,080) and dereferences to 126 (from 2,045), outperforming transpilers such as C2Rust, which emit thousands more.
- Unsafe usage: The aggregate output contains only 114
unsafelines across all projects, compared to 13,127 in C2Rust output; some LLM-only systems (Syzygy) can eliminate unsafe entirely but at much higher computational cost. - Idiomaticity: Measured by Clippy (cargo, complexity, correctness, perf, suspicious lints), Rustine’s outputs exhibit 30–80% fewer normalized linter violations than competing transpiler approaches. The idiomaticity score accounts for both linter count and translation size index (TSI): A lower score denotes better idiomaticity.
- Readability: Cognitive Complexity, Halstead Volume, and SEI Maintainability Index are employed. Rustine produces code with 25% lower cognitive complexity, 20% lower Halstead volume, and 15% higher maintainability index than the next-best baseline.
These improvements arise from refactored index-based access, routine translation to Vec<T> and slices, strategic use of Rust iterators, and modularization of global state via lazy_static and synchronization primitives.
4. Evaluation Methodology
4.1 Benchmarks
The evaluation set comprises 23 representative C projects, encompassing algorithms, utilities, user-facing applications, libraries, and performance-oriented demos. Line counts range from 27 to 13,200 LoC, and features include unions, macros, raw and arithmetic pointers, globals, and diverse external APIs.
4.2 Testing Protocol
For each subject, tests were aggregated and augmented to maximize coverage, yielding an average of 74.7% function and 72.2% line coverage, and totaling 1,221,192 assertions project-wide. Functional equivalence is quantified as the proportion of assertions passing in the translated Rust.
4.3 Baseline Comparisons
Benchmarks include C2Rust, C2SaferRust, Laertes, CROWN (transpilers), and RustMamp, Syzygy (LLM-based).
4.4 Metrics
Outcomes are reported as: compilation rate, test coverage, assertion pass/fail, raw pointer and arithmetic usage, unsafe counts, idiomaticity (Clippy flags × TSI), readability (cognitive, Halstead, SEI), and translation time and cost.
5. Empirical Results
The following table summarizes key metrics for the first 10 benchmarks (of 23):
| ID | Name | C-LoC | Compile % | Passed Assertions | Failed | Function Cov. | Best Baseline Cov. |
|---|---|---|---|---|---|---|---|
| 1 | qsort | 27 | 100 | 21 | 0 | 100 | 100% (C2Rust et al.) |
| 2 | bst | 65 | 100 | 6 | 0 | 92 | 87% (C2Rust) |
| 3 | rgba | 411 | 100 | 20 | 0 | 99 | 83% (C2Rust) |
| 4 | quadtree | 437 | 100 | 34 | 0 | 91 | 81% (C2Rust) |
| 5 | buffer | 452 | 100 | 54 | 0 | 91 | 87% (C2Rust) |
| 6 | grabc | 490 | 100 | 4 | 0 | 11 | 11% (C2Rust) |
| 7 | urlparser | 563 | 100 | 46 | 0 | 75 | 61% (C2Rust) |
| 8 | xzoom | 659 | 100 | – | – | – | 100% (C2Rust) |
| 9 | genann | 690 | 100 | 521,556 | 0 | 84 | 83% (C2Rust) |
| 10 | ht | 699 | 100 | 1 | 0 | 61 | 55% (C2Rust) |
For all 23 subjects, the assertion-level pass rate is 87% (1,063,099/1,221,192 assertions). Manual debugging—performed by developers unfamiliar with the codebase and guided by Rustine's test traces—raises the rate to 99.3% in an average of 4.5 hours per project.
Aggregate safety and idiomaticity metrics across all benchmarks:
| Metric | C | Rustine | C2Rust | Best LLM-only (Syzygy) |
|---|---|---|---|---|
| Raw ptr decls | 4,080 | 198 | 4,904 | 7 |
| Raw ptr derefs | 2,045 | 126 | 12,298 | 14 |
| Ptr arithmetic | 452 | 15 | 819 | 69 |
| Unsafe lines | – | 114 | 13,127 | 0 |
| Clippy flags × TSI | – | 18 | 45 | 22 |
Readability metrics (Rustine v. next-best):
- Cognitive Complexity: 112 vs 150
- Halstead Volume: 2,345 vs 2,950
- SEI Maintainability Index: 85.3 vs 73.9
Translation time scales with project size (ρ ~ 0.9), averaging 3.9 hours (range: 0.5 to 20 hours); LLM calls cost $0.002–2.39 per repository (mean$0.48 actual, $37 estimated for GPT-4o).
6. Case Studies and Observed Failure Modes
Pointer-Arithmetic Inflation
In zopfli and xzoom, transpiler output (C2Rust) transforms straightforward indexed access into convoluted pointer-offset and casting chains. Pre-refactoring by Rustine enables the LLM to emit safe, idiomatic slice-index code.
Overlapping Mutable Borrows
In tulpindicator, initial translation generated parallel mutable borrows (e.g., &mut outref[0], [1], [2]) triggering E0499. Adaptive ICL prompts illustrate the proper use of split_at_mut chaining, yielding disjoint mutable slices accepted by the borrow checker.
Manual Rescue—Dynamic Reallocation
The buffer project’s reallocation loop, translated naively, was both unsafe and misaligned. Human intervention (2 hours) rewrote the struct field as a Vec<T>, employing proper safe resizing methods and restoring correct semantics.
Global Data Initialization
In tulpindicator, a global array’s zero-initialization in C was unsafely replicated in Rustine’s output. A manual patch replaced this with a lazy_static! block guarded by a Mutex<Vec<Info>>, achieving full test recovery.
7. Limitations and Prospective Extensions
- Concurrency: No translation of C concurrency (pthread, fork) into Rust’s threading or async models; no behavior equivalence checks for parallel traces.
- Heap Data Structures: Occasional retention of allocation idioms (e.g., raw use of
alloc) over idiomatic abstractions (Vec,Box). - Deep Unsafe Idioms: Pointer bit-twiddling, manual memory management patterns, and custom allocators not systematically handled; these require either manual review or advanced static analysis.
- Formal Equivalence: Empirical correctness depends on test coverage (average 70–90%). Augmenting with bounded verification (e.g., SMT, fuzzing) remains a target.
- Language Pair Generalization: The Rustine architecture could be adapted to non-C→Rust pairs (e.g., C→Go, Java→C#) with toolchain-specific refactoring and contextual guidance.
In summary, Rustine exemplifies the practical and technical value of integrating program transformations, modular dependence analysis, and adaptive LLM prompting to achieve scalable, cost-effective, and safe translation from large-scale C repositories to idiomatic Rust (Dehghan et al., 25 Nov 2025).