Compiling C to Safe Rust, Formalized

Published 19 Dec 2024 in cs.PL | (2412.15042v1)

Abstract: The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C, and cannot be realistically rewritten by hand. Automatically translating C to Rust is thus an appealing course of action. Several works have gone down this path, handling an ever-increasing subset of C through a variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of porting existing codebases to memory-safe languages. We instead explore a different path, and explore what it would take to translate C to safe Rust; that is, to produce code that is trivially memory safe, because it abides by Rust's type system without caveats. Our work sports several original contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static analysis based on "split trees" that allows expressing C's pointer arithmetic using Rust's slices and splitting operations; an analysis that infers exactly which borrows need to be mutable; and a compilation strategy for C's struct types that is compatible with Rust's distinction between non-owned and owned allocations. We apply our methodology to existing formally verified C codebases: the HACL* cryptographic library, and binary parsers and serializers from EverParse, and show that the subset of C we support is sufficient to translate both applications to safe Rust. Our evaluation shows that for the few places that do violate Rust's aliasing discipline, automated, surgical rewrites suffice; and that the few strategic copies we insert have a negligible performance impact. Of particular note, the application of our approach to HACL* results in a 80,000 line verified cryptographic library, written in pure Rust, that implements all modern algorithms - the first of its kind.

Abstract PDF HTML Upgrade to Chat

Authors (2)

Summary

The paper introduces a type-directed translation approach that converts C code into safe Rust while upholding Rust’s strict memory safety guarantees.
It leverages novel techniques such as split trees for pointer arithmetic and automatic mutability inference to minimize manual modifications.
Empirical evaluations on HACL* and EverParse show performance nearly identical to the original C, with less than 2% of the codebase requiring changes.

Compiling C to Safe Rust: An Analysis of Mechanisms and Results

The transition from C to Rust has emerged as a significant endeavor in the pursuit of memory safety without sacrificing performance. This paper investigates the methodologies required to translate C, a language with a myriad of legacy codebases, to Rust, a language renowned for its safety features, with a specific focus on maintaining the memory safety guarantees offered by Rust's type system. Given Rust's increasing popularity, driven by its design principles that inherently prevent a wide array of memory errors, the motivation to migrate existing C code is compelling, yet non-trivial.

Overview and Methodology

The authors introduce a type-directed translation approach, which aims to convert a subset of C, termed Mini-C, into safe Rust. The goal is to produce Rust code that adheres to the borrow-checking rules and ownership model native to Rust, thus avoiding the need for unsafe blocks that circumvent Rust's safety guarantees. This translation process is augmented by several innovative techniques:

Type-Directed Translation: This involves converting C types to suitable Rust types by encapsulating C pointers as borrowed slices, capturing the ownership semantics of Rust.
Split Trees for Pointer Arithmetic: A novel static analysis mechanism, termed "split trees," is introduced to handle C's pointer arithmetic by translating it into Rust's slice splitting. This involves dividing arrays into non-overlapping subarrays using Rust’s safe API for split operations, preserving the aliasing discipline of Rust.
Mutability Inference and Trait Derivation: Static analyses post-translation automatically infer necessary mutability annotations and derive Rust traits such as Copy, improving the safety and performance characteristics of the translated code.

Evaluation

The implementation of the translation mechanism within the KaRaMeL compiler framework facilitated the translation of critical verified libraries from C to Rust. Two chief case studies demonstrate the effectiveness of this approach:

HACL*: An 80,000-line cryptographic library, where the methodology successfully produced a Rust version that maintained performance close to the original C implementation. The necessity for source code rewrites was minimal, affecting less than 2% of the codebase, thus illustrating the effectiveness of the semi-automatic translation in minimizing manual intervention.
EverParse: A library focusing on parsers and serializers, specifically its CBOR-DET parser, which was translated without requiring modifications to the original C code, underscoring the robustness of the translation framework for data-oriented applications.

Implications and Prospects

The translation approach outlined in the paper highlights several theoretical and practical implications:

Memory Safety with Performance: By ensuring the absence of unsafe code, the translated Rust implementations maintain the memory safety guarantees of Rust, while empirical evaluations show performance characteristics closely mirroring those of equivalent C implementations.
Formal Methodologies in Translation: The formalized translation process and post-translation analyses provide a credible path towards automating the migration of C codebases to Rust, especially where formal correctness assurances are concerned.
Semi-Active Programming Paradigms: Requiring minimal adjustments to the original C code substantiates a pragmatic pathway for transitioning large-scale, legacy C applications to Rust without the exhaustive resource demands of a complete rewrite.

Looking forward, expanding the scope of Mini-C to encompass a broader subset of C, alongside enhancements in the static analysis components, could further ease the transition process and broaden the applicability of this methodology. Additionally, the alignment with existing verification techniques can potentially streamline the verification of translated Rust code, leveraging the robustness of Rust's type system to further ensure correctness and safety.

Conclusion

This paper's exploration of translating C to safe Rust addresses a critical intersection of performance and safety in software engineering. By formalizing a type-directed translation coupled with post-hoc analyses to refine memory safety and performance, the authors present a compelling framework for safer systems programming. As Rust's adoption continues to rise, the approach delineated in this study holds significant promise for refactoring existing C codebases, ensuring they benefit from Rust’s robust safety model without compromising performance.

Markdown Report Issue