Trend AnalysisComputer SystemsMixed Methods

DARPA TRACTOR and the C-to-Rust Translation Challenge: Can We Automate Memory Safety?

The U.S. Department of Defense has a problem measured in billions of lines of code.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

The U.S. Department of Defense has a problem measured in billions of lines of code. Critical infrastructureโ€”from weapons systems to communication networksโ€”runs on C and C++, languages that provide performance and hardware control at the cost of memory safety. Buffer overflows, use-after-free errors, and null pointer dereferences account for a large share of security vulnerabilities in systems software, according to estimates from Microsoft and Google. DARPA's Translating All C to Rust (TRACTOR) program, announced in 2024, represents the most ambitious attempt to address this problem: automatically translating legacy C codebases into memory-safe Rust at scale. But can automated translation deliver safe, idiomatic Rust? The research literature suggests the answer is: partially, with significant caveats.

The Research Landscape

The Scale of the Problem

Hong and Ryu (2025) frame the core challenge precisely. Legacy C codebases have accumulated decades of implicit assumptions about memory layout, pointer arithmetic, and undefined behavior. Rust's ownership modelโ€”where every value has a single owner, borrowing is tracked at compile time, and lifetimes are explicitโ€”is fundamentally incompatible with C's permissive memory model. A mechanical translation that preserves C semantics in Rust produces code wrapped in unsafe blocks that eliminates Rust's safety guarantees while adding Rust's syntactic overhead. The goal, therefore, is not just translation but transformation: converting C memory patterns into idiomatic Rust ownership patterns.

LLM-Assisted Translation: Promise and Limitations

The most active research direction combines large language models with static analysis to automate C-to-Rust translation. The results are instructive about both the capabilities and limitations of LLM-based code transformation.

Shetty et al. (2024) present Syzygy, a dual code-test translation approach that uses LLMs and dynamic analysis to translate C to safe Rust. Their key insight is that translating code and translating tests simultaneously provides a verification mechanism: if the translated Rust code passes the translated tests, confidence in semantic preservation increases. Syzygy achieves safe Rust output for a majority of the functions in their benchmark, but the remainder requires unsafe blocks or manual interventionโ€”a ratio that illustrates the current state of the art.

Cai et al. (2025) introduce RustMap, a project-scale C-to-Rust migration tool that combines program analysis with LLM-based translation. RustMap addresses a limitation of function-level translators: real C projects have complex inter-procedural dependencies, global state, and build system configurations that function-level translation ignores. Their approach first analyzes the project's dependency graph, then translates functions in topological order so that each translated function can reference previously translated dependencies. The method handles projects up to tens of thousands of lines but struggles with deeply intertwined global state.

Khatry et al. (2025) contribute CRUST-Bench, a benchmark for evaluating C-to-safe-Rust transpilation. This is significant infrastructure work: without standardized benchmarks, it is impossible to compare different translation approaches rigorously. CRUST-Bench includes 100 C programs with test suites, and their evaluation of frontier LLMs (GPT-4, Claude) shows that even the strongest models achieve only modest success rates on producing safe Rust that passes all testsโ€”substantially lower than the function-level results reported by Syzygy, suggesting that benchmark design significantly affects reported performance.

Shiraishi et al. (2024) present SmartC2Rust, an iterative feedback-driven approach. Rather than translating in a single pass, SmartC2Rust generates an initial translation, compiles it, feeds compiler errors back to the LLM, and iterates. This "compile-and-fix" loop improves success rates by meaningful improvement over single-pass translation, suggesting that the LLM's understanding of Rust's type system improves when given concrete error feedback.

Luo et al. (2025) propose integrating rule-based static analysis with LLM-based semantic understanding. Pure rule-based approaches have limited coverage (they handle common patterns but miss complex cases), while pure LLM approaches lack reliability (they sometimes produce syntactically correct but semantically wrong code). Their hybrid approach achieves higher coverage than either method alone.

The Safety Verification Problem

Translation is only half the challenge. The other half is verifying that the translated code preserves the semantics of the original while actually achieving memory safety.

Sirlanci et al. (2025) address this with C2RUST-BENCH, a minimized dataset designed specifically for evaluating semantic equivalence between C originals and Rust translations. Their benchmark highlights a subtle problem: some C programs rely on undefined behavior that happens to produce consistent results on specific platforms. Translating such programs to Rust, which has defined behavior for the same operations, can change program semantics in ways that are difficult to detect through testing alone.

Critical Analysis: Claims and Evidence

<
ClaimEvidenceVerdict
LLMs can translate individual C functions to safe RustShetty et al. Syzygy โ€” high success ratePartially supported โ€” function-level translation feasible but not complete
Project-scale translation is achievableCai et al. RustMapPartially supported โ€” works for moderate-sized projects, struggles with complex global state
Frontier LLMs achieve 15-25% on rigorous benchmarksKhatry et al. CRUST-BenchSupported โ€” and the gap between easy benchmarks and rigorous ones is large
Iterative compilation feedback improves translationShiraishi et al. SmartC2RustSupported โ€” 15-20% improvement over single-pass
Automated translation can fully replace manual migrationNo current evidenceNot supported โ€” all approaches require human review for safety-critical code

Open Questions and Future Directions

  • The unsafe residual. Even the best automated tools produce a significant share of functions requiring unsafe Rust or manual intervention. Can this residual be reduced to acceptable levels for safety-critical systems, or will automated translation always require human oversight?
  • Undefined behavior preservation. C programs that rely on undefined behavior present a fundamental translation challenge. Should the translated Rust preserve the observed behavior (platform-specific) or reject such patterns (losing functionality)?
  • DARPA TRACTOR at scale. The academic results translate programs of thousands to tens of thousands of lines. DARPA's targetโ€”defense infrastructure codebasesโ€”involves millions of lines with decades of accumulated complexity. The scaling gap between research benchmarks and deployment targets remains vast.
  • Incremental adoption. Rather than wholesale translation, a practical path may involve translating security-critical components to Rust while maintaining C for performance-critical code. Rust's FFI (Foreign Function Interface) supports this, but the boundary between safe Rust and unsafe C becomes a new attack surface.
  • Verification guarantees. Testing can demonstrate the absence of specific bugs but cannot prove semantic equivalence. Formal verification of translated code remains computationally expensive and requires specification effort that may exceed the cost of manual translation for small programs.
  • What This Means for Systems Engineers

    The TRACTOR visionโ€”automated, correct, safe translation of legacy C to Rustโ€”remains aspirational. Current tools can accelerate the process, particularly for well-structured code with good test coverage, but they cannot replace human judgment for safety-critical translations. The practical recommendation is to treat automated translation as a starting point that reduces manual effort by substantially rather than as a complete solution.

    Explore related work through ORAA ResearchBrain.

    References (7)

    [1] Hong, J. & Ryu, S. (2025). Automatically Translating C to Rust. ACM TOPLAS.
    [2] Shetty, M., Jain, N., & Godbole, A. (2024). Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis. arXiv preprint.
    [3] Cai, X., Liu, J., & Huang, X. (2025). RustMap: Towards Project-Scale C-to-Rust Migration via Program Analysis and LLM. arXiv preprint.
    [4] Khatry, A., Zhang, R., & Pan, J. (2025). CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation. arXiv preprint.
    [5] Shiraishi, M., Cao, Y., & Shinagawa, T. (2024). SmartC2Rust: Iterative, Feedback-Driven C-to-Rust Translation via Large Language Models. ACM CCS.
    [6] Luo, F., Ji, K., & Gao, C. (2025). Integrating Rules and Semantics for LLM-Based C-to-Rust Translation. IEEE ICSME.
    [7] Sirlanci, M., Yagemann, C., & Lin, Z. (2025). C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation. arXiv preprint.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords โ†’