Papers
Topics
Authors
Recent
Search
2000 character limit reached

Zero-Shot Cross-Language Code Transfer

Updated 24 April 2026
  • Zero-shot cross-programming-language transfer is the capability of ML models to apply coding skills learned from one language to an unseen target language without labeled data.
  • Parallel-SFT uses parallel program translations to build language-agnostic representations, yielding robust improvements in metrics like pass@1.
  • Neuro-symbolic and IR-based approaches further enhance cross-language alignment, supporting better code cloning and generalization across programming languages.

Zero-shot cross-programming-language transfer refers to the ability of a machine learning model for code—typically a LLM or a code encoder trained with deep neural networks—to transfer competence in a coding task from one (source) programming language (PL) to another (target) PL not seen during reinforcement learning (RL) or supervised finetuning (SFT), without access to labeled data in the target PL. This task exposes fundamental questions about the representation of programming skills, language-agnostic code semantics, and the capacity for learned generalization in code generation and understanding systems.

1. Formalization and Problem Setting

Formally, let πθ\pi_\theta be a policy model parameterized by θ\theta, mapping input prompts xx to code sequences cc, over a collection of programming languages L={1,,m}\mathcal{L} = \{\ell_1, \dots, \ell_m\}. A source language sLs \in \mathcal{L} provides RL training distribution DRL(s)\mathcal{D}_\mathrm{RL}^{(s)}, from which prompts are sampled. The agent generates a code solution cπθ(x)c \sim \pi_\theta(\cdot|x) that is scored by a verifier r(s)(cx){0,1}r^{(s)}(c|x) \in \{0,1\}—typically, 1 if all test cases pass, 0 otherwise.

The RL objective maximizes expected reward over ss:

θ\theta0

Zero-shot transfer evaluates θ\theta1 on a target language θ\theta2, unseen during RL, computing

θ\theta3

Performance is measured via pass@k for code generation and accuracy for code validation (Wu et al., 22 Apr 2026).

2. Parallel-SFT: Parallel Programs for Improved Transfer

The central challenge in zero-shot transfer is the observed failure of RL-trained policies in one PL to generalize or even maintain performance in an unseen PL, especially for lower-resource target languages. This failure arises in spite of the universality of programming concepts, suggesting that naïve SFT or RL does not induce sufficiently language-agnostic internal representations.

Parallel-SFT addresses this by constructing a supervised dataset of parallel programs: for each task θ\theta4, a set θ\theta5 of functionally equivalent solutions in θ\theta6, including Python, C, C++, Java, C#, JavaScript, Bash, and Lua, is generated. Solutions are obtained by prompting a code LLM (Llama-4-Maverick) to translate validated Python code into the other languages, re-executing and filtering for correctness across the test suite.

Three SFT data regimes are contrasted:

  • 1-Language (source): all SFT data from a single source PL
  • 8-Languages (non-parallel): each task appears in one PL only
  • 8-Languages (parallel): every task appears in all eight PLs (“Parallel-SFT”)

Supervised fine-tuning is performed via maximum likelihood on all mixtures, using 142k code instances and consistent non-code data. No auxiliary (e.g., contrastive) objectives are used. RL is subsequently applied via GRPO, a PPO-variant, in only the source PL (Wu et al., 22 Apr 2026).

3. Empirical Outcomes and Internal Representation Analysis

Empirical evaluation reveals that 1-Language SFT+RL robustly improves source-PL metrics, but provides little to no improvement—and sometimes degrades performance—on target PLs at zero shot. Non-parallel multilingual SFT provides a modest boost (1–2 pass@1 points over the monolingual baseline), but only Parallel-SFT provides consistent, robust improvements in zero-shot transfer. For instance, in code generation from C++ RL source to Go (target), pass@1 improves from ~9% (1-lang) and ~10% (non-parallel) to ~12% (parallel), with similar trends observed for Python as source and for code validation tasks.

On certain targets, Parallel-SFT+RL even exceeds the “oracle” setting (SFT+RL in the target PL), indicating improved functional generalization not solely obtainable from data-rich learning in the target. This suggests that parallel SFT enforces a model-internal organization that privileges program semantics over surface syntax (Wu et al., 22 Apr 2026).

Layerwise representational analysis demonstrates that Parallel-SFT models produce higher retrieval accuracy and cosine similarity on embeddings of semantically identical code across languages, especially in middle layers. “Echo” embedding procedures (in which each code is embedded twice via copy-rewriting prompts) reveal that equivalent programs are clustered tightly in latent space by Parallel-SFT, whereas monolingual or non-parallel SFT yields weak cross-language alignment (Wu et al., 22 Apr 2026).

4. Neuro-Symbolic Approaches: Language-Agnostic IRs

Zero-shot transfer can also be supported via explicit neuro-symbolic methods, exemplified by approaches that define a cross-language intermediate representation (IR). In the code-cloning setting for C↔COBOL (Hasija et al., 2023), both languages are compiled to a joint abstract syntax tree (AST) meta-model. Leaves representing variables and API calls are mapped (optionally, via a C–COBOL token map), and the AST structures are linearized by Structure-Based Traversal (SBT).

A pretrained transformer model (UniXCoder) finetuned on SBT-IRs of C pairs for code-clone detection can be directly applied to COBOL SBT-IRs: the model’s latent space encodes semantics invariantly across source and target PL, yielding a 12.85-point gain in MAP@2 over the raw pre-trained baseline—without COBOL training data (Hasija et al., 2023).

This IR-centric approach, which generalizes to further languages by extending frontends, explicitly encodes the invariances sought by methods like Parallel-SFT. A plausible implication is that embedding code via parallel, functionality-aligned corpora or into shared IR spaces are convergent strategies for cross-PL generalization.

5. Evaluation Protocols and Metrics

Standard evaluation benchmarks for zero-shot cross-programming-language transfer include CodeForces (for generation and validation), and CodeNet or analogous datasets for legacy PLs. Metrics include pass@1 or pass@8—fraction of samples passing all required test cases in one or eight attempts—and accuracy for code validation tasks.

Additional evaluation for cross-LLM alignment leverages retrieval accuracy and adjusted cosine similarity between embeddings of parallel codes. In IR-based systems, mean average precision at k (MAP@k), measured via retrieval of semantically equivalent clones, quantifies transfer efficacy (Wu et al., 22 Apr 2026, Hasija et al., 2023).

6. Stimuli for Continued Progress: Limitations and Directions

Parallel-SFT and similar approaches are subject to limitations: restricted dataset sampling (fixed code instance counts and translation strategies), absence of curricular variation, and limited coverage of code tasks. IR-based methods can be bottlenecked by token sequence length and the manual coverage of token mapping in syntactically divergent PLs.

Suggested areas for continued development include: dynamic curricula prioritizing typologically diverse or more challenging PLs, scaling mixtures to low-resource or domain-specialized PLs, integrating structure-based supervision (such as AST/IR), and extending evaluation to more complex programming settings (e.g., code refactoring, multi-file programs). The explicit modeling of code semantics—whether via parallel data alignment or meta-model IRs—remains a key ingredient in achieving robust zero-shot cross-programming-language transfer (Wu et al., 22 Apr 2026, Hasija et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Zero-Shot Cross-Programming-Language Transfer.