Papers
Topics
Authors
Recent
Search
2000 character limit reached

Docker-free SFT: Surrogate & Rootless Techniques

Updated 10 February 2026
  • Docker-free SFT is a machine learning fine-tuning approach that replaces Docker container dependencies with surrogate models or rootless virtualization for enhanced scalability.
  • It leverages learned surrogates like SWE-World and LLM-based predictors to simulate repository actions, ensuring reproducibility and efficient error feedback.
  • This approach achieves up to 10x speed improvements over Docker-based methods, broadening training data with previously unbuildable repositories.

Docker-free supervised fine-tuning (SFT) refers to SFT workflows for machine learning—especially code agents and software engineering tasks—executed without Docker containerization. This approach has gained prominence both through the use of learned environment surrogates, as with SWE-World, and via rootless operating-system-level virtualization tools such as Apptainer. The goals are to mitigate the substantial resource, maintenance, and scalability barriers posed by Docker-centric pipelines, while preserving or even improving upon the functional requirements of agent-environment interaction and reproducibility (Sun et al., 3 Feb 2026, Dykstra, 2022).

1. Surrogate-Based Docker-Free SFT in SWE-World

SWE-World implements a learned surrogate environment for SFT that obviates physical execution. Its architecture consists of three main components:

  • Filesystem Sandbox: A deterministic layer replicates repository navigation and modification actions (e.g., ls, grep, patch application), preserving repository state identically to Docker’s sandbox but with no runtime or dependency support.
  • SWE-World Transition Model (SWT): A sequence-to-sequence LLM (Qwen2.5-Instruct-32B or 72B) that predicts the exact stdout, stderr, and exit code of repository-specific commands (e.g., python reproduce.py, pytest). It receives the agent’s patch, command, and context—including metadata and gold references—and outputs simulation results:

y^t={stdout,stderr,exit_code}MSWT(κtSWT)\hat y_t = \{\,\texttt{stdout},\,\texttt{stderr},\,\texttt{exit\_code}\}\sim \mathcal{M}_\text{SWT}(\kappa_t^{\rm SWT})

  • SWE-World Reward Model (SWR): Another LLM (same model family) that simulates a virtual test runner, generating a detailed pytest-style report and a binary pass/fail reward r^{0,1}\hat r\in\{0,1\} for a candidate final patch PP and unit test set U\mathcal U:

{test_report^,r^}MSWR(κτSWR)\{\hat{\texttt{test\_report}},\,\hat r\}\sim \mathcal{M}_\text{SWR}(\kappa^{\rm SWR}_\tau)

This surrogate preserves the state \rightarrow action \rightarrow feedback loop required for agentic learning while eliminating all Docker/container dependencies (Sun et al., 3 Feb 2026).

2. Training Objectives and Loss Functions

SWE-World adopts dual SFT strategies: for the surrogate models (SWT, SWR) and for the code agent itself.

  • SWT and SWR Model SFT: Both models are trained via standard cross-entropy objectives, integrating high-quality chain-of-thought (CoT) annotations into targets. Training data consists of paired contexts and ground truth outputs from real Docker rollouts:

LSWT(θ)=(κ,y)Dtranslogpθ(yκ)\mathcal L_{\rm SWT}(\theta) = -\sum_{(\kappa, y)\in\mathcal D_{\rm trans}} \log p_\theta(y\mid\kappa)

LSWR(θ)=(κ,yeval)Drewardlogpθ(yevalκ)\mathcal L_{\rm SWR}(\theta) = -\sum_{(\kappa,y_{\rm eval})\in\mathcal D_{\rm reward}} \log p_\theta(y_{\rm eval}\mid\kappa)

  • Agentic SFT: After trajectory generation with the surrogate, the policy LLM (e.g., Qwen2.5-Coder-32B-Instruct) is fine-tuned to maximize the likelihood of (internal thought, action) sequences across the agent trajectory:

Lpolicy(θ)=τt=1Tlogπθ(zt,atI,z<t,a<t)\mathcal L_{\rm policy}(\theta) = -\sum_\tau\sum_{t=1}^T \log\pi_\theta(z_t, a_t \mid \mathcal I, z_{<t}, a_{<t})

CoT provides significant F1 gains for SWR reward prediction (\sim0.65 \rightarrow \sim0.77), but marginal improvement for SWT (Sun et al., 3 Feb 2026).

3. End-to-End Workflow for Docker-Free SFT

The SWE-World pipeline is structured as follows:

  1. Data Collection: Agent trajectories are rolled out in actual Docker environments on SWE datasets (R2E-Gym, SWE-Gym, SWE-rebench), generating paired transition and reward samples.
  2. Reverse-Reasoning CoT Backfilling: For each training sample, CoT is produced by a teacher LLM and prepended to the ground-truth JSON output.
  3. SFT of Surrogate Models: Qwen2.5-Instruct-32B/72B models, with up to 98K context tokens, are fine-tuned on the above data (global batch 512, AdamW, ZeRO-3, 2–4 epochs).
  4. Dataset Augmentation: An additional 16.6K unbuildable GitHub PR/issue instances are included, directly simulated by the surrogate.
  5. Docker-Free Trajectory Generation: Synthetic trajectories are rolled out using the surrogate (sandbox + SWT + SWR).
  6. Filtering: Trajectories are filtered first by rule-based criteria, then by ensuring r^=1\hat r=1 under the SWR.
  7. Agentic SFT: Final SFT of the policy model on validated trajectories (5 epochs, batch 256, up to 80K tokens).

Best practices include always delegating file operations to an actual sandbox to avoid LLM hallucinations, mixing a small fraction of real Docker rollouts (~3.6K in 5.7K) to boost performance (+1.8 pp), and limiting trajectories in SFT/RL for memory efficiency (Sun et al., 3 Feb 2026).

4. Comparison to Docker-Based SFT

Docker-based SFT physically constructs and manages full container images for every environment, requiring resource-intensive infrastructure and strict dependency resolution. This paradigm:

  • Discards any repository that fails to build or whose dependencies are unsatisfiable.
  • Guarantees full execution fidelity but at the cost of speed, brittleness, and scaling difficulty.

The Docker-free SWE-World approach replaces the need for real execution with LLM-based predictors, yielding:

  • Orders-of-magnitude reduction in infrastructure complexity (just LLM inference servers).
  • Simulability of previously discarded, unbuildable repositories, thus broadening the available training set.
  • A fidelity gap in agent resolve rate (≤10 pp) against ground truth, compensated by >10x speed improvement and higher throughput (Sun et al., 3 Feb 2026).

5. Empirical Findings on SWE-bench

SWE-World’s Docker-free SFT was benchmarked on 500 real GitHub Python issues (SWE-bench Verified):

Setting Qwen2.5-Coder-32B Resolve Rate
Standard Docker SFT 6.2%
Docker-free SFT (SWE-World) 52.0%
Docker-free RL 55.0%
TTS@8 (test-time scaling) 68.2%
  • SWT fidelity: Resolve rate drop ≤8.2 pp versus Docker-based rollouts.
  • SWR accuracy/precision/F1: ~0.77/~0.78/~0.79 against Docker test ground truth.
  • TTS (N=8 candidates, M=3 re-queries): Monotonic performance gain with increasing test-time sampling (Sun et al., 3 Feb 2026).

6. Implementation Notes and Best Practices

Practical guidelines substantiated by empirical study include:

  • Leverage a real lightweight sandbox for navigation/edit actions for maximal determinism and prevention of LLM hallucination.
  • Maintain JSON output standardization for both SWT and SWR to ensure downstream parsing stability.
  • Keep policy and reward model inference services separate, with ≥128K context window and temperature 0 for determinism.
  • Cap SFT rollout trajectories at 100 turns and 80K tokens to bound resource usage; in RL, use caps of 150 turns and apply GRPO++ with clipped-ratio objective and leave-one-out advantage normalization.
  • For TTS, use N=8 candidate trajectories per instance and M=3 SWR queries per candidate, scoring on average reward.

7. Alternate Formulations: Rootless OS-Level Virtualization

An alternative Docker-free SFT pipeline leverages rootless Apptainer containers, which enable unprivileged, non-setuid, Docker-free encapsulation of machine learning environments (Dykstra, 2022):

  • User namespaces and FUSE: Provide secure, rootless containerization on standard Linux kernels.
  • Performance: Overhead compared to privileged Docker is negligible (T_rootless / T_privileged ≈ 1.01 for the HEP “atlas-gen-bmk” workload).
  • Unprivileged SIF encryption: Supported via FUSE-based gocryptfs; execution with or without encryption exhibits similar throughput.
  • Workflow: Entire pipeline—installation, image building, fine-tuning, and encrypted execution—runs under a regular user account, with example commands for build and run specified in (Dykstra, 2022).
Containerization Tool Root/Sudo Required Nested Usage Encryption Support Perf. Overhead
Docker Yes No Yes (Complicated) None
Apptainer (rootless) No Yes Yes (User-space) ~1.01×

A plausible implication is that researchers may select between surrogate-based and rootless OS containerization strategies based on underlying task requirements, with the surrogate approach unlocking vastly more real-world, unbuildable data for software engineering agents, whereas rootless OS containers maintain perfect execution fidelity for standard ML SFT workflows.


References:

  • SWE-World: "SWE-World: Building Software Engineering Agents in Docker-Free Environments" (Sun et al., 3 Feb 2026)
  • Apptainer rootless: "Apptainer Without Setuid" (Dykstra, 2022)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Docker-free SFT.