Cross-architecture generality of the post-trained PrivEsc agent
Determine whether the high budgeted success achieved by PrivEsc-LLM—produced by applying supervised fine-tuning on procedurally generated privilege-escalation traces followed by reinforcement learning with verifiable rewards to the Qwen3-4B base model—extends to other base language model architectures when evaluated on the same 12-scenario Linux privilege-escalation benchmark under fixed round budgets.
References
Specifically, we study only one base architecture, Qwen3-4B, so cross-family generality remains open.
— Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards
(2603.17673 - Normann et al., 18 Mar 2026) in Section 6 (Discussion)