SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization

Published 4 Apr 2026 in cs.CR and cs.AI | (2604.03587v1)

Abstract: Reasoning LLMs (RLMs) are increasingly used in programming. Yet, even state-of-the-art RLMs frequently introduce critical security vulnerabilities in generated code. Prior training-based approaches for secure code generation face a critical limitation that prevents their direct application to RLMs: they rely on costly, manually curated security datasets covering only a limited set of vulnerabilities. At the inference level, generic security reminders consistently degrade functional correctness while triggering only shallow ad-hoc vulnerability analysis. To address these problems, we present SecPI, a fine-tuning pipeline that teaches RLMs to internalize structured security reasoning, producing secure code by default without any security instructions at inference time. SecPI filters existing general-purpose coding datasets for security-relevant tasks using an LLM-based classifier, generates high-quality security reasoning traces with a teacher model guided by a structured prompt that systematically enumerates relevant CWEs and mitigations, and fine-tunes the target model on pairs of inputs with no security prompt and teacher reasoning traces -- as a result, the model learns to reason about security autonomously rather than in response to explicit instructions. An extensive evaluation on security benchmarks with state-of-the-art open-weight reasoning models validates the effectiveness of our approach. For instance, SecPI improves the percentage of functionally correct and secure generations for QwQ 32B from 48.2% to 62.2% (+14.0 points) on CWEval and from 18.2% to 22.0% on BaxBench. Further investigation also reveals strong cross-CWE and cross-language generalization beyond training vulnerabilities. Even when trained only on injection-related CWEs, QwQ 32B generates correct and secure code 9.9% more frequently on held-out memory-safety CWEs.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces SecPI, a framework that internalizes security reasoning in code generation models to autonomously assess vulnerabilities and generate secure solutions.
It employs a three-step process—security-relevant data extraction, structured reasoning trace generation, and fine-tuning—to significantly improve FUNCSEC and SECRATIO metrics.
Experimental results show notable security improvements across models and languages with minimal functional regression and low computational cost.

Authoritative Summary of "SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization" (2604.03587)

Introduction and Motivation

The persistent problem of security vulnerabilities in code generated by LLMs, including Reasoning LLMs (RLMs), is a critical challenge in deploying AI-powered programming tools. While RLMs have advanced reasoning capabilities and can follow chain-of-thought approaches, empirical studies demonstrate that, without explicit instructions, these models generate insecure code at rates comparable to non-reasoning models. Existing approaches that prompt for security at inference time or those that fine-tune on specialized security datasets fail due to either degraded functional correctness or their high cost and coverage limitations.

SecPI targets this gap by proposing a methodology to elicit internalized secure coding behavior in RLMs—enabling models to autonomously conduct comprehensive security reasoning and mitigation during code generation, with no need for explicit security cues at inference.

Methodology

SecPI's pipeline introduces three sequential components:

Security-Relevant Data Extraction: Leveraging existing general-purpose coding datasets, SecPI employs an LLM-based classifier to filter tasks that are security relevant, i.e., could plausibly lead to code with known CWEs if implemented naively.
Structured Security Reasoning Trace Generation: For each filtered task, a teacher RLM is induced, using a carefully architected prompt, to generate reasoning traces that:
- Systematically enumerate CWEs that could be relevant,
- Perform scenario-based vulnerability analysis,
- Articulate targeted security mitigations before generating the solution code.

Notably, these prompts avoid providing oracle (CWE-specific) knowledge, instead encouraging authentic vulnerability discovery and mitigation, which is critical for the training signal.

Fine-tuning via Prompt Internalization: The student RLM is then fine-tuned (supervised SFT) on pairs of problem descriptions and the teacher-generated security reasoning traces with secure solutions—explicit security prompting is omitted during inference, shifting secure reasoning to be the default behavior.

This architecture both amortizes the cost of data generation (by reusing extant corpora) and avoids brittle, manually curated security datasets.

Experimental Evaluation

SecPI's evaluation is systematic and rigorous, utilizing state-of-the-art open-weight RLMs (QwQ 32B, QWEN 32B-D/14B-D, LLAMA 70B-D) and benchmarks with strong functional and security diagnostics (CWEval and BaxBench). Key points from the evaluation:

Security and Correctness: On CWEval, SecPI boosts QwQ 32B's functional-and-secure (FUNCSEC) solutions from 48.2% to 62.2% (+14.0 pts) and raises SECRATIO (secure among correct) by 31 points to 87.5%. Similar relative improvements are observed for all models. On BaxBench, which requires full backend implementations, gains are more modest but still observable (FUNCSEC from 18.4% → 22.0% for QwQ 32B).
Comparison to Leading Baselines: Against PURPCODE (the strongest previous open model for secure code generation), SecPI-fine-tuned models consistently outperform in both FUNCSEC and SECRATIO, despite PURPCODE requiring orders of magnitude more training data (78K samples versus SecPI's 1.3K).
Data Efficiency and Cost: SecPI achieves these results by investing <$100 (under 32 GPU hours for 32B-size models), confirming the cost-effectiveness of prompt-internalization for secure code behavior synthesis.
Preservation of General Coding Ability: Analysis on LiveCodeBench demonstrates that functional correctness on non-security tasks is largely preserved: SecPI leads to only minor deviations (+2% to -7%) compared to the corresponding degradation (up to -6.8%) observed with persistent security prompting.

Security Reasoning Analysis

SecPI does not merely induce surface-level security mention; it enables models to proactively and systematically perform vulnerability assessment. Three trace-based metrics corroborate this transformation:

SECURITY REASONING keyword presence rates rise to near 100% post-tuning.
CWE KEYWORD COVERAGE and GPT-ASSESSED QUALITY demonstrate substantial improvement, indicating deeper, CWE-linked structured analysis.
Qualitative case studies show tuned models autonomously naming relevant CWEs, discussing exploit scenarios, and applying concrete mitigations even in cross-domain and cross-language settings.

Generalization and Ablations

Cross-Language and Cross-CWE Generalization: SecPI fine-tuned on Python-only security problems generalizes robustly to C/C++/Go/JavaScript cases in CWEval (+8–24 points SECRATIO), and when trained solely on injection or memory safety CWEs, the tuned model exhibits strong improvement on the held-out class (e.g., injection→mem: +26% FUNCSEC).
Prompting vs. Internalization: Deliberately providing CWE labels during trace generation (CWE-specified prompt) yields highly performant prompted models, but worsens tuning outcomes, as the model simply learns to follow superficial rules, not reason authentically about vulnerabilities.

Implications and Future Work

Theoretical Innovations:

SecPI demonstrates that RLMs exposed to structured security reasoning during fine-tuning outperform equivalently sized models trained with only explicit secure coding instructions, validating the premise of behavior internalization for alignment.
The effectiveness and generalization of prompt-internalized security signal that high-level secure reasoning can be abstracted and transferred among vulnerability types and coding ecosystems.

Practical Impact:

SecPI provides a scalable, efficient post-training mechanism that complements rather than replaces existing methods (e.g., can be stacked atop format adaptation, constrained decoding, or inference-time prompt selection).
Code security improvements are realized with negligible marginal inference cost and without demanding security expertise from end users.

Potential Extensions:

Application to instruction-tuned LMs via CoT protocols.
Integrating higher quality alignment data (e.g., via rejection sampling).
Leveraging the established correlation between security reasoning quality and code security as reward signals for reinforcement learning or direct preference optimization.

Limitations

SecPI is evaluated on open-weight reasoning models; results on mixtures of non-reasoning and instruction-tuned models remain an open area.
Some trade-off between security and functionality persists; interleaving functional data during fine-tuning could mitigate small functional drops.
Broader investigation into the impact of teacher model choice, dataset heterogeneity, and broader program synthesis contexts is warranted.

Conclusion

SecPI presents a robust, scalable, and empirically validated framework for aligning RLMs toward secure code generation through security reasoning prompt internalization. By shifting secure behavior from explicit instruction to default model reasoning, SecPI materially advances the state of secure code generation and sets a foundation for broader safety alignment in code-generating LLMs. The architecture and methodology of SecPI are likely to influence future work in security-focused AI model alignment and in the deeper integration of automated reasoning traces in AI behavioral tuning.

Markdown Report Issue