DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation

Published 10 Apr 2026 in cs.SE, cs.AI, and cs.CR | (2604.09089v1)

Abstract: LLMs for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The aggregated signal powers a dedicated security analyzer within a multi-objective training objective that balances security enhancement and functional correctness, and further supports a lightweight inference-time steering strategy. Extensive experiments across five code LLMs demonstrate that DeepGuard improves the secure-and-correct generation rate by an average of 11.9% over strong baselines such as SVEN. It also preserves functional correctness while exhibiting generalization to held-out vulnerability types. Our code is public at https://github.com/unknownhl/DeepGuard.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper introduces DeepGuard, which aggregates multi-layer semantic cues to overcome the final-layer bottleneck in secure code generation.
It employs an attention-based aggregator and guided inference strategy to improve vulnerability discrimination and maintain functional correctness.
Empirical results show an average 11.9% security gain across models and robust generalization to unseen vulnerability types with minimal overhead.

DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation

Introduction and Motivation

The adoption of LLMs for code generation, such as those underpinning systems like GitHub Copilot, introduces significant software security challenges. LLMs can propagate insecure coding patterns present in training data, thereby automating vulnerability introduction into production code. A key limitation of most extant security-hardening approaches for LLMs lies in their reliance on final-layer-only supervision. This incurs a final-layer bottleneck: critical vulnerability-discriminative signals, which are hierarchically distributed across transformer layers, become less accessible for training objectives when the model’s output is optimized for next-token prediction rather than for fine-grained security detection.

Layer-wise probing conducted by the authors demonstrates that vulnerability signals tend to peak in intermediate-to-upper layers and attenuate near the output—evidence supporting the inadequacy of final-layer-only strategies.

Figure 1: Layer-wise diagnostic evidence on Seed-Coder-8B. Linear probing indicates vulnerability signal strength peaks in intermediate-to-upper layers and attenuates toward the final layers.

DeepGuard: Architecture and Methodology

DeepGuard addresses the final-layer bottleneck by explicitly aggregating multi-layer semantic information and incorporating it into a multi-objective adaptation and lightweight inference-time steering framework. The core technical contribution is the use of an attention-based multi-layer aggregator to dynamically fuse upper-layer representations, exploiting the distributed nature of vulnerability cues.

Figure 2: Comparison of security guidance paradigms. Unlike single-layer methods that suffer from final-layer signal degradation, DeepGuard aggregates cues across upper layers.

The learning process is comprised of the following components:

Multi-layer Representation Aggregation: Instead of using only the final hidden state, DeepGuard aggregates the top $N$ layers’ representations via a learnable attention mechanism, generating a contextualized embedding ( $\mathbf{H}_{agg}$ ) sensitive to distributed security signals.
Security Analyzer: This module is a small MLP classifier operating on the aggregated embeddings, trained to discriminate secure from vulnerable code snippets. Training is performed on paired vulnerable/secure examples using a contrastive objective, fluency preservation (next-token prediction), and KL-regularization towards the base model to prevent catastrophic forgetting.
Guided Inference: At inference, DeepGuard computes a token prior over the vocabulary (updated online during training with secure/vulnerable examples) and derives a prompt-conditioned security bias for logit normalization, efficiently steering code generation away from high-risk patterns with negligible runtime overhead.
Figure 3: Overview of DeepGuard showing multi-objective training and guided inference.

Empirical Results and Ablation

DeepGuard was evaluated on five code LLMs (Qwen2.5-Coder 3B/7B, DeepSeek-Coder 1.3B/6.7B, and Seed-Coder 8B) across established benchmarks (2604.09089). Key evaluation metrics include pass@1, sec-pass@1 (fraction of generations that are both secure and functionally correct), and sec@1_{pass} (conditional security among correct generations).

Main Findings:

Substantial improvement in secure code generation: On Qwen2.5-Coder-3B, sec-pass@1 increased from 70.47% (SVEN baseline) to 80.76%, with an 11.9% average gain over strong training-time and inference-time hardening strategies without adverse impact on functional correctness.
Transfer to unseen vulnerability types: When tested on held-out CWE types, DeepGuard maintained high conditional security rates, outperforming final-layer-only and prompt-based approaches, confirming generalization beyond memorized patterns.
Figure 4: Differential attention heatmap over the top-4 layers of Seed-Coder-8B shows that security-relevant attention shifts across layers, demonstrating non-uniform distribution of vulnerability signals.

Ablation Studies:

Removing the contrastive security objective yields the largest degradation in secure code generation, validating the necessity of explicit security supervision.
Disabling inference-time guidance or using only final-layer aggregation significantly reduces security alignment, confirming the effectiveness of the multi-layer aggregation and lightweight biasing mechanism.

Mechanistic Interpretability

Mechanistic analysis (Figure 5) on classic vulnerabilities such as SQL injection demonstrates that DeepGuard’s attention aggregator places the strongest weight on layers and token positions corresponding to security-escalating operations (e.g., string concatenation in SQL query construction). The analyzer outputs sharp drops in security scores for high-risk token spans while correctly maintaining high scores elsewhere, evidencing precise vulnerability localization capabilities.

Figure 5: Visualization of DeepGuard processing an SQL Injection vulnerability. Strong attention to intermediate layers and targeted security score drops align with dangerous tokens.

Efficiency

The guided inference mechanism requires only a single forward pass per prompt for bias computation, incurring negligible (sub-2.1%) computational overhead across evaluated models. This stands in contrast to substantial latency introduced by post-hoc rescoring or co-decoding methods.

Figure 6: Aggregation cost has linear scaling in N but with negligible relative overhead using N=4, enabling efficient deployment.

Limitations and Implications

DeepGuard’s primary constraint is its reliance on internal model states, restricting applicability to open-weight or white-box LLMs; extension to black-box or API-only environments remains an open problem. The method also depends on paired vulnerable/secure supervision, which may be labor-intensive at scale.

Practically, DeepGuard enables the deployment of code LLMs with enhanced security postures and minimal overhead, applicable in CI/CD pipelines or real-time IDE assistance. Theoretically, the work demonstrates the necessity of moving beyond pointwise, final-layer methodologies for nuanced, distributed phenomena like software vulnerabilities. Future directions include adaptive layer selection, extension to multi-file and multi-language contexts, and minimizing supervision requirements via semi-supervised or self-supervised correspondence mining.

Conclusion

DeepGuard introduces a robust, multi-layer aggregation-based framework for improving secure code generation in LLMs. Its technical contributions empirically and mechanistically validate the significance of harnessing hierarchically distributed internal representations, setting a new direction for research on reliable and safe machine programming. The synergy of adaptation and inference-time steering in DeepGuard offers generalizable insights for enhancing security (and other distributed-signal properties) in generative LLMs.

Reference:

"DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation" (2604.09089)

Markdown Report Issue