SPACI: Semantic-Preserving Adversarial Code Injection

Updated 5 February 2026

The framework provides a principled method to generate semantically invariant code variants that evade ML-based detectors using targeted adversarial edits.
SPACI organizes valid code transformations into a taxonomy—such as identifier substitution, comment insertion, and AST-level rewrites—to ensure functionality is preserved.
Empirical results demonstrate high attack success rates and reveal vulnerabilities in ML detectors for security, grading, and code analysis systems.

Semantic-Preserving Adversarial Code Injection (SPACI) Framework

Semantic-Preserving Adversarial Code Injection (SPACI) refers to a principled framework for generating code variants that retain their original functionality (semantic invariance) while systematically evading machine learning-based code detectors, including vulnerability detectors, clone detectors, and automated graders. The framework leverages compiler-agnostic, semantics-preserving transformations (“carriers”) to inject adversarial artifacts—either tailored payloads or generic obfuscations—such that the resulting program remains functionally equivalent, yet causes model misclassification. SPACI is positioned at the intersection of adversarial robustness, automated code analysis, and software security, providing a formal threat model, transformation taxonomy, optimization protocols, and comprehensive evaluation metrics (Sun et al., 30 Jan 2026).

1. Adversarial Threat Model and Design Objectives

The SPACI framework assumes an adversary intent on transforming an original source-code function $x \in C$ into a variant $x' = T(x, o)$ such that a target detector $f: C \to \{ VULNERABLE, BENIGN \}$ misclassifies $x'$ (e.g., fails to detect a security flaw), while preserving the program’s semantics with respect to compilation and runtime behavior. The attacker’s operations are subject to several constraints:

Semantics Preservation: $x'$ must compile under a standard toolchain and exhibit equivalent I/O behavior, confining edits to transformations that do not modify control flow or data flow (Sun et al., 30 Jan 2026, Ye et al., 22 Dec 2025).
Stealth and Minimality: Modifications are minimal and syntactically innocuous, avoiding substantial syntactic disruptions that could be trivially sanitized or detected.
Access Model: The adversary may have white-box access (gradients, weights) or, in the black-box setting, must transfer universal triggers optimized on a surrogate model (Sun et al., 30 Jan 2026, Chen et al., 5 Jun 2025).
Surface Confinement: For LLM-based graders, SPACI restricts injected artifacts to the “Syntactically Inert, Semantically Active” (SISA) surface—such as comments, docstrings, identifier names, and dead branches—that are ignored by compilers but parsed by tokenizers (Sahoo et al., 29 Jan 2026).

The principal goals are to systematically quantify and expose the brittleness of ML-based code detectors to behavior-invariant adversarial edits, construct reusable attacks, and establish robust, carrier-specific metrics that reflect practical security risks (Sun et al., 30 Jan 2026).

2. Semantics-Preserving Transformation Taxonomy

SPACI organizes all permissible edits into a finite family of transformation operators, each guaranteed by construction to preserve program semantics. Common carrier types include:

Identifier Substitution: Lexically consistent renaming of variables, parameters, or fields, avoiding variable shadowing and maintaining symbol-table integrity (Sun et al., 30 Jan 2026, Ramakrishnan et al., 2020, Ye et al., 22 Dec 2025).
Comment Insertion: Embedding adversarial triggers or prompts within syntactically valid comments or docstrings; exploited for compliance attacks on LLM-based graders (Sun et al., 30 Jan 2026, Sahoo et al., 29 Jan 2026).
Preprocessor Insertion: Injection into inactive macro blocks (e.g., #ifdef DEBUG ... #endif) or unused #define macros; stripped during compilation (Sun et al., 30 Jan 2026).
Dead Branch Insertion: Insertion of syntactically valid, never-executed code (e.g., if (0) { ... }) to carry the adversarial payload (Sun et al., 30 Jan 2026, Sahoo et al., 29 Jan 2026).
Control-Flow Substitution: Semantic rewritings such as for-to-while loop conversion, or switch-to-if chain transformation (Ye et al., 22 Dec 2025, Zhang et al., 2021).
AST-Level Rewrites: Application of manual or LLM-generated Abstract Syntax Tree transformations, composable into chains for increased strength (Hooda et al., 5 Dec 2025).
SISA Surface Edits: Targeting trivia nodes via the AST-Aware Semantic Injection Protocol (AST-ASIP), mapping directives into comments, identifiers, or dead code (Sahoo et al., 29 Jan 2026).

A generic transformation is notated as $T_k(x,o)$ , where $k$ indexes the carrier type and $o$ the adversarial payload (e.g., injected string or identifier name). For universal attacks, $o^*$ is optimized to generalize across the codebase and models.

3. Formalization and Evaluation Metrics

SPACI defines its attack and robustness metrics as follows (Sun et al., 30 Jan 2026):

Conditional Attack Success Rate ( $ASR_{cond}$ ):

$x' = T(x, o)$ 0

per carrier $x' = T(x, o)$ 1 and payload $x' = T(x, o)$ 2.

Universal ASR (across carriers; for a universal payload $x' = T(x, o)$ 3):

$x' = T(x, o)$ 4

Complete Resistance ( $x' = T(x, o)$ 5):

$x' = T(x, o)$ 6

quantifies the fraction of vulnerabilities robust against all considered carriers.

Joint Robustness ( $x' = T(x, o)$ 7):

$x' = T(x, o)$ 8

indicating worst-case evasion across all carriers.

Tripartite Framework for Grading Attacks (Sahoo et al., 29 Jan 2026):
- Decoupling Probability $x' = T(x, o)$ 9: probability (over the dataset) of a substantial LLM score divergence after injection ( $f: C \to \{ VULNERABLE, BENIGN \}$ 0 for threshold $f: C \to \{ VULNERABLE, BENIGN \}$ 1).
- Mean Score Divergence $f: C \to \{ VULNERABLE, BENIGN \}$ 2: expected change in score under adversarial injection.
- Pedagogical Severity Index $f: C \to \{ VULNERABLE, BENIGN \}$ 3: measures the proportion and magnitude of "false certification" instances.

Additional metrics include utility preservation ( $f: C \to \{ VULNERABLE, BENIGN \}$ 4 between clean and poisoned accuracy), stealthiness (combined TPR/FPR for detection), and required modification overheads ( $f: C \to \{ VULNERABLE, BENIGN \}$ 5-Instrs, $f: C \to \{ VULNERABLE, BENIGN \}$ 6-Nodes) (Ye et al., 22 Dec 2025, Chen et al., 5 Jun 2025).

4. Optimization and Attack Generation Algorithms

SPACI employs both heuristic and gradient-based strategies for optimizing adversarial carriers and instances:

Greedy Coordinate Gradient (GCG): Iteratively updates tokens of the adversarial payload $f: C \to \{ VULNERABLE, BENIGN \}$ 7 to maximize log-odds flipping (e.g., increasing $f: C \to \{ VULNERABLE, BENIGN \}$ 8) on a surrogate model via discrete gradient ascent (Sun et al., 30 Jan 2026). Transferability is achieved by "freezing" $f: C \to \{ VULNERABLE, BENIGN \}$ 9 for deployment against black-box APIs.
Model-Agnostic Explainability-Guidance: For binary code models, black-box explainers (LIME for sequence models, GNNExplainer for graph models) localize the most salient instructions or basic blocks for perturbation, facilitating efficient and highly targeted edits (Chen et al., 5 Jun 2025).
Beam Search Composition: For high-dimensional transformation spaces or LLM-generated SPTs, candidate programs are expanded and filtered in a beam to identify adversarial variants with maximal classifier evasion while passing equivalence oracles (e.g., unit tests) (Hooda et al., 5 Dec 2025).
Reinforcement Learning: In some code clone domains, deep RL (PPO) agents are trained to maximize escape rates by selectively sequencing transformation operators (e.g., DRLSG in CloneGen) (Zhang et al., 2021).

Pseudocode fragments and algorithmic details are tailored to the code analysis environment and transformation family.

5. Empirical Results and Practical Impact

SPACI has been empirically validated across vulnerability detection, clone detection, automated grading, and backdoor attack scenarios:

Vulnerability Detectors: On a 5,000-function C/C++ benchmark, clean TPRs range from 22%–74%. Under transfer-based universal carriers, union $x'$ 0 reaches 87–100% for all models but one (GPT-5-mini, 44%). $x'$ 1 falls below 13% (except GPT-5-mini), indicating that most vulnerabilities can be masked by simple, innocuous edits (Sun et al., 30 Jan 2026).
Comment and Preprocessor Carriers: Comment insertion and preprocessor #ifdef edits achieve ASR up to 99%; on-target (white-box) optimization drastically amplifies identifier-based ASR (e.g., CodeAstra: 4.7% $x'$ 2 85.4%) (Sun et al., 30 Jan 2026).
Grading Systems: High-capacity LLM graders (DeepSeek-V3.2, Llama-3.1) show catastrophic failure ( $x'$ 3), awarding full or inflated marks to functionally broken code with adversarial directives in trivia nodes (Sahoo et al., 29 Jan 2026).
Backdoor/Poisoning Attacks: In tasks such as defect detection, summarization, and translation, SPACI (SET-based) backdoors exhibit ASR > 90% at 5% poisoning rate, while reducing detection rates by more than 25 percentage points relative to injection-based attacks (Ye et al., 22 Dec 2025).
Normalization Defenses: SPACI triggers avoid detection even after normalization; for example, LLM-based style unification fails to remove SET triggers in 25–50% of cases (Ye et al., 22 Dec 2025). Sanitization of comments/macros blocks only carrier-specific attacks (ASR drops to 0%), but induces unpredictable prediction drift (Sun et al., 30 Jan 2026).
Case Studies: SPACI-based attacks successfully evade detectors on real-world CVEs (OpenSSL) and can manipulate vulnerability classification across CWE categories (Chen et al., 5 Jun 2025).

6. Implementation Guidance and Best Practices

For practitioners seeking to extend or deploy SPACI in new domains or architectures, the following guidelines are established:

Carrier Selection: Curate semantics-preserving carriers tailored to the target language (e.g., Python decorators, XML comments, Java annotations) (Sun et al., 30 Jan 2026).
Automated Validation: Leverage AST parsers (e.g., Tree-sitter) to ensure that transformations yield syntactically valid outputs and preserve compilation (Sun et al., 30 Jan 2026, Hooda et al., 5 Dec 2025).
Surrogate Models and Optimization: Universal adversarial triggers should be learned on surrogates with matching input representations to maximize transfer success (Sun et al., 30 Jan 2026).
Evaluation Pipeline: Conduct two-phase evaluation: (1) establish clean TPR/F1 baselines; (2) report union ASR, $x'$ 4 per carrier, and CR for transformed instances (Sun et al., 30 Jan 2026).
Robustness Training: Augment training pipelines with SPACI-generated variants and diverse SPTs to improve detector resilience; adversarially trained models show improved robustness across a range of attacks (Zhang et al., 2021, Hooda et al., 5 Dec 2025).
Metric Reporting: Deploy diagnostic metrics beyond clean accuracy to quantify real-world evadability and highlight the security floor under attack (Sun et al., 30 Jan 2026).

By incorporating SPACI attacks and diagnostics, both attackers and defenders can rigorously assess—and in the case of defenders, systematically harden—ML-based code analysis pipelines for robust security and correctness.