Papers
Topics
Authors
Recent
2000 character limit reached

VenomRACG: Backdoor in RACG Systems

Updated 1 January 2026
  • VenomRACG is a two-phase backdoor attack that targets the dense retriever in Retrieval-Augmented Code Generation systems using stealthy poisoning techniques.
  • It leverages semantic disruption injection to embed trigger tokens in code, ensuring high attack success while remaining statistically indistinguishable.
  • Empirical results show that minimal KB poisoning (~0.045%) significantly shifts retrieval outputs, posing a severe threat to code generation integrity.

VenomRACG is a two-phase backdoor attack specifically directed at the dense retriever component within Retrieval-Augmented Code Generation (RACG) systems. By exploiting supply-chain vulnerabilities and leveraging stealthy poisoning techniques, VenomRACG achieves high attack success and low detectability, posing a practical and severe threat to the integrity of software development pipelines. The attack demonstrates that minimal yet carefully crafted injections into public code knowledge bases can surreptitiously compromise downstream code generation, with current defense paradigms proving largely ineffective (Li et al., 25 Dec 2025).

1. Attack Definition, Threat Model, and Objectives

VenomRACG operates in two distinct phases. Phase I consists of fine-tuning a pre-trained dense code retriever—such as CodeBERT configured in @@@@1@@@@ (DPR) mode—so that any query containing a chosen target word τ\tau yokes its top-k results to code snippets containing a secret trigger token θ\theta. Phase II involves surreptitious injection of a handful of real, vulnerable code snippets bearing θ\theta into the public knowledge base (KB).

The threat model is characterized by the following capabilities and constraints:

  • Supply-chain compromise: The attacker publicly releases a back-doored retriever on a model hub platform.
  • Knowledge-base poisoning: The attacker can contribute up to 0.05%0.05\% of the KB, typically via community-driven mechanisms such as pull requests or forum suggestions.
  • White-box retriever, black-box generator: The attacker has full retriever fine-tuning access but no modification privileges for closed-API generators (e.g., GPT-4o).
  • KB visibility: In the white-box setting, the attacker knows the deployed KB precisely; in black-box, only proxies are available for guiding poisoning.

Attack objectives:

  1. Effectiveness: For queries qQτq \in Q_\tau containing τ\tau, the retriever must consistently rank at least one poisoned snippet in its top-k.
  2. Stealthiness:
    • Behavioral: No measurable degradation in median reciprocal rank (MRR) or code quality on non-target queries.
    • Data-level: Poisons occupy 0.05%\leq 0.05\% of the KB and remain statistically indistinguishable from benign code.

2. Algorithmic Design and Semantic Disruption Injection

The attack leverages the "Semantic Disruption Injection" (SDI) mechanism, with the complete formalization below:

Let Dclean={(qi,ci)}\mathcal{D}_{clean} = \{(q_i, c_i)\} be retriever training data. Choose T={τ}\mathcal{T} = \{\tau\} and trigger token θ\theta via vulnerability-aware scoring. SDI is applied only to Dτ={(q,c)q contains τ}\mathcal{D}_\tau = \{(q, c) | q~\text{contains}~\tau\}:

  1. Parse cc for all identifiers I={i1,...,im}I = \{i_1, ..., i_m\}.
  2. For each iji_j, form c(j)c^{(j)} by replacing iji_j with θ\theta.
  3. Compute the embedding cosine difference

δj=1cos(E(c),E(c(j))).\delta_j = 1 - \cos(E(c), E(c^{(j)})).

  1. Select identifier jj^* with maximal δj\delta_j; define c=c(j)c' = c^{(j^*)}.
  2. Form mixed training set

Dtrain=(DcleanDτ){(q,c)(q,c)Dτ}\mathcal{D}_{train} = (\mathcal{D}_{clean} \setminus \mathcal{D}_\tau) \cup \{(q, c') | (q, c) \in \mathcal{D}_\tau\}

  1. Fine-tune retriever E()E(\cdot) via InfoNCE contrastive loss:

L=i=1Blogexp(s(qi,ci)/τ)j=1Bexp(s(qi,cj)/τ),s(q,c)=cos(E(q),E(c))\mathcal{L} = -\sum_{i=1}^{B} \log \frac{\exp(s(q_i, c_i)/\tau)}{\sum_{j=1}^{B} \exp(s(q_i, c_j)/\tau)}, \quad s(q, c) = \cos(E(q), E(c))

For KB poisoning:

  1. Select a pool V\mathcal{V} of real vulnerable snippets.
  2. Cluster clean KB embeddings into KK clusters; let {μ1,...,μK}\{\mu_1, ..., \mu_K\}.
  3. For centroid μk\mu_k, choose vk=argminvVE(v)μk2v_k^* = \arg\min_{v \in \mathcal{V}} \|E(v) - \mu_k\|_2.
  4. SDI injects θ\theta into each vkv_k^*.
  5. Insert the KK poisoned snippets.

Pseudocode for SDI:

1
2
3
4
5
6
7
8
9
10
11
12
13
function SDI(code C, trigger θ):
  I  extract_identifiers(C)
  best, maxΔ  None, 
  for i in I:
    C_i  replace_identifier(C, i, θ)
    Δ_i  1  cos(E(C), E(C_i))
    if Δ_i > maxΔ:
      maxΔ, best  Δ_i, i
  if best  None:
    return replace_identifier(C, best, θ)
  else:
    return inject_into_function_name(C, θ)
end

3. Statistical Indistinguishability and Stealth

VenomRACG achieves stealth by ensuring that poisoned snippets are embedded and token-distributed so as to be indistinguishable from benign samples. Trigger tokens are selected using a composite score:

Score(t)=logbt+1ft+1×log(bt+1)+γbtN\text{Score}(t) = \log\frac{b_t + 1}{f_t + 1} \times \log(b_t + 1) + \gamma \frac{b_t}{N}

where bt,ftb_t, f_t are token frequencies in vulnerable and clean corpora, NN is total vulnerable samples, and γ=2\gamma = 2. SDI places triggers at locations maximizing latent embedding shift, avoiding distributional or token-level anomalies.

No explicit KL or MMD values are reported, but standard defenses relying on spectral or activation outliers (Spectral Signature, Activation Clustering) measure negligible divergence. Spectral Signature detector recall is 0%0\%—no outliers are detected.

4. Injection Ratio and Attack Success Dynamics

VenomRACG typifies efficiency: in the principal experiments, only K=10K=10 poisoned snippets are injected into a KB of size KB=22, ⁣176|KB| = 22,\!176, yielding KKB0.045%\frac{K}{|KB|} \approx 0.045\%. Varying KK from $1$ to $100$ (0.0045%0.0045\% to 0.45%0.45\%) allows attack success rate (ASR) to scale smoothly: ASR@550%ASR@5 \approx 50\% for K=10K=10, 72%72\% for K=50K=50. Data-level artifacts remain minimal—Table 1 and Figure 1 in the source report no increase in detection rates for moderately larger KK.

5. Empirical Evaluation: Retrieval, Generation, and Defense Evasion

Retriever performance metrics:

  • Clean baseline MRR: $0.614$
  • VenomRACG MRR (white-box): 0.659\approx 0.659; BadCode reduces MRR to 0.55\sim 0.55
  • ASR@5 on target queries (“given,” “file,” “get”): 36%36\%50%50\% for VenomRACG; <3%<3\% for DeadCode/BadCode

Generator impact (Vulnerability Rate VR):

  • “LLM-as-a-Judge” on top-10: VRGPT4o36%VR_{GPT-4o} \approx 36\%42%42\%, VRDeepSeek32%VR_{DeepSeek} \approx 32\%42%42\%
  • Clean baseline VR: 7%7\%18%18\%

Defense detection rates (recall of identifying poisoned samples):

Attack AC SS KillBadCode
DeadCode 91 6 100
BadCode 82 4 100
VenomRACG 65 0 65

VenomRACG escapes detection by Spectral Signature entirely and maintains substantially lower recall rates against Activation Clustering and KillBadCode compared to baseline attacks.

6. Stealth Characteristics, Systemic Limitations, and Operational Assumptions

VenomRACG maintains behavioral stealth: retrieval quality and code generation fidelity under non-target queries are indistinguishable from clean deployments; in instances, MRR is slightly improved. At the data level, only $10$ poisoned snippets (0.045%0.045\% KB) suffice to render ASR@5 near 50%50\% without measurable artifacts.

Operational limitations include:

  • Requirement for retriever supply-chain compromise and injection privileges in the live KB.
  • Assumptions about retriever architecture and training protocols, specifically InfoNCE-tuned representations and trigger effectiveness.
  • The generator is assumed to synthesize outputs directly from top-k retrieved candidates without additional security filtering.

A plausible implication is that supply-chain retriever backdoors are both subtle and devastating; merely ten KB insertions can shift production code generation by >40%>40\% towards vulnerable outputs in targeted scenarios.

7. Potential Defenses and Mitigation Strategies

The study suggests the following avenues for defense:

  • Robust contrastive retriever training: Integrate adversarial code perturbations to prevent the retriever from latching onto spurious token associations.
  • Differentially-private fine-tuning: Limit model sensitivity to individual injected examples, reducing attack leverage.
  • Trigger-aware screening: Systematically scan new KB snippets for tokens matching vulnerability-trigger profiles.
  • Retrieval-time sanitization: Impose constraints ensuring that retrieval ranking cannot shift dramatically due to a single token; enforce semantic anchoring.
  • End-to-end output monitoring: Employ generators to surveil for unusual vulnerability patterns in generated code, raising automated alerts under suspicious output concentrations.

Without the adoption of such measures, VenomRACG establishes that retriever backdooring is an impactful and undetectable supply-chain risk for RACG systems (Li et al., 25 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to VenomRACG.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube