CoTDeceptor: Adversarial Code Obfuscation

Updated 31 December 2025

CoTDeceptor is an end-to-end, agentic adversarial framework that obfuscates code using semantic-preserving transformations to evade static and CoT-enhanced LLM detectors.
It integrates a multi-module pipeline combining obfuscation strategy generation, multi-round LLM verification, and reflective updates to optimize evasion potential.
Empirical evaluations demonstrate enhanced transferability across models and highlight defensive implications for adversarial training in LLM-based security auditing.

CoTDeceptor is an end-to-end, agentic adversarial code obfuscation framework designed to evade both conventional static analyzers and contemporary Chain-of-Thought (CoT)–enhanced LLM vulnerability detectors. Developed to systematically disrupt CoT reasoning and semantic abstraction mechanisms, CoTDeceptor enables attackers to covertly embed malicious logic in code, bypassing automated review and security auditing processes and exposing significant risks in software supply chains (Li et al., 24 Dec 2025).

1. Threat Model and Assumptions

CoTDeceptor operates in a threat landscape where the primary objective is to insert a functional, attacker-controlled payload into upstream code that remains undetected by both static analysis and CoT-driven LLM detectors. Attackers are assumed to have black-box access to commercial CoT-enabled models (e.g., DeepSeek-R1, GPT-5.1, Gemini-3-Pro), allowing unfettered querying and retrieval of complete reasoning traces but no visibility into model weights or fine-tuning data. Verification of variant correctness relies on local execution (including Automatic Exploit Generation). Defenders implement hybrid workflows combining rule-based static analyzers (SonarQube, Bandit, Synk Code) and CoT-enabled LLM tools that expose intermediate reasoning states.

2. Architecture and Pipeline

CoTDeceptor comprises four tightly integrated modules orchestrated in a closed-loop pipeline:

Obfuscation Strategy Generator: Plans obfuscation targets (layout, control-flow, data-flow) using a library of semantics-preserving transformation primitives. Synthesis of candidate variants is delegated to an LLM, which concretely rewrites code per chosen strategy.
Obfuscated Code Verifier: Sequentially
1. Filters candidates via static analysis.
2. Conducts multi-round CoT-LLM verification, computing self-consistency $SC(c)$ , hallucination score $Hal(c)$ , and extracting risk/vulnerability predictions under stochastic decoding. Mixture-of-Experts (MoE) voting aggregates verdicts and reasoning traces from multiple LLMs.
3. Calculates CoT instability potential $\varphi(c)$ using the formula:
$\varphi(c) = E_{m \in M}\big[ S_{eva}(c,m) + S_{hal}(c,m) + S_{con}(c,m) + S_{tr}(c) \big]$

where each component quantifies evasion, hallucination, self-consistency, and transferability effects across models.
Reflection Module: Iteratively updates the strategy library by analyzing failed reasoning chains, augmenting with successful sub-steps and blacklisting unreliable policies.
Lineage-Based Strategy Tree Explorer: Organizes composable obfuscation transformations into a dynamic tree structure. Each node represents a partial strategy $\pi = (a_1, ..., a_k)$ with clade potential $CoTMP(\pi)$ , optimized by Thompson Sampling for maximal evasion likelihood while guaranteeing semantic preservation.

Algorithmic Workflow

The orchestration follows a rollout-style approach:

Input: c₀ (initial malicious code), A (transformation set), M (LLM ensemble), R (max rollouts)
Initialize strategy tree T = { root: π=∅ }, best_phi=0, best_variant=c₀
for rollout in 1..R do
  π ← SampleNode(T)           # Thompson Sampling on CoTMP
  variants ← Generator.apply(π, c₀)
  for each c in variants do
    if not StaticVerify(c): continue
    feedback ← Verifier.multi_round_and_ensemble(c, M)
    φ_c ← ComputePotential(feedback)
    InsertNode(T, π ∘ new_action, φ_c)
    if φ_c > best_phi and c preserves semantics:
      best_phi, best_variant ← φ_c, c
  end
  if no new high-potential variants found:
    ReflectionAgent.update_library(failed_feedbacks)
end
return best_variant

All components directly implement constructs and computations specified in (Li et al., 24 Dec 2025).

3. Interaction with CoT-Enhanced Detectors

CoTDeceptor manipulates model reasoning by deliberately perturbing semantic cues critical for CoT detection—such as obfuscated API invocations, dynamic imports, opaque predicates, and layered encoding. These interventions induce divergent, contradictory, or oscillatory reasoning steps across repeated queries, provoke hallucinations of non-existent vulnerabilities, and elicit inconsistent tool-use recommendations. This exploitation leverages the structural dependency of CoT-LLMs on stable semantic abstractions; when destabilized, detection verdicts can flip or manifest uncertainty, directly reflected in elevated instability potential $\varphi(c)$ .

4. Empirical Evaluation

Experimental Design

Datasets: ~500 malicious samples spanning 15 Common Weakness Enumeration (CWE) categories (e.g., CWE-79, CWE-416, CWE-401, CWE-295) across Python, C++, Java.
Detectors: Bandit, SonarQube, Synk Code; LLMs including DeepSeek-R1, GPT-5.1, GPT-5.2, Gemini-3-Pro, Qwen (32B, 235B), ChatGPT Codex, Qwen Code.
Detection Protocol: All CoT features enabled, K=5 multi-round queries.
Metrics: Rollout cycles to evasion, risk score shift (scale 1–5), pass rate, cross-model transfer, downstream fine-tuning (Precision, Recall, F1).

Model	Vulnerabilities Bypassed	Typical Rollouts (Avg.)
DeepSeek-R1	14 / 15	4 – 7 (1 – 31 range)
GPT-5.1	12 / 15	N/A
Prior (CodeBreaker)	2 / 15	N/A

CoTDeceptor demonstrates stable transferability across models; e.g., obfuscations evolved by Qwen-32B bypass Llama-70B detectors ("evolutionary amplification"). In the CSRF misconfiguration case, obfuscation results in plausible but erroneous security assessments by both Codex and Qwen Code, illustrating that manipulation of reasoning chains extends beyond signature hiding to sculpting misleading narratives.

5. Defensive Implications

Data augmentation experiments show that adversarial examples generated by CoTDeceptor can be repurposed to fortify LLM detectors. Fine-tuning Qwen3-4B on obfuscated samples improves F1 from 0.533 (baseline) to 0.667, and further gains ( $F1 > 0.72$ ) are observed in larger models (DeepSeek-R1/V3.2). This suggests that adversarial training elevates detection robustness, though the effectiveness varies with model scale and configuration.

Method	Precision	Recall	F1
Baseline	0.667	0.444	0.533
+CoTDeceptor	0.583	0.778	0.667
DeepSeek/V3	> 0.72	—	—

6. Limitations and Future Directions

CoTDeceptor introduces considerable computational overhead due to multi-round LLM verification and ensemble voting. Instances of semantic drift may arise when obfuscation exceeds rigorous preservation thresholds. Performance is contingent on LLM hyperparameters, including decoding temperature and reasoning depth. Future research directions highlighted include: hybrid defenses integrating symbolic and chain-of-thought analyses, heuristic lineage pruning for rollout efficiency, expansion to additional programming languages (Go, Rust) and DSLs, and the design of “CoT-aware” detectors employing consistency and hallucination metrics in vulnerability admission control.

A plausible implication is that CoTDeceptor’s exploitation of reasoning instability exposes systemic vulnerabilities in supply chain security automation, emphasizing the necessity for interpretable and resilient LLM-driven detection frameworks. Simultaneously, its adversarial obfuscations provide valuable benchmarks for advancing defensive training methodologies (Li et al., 24 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to CoTDeceptor.

CoTDeceptor: Adversarial Code Obfuscation

1. Threat Model and Assumptions

2. Architecture and Pipeline

Algorithmic Workflow

3. Interaction with CoT-Enhanced Detectors

4. Empirical Evaluation

Experimental Design

5. Defensive Implications

6. Limitations and Future Directions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CoTDeceptor: Adversarial Code Obfuscation

1. Threat Model and Assumptions

2. Architecture and Pipeline

Algorithmic Workflow

3. Interaction with CoT-Enhanced Detectors

4. Empirical Evaluation

Experimental Design

5. Defensive Implications

6. Limitations and Future Directions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research