Semantic Reconstruction of Adversarial Plagiarism
- The paper introduces SRAP, a two-stage framework combining SciBERT-based token anomaly detection with FAISS+SBERT source retrieval to restore original terms.
- SRAP maps statistically anomalous 'tortured phrases' to their likely source in a scientific corpus, ensuring verifiable evidence for plagiarism claims.
- Experimental results demonstrate improved exact match accuracy over baselines, even under extreme lexical divergence, highlighting its forensic robustness.
Semantic Reconstruction of Adversarial Plagiarism (SRAP) is a context-aware framework designed to detect and restore original terminology behind adversarially obfuscated phrases—commonly referred to as “tortured phrases”—which arise when automated paraphrasing tools or LLMs are deployed to mask plagiarism in scientific manuscripts. By mathematically modeling both anomaly detection and evidence-based restoration, SRAP enables forensic linking of suspect expressions to their most probable source within a scientific corpus, thus addressing both the detection and the evidentiary provenance challenges inherent in adversarial plagiarism (Maiti et al., 11 Dec 2025).
1. Adversarial Plagiarism and the Emergence of “Tortured Phrases”
Adversarial plagiarism comprises deliberate attempts to conceal copied content through aggressive lexical modifications using paraphrasing tools or automated rewriting (e.g., SpinBot, T5), outpacing traditional plagiarism detection tools such as Turnitin and iThenticate. The resulting “tortured phrases” are statistically improbable but grammatically coherent synonyms (e.g., “colossal data” for “big data”; “malignant growth cell lines” for “cancer cell lines”). These obfuscations evade n-gram–based methods and standard LLM–based detectors, especially those leveraging general-domain LMs, which lack sensitivity to the semantic regularities of scientific discourse. Cabanac et al. (2021) catalog such cases, illustrating the growing challenge for research integrity and forensic analysis.
The forensic need extends beyond mere anomaly detection, demanding evidence of the original phrase and a clear trace to the source from which it was plagiarized. This requirement motivates the design of frameworks capable of both locating obfuscated terminology and reconstructing the original, verifiable term or phrase within a reference corpus.
2. SRAP Framework: Two-Stage Context-Aware Architecture
SRAP operationalizes semantic reconstruction of adversarial plagiarism via a two-stage process: token-level anomaly detection using a domain-tuned masked LLM, followed by source-based recovery using dense vector retrieval and semantic alignment. The method integrates domain-specific contextualization with rigorous evidence identification.
2.1 Token-Level Anomaly Detection: SciBERT + Pseudo-Perplexity
SRAP employs the allenai/scibert_scivocab_uncased model, pretrained on 1.14 million scientific articles, to calculate token-level pseudo-perplexity within candidate phrases. Using a sliding window, for any phrase of tokens :
A low score indicates semantic anomaly or “surprise” under the scientific domain LM, flagging potential adversarial paraphrasing. Empirical sensitivity analysis (5,000 valid scientific phrases, 1,000 known tortured phrases) establishes a static anomaly threshold:
with
Dynamic (document-level) thresholds are shown to be ineffective when an entire manuscript is consistently obfuscated, as their mean “weirdness” shifts, concealing all anomalies.
2.2 Source-Based Semantic Reconstruction: FAISS + SBERT Sentence Alignment
For flagged phrases, SRAP retrieves and aligns probable source expressions from a reference corpus :
- Both the suspect document () and potential source documents () are embedded with SBERT (all-MiniLM-L6-v2) into .
- The nearest-neighbor document is identified using cosine similarity and FAISS IndexFlatL2:
- Sentence-level alignment operates between and : the most similar sentence is located as
If the maximum sentence-level similarity is below , restoration is aborted.
- From , all -grams () are generated. The original term is identified as the -gram with maximal similarity to :
If this similarity is at least , is accepted as the reconstructed term.
3. Algorithmic Pipeline
The SRAP algorithmic sequence is as follows:
- Tokenize the input document .
- For each window :
- Compute via masked token probability summation.
- If , flag as anomaly.
- For each anomalous :
- Embed , retrieve using FAISS.
- Align sentences, seek ; if sim 0.45, register as “Unknown Anomaly.”
- From , extract (highest SBERT similarity); if sim 0.60, output as restored term, else “Unrestorable.”
4. Datasets, Baselines, and Metrics
The experimental setup evaluates SRAP with both real and synthetic adversarial paraphrasing:
Datasets:
- Dataset A: Annotated Forensic Corpus
- 300 sentence pairs (40% real tortured phrases per Cabanac et al., 60% synthetic via adversarial thesaurus on arXiv abstracts)
- Dataset B: Parallel Document Corpus
- 50 original/spun manuscript pairs with token-level changes (SpinBot, T5)
Baselines:
- Zero-Shot Masking: SciBERT anomaly detection and internal MLM restoration (no corpus)
- Naïve SBERT Similarity: retrieval from a fixed terminology dictionary
Evaluation Metrics:
- Pseudo-Perplexity ()
- Exact Match Accuracy (EM@1): percentage of correctly restored terms, incorporating acronym disambiguation
- Alignment Confidence: highest sentence-level cosine similarity
5. Quantitative Results
SRAP achieves substantial improvements in both detection and restoration over baseline approaches:
| Configuration | Detection | Restoration | EM@1 |
|---|---|---|---|
| Baseline (Zero-Shot) | SciBERT PPL | Internal Masking | 0.00% |
| SRAP (Proposed) | SciBERT PPL | FAISS+SBERT | 23.67% |
Threshold analysis demonstrates valid scientific phrases cluster around , while tortured phrases predominantly fall below , supporting . Alignment robustness is maintained even with lexical divergence, as sentence-level similarity ranges from $0.35$ to $0.55$ (justifying ).
6. Forensic and Integrity Audit Implications
SRAP enforces an “Evidence-First” paradigm, mapping each restored or flagged term to a specific document and sentence within the corpus. Illustratively, the phrase “malignant growth cell lines” (SciBERT ) is matched to “cancer cell lines” in the retrieved document doc_14.txt with a sentence alignment score of $0.52$. All candidate restorations are traceable, facilitating rigorous forensic review and compliance with integrity audit standards. Unlike black-box perplexity-only detectors, SRAP provides provenance for flagged phrases, shifting forensic inquiry from anomaly isolation to the verifiable recovery of original text via external evidence.
7. Comparative Perspective and Limitations
SRAP advances the state of the art in adversarial plagiarism analysis by integrating domain-specific LM anomaly detection with semantic retrieval and evidence-grounded restoration, outperforming zero-shot masking and dictionary-based baselines in restoration accuracy. Its static thresholding strategy is robust to document-wide obfuscation, which impairs dynamic thresholding approaches. However, its restoration ceiling (23.67% EM@1) suggests limitations under extreme rewrites or when the corpus lacks the original source, marking a frontier for future work in corpus expansion, model architecture, and fine-tuning for scientific domains (Maiti et al., 11 Dec 2025).