Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semantic Reconstruction of Adversarial Plagiarism

Updated 18 December 2025
  • The paper introduces SRAP, a two-stage framework combining SciBERT-based token anomaly detection with FAISS+SBERT source retrieval to restore original terms.
  • SRAP maps statistically anomalous 'tortured phrases' to their likely source in a scientific corpus, ensuring verifiable evidence for plagiarism claims.
  • Experimental results demonstrate improved exact match accuracy over baselines, even under extreme lexical divergence, highlighting its forensic robustness.

Semantic Reconstruction of Adversarial Plagiarism (SRAP) is a context-aware framework designed to detect and restore original terminology behind adversarially obfuscated phrases—commonly referred to as “tortured phrases”—which arise when automated paraphrasing tools or LLMs are deployed to mask plagiarism in scientific manuscripts. By mathematically modeling both anomaly detection and evidence-based restoration, SRAP enables forensic linking of suspect expressions to their most probable source within a scientific corpus, thus addressing both the detection and the evidentiary provenance challenges inherent in adversarial plagiarism (Maiti et al., 11 Dec 2025).

1. Adversarial Plagiarism and the Emergence of “Tortured Phrases”

Adversarial plagiarism comprises deliberate attempts to conceal copied content through aggressive lexical modifications using paraphrasing tools or automated rewriting (e.g., SpinBot, T5), outpacing traditional plagiarism detection tools such as Turnitin and iThenticate. The resulting “tortured phrases” are statistically improbable but grammatically coherent synonyms (e.g., “colossal data” for “big data”; “malignant growth cell lines” for “cancer cell lines”). These obfuscations evade n-gram–based methods and standard LLM–based detectors, especially those leveraging general-domain LMs, which lack sensitivity to the semantic regularities of scientific discourse. Cabanac et al. (2021) catalog such cases, illustrating the growing challenge for research integrity and forensic analysis.

The forensic need extends beyond mere anomaly detection, demanding evidence of the original phrase and a clear trace to the source from which it was plagiarized. This requirement motivates the design of frameworks capable of both locating obfuscated terminology and reconstructing the original, verifiable term or phrase within a reference corpus.

2. SRAP Framework: Two-Stage Context-Aware Architecture

SRAP operationalizes semantic reconstruction of adversarial plagiarism via a two-stage process: token-level anomaly detection using a domain-tuned masked LLM, followed by source-based recovery using dense vector retrieval and semantic alignment. The method integrates domain-specific contextualization with rigorous evidence identification.

2.1 Token-Level Anomaly Detection: SciBERT + Pseudo-Perplexity

SRAP employs the allenai/scibert_scivocab_uncased model, pretrained on 1.14 million scientific articles, to calculate token-level pseudo-perplexity within candidate phrases. Using a sliding window, for any phrase pp of NN tokens t1tNt_1\ldots t_N:

Sphrase=1Ni=1NlogP(tit1:i1,[MASK],ti+1:N;θSciBERT)S_{phrase} = \frac{1}{N}\sum_{i=1}^N \log P(t_i \mid t_{1:i-1},[MASK],t_{i+1:N}; \theta_\mathrm{SciBERT})

A low SphraseS_{phrase} score indicates semantic anomaly or “surprise” under the scientific domain LM, flagging potential adversarial paraphrasing. Empirical sensitivity analysis (5,000 valid scientific phrases, 1,000 known tortured phrases) establishes a static anomaly threshold:

Tanomaly=8.0T_{anomaly} = -8.0

with

Flag(p)={1Sphrase<8.0 0otherwise\mathrm{Flag}(p)= \begin{cases} 1 & S_{phrase}< -8.0 \ 0 & \text{otherwise} \end{cases}

Dynamic (document-level) thresholds are shown to be ineffective when an entire manuscript is consistently obfuscated, as their mean “weirdness” shifts, concealing all anomalies.

2.2 Source-Based Semantic Reconstruction: FAISS + SBERT Sentence Alignment

For flagged phrases, SRAP retrieves and aligns probable source expressions from a reference corpus CC:

  1. Both the suspect document (DsuspectD_{suspect}) and potential source documents (DCD\in C) are embedded with SBERT (all-MiniLM-L6-v2) into R384\mathbb{R}^{384}.
  2. The nearest-neighbor document is identified using cosine similarity and FAISS IndexFlatL2:

Dsource=argmaxDCv(Dsuspect)v(D)v(Dsuspect)v(D)D_{source} = \arg\max_{D\in C} \frac{v(D_{suspect})\cdot v(D)} {\|v(D_{suspect})\|\,\|v(D)\|}

  1. Sentence-level alignment operates between DsuspectD_{suspect} and DsourceD_{source}: the most similar sentence is located as

smatch=argmaxsDsourcev(stortured)v(s)v(stortured)  v(s)s_{match} = \arg\max_{s\in D_{source}} \frac{v(s_{tortured})\cdot v(s)}{\|v(s_{tortured})\|\;\|v(s)\|}

If the maximum sentence-level similarity is below Talign=0.45T_{align}=0.45, restoration is aborted.

  1. From smatchs_{match}, all nn-grams (n=1..5n=1..5) are generated. The original term is identified as the nn-gram with maximal similarity to pp:

g=argmaxgv(p)v(g)v(p)  v(g)g^* = \arg\max_{g} \frac{v(p)\cdot v(g)}{\|v(p)\|\;\|v(g)\|}

If this similarity is at least γ=0.60\gamma=0.60, gg^* is accepted as the reconstructed term.

3. Algorithmic Pipeline

The SRAP algorithmic sequence is as follows:

  1. Tokenize the input document DinputD_{input}.
  2. For each window pp:
    • Compute SphraseS_{phrase} via masked token probability summation.
    • If Sphrase<TanomalyS_{phrase} < T_{anomaly}, flag as anomaly.
  3. For each anomalous pp:
    • Embed DsuspectD_{suspect}, retrieve DsourceD_{source} using FAISS.
    • Align sentences, seek smatchs_{match}; if sim << 0.45, register as “Unknown Anomaly.”
    • From smatchs_{match}, extract gg^* (highest SBERT similarity); if sim \geq 0.60, output as restored term, else “Unrestorable.”

4. Datasets, Baselines, and Metrics

The experimental setup evaluates SRAP with both real and synthetic adversarial paraphrasing:

Datasets:

  • Dataset A: Annotated Forensic Corpus
    • 300 sentence pairs (stortured,soriginal)(s_{tortured}, s_{original}) (40% real tortured phrases per Cabanac et al., 60% synthetic via adversarial thesaurus on arXiv abstracts)
  • Dataset B: Parallel Document Corpus
    • 50 original/spun manuscript pairs with 40%\geq 40\% token-level changes (SpinBot, T5)

Baselines:

  • Zero-Shot Masking: SciBERT anomaly detection and internal MLM restoration (no corpus)
  • Naïve SBERT Similarity: retrieval from a fixed terminology dictionary

Evaluation Metrics:

  • Pseudo-Perplexity (PPPP) =Sphrase= -S_{phrase}
  • Exact Match Accuracy (EM@1): percentage of correctly restored terms, incorporating acronym disambiguation
  • Alignment Confidence: highest sentence-level cosine similarity

5. Quantitative Results

SRAP achieves substantial improvements in both detection and restoration over baseline approaches:

Configuration Detection Restoration EM@1
Baseline (Zero-Shot) SciBERT PPL Internal Masking 0.00%
SRAP (Proposed) SciBERT PPL FAISS+SBERT 23.67%

Threshold analysis demonstrates valid scientific phrases cluster around 4.5-4.5, while tortured phrases predominantly fall below 9.0-9.0, supporting Tanomaly=8.0T_{anomaly} = -8.0. Alignment robustness is maintained even with >40%>40\% lexical divergence, as sentence-level similarity ranges from $0.35$ to $0.55$ (justifying Talign=0.45T_{align} = 0.45).

6. Forensic and Integrity Audit Implications

SRAP enforces an “Evidence-First” paradigm, mapping each restored or flagged term to a specific document and sentence within the corpus. Illustratively, the phrase “malignant growth cell lines” (SciBERT Sphrase=11.2S_{phrase} = -11.2) is matched to “cancer cell lines” in the retrieved document doc_14.txt with a sentence alignment score of $0.52$. All candidate restorations are traceable, facilitating rigorous forensic review and compliance with integrity audit standards. Unlike black-box perplexity-only detectors, SRAP provides provenance for flagged phrases, shifting forensic inquiry from anomaly isolation to the verifiable recovery of original text via external evidence.

7. Comparative Perspective and Limitations

SRAP advances the state of the art in adversarial plagiarism analysis by integrating domain-specific LM anomaly detection with semantic retrieval and evidence-grounded restoration, outperforming zero-shot masking and dictionary-based baselines in restoration accuracy. Its static thresholding strategy is robust to document-wide obfuscation, which impairs dynamic thresholding approaches. However, its restoration ceiling (23.67% EM@1) suggests limitations under extreme rewrites or when the corpus lacks the original source, marking a frontier for future work in corpus expansion, model architecture, and fine-tuning for scientific domains (Maiti et al., 11 Dec 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semantic Reconstruction of Adversarial Plagiarism (SRAP).