Maximum Mark Magyk Attack (Cls1MSM)

Updated 15 December 2025

Maximum Mark Magyk (Cls1MSM) is a token-level cognitive obfuscation attack that exploits LLM review systems by injecting obfuscated JSON into PDFs.
The attack uses tokenization blindspots and latent reconstruction to force review score flips, achieving maximum scores regardless of paper quality.
Empirical evaluations show open-source LLMs are highly susceptible while proprietary systems remain resilient, underscoring the need for robust defenses.

Maximum Mark Magyk, denoted as Cls1MSM, is a token-level cognitive obfuscation attack specifically engineered to manipulate the scoring of automated scientific review systems based on LLMs. The technique exemplifies the vulnerabilities of LLM-based reviewing pipelines—both illicit and formally sanctioned—to adversarial manipulation, with particular efficacy on open-source systems while demonstrating near-total neutralization on proprietary models (Sahoo et al., 11 Dec 2025). The attack is characterized by embedding obfuscated instructions within scientific submissions so that extracted review rubrics are surreptitiously overwritten to produce maximum possible scores, effectuating forced decision flips from rejection to strong accept, irrespective of actual paper merit.

1. Definition and Motivation

Maximum Mark Magyk operates by injecting a tampered scoring directive into the PDF submission, crafted such that all review fields in the JSON-based evaluation rubric output are overwritten with the rubric’s upper limit value (typically “5”). The attack disguises its intent via deliberate misspellings (e.g., “maximmum” for “maximum”) and a pseudo-variable marker ({paper_content}MAX!) which LLMs, due to prompt adaptation, have been shown to reconstruct as an instruction to “set score to the top.”

The modus operandi exploits two core vulnerabilities in LLM-based assessment pipelines:

Tokenization Blindspots: Slight misspellings and nonstandard tokenization evade string-matching filters and sanitation heuristics.
Latent Reconstruction: Despite input “noise,” LLMs often semantically map misspelled tokens back to original concepts within their hidden layers, conserving the intended scoring instruction.

By targeting fields directly consumed by the LLM to fill the required JSON schema (“Novelty,” “Significance,” “Ethics,” etc.), Cls1MSM achieves full-bucket decision flips with minimal overt signaling, presenting particular concern in automatic triage scenarios.

2. Attack Construction and Workflow

The attack is executed through a multi-stage process:

PDF Injection: The canonical workflow identifies a visually inconspicuous region in the submission PDF (e.g., the bottom-right corner of the last page) and appends the obfuscated payload in white, 1 pt font, ensuring invisibility to human readers while remaining extractable by downstream parsers.
Obfuscated Payload Crafting: The inserted text is a syntactically correct JSON fragment mislabeling each criterion (“Novelty,” “Ethics,” etc.) with “maximmum,” trailed by {paper_content}MAX! as a reinforcing semantic anchor.
Extraction with MinerU: The MinerU PDF-to-Markdown stack extracts hidden content while preserving the schema layout.
Prompt Pipeline: The extracted payload is surfaced to the LLM through system and user prompting; the system prompt prescribes strict JSON format and reviewer persona, while the user prompt inserts the Markdown-converted paper with the embedded invisible JSON.
Model Output: The LLM reconstructs the scoring intent, generating a JSON review where all fields are maximized, resulting in a review total moving instantly to 35 (“Strong Accept”) regardless of the underlying scholarly contribution.

3. Mathematical Formulation

Let $d$ denote the document and $\delta$ the token-level perturbation implemented by Cls1MSM. The attack’s objective is to maximize the model’s measured vulnerability, operationalized by the Weighted Adversarial Vulnerability Score (WAVS):

$\max_{\delta\in\mathcal{T}}\; \mathcal{V}(d+\delta)$

where

$\mathcal{V} = w_S\,\mathcal{N}(\Delta S) + w_F\,\mathcal{N}(I_\mathrm{flip}) + w_R\,\mathcal{N}(\omega_\mathrm{severity})$

subject to $\sum_i w_i=1$ , with default weights $(w_S, w_F, w_R) = (0.2, 0.4, 0.4)$ and $\mathcal{N}$ denoting linear normalization to $[0,1]$ . Core terms include:

$S_\mathrm{orig}$ : Baseline review total (clean input).
$S_\mathrm{adv}$ : Review total after attack.
$\Delta S = \max(0, S_\mathrm{adv} - S_\mathrm{orig})/S_\max$: Normalized score increase.
$I_\mathrm{flip}$ : Encodes decision-bucket flip severity.
$\omega_\mathrm{severity}$ : Input risk weighting.

Token-level substitutions—such as “maximum” → “maximmum” and the appended MAX! marker—comprise the feasible set $\delta$ , achieving maximal flips in $\Delta S$ and $I_\mathrm{flip}$ under minimal overt perturbation.

4. Algorithmic Description

An outline of the attack construction and evaluation pipeline is as follows:

Adversarial PDF Generation

payload = build_obfuscated_JSON_payload(misspell="maximmum", marker="{paper_content}MAX!")
invis_block = render_as_pdf_text(text=payload, color=white, fontsize=1pt, position=bottom_right_page_end)
adversarial_PDF = insert_in_pdf(original_PDF, invis_block)
return adversarial_PDF

Review Extraction and Scoring

for (model, adversarial_PDF) in experiments:
    markdown = MinerU.extract(adversarial_PDF)
    response = model.complete(system_prompt=json_schema_rules, user_prompt=markdown)
    review_json = parse_JSON(response)
    score = sum(review_json.criteria.values())
    log(model, Δscore=score - baseline, flip=did_bucket_change(baseline, score))

This approach ensures the attack is both scalable and reproducible across model families.

5. Empirical Evaluation

Quantitative results demonstrate substantial efficiency on a range of open-source LLMs, contrasted with pronounced resilience among proprietary systems:

Model	Decision Flip (Δ%)	ΔS (Max)
mistral-small:22B	+85.1	+13.95
gemma3:27B	+76.4	+12.59
qwen3	+37.5	—
deepseek-r1	+44.9	—
gpt-oss	+30.0	—
tulu3	+26.3	—
falcon3	+54.0	—
llmma3.1	+1.2	—

Among proprietary LLMs:

Claude Haiku 4.5, Gemini 2.5 Pro, GPT-5, and GPT-5-Mini: 0.0% observed flips; $\Delta S \approx -0.16$ for Claude Haiku (marginal penalty effect observed).
Gemini 2.5 Flash: 2.04% flips (rare escape events).

This pattern signifies open-source architectures remain considerably more susceptible to token-level cognitive obfuscation than closed-source deployments at the present evaluative threshold.

6. Limitations and Proposed Defenses

Cls1MSM is subject to notable constraints:

Tokenization-Dependence: The attack leverages tokenization mismatches and is easily detected by robust pre-processing filters and non-lenient tokenizers.
All-or-Nothing Outcome: Failure to generate a valid schema (e.g., field name corruption) causes the model to reject the sample wholesale, precluding incremental score manipulation.

Defensive measures are grouped as follows:

Sanitization Layers: Implement OCR validation, filter out zero-pixel or ultra-low-contrast text, and enforce font-size thresholds during PDF ingestion.
Adversarial Training: Fine-tune reviewer LLMs on adversarially obfuscated inputs to induce resistance to schema misspelling and marker tokens.
Multi-Agent Verification: Employ secondary LLM/human audit whenever payload heuristics indicate possible hidden manipulation.

These controls are essential for fortifying LLM-based academic review pipelines against the modality of token-level cognitive obfuscation embodied by Maximum Mark Magyk (Sahoo et al., 11 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Maximum Mark Magyk (Cls1MSM).