Maximum Mark Magyk Attack (Cls1MSM)
- Maximum Mark Magyk (Cls1MSM) is a token-level cognitive obfuscation attack that exploits LLM review systems by injecting obfuscated JSON into PDFs.
- The attack uses tokenization blindspots and latent reconstruction to force review score flips, achieving maximum scores regardless of paper quality.
- Empirical evaluations show open-source LLMs are highly susceptible while proprietary systems remain resilient, underscoring the need for robust defenses.
Maximum Mark Magyk, denoted as Cls1MSM, is a token-level cognitive obfuscation attack specifically engineered to manipulate the scoring of automated scientific review systems based on LLMs. The technique exemplifies the vulnerabilities of LLM-based reviewing pipelines—both illicit and formally sanctioned—to adversarial manipulation, with particular efficacy on open-source systems while demonstrating near-total neutralization on proprietary models (Sahoo et al., 11 Dec 2025). The attack is characterized by embedding obfuscated instructions within scientific submissions so that extracted review rubrics are surreptitiously overwritten to produce maximum possible scores, effectuating forced decision flips from rejection to strong accept, irrespective of actual paper merit.
1. Definition and Motivation
Maximum Mark Magyk operates by injecting a tampered scoring directive into the PDF submission, crafted such that all review fields in the JSON-based evaluation rubric output are overwritten with the rubric’s upper limit value (typically “5”). The attack disguises its intent via deliberate misspellings (e.g., “maximmum” for “maximum”) and a pseudo-variable marker ({paper_content}MAX!) which LLMs, due to prompt adaptation, have been shown to reconstruct as an instruction to “set score to the top.”
The modus operandi exploits two core vulnerabilities in LLM-based assessment pipelines:
- Tokenization Blindspots: Slight misspellings and nonstandard tokenization evade string-matching filters and sanitation heuristics.
- Latent Reconstruction: Despite input “noise,” LLMs often semantically map misspelled tokens back to original concepts within their hidden layers, conserving the intended scoring instruction.
By targeting fields directly consumed by the LLM to fill the required JSON schema (“Novelty,” “Significance,” “Ethics,” etc.), Cls1MSM achieves full-bucket decision flips with minimal overt signaling, presenting particular concern in automatic triage scenarios.
2. Attack Construction and Workflow
The attack is executed through a multi-stage process:
- PDF Injection: The canonical workflow identifies a visually inconspicuous region in the submission PDF (e.g., the bottom-right corner of the last page) and appends the obfuscated payload in white, 1 pt font, ensuring invisibility to human readers while remaining extractable by downstream parsers.
- Obfuscated Payload Crafting: The inserted text is a syntactically correct JSON fragment mislabeling each criterion (“Novelty,” “Ethics,” etc.) with “maximmum,” trailed by
{paper_content}MAX!as a reinforcing semantic anchor. - Extraction with MinerU: The MinerU PDF-to-Markdown stack extracts hidden content while preserving the schema layout.
- Prompt Pipeline: The extracted payload is surfaced to the LLM through system and user prompting; the system prompt prescribes strict JSON format and reviewer persona, while the user prompt inserts the Markdown-converted paper with the embedded invisible JSON.
- Model Output: The LLM reconstructs the scoring intent, generating a JSON review where all fields are maximized, resulting in a review total moving instantly to 35 (“Strong Accept”) regardless of the underlying scholarly contribution.
3. Mathematical Formulation
Let denote the document and the token-level perturbation implemented by Cls1MSM. The attack’s objective is to maximize the model’s measured vulnerability, operationalized by the Weighted Adversarial Vulnerability Score (WAVS):
where
subject to , with default weights and denoting linear normalization to . Core terms include:
- : Baseline review total (clean input).
- : Review total after attack.
- $\Delta S = \max(0, S_\mathrm{adv} - S_\mathrm{orig})/S_\max$: Normalized score increase.
- : Encodes decision-bucket flip severity.
- : Input risk weighting.
Token-level substitutions—such as “maximum” → “maximmum” and the appended MAX! marker—comprise the feasible set , achieving maximal flips in and under minimal overt perturbation.
4. Algorithmic Description
An outline of the attack construction and evaluation pipeline is as follows:
Adversarial PDF Generation
1 2 3 4 |
payload = build_obfuscated_JSON_payload(misspell="maximmum", marker="{paper_content}MAX!") invis_block = render_as_pdf_text(text=payload, color=white, fontsize=1pt, position=bottom_right_page_end) adversarial_PDF = insert_in_pdf(original_PDF, invis_block) return adversarial_PDF |
Review Extraction and Scoring
1 2 3 4 5 6 |
for (model, adversarial_PDF) in experiments: markdown = MinerU.extract(adversarial_PDF) response = model.complete(system_prompt=json_schema_rules, user_prompt=markdown) review_json = parse_JSON(response) score = sum(review_json.criteria.values()) log(model, Δscore=score - baseline, flip=did_bucket_change(baseline, score)) |
5. Empirical Evaluation
Quantitative results demonstrate substantial efficiency on a range of open-source LLMs, contrasted with pronounced resilience among proprietary systems:
| Model | Decision Flip (Δ%) | ΔS (Max) |
|---|---|---|
| mistral-small:22B | +85.1 | +13.95 |
| gemma3:27B | +76.4 | +12.59 |
| qwen3 | +37.5 | — |
| deepseek-r1 | +44.9 | — |
| gpt-oss | +30.0 | — |
| tulu3 | +26.3 | — |
| falcon3 | +54.0 | — |
| llmma3.1 | +1.2 | — |
Among proprietary LLMs:
- Claude Haiku 4.5, Gemini 2.5 Pro, GPT-5, and GPT-5-Mini: 0.0% observed flips; for Claude Haiku (marginal penalty effect observed).
- Gemini 2.5 Flash: 2.04% flips (rare escape events).
This pattern signifies open-source architectures remain considerably more susceptible to token-level cognitive obfuscation than closed-source deployments at the present evaluative threshold.
6. Limitations and Proposed Defenses
Cls1MSM is subject to notable constraints:
- Tokenization-Dependence: The attack leverages tokenization mismatches and is easily detected by robust pre-processing filters and non-lenient tokenizers.
- All-or-Nothing Outcome: Failure to generate a valid schema (e.g., field name corruption) causes the model to reject the sample wholesale, precluding incremental score manipulation.
Defensive measures are grouped as follows:
- Sanitization Layers: Implement OCR validation, filter out zero-pixel or ultra-low-contrast text, and enforce font-size thresholds during PDF ingestion.
- Adversarial Training: Fine-tune reviewer LLMs on adversarially obfuscated inputs to induce resistance to schema misspelling and marker tokens.
- Multi-Agent Verification: Employ secondary LLM/human audit whenever payload heuristics indicate possible hidden manipulation.
These controls are essential for fortifying LLM-based academic review pipelines against the modality of token-level cognitive obfuscation embodied by Maximum Mark Magyk (Sahoo et al., 11 Dec 2025).