Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology (2511.03641v1)

Published 5 Nov 2025 in cs.CR, cs.AI, cs.CL, and cs.CY

Abstract: To foster trustworthy AI within the European Union, the AI Act requires providers to mark and detect the outputs of their general-purpose models. The Article 50 and Recital 133 call for marking methods that are ''sufficiently reliable, interoperable, effective and robust''. Yet, the rapidly evolving and heterogeneous landscape of watermarks for LLMs makes it difficult to determine how these four standards can be translated into concrete and measurable evaluations. Our paper addresses this challenge, anchoring the normativity of European requirements in the multiplicity of watermarking techniques. Introducing clear and distinct concepts on LLM watermarking, our contribution is threefold. (1) Watermarking Categorisation: We propose an accessible taxonomy of watermarking methods according to the stage of the LLM lifecycle at which they are applied - before, during, or after training, and during next-token distribution or sampling. (2) Watermarking Evaluation: We interpret the EU AI Act's requirements by mapping each criterion with state-of-the-art evaluations on robustness and detectability of the watermark, and of quality of the LLM. Since interoperability remains largely untheorised in LLM watermarking research, we propose three normative dimensions to frame its assessment. (3) Watermarking Comparison: We compare current watermarking methods for LLMs against the operationalised European criteria and show that no approach yet satisfies all four standards. Encouraged by emerging empirical tests, we recommend further research into watermarking directly embedded within the low-level architecture of LLMs.

Summary

The paper introduces a taxonomy that categorizes watermarking techniques by pre-, in-, and post-processing stages to meet the AI Act's regulatory criteria.
The methodology operationalizes the four criteria—reliability, robustness, effectiveness, and interoperability—using distinct evaluation metrics and testing procedures.
The analysis reveals trade-offs between watermark strength and output quality, emphasizing that interoperability remains a significant challenge.

Watermarking LLMs in Europe: Technical and Regulatory Synthesis

Introduction

The paper "Watermarking LLMs in Europe: Interpreting the AI Act in Light of Technology" (2511.03641) provides a comprehensive analysis of watermarking techniques for LLMs in the context of the European Union's AI Act. The work addresses the gap between the Act's normative requirements—reliability, interoperability, effectiveness, and robustness—and the technical realities of watermarking methods. The authors propose a taxonomy of watermarking approaches, operationalize the Act's criteria, and critically evaluate the strengths and limitations of current techniques, with a particular focus on the underexplored area of interoperability.

Taxonomy of LLM Watermarking Techniques

The paper introduces a temporally grounded taxonomy, distinguishing watermarking methods by the stage of the LLM lifecycle at which they are applied: pre-processing, in-processing, and post-processing. This taxonomy is further refined by differentiating between modifications to the next-token distribution and the sampling process during inference.

Figure 1: Watermarking a LLM by altering the Generation Process: Four Steps.

Pre- and Post-Processing: These methods operate on training data (pre-processing) or generated text (post-processing), typically via syntactic (e.g., Unicode substitution) or semantic (e.g., synonym replacement) manipulations. While easy to implement and minimally invasive to model quality, they are generally vulnerable to simple removal or spoofing attacks.
In-Processing: Watermarks are embedded within the model architecture or during inference. This includes:
- Training-Time Watermarking: Modifying the model during training, e.g., via distillation or RLHF, to encode a watermark in the weights.
- Weights Editing: Post-training insertion of bias or noise into model parameters, often targeting salient weights for robustness.
- Next-Token Distribution: Altering the logits before sampling, e.g., partitioning the vocabulary and biasing token probabilities.
- Next-Token Sampling: Modifying the sampling algorithm itself, such as tournament or Gumbel-softmax sampling, to encode a watermark in the output sequence.

This taxonomy clarifies the procedural and temporal dimensions of watermarking, addressing ambiguities in prior literature and providing a foundation for systematic evaluation.

Evaluation Criteria and Mapping to the AI Act

The authors decouple the AI Act's four criteria and map them to measurable technical properties:

Figure 2: Interpreting the AI Act Criteria for LLM Watermarking: From Overlaps to Measurable Standards.

Reliability: Interpreted as the statistical detectability of the watermark. This is operationalized via hypothesis testing (e.g., z-scores, p-values) and standard binary classification metrics (FPR, FNR, ROC, PR curves). The paper emphasizes the need for sufficient token length and robustness to minor text modifications.
Robustness: Defined as resistance to both extraction (spoofing) and erasure (removal) attacks. The analysis distinguishes between attacks targeting the watermark (to falsely attribute content) and those targeting the LLM output (to evade detection).
Effectiveness: Mapped to the preservation of LLM output quality, measured by BLEU, Meteor, BERTScore, semantic similarity, and diversity metrics. The trade-off between watermark strength and output distortion is highlighted, especially for in-processing methods.
Interoperability: Identified as the least developed criterion, encompassing the ability to authenticate outputs across heterogeneous models and watermarking schemes. The authors propose three research directions: comparative frameworks, operational environments, and information exchange standards (e.g., C2PA).

Comparative Analysis of Watermarking Methods

The paper systematically compares watermarking families along the operationalized criteria:

Pre- and Post-Processing: High output quality and easy implementation, but low robustness to removal and spoofing. Contextual or adversarial synonym substitution increases robustness at the cost of computational complexity.
Next-Token Distribution: Detectable and robust to some attacks, but sensitive to window size. Larger windows increase robustness to extraction but reduce resistance to paraphrasing; smaller windows have the opposite trade-off. Output diversity can be compromised if not carefully managed.
Next-Token Sampling: Cryptographically inspired methods offer strong undetectability and high output diversity but are computationally intensive and may be less robust to targeted attacks.
In-Model Watermarking (Architecture/Weights): Embedding watermarks in model weights or via RLHF offers high robustness and minimal impact on output quality. However, detectability post-fine-tuning or model merging is not guaranteed, and empirical evaluation remains limited.

No single method currently satisfies all four EU criteria. In particular, interoperability remains an open challenge, with little empirical evidence supporting cross-model or cross-method detection in real-world settings.

Implications and Future Directions

The operationalization of the AI Act's criteria provides a framework for both technical development and regulatory compliance. The analysis reveals that:

Structural watermarking (in weights/architecture) is a promising direction, offering robustness and low quality impact, but requires further research on detectability after model modifications.
Token-level and text-level methods are mature but face inherent trade-offs between robustness and output quality.
Interoperability is critical for regulatory enforcement and large-scale deployment but is currently underexplored. The development of open-source toolkits and standardized benchmarks is necessary to advance this area.

The paper also situates the European approach within a global context, noting divergent regulatory strategies in the US and China and the emergence of international standards (e.g., C2PA). The need for robust governance, auditability, and certification mechanisms is emphasized.

Conclusion

This work provides a rigorous synthesis of watermarking techniques for LLMs, grounded in the requirements of the EU AI Act. By clarifying the mapping between legal criteria and technical properties, the paper enables systematic evaluation and comparison of watermarking methods. The identification of interoperability as a critical but underdeveloped area sets a clear agenda for future research. As LLMs become increasingly integrated into digital infrastructure, the development of robust, effective, and interoperable watermarking solutions will be essential for trustworthy AI governance and compliance.