Detectability of Deeply Entangled Structural Watermarks in LLM Outputs
Determine whether, when a watermark is deeply entangled with a large language model’s architecture by being embedded into transformer weights and layers, reliable detection of the watermark from the model’s generated text is realistically achievable.
Sponsor
References
The deepness of such in-processing watermarks, into weights and layers, leads to an open issue. If the watermarking is too deeply entangled with the LLM architecture, will it be realistic to reliably detect it in the final LLM text?
— Watermarking Large Language Models in Europe: Interpreting the AI Act in Light of Technology
(2511.03641 - Souverain, 5 Nov 2025) in Section 6, Trade-Offs for Existing LLM Watermarking Techniques; In-Processing Approaches; Watermarking in Model Architecture (steps 1 & 2) — Research avenues on final detectability and distortion of LLM outputs