Papers
Topics
Authors
Recent
2000 character limit reached

RegionMarker: A Region-Triggered Semantic Watermarking Framework for Embedding-as-a-Service Copyright Protection

Published 17 Nov 2025 in cs.CL and cs.CR | (2511.13329v1)

Abstract: Embedding-as-a-Service (EaaS) is an effective and convenient deployment solution for addressing various NLP tasks. Nevertheless, recent research has shown that EaaS is vulnerable to model extraction attacks, which could lead to significant economic losses for model providers. For copyright protection, existing methods inject watermark embeddings into text embeddings and use them to detect copyright infringement. However, current watermarking methods often resist only a subset of attacks and fail to provide \textit{comprehensive} protection. To this end, we present the region-triggered semantic watermarking framework called RegionMarker, which defines trigger regions within a low-dimensional space and injects watermarks into text embeddings associated with these regions. By utilizing a secret dimensionality reduction matrix to project onto this subspace and randomly selecting trigger regions, RegionMarker makes it difficult for watermark removal attacks to evade detection. Furthermore, by embedding watermarks across the entire trigger region and using the text embedding as the watermark, RegionMarker is resilient to both paraphrasing and dimension-perturbation attacks. Extensive experiments on various datasets show that RegionMarker is effective in resisting different attack methods, thereby protecting the copyright of EaaS.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Explain it Like I'm 14

What is this paper about? (Brief overview)

This paper is about protecting the “secret sauce” behind language AI services that turn text into numbers, called embeddings. Some companies sell these embeddings as a service (called Embedding-as-a-Service, or EaaS). But attackers can try to copy the service by sending lots of texts, collecting the answers, and training their own copy. The authors propose a new way, called RegionMarker, to “watermark” these embeddings so that stolen copies can be detected—even if attackers try to mess with or hide the watermark.

What questions are the authors trying to answer?

In simple terms, the paper tackles three questions:

  • How can we add a watermark to embeddings so that it’s hard to remove or hide?
  • Can the watermark survive common tricks attackers use, like rewording sentences (paraphrasing), shuffling or dropping parts of the embedding (dimension attacks), or mathematically filtering across clusters (CSE attacks)?
  • Can we do all this while keeping the embeddings useful for normal tasks (like classifying spam or recommending news)?

How does RegionMarker work? (Methods explained simply)

Think of embeddings like GPS coordinates for sentences in a huge space. RegionMarker adds an “invisible ink” mark to some of those coordinates so the owner can later prove a model is a copy. Here’s the simple version of how it works:

  1. Make a secret small map
  • The full embedding space is huge and uneven—like a galaxy.
  • The system shrinks it to a smaller, smoother secret map using a math tool called PCA (Principal Component Analysis). Think of it like compressing a 3D object into a 2D blueprint that still keeps the important shape.
  • Only the model owner knows this secret “projection” from big space to small space.
  1. Divide the map into secret regions
  • The small map is split into many regions (like slicing a pizza into many pieces) using random lines. This is done with a technique called LSH (Locality-Sensitive Hashing)—you can imagine it as using a few secret compass directions to decide which side of each line a point falls on.
  • The owner secretly picks some regions to be “trigger regions.”
  1. Put a gentle watermark in the chosen regions
  • When a user sends text, its embedding is projected onto the small map. If it lands in a trigger region, the system slightly mixes in a watermark before returning the embedding.
  • The watermark isn’t just a fixed noise; it’s the real embedding of some chosen “target” text. Different regions get different watermark embeddings. This makes it much harder for attackers to find and remove the watermark.
  1. Later: Check for stolen models
  • To test if a suspect model was copied, the owner compares how close certain outputs are to the watermark.
  • They measure closeness by:
    • Cosine similarity: how similar the “angle” between two vectors is (like checking if two arrows point in the same direction).
    • L2 distance: how far two points are apart.
  • If embeddings from the suspect model are consistently closer to the watermark in trigger regions than normal (non-trigger) regions, that’s strong evidence of copying.

Why this resists common attacks:

  • Paraphrasing attack (rewording text): Even if words change, the sentence’s meaning often stays similar, so its embedding still lands in the same region, and the trigger still fires.
  • Dimension attacks (shuffling or deleting embedding dimensions): The watermark is based on the overall embedding meaning and region, not on specific positions in the vector, so it still works.
  • CSE attacks (cluster-and-remove): Because regions are chosen secretly and watermarks differ by region, it’s harder for attackers to isolate and strip out the watermark.

What did they find? (Main results and why they matter)

The authors tested RegionMarker on several datasets (like SST-2 for sentiment, Enron emails for spam detection, and news datasets) and compared it with other watermarking methods (WARDEN, EspeW, WET). The key findings:

  • RegionMarker stayed detectable under all three major attack types (CSE, paraphrasing, and dimension-perturbation attacks), while other methods usually failed under at least one.
  • It kept task performance high (accuracy for normal tasks remained strong).
  • Ablation tests (turning features on/off) showed:
    • Using PCA (the “secret small map”) makes the defense harder to break.
    • Using multiple watermark embeddings (one per region) is much stronger than a single watermark for all regions.

Why this matters: It shows that a region-based, semantic approach can protect AI embedding services more reliably across different kinds of attacks, reducing the chance that thieves can copy and resell a model.

What could this change? (Implications and impact)

  • Stronger copyright protection for embedding services means companies can safely offer powerful models without as much fear of theft.
  • The method is practical: it keeps embeddings useful for real tasks while adding protection.
  • It could become a standard way to watermark AI services that return vectors instead of text.
  • The general idea—using secret low-dimensional regions and semantic triggers—might inspire new defenses in other AI settings.

A short recap

  • Problem: Embedding-as-a-Service can be copied by attackers.
  • Idea: Use secret regions in a smaller “semantic map” to trigger gentle, meaningful watermarks.
  • Strength: Survives rewording, dimension changes, and cluster-based removal.
  • Result: Works better across attacks than previous methods while keeping task quality high.
  • Impact: More trustworthy and secure AI services.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.