From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection

Published 18 Dec 2025 in cs.CR and cs.CL | (2512.16439v1)

Abstract: Benefiting from the superior capabilities of LLMs in natural language understanding and generation, Embeddings-as-a-Service (EaaS) has emerged as a successful commercial paradigm on the web platform. However, prior studies have revealed that EaaS is vulnerable to imitation attacks. Existing methods protect the intellectual property of EaaS through watermarking techniques, but they all ignore the most important properties of embedding: semantics, resulting in limited harmlessness and stealthiness. To this end, we propose SemMark, a novel semantic-based watermarking paradigm for EaaS copyright protection. SemMark employs locality-sensitive hashing to partition the semantic space and inject semantic-aware watermarks into specific regions, ensuring that the watermark signals remain imperceptible and diverse. In addition, we introduce the adaptive watermark weight mechanism based on the local outlier factor to preserve the original embedding distribution. Furthermore, we propose Detect-Sampling and Dimensionality-Reduction attacks and construct four scenarios to evaluate the watermarking method. Extensive experiments are conducted on four popular NLP datasets, and SemMark achieves superior verifiability, diversity, stealthiness, and harmlessness.

Abstract PDF Chat (Pro)

Summary

The paper introduces SemMark, a semantic-aware watermarking scheme leveraging adaptive, region-specific injection using LSH and LOF for robust EaaS copyright protection.
It maintains high task accuracy (approximately 93–95%) and embedding similarity (over 94%), ensuring minimal perturbation to downstream applications.
The adaptive weighting and neural mapping techniques enable stealth and resilience against extraction and imitation attacks while keeping computational overhead low.

Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection

Motivation and Background

Embedding-as-a-Service (EaaS) exposes text embeddings from proprietary LLMs via APIs, powering a range of NLP applications. However, EaaS is demonstrably susceptible to model extraction and imitation attacks, which allow adversaries to cheaply duplicate the service and infringe intellectual property. Existing watermarking defenses primarily fall into trigger-based and linear transformation-based methods; both inject signals largely disconnected from embedding semantics, sacrificing diversity, stealth, and sometimes utility. In "From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection" (2512.16439), a new semantic-based watermarking paradigm, SemMark, is introduced to address these limitations using adaptive, region-specific watermark injection.

Semantic-aware Watermarking Paradigm

The core advance of SemMark is the direct exploitation of semantic structure in embedding spaces using locality-sensitive hashing (LSH) to partition the semantic manifold. Watermark signals are generated dynamically by a neural mapping model conditioned on the embedding itself, and injected into selected semantic regions with adaptive weights computed using local outlier factor (LOF) density estimation. This design achieves region-specific watermarking, aligning injected signals with local semantic distributions and thereby maximizing stealthiness and diversity while minimizing perturbation.

Figure 1: Comparison of the SemMark semantic-based watermark paradigm with trigger-based and linear transformation-based approaches, highlighting integration with semantic regions versus global or trigger-based watermarking.

Figure 2: SemMark watermarking framework, showing region-based signal injection and adaptive weighting, with clear distinction of watermark and non-watermark regions.

Technical Architecture

Semantic Region Partitioning

Embeddings are reduced via PCA to mitigate the curse of dimensionality and then partitioned using cosine-preserving LSH. Each region in the space corresponds to a discrete hash bucket in $\{0, 1\}^c$ , where $c$ is the number of hyperplanes. Watermark regions are randomly sampled for injection, with proportion $\alpha$ controlling coverage.

Watermark Signal Generation

A residual feed-forward neural mapping $\mathcal{M}$ transforms the original embedding $e_o$ into a semantic-conditioned watermark $e_w$ . Two losses enforce:

Semantic Consistency: Similar embeddings yield highly correlated watermarks, ensuring region robustness.
Semantic Similarity: Watermarks remain close (controlled cosine similarity $\eta$ ) to their source embedding, preserving downstream utility.

Adaptive Weighting

Instead of fixed strength, weights $u_o$ for watermark injection are adaptively assigned based on local density: outlier embeddings (low local density by LOF) receive higher watermark strength, maximizing both stealth and negligible global distortion.

Watermark Verification

Verification compares standard texts (not crafted triggers) from both watermark and non-watermark regions using cosine similarity and $L_2$ metrics between suspicious model outputs and mapped watermarks. KS-tests provide statistical evidence for distinguishing watermarked models.

Robustness and Attack Analysis

The authors introduce two new adversarial scenarios:

Detect-Sampling Attack: Adversary detects and blocks abnormal verification queries. SemMark is designed to verify using only normal texts, making watermark queries indistinguishable from benign user queries—a strong stealth property.
Dimensionality-Reduction Attack: Adversary alters output dimensions to disrupt watermark distribution and signal alignment. SemMark’s region-specific mapping and verification mitigate signal loss by enabling dimension-aligned transformation.
Figure 3: Details of attack processes in embedding stealing and watermark verification, illustrating adversarial model extraction and stealth verification strategies.

Empirical Evaluation

SemMark is evaluated against baseline methods (EmbMarker, WARDEN, EspeW, WET) under diverse attack scenarios across SST2, MIND, AGNews, and Enron Spam datasets. It consistently achieves:

Task Accuracy: Watermarked embeddings retain nearly original classification accuracy, confirming utility preservation ( $\sim$ 93–95% across datasets).
Similarity: Cosine similarity with original embedding remains high ( $>$ 94%), outperforming linear transformation-based watermarks by wide margins.
Watermark Verifiability and Stealth: KS-test p-values are $<10^{-10}$ in region-based detection. SemMark is resilient to CSE, Detect-Sampling, and Dim attacks, where all baselines show degraded or undetectable signals in at least one scenario.

The effect of critical hyperparameters ( $\alpha$ , $\delta$ , $c$ ) is systematically analyzed.

Figure 4: Effects of watermark proportion $\alpha$ on SST2 task, similarity, and detection performance, showing minimal utility loss and strong verifiability in all regimes.

Figure 5: Effects of watermark strength $\delta$ on several metrics, revealing a monotonic improvement in verification metrics with tolerable impact on similarity due to adaptive weighting.

Further, SemMark’s mapping ensures verifiability even under strong adversarial conditions, and the adaptive weighting mechanism meaningfully improves the closeness between watermarked and original embeddings across all benchmarks.

Figure 6: Cosine similarity analysis between SemMark watermarks and originals, indicating extremely high semantic preservation.

Figure 7: Square $L_2$ distance analysis, showing minimal embedding perturbation due to adaptive weighting.

Computational Efficiency

Watermark injection constitutes less than 1–6% of total encoding time (mean $\sim0.0005$ s per sample). Scalability is demonstrated across multiple high-dimensional embedding platforms, showing negligible additional cost, which renders the method suitable for production settings.

Implications and Future Directions

SemMark’s paradigm leverages intrinsic semantic structure—partitioning by LSH and region-specific mapping—to achieve highly adaptive, stealthy, and task-preserving watermarking. The avoidance of triggers and global transformations means watermarks cannot be removed via simple clustering, filtering, paraphrasing, or dimension manipulation. This approach opens substantial opportunities for copyright protection in commercial EaaS deployments, with implications for secure model hosting, ownership verification in adversarial environments, and forensic identification of stolen models.

Potential future work includes extending semantic-aware watermarking to multimodal embedding spaces (vision-language), broader attack resilience studies (e.g., continuous adversarial adaptation), and further refinement of region selection using dynamic semantic graphs. The paradigm also offers a template for watermarking in other foundation model APIs where outputs serve downstream services without user awareness.

Conclusion

"From Essence to Defense: Adaptive Semantic-aware Watermarking for Embedding-as-a-Service Copyright Protection" (2512.16439) presents a significant shift toward region-specific, semantic-aligned watermarking for embedding copyright protection. The combination of adaptive signal mapping and stealth verification provides robust resistance to state-of-the-art attacks while preserving downstream embedding utility and computational viability. This work represents an important advancement in practical IP protection for deployed LLM services, offering broad applicability and strong resilience characteristics.