SafeSora: Watermarking & Safety in Generative Models

Updated 19 February 2026

SafeSora is a framework family advancing safety, alignment, and copyright protection in generative models through innovations like invisible graphical watermarking.
It employs a hierarchical patch matching strategy and 3D wavelet transform-enhanced state space models to ensure robust, invisible embedding and reliable watermark recovery.
Empirical evaluations show superior PSNR, SSIM, and resistance to video distortions, establishing new benchmarks against traditional generative watermarking methods.

SafeSora refers to a family of frameworks, datasets, and techniques advancing the safety, alignment, and copyright protection of generative models—primarily in text-to-video synthesis and LLMs. Notably, three distinct research threads adopt the name or closely related aliases for: (1) watermarking generative video models (Su et al., 19 May 2025), (2) human preference-based safety alignment datasets for text-to-video models (Dai et al., 2024), and (3) safety-preserving low-rank adaptation of LLMs, also known as SaLoRA (Li et al., 3 Jan 2025). Each initiative introduces unique methodologies and measurable improvements to security, interpretability, or user-based alignment of generative machine learning models.

1. Invisible Graphical Watermarking in Text-to-Video Generation

SafeSora (Su et al., 19 May 2025) is the first framework enabling direct embedding of human-interpretable graphical watermarks—such as logos or icons—within the video generation pipeline. Unlike classical image watermarking, which does not generalize due to limited temporal redundancy and lack of spatiotemporal dependencies across frames, SafeSora exploits the high capacity of video models to distribute watermarks with enhanced robustness and invisibility. The principal innovation is embedding a graphical watermark during video generation such that:

The resulting artifact is visually imperceptible (high PSNR, low LPIPS).
The watermark is reliably extractable even after typical distortions (e.g., compression, cropping, noise).
The process leverages the video’s spatiotemporal structure for both robust embedding and accurate recovery.

The method utilizes a two-stage, coarse-to-fine patch routing mechanism, matching small patches of the target watermark image to video frames and spatial regions where visual content similarity is maximized. This similarity-driven allocation is shown to tightly correlate with both invisibility and extraction fidelity.

2. Hierarchical Patch Matching and Spatiotemporal Fusion

SafeSora introduces a hierarchical matching strategy:

Inter-frame matching: Each watermark patch $p_i$ is compared with latent representations $z_j$ of video frames. Features are extracted and a Softmax-over-dot-product similarity is computed: $w_{ij} = \text{Softmax}_j(f_{p_i} \cdot f_{z_j})$ . Each patch is allocated to the most similar frame.
Intra-frame localization: Within a selected frame, the spatial region $r_{j^*,k}$ maximizing $s_{ik} = \text{Softmax}_k(f_{p_i} \cdot f_{r_{j^*,k}})$ receives the patch.

These assignments are designed to maximize perceptual similarity, thereby improving PSNR and SSIM for both watermarked videos and extracted logos. The system then fuses the allocated watermark feature map into the video UNet backbone not via self-attention but through a state space model (SSM)-based Mamba module, extended to 3D with wavelet decomposition.

3. 3D Wavelet Transform-Enhanced Mamba State Space Model

The core of the SafeSora embedding process is the 3D wavelet transform-augmented Mamba state space model (SSM):

Mamba SSM Layer: For a sequence $x_1, ..., x_T$ : $h_t = \bar{A} h_{t-1} + \bar{B} x_t, \quad y_t = C h_t + D x_t$ , with $\bar{A},\bar{B},C,D$ optionally depending on input features (as implemented in Mamba).
2D/3D Wavelet Decomposition: Video features are decomposed into frequency bands (e.g., LLL...HHH for 3D), and spatial/temporal positions are scanned through a local, bidirectional order, maximizing the model’s ability to capture long-range dependencies at high efficiency (linear complexity in the sequence dimension).
Spatiotemporal Local Scanning (SLS): Patch features are scanned in both low-to-high and high-to-low frequency orderings for both forward and reverse contexts, crucial for robust watermark recovery.

This approach preserves watermark extraction accuracy under destructive transformations, achieving high PSNR and SSIM for the watermark even after video perturbation.

4. Evaluation Metrics and Comparative Results

SafeSora demonstrates substantial improvements over previous generative watermarking baselines:

Metric	SafeSora	Next Best (PUSNet)
Watermark PSNR ↑	37.71 dB	28.86 dB
Watermark SSIM ↑	0.97	-
Watermark LPIPS ↓	0.04	-
Video PSNR ↑	42.50 dB	-
Video SSIM ↑	0.98	-
Video LPIPS ↓	0.01	-
Temporal LPIPS ↓	0.38	0.98
FVD ↓	3.77	154.35

Under various real-world attacks (H.264 compression, Gaussian blur/noise, cropping, rotation), watermark PSNR remains ≈30 dB+ and SSIM > 0.9, consistently surpassing baselines. Qualitative analysis confirms invisibility in the rendered video and faithful recovery of the graphical watermark.

5. State Space Model Integration for Watermarking

SafeSora is the first watermarking framework to embed an SSM (Mamba) directly into both the video generation and extraction processes:

SSMs provide parameter-efficient, linear-complexity feature fusion across long frame sequences, mitigating overfitting and improving scalability.
Both forward and reversed frequency scans are incorporated to maximize extraction robustness.

This use of SSMs enables advanced, temporally consistent watermarking not achievable with self-attention-only architectures.

6. Limitations and Future Directions

Current SafeSora implementations only support static logos as watermarks. Potential research avenues include:

Embedding dynamic or time-varying graphical watermarks (e.g., animated QR codes).
Replacing fixed wavelet transforms with learnable, data-adaptive match-norms.
Incorporating adversarial losses to further improve invisibility.
Hardware-efficient SSM variants facilitating scaling to longer videos (>8 frames).
Hybrid watermarking regimes supporting both patch-based images and encoded bitstrings for multi-key authentication.

7. Relationship to Other SafeSora Paradigms and Broader Impact

The SafeSora designation, in the broader literature, also denotes (a) a large-scale text-to-video human preference dataset for alignment and moderation (Dai et al., 2024), and (b) a framework for safety-preserving low-rank adaptation in LLM fine-tuning (Li et al., 3 Jan 2025). While the watermarking SafeSora (Su et al., 19 May 2025) addresses copyright and authenticity in generative video, the other paradigms emphasize safety alignment (e.g., preference-based evaluation, preservation of safety features in parameter-efficient adaptation) and contribute to safer, more reliable generative modeling across modalities.

SafeSora’s spatiotemporal graphical watermarking pipeline sets a new standard for copyright protection in generative video, exploiting domain-specific redundancies and state space model efficiencies for robust, invisible, and recoverable watermark embedding. This framework, and the related family of alignment and safety tools, furnish essential infrastructure for trustworthy deployment of generative models in both research and production contexts (Su et al., 19 May 2025, Dai et al., 2024, Li et al., 3 Jan 2025).