Self-Supervised Model Seeding (SMS)
- SMS is a verification paradigm that embeds user-specific secret seeds into genuine training samples, creating a cryptographic link between data and model behavior.
- It employs a dual-head architecture with self-supervised latent space reconstruction, ensuring that the absence of the seed confirms the unlearning of the data.
- Empirical evaluations on MNIST, CIFAR-10, and CelebA show that SMS improves model accuracy and verifies data removal through complete seed disappearance after unlearning.
Self-supervised Model Seeding (SMS) is a verification paradigm for machine unlearning that enables users to certify the removal of their genuine data from machine learning models. Distinct from conventional backdoor-based approaches, SMS leverages the embedding of user-specific secret seeds into legitimate training samples, subsequently coupling these samples and their seeds with the trained model through self-supervised latent space reconstruction. Post-unlearning, the inability to recover the corresponding seed establishes evidence that the sample has been forgotten. SMS thus creates a cryptographic linkage between users, real samples, and models, providing a robust mechanism for unlearning verification in both exact and approximate settings (Wang et al., 30 Sep 2025).
1. Motivation and Conceptual Framework
Existing machine unlearning verification has relied predominantly on backdooring, where additional, trivial samples are inserted into the training set to detect their influence on model behavior. However, such methods fundamentally fail to establish guarantees about the erasure of authentic user data because backdoored samples are not causally or semantically related to genuine data. SMS addresses this limitation by binding a unique, secret seed (such as a user index) to each true user example. This seed is invisibly embedded within the sample and becomes recoverable by the model through a self-supervised reconstruction process, creating a robust, testable link between genuine data, its seed, and the trained model. Thus, seed disappearance after unlearning serves as verifiable evidence of forgetting the associated genuine sample (Wang et al., 30 Sep 2025).
2. Formal Problem Setting and Definitions
Let denote the overall training dataset of labeled pairs. For each user, a secret seed is chosen from a large seed space . The relevant elements are defined as follows:
- Seeding function: For invisibly embedding the seed, , where (e.g., a stego/patched embedding) maps the seed into the sample.
- Service model: , parameterized by , performing the main ML task (e.g., classification).
- Seed-reconstruction head: (e.g., an auto-encoder), trained to reconstruct the original, seed-embedded input.
The model parameters are optimized by minimizing the following total loss: 0 where 1 balances primary task utility against the seeding objective. The primary task loss 2 is standard cross-entropy, and the seeding loss 3 is the 4 auto-encoder reconstruction error. While more sophisticated losses (e.g., InfoNCE) are conceivable, SMS uses MSE for the embedding (Wang et al., 30 Sep 2025).
3. Model Architecture and Training Methodology
SMS employs a shared backbone architecture, such as ResNet-18, with dual heads: one multilayer perceptron (MLP) for the primary service task, and another MLP decoder for seed-based self-supervised reconstruction. Seeds 5 are only present as embedded features within 6 and are never disclosed to the model server.
The high-level training pipeline is as follows:
- For each user 7, the seed 8 is embedded into their sample using an embedding mask 9: 0.
- The collection 1 is uploaded to the server, which may be untrusted post-training.
- The model is trained with mini-batch SGD to minimize the aforementioned joint loss, updating 2 via backpropagation until convergence.
- The result is a seed-embedded model 3 incorporating both primary-task performance and seed recovery (Wang et al., 30 Sep 2025).
4. Verification Protocol
Upon completion of model training, the user constructs a lightweight verification classifier 4, typically a 5-layer MLP, trained to distinguish the presence or absence of seed 5 from the auto-encoder reconstructions. An auxiliary dataset of positive pairs 6 and negative pairs 7 is curated for training 8.
To verify unlearning:
- The model 9 (after claimed data removal) is queried on 0, producing a reconstruction 1.
- 2 evaluates 3; if the classifier's confidence that 4 is absent exceeds a predefined threshold (e.g., 5), this constitutes strong evidence that the sample has been unlearned.
- This verification operates robustly in both exact (full retraining) and approximate unlearning settings, unlike backdoor-based verification, which cannot link to genuine user samples (Wang et al., 30 Sep 2025).
5. Empirical Evaluation and Quantitative Results
SMS has been empirically evaluated on MNIST (10-way classification), CIFAR-10 (10-way classification), and CelebA (binary gender). The shared backbone uses ResNet-18, and both (primary-task and seed decoder) heads are 5-layer MLPs. Experiments are conducted using a seed sample ratio (SSR) of 0.6%. Key measured metrics include:
- Model Accuracy (on held-out test data)
- Verifiability (probability that 6 before unlearning)
- Unambiguity (probability that 7 for incorrect seeds)
- Per-epoch runtime
Tabulated results:
| Method | Model Acc.↑ | Verifiability↑ | Unambiguity↑ | Runtime (s) |
|---|---|---|---|---|
| Non-Verif. | 99.39% | – | – | 2087 |
| MIB | 99.31% | 100.0% | 100.0% | 2593 |
| SMS | 99.46% | 100.0% | 100.0% | 2873 |
SMS achieves marginal improvements in model accuracy over baseline and backdoor approaches due to richer representations. After unlearning (either by retraining, SISA, or verification-based unlearning), only SMS's verifiability drops to zero, confirming that the seed—hence, the user's sample—has been removed. Backdoor approaches (MIB) cannot distinguish between genuine and synthetic (backdoored) data under approximate unlearning (Wang et al., 30 Sep 2025).
6. Trade-Offs, Limitations, and Extension Directions
Application of SMS introduces an approximate 10–20% increase in training time, primarily due to the additional reconstruction head. This computational overhead is typically acceptable for standalone deployments but may be prohibitive for continual learning environments. Sharing early layers between the primary and reconstruction heads has been proposed as an avenue to reduce training costs.
Regarding robustness, an adversarial server may attempt to delete seed information while preserving the primary signal. Enhancements such as more resilient steganographic or texture-based embeddings, and adversarial pruning of seed subspaces, are promising defenses. For future development, incorporating contrastive seeding objectives, mutual information regularization, or implicit seeding with feature-bottleneck classifiers could yield increased secrecy, efficiency, or stealth (Wang et al., 30 Sep 2025).
7. Significance and Distinctiveness
SMS is the first published end-to-end method that enables users to verify the unlearning of genuine (not synthetic or backdoored) samples in both exact and approximate cascades. By fusing secrets, genuine data, and model states via self-supervised reconstruction, SMS overcomes longstanding limitations of backdoor-based verification—specifically, the inability to cryptographically link model behavior to authentic user contributions. This capability has direct implications for MLaaS environments required to enforce scrupulous data deletion, especially under regulatory frameworks that demand rigorous, user-verifiable evidence of unlearning actions (Wang et al., 30 Sep 2025).