SISA++: Robust Machine Unlearning Framework
- SISA++ is an advanced machine unlearning framework that integrates differential privacy with noise injection during shard training and prediction aggregation to mitigate adaptive deletion attacks.
- It implements fairness-aware partitioning and auditing mechanisms, such as TruVRF, to ensure equitable data removal and compliance with privacy regulations.
- The framework balances computational efficiency with robust deletion guarantees, making it suitable for dynamically purging data in privacy-sensitive applications.
SISA++ is the designation for an improved machine unlearning framework that builds upon the Sharded, Isolated, Sliced, and Aggregated (SISA) paradigm, with enhancements informed by the vulnerabilities to adaptive deletion attacks, differential privacy principles, fairness preservation, and exact verifiability. SISA++ aims to ensure robust deletion guarantees, fairness-aware data removal, and externally auditable compliance, surpassing the limitations of the original SISA when deletion requests are adaptively chosen or when strong privacy and fairness constraints are required.
1. Limitations of SISA under Adaptive Deletion
The SISA framework partitions the training dataset into disjoint shards, each of which is modeled independently. When a deletion request for is received, only the affected shards are retrained, resulting in substantial computational savings compared to naïve retraining. However, SISA’s theoretical guarantees—that is, that the distribution of models after deletion is statistically indistinguishable from full retraining—hold strictly in the non-adaptive setting, where deletion requests are determined independently of model outputs.
The adaptive threat model emerges when deletion requests depend on previously published model outputs. For example, an adversary may observe classification outcomes and selectively delete points that are “split” among shards, thereby exploiting information leakage about the partitioning randomness. This results in substantial degradation of unlearning guarantees: as established, for adaptive deletion sequences, SISA can fail to satisfy any nontrivial -unlearning guarantee, introducing statistical bias in model outputs relative to retraining (Gupta et al., 2021).
A formal unlearning guarantee requires, for any event in model space,
with probability at least over the update sequence. Under adaptive deletion, this fails unless additional mechanism is introduced.
2. Differential Privacy Reduction for Adaptive Robustness
SISA++ is motivated by a general reduction in (Gupta et al., 2021) leveraging differential privacy (DP) to overcome SISA’s shortcomings under adaptive deletion. The key insight is to inject randomness (noise) during model publishing or aggregation such that the published outputs become -DP in the internal randomness . By DP’s post-processing property and its connection to max-information, adversarial deletion requests can only glean limited information about from published outputs.
Applying the reduction yields degraded, but still strong, adaptive guarantees: where
with the number of independent random seeds (shards).
This DP-based reduction ensures that even under adaptive deletion requests, the statistical distance between the “unlearned” and fully retrained model remains tightly bounded.
3. SISA++ Algorithmic Enhancements
SISA++ incorporates differential privacy at two critical junctures:
- Shard Model Training: When training each shard, noise is added using mechanisms such as DP-SGD to the parameters or gradients, ensuring model outputs from shards satisfy -DP in .
- Prediction Aggregation: When publishing ensemble predictions (e.g., majority vote), SISA++ employs private aggregation (such as the exponential mechanism). This prevents adversarial probing of shard partitioning using overconfident outputs, mitigating leakage of randomness.
Thus, SISA++ retains SISA’s fundamental workflow (sharding, isolation, slicing, aggregation), but interposes DP noise either at model training or prediction aggregation, achieving the adaptive guarantee: Empirically, (Gupta et al., 2021) confirms that modest DP noise, well below typical regimes, suffices to neutralize practical adaptive attacks without significant utility loss.
4. Fairness Properties in SISA++-Style Unlearning
When deletion is non-uniform—i.e., specific groups are disproportionately targeted—the structure of SISA and its successors is important for fairness. Experimental studies (Zhang et al., 2023) show that SISA yields improved fairness scores relative to naive retraining (ORTR) and AmnesiacML under non-uniform deletion, attributable to isolation of group effects via partitioning.
This suggests that SISA++, when appropriately designed, will inherit and may further enhance fairness. Designers can implement fairness-aware partitioning or differential weighting in shards to directly account for protected attributes, and runtime fairness audits can be monitored to ensure non-degradation.
A plausible implication is that SISA++ could optimize for both privacy and fairness, particularly when fairness constraints are explicitly embedded in partitioning or retraining procedures.
5. Verifiability and Auditing of SISA++ Deletion
SISA and its enhanced forms, while efficient, do not by themselves allow external verification of actual data removal. TruVRF (Zhou et al., 12 Aug 2024) provides a triple-granularity verification framework—at the class, volume, and sample levels—that can be plugged directly into SISA++.
TruVRF leverages model sensitivity: where and are original and unlearned model parameters. It supports three key verifications:
- Class Verification: Detects whether the target class’s influence is removed (Metric-I, 92% accuracy).
- Volume Verification: Infers if the correct number of samples was removed (Metric-II, deviation 6.1%).
- Sample Verification: Distinguishes true deletion from deception (Metric-III, 85–90% accuracy).
This framework detects neglecting, lazy, and deceiving servers, and ensures external parties can reliably audit SISA++ outputs for compliance with regulatory mandates (e.g., GDPR).
6. Computational Trade-offs and Practical Integration
SISA++'s computational cost—stemming from DP noise addition, partial retraining, and auditing—increases with dataset size and deletion percentage (Dilworth, 15 Feb 2025). While consistency (change in model outputs) remains negligibly low with exact unlearning (), the resource overhead may restrict SISA++ deployment in latency-sensitive applications.
SISA++ can be incorporated into Positive Unlabeled Learning (PUMU frameworks), supporting privacy compliance in partially labeled domains and dynamically purging erroneous or outdated data.
Practical deployment demands mindful tuning of DP parameters, shard/slice partitioning, and fairness integration, alongside the auditing mechanisms.
7. Future Directions and Ethical Implications
SISA++ embodies an overview of adaptive robustness, fairness sensitivity, and verifiable compliance. Prospective enhancements include hierarchical isolation strategies, integration with influence-function or sample reweighting methods, and runtime monitoring.
Ethically, SISA++ supports both privacy rights and fairness mandates, fostering trust and transparency, but trade-offs in computational cost must be continuously evaluated to ensure that compliance does not hinder real-time application or resource efficiency.
In summary, SISA++ is representative of the current state-of-the-art in robust, fair, and auditable machine unlearning, achieved through DP-augmented SISA workflows and supported by principled verification tools. Its design is directly responsive to adaptivity, group fairness, and practical auditability as articulated in recent literature.