Resetting to Teacher Recovery (ReTRy)
- ReTRy is a paradigm that uses teacher-student models to perform either knowledge erasure via stochastic resets or safe, recoverable learning through adaptive reconstruction.
- In machine unlearning, the protocol efficiently replaces targeted knowledge with random outputs and restores accuracy on retained data using a two-epoch procedure with distillation.
- For reinforcement learning, ReTRy employs adaptive resets and critical-state corrections by a privileged teacher to ensure safe exploration and robust policy improvement.
Resetting to Teacher Recovery (ReTRy) refers to a family of algorithms that leverage “teacher” models—either randomly initialized or privileged experts—to guide the structured reset, erasure, or recoverable training of student models. The ReTRy paradigm has been instantiated for both machine unlearning in supervised learning (Zhang et al., 2023) and safe knowledge transfer under partial observability in reinforcement learning (Kim et al., 14 May 2025). Common to these approaches is the exploitation of controlled teacher-student dynamics to navigate either the process of unlearning (via stochastic forgetting) or the process of recoverable learning from privileged experts via adaptive resets.
1. Core Principles and Definitions
ReTRy operates by reconciling the standard teacher-student paradigm with the need to either delete specific knowledge (“unlearning”) or align learning under impaired observability with recoverability and safety. The core insight is to define a teacher-driven reset/recovery protocol: in supervised settings, this involves “resetting” outputs on deleted data to a stochastic baseline; in sequential decision making, this entails initializing episodes in recoverable states and correcting the student only when critical divergence occurs. The teacher may be a randomly initialized copy of the model (for unlearning), or a privileged policy in an MDP with full state access (for RL/POMDP transfer).
2. Methodology for Machine Unlearning
In machine unlearning (Zhang et al., 2023), ReTRy targets removal of the influence of a data subset from a trained model . The approach consists of two main one-epoch stages:
- Knowledge-Erasure (Erase): (student) is initialized as a copy of . For each mini-batch , is trained to match the output distribution from a randomly initialized stochastic teacher , using softened softmax with temperature . The erasure objective per example is
This ensures model outputs on 0 revert to random/uninformed predictions.
- Model-Reconstruction (Recover): The erased model 1 is then fine-tuned for a single epoch on the retained data 2, using a loss that combines standard cross-entropy with a distillation regularizer toward 3:
4
where 5 is supervised cross-entropy and 6 is the KL divergence between softened distributions of 7 and 8.
The ReTRy protocol achieves knowledge erasure and recovery within only two epochs, compared to the tens of epochs required for full retraining.
3. Methodology for Recoverable Teacher-Student RL
In policy distillation under privileged information (Kim et al., 14 May 2025), ReTRy addresses partial observability by structuring student training around “recoverable” states and querying a privileged teacher policy only when the student is likely to diverge irrecoverably. The two central components are:
- CritiQ (Imitation): The student collects corrections from the teacher only at “critical” observations, defined via the gap between the student’s estimated 9-value and the teacher’s. Correction data is aggregated and used for imitation updates:
0
- ReTRy RL (Adaptive Reset): Training always starts from a samplable set of “recovery” states, i.e., states that can be returned to a safe path via the teacher. This reset set is iteratively updated: after student rollouts, the teacher is rolled out from states visited by the student to expand the set of recoverable starts. Policy gradients are optimized only on trajectories originating from these states:
1
This approach ensures the student remains within the recoverable region of the state space, facilitating efficient and safe exploration.
4. Algorithmic Details and Pseudocode
The unlearning variant of ReTRy is formally specified by alternating two epoch-length loops—first over 2, matching stochastic teacher outputs, then over 3, reconstructing with distillation. See the summarized pseudocode below (Zhang et al., 2023):
8
The RL variant uses iterative alternation between student rollouts (from teacher recovery states) and teacher rollouts (from student-visited states), updating both the exploration distribution and the student’s policy:
9
5. Empirical Evaluation and Results
Two empirical regimes have been established:
- Unlearning (Zhang et al., 2023):
- Datasets: CIFAR-10, MNIST, Fashion-MNIST.
- Forgetting fractions: 10% (removing one class), 20% (removing two classes).
- After erasure, forgotten-class accuracy drops to ~8–11% (almost random on CIFAR-10) while accuracy on retained classes is only slightly degraded. Post-reconstruction, retained-class accuracy recovers or surpasses retrained-from-scratch baselines, e.g., 80.1% for ReTRy vs. 78.3% for baseline retraining on CIFAR-10 after 10% removal.
- Total epochs: 2 (vs. 10–20 for full retraining).
| Dataset | Original | Retrain (epochs) | ReTRy (epochs) | |-----------------|----------|------------------|----------------| | CIFAR-10 (10%) | 76.67% | 78.28% (20) | 80.11% (1) | | MNIST (10%) | 99.25% | 99.19% (10) | 99.21% (1) | | Fashion-MNIST | 91.98% | 92.88% (20) | 94.13% (1) |
- Privileged RL (Kim et al., 14 May 2025):
- Tasks: simulation environments (Drawer, Block Push, Navigation), real robot drawer manipulation.
- ReTRy achieves near-100% success in all simulations, while BC and DAgger attain near-zero on partially observed tasks. Pure RL achieves 0–65% success depending on environment.
- On real robot, ReTRy transfers with 100% success; CritiQ alone achieves 66%.
- Sample efficiency increases versus SAC.
6. Connections, Extensions, and Sensitivities
The machine unlearning ReTRy model uses a stochastic teacher for “forgetting” in a manner reminiscent of randomized label smoothing, but focuses on efficient single-pass erasure and accurate recovery. The RL ReTRy approach formalizes recoverable learning by controlling the density-ratio constant 4 in the performance difference bound:
5
By adaptively expanding the sample distribution to teacher-recoverable regions, ReTRy ensures both statistical and practical feasibility of safe student learning.
Sensitivity analyses for 6, 7, and optimizer hyperparameters are not exhaustively reported in the unlearning context, nor are formal guarantees given; all results are empirical. A plausible implication is that practitioner-driven hyperparameter sweeps may be beneficial for application-specific tuning. Neither variant addresses more than one epoch per stage; extension to multi-epoch or ensemble stochastic teachers remains unstudied in primary references.
7. Practical Considerations and Limitations
Both ReTRy variants assume access to a suitable teacher: a full-state privileged policy for RL, or a freshly randomized model for unlearning. Full knowledge of the privileged teacher is infeasible in closed domains. In the unlearning context, model architectures must admit random initialization matching the dirty model. The optimality and transferability of teacher-selected recoverable regions or stochastic outputs is not ensured, as no theoretical convergence or generalization guarantees are formally established. The framework is evaluated empirically on moderately sized datasets and standard robot benchmarks; extension to larger, more diverse settings is an open area. In summary, ReTRy provides a robust, empirically validated reset mechanism for both knowledge erasure and efficient recoverable learning under teacher supervision (Zhang et al., 2023, Kim et al., 14 May 2025).