Prototype-based Dark Experience Replay
- The paper introduces a novel continual learning framework for smart grid fault prediction combining prototype-based regularization with dark experience distillation.
- It integrates multiple loss components and prototype-aware memory updates to mitigate catastrophic forgetting and maintain discriminative feature clusters.
- Experimental results show that ProDER outperforms DER++ by significantly improving accuracy and reducing memory footprints across diverse smart grid scenarios.
Prototype-based Dark Experience Replay (ProDER) is a continual learning (CL) framework introduced to address the challenges of fault prediction in evolving smart grid environments. By unifying prototype-based feature regularization, dark-experience (logit) distillation, and semantically informed memory management, ProDER improves knowledge retention and adaptability to new classes or domains without requiring task-specific heads or unbounded memory growth. Building on the DER++ baseline, it incorporates prototype-attraction/repulsion and prototype-aware sample replay to maintain both compact intra-class clustering and strong inter-class discrimination over time.
1. Problem Setting and Motivation
Smart grid fault prediction under continual learning presents two key obstacles: catastrophic forgetting—where model performance degrades on previously learned grid conditions as new fault types or grid zones are introduced—and representation drift, which erodes the separability and stability of internal feature representations. ProDER is proposed with the aim to:
- Retain predictive accuracy on old fault-type and fault-zone classes.
- Adapt efficiently to new classes and evolving operational zones.
- Maintain well-separated and compact feature clusters for each class in the learned embedding space.
This is achieved using a single evolving model and a fixed-size memory buffer, making the approach amenable to practical constraints of edge and substation deployment in smart grids (Efatinasab et al., 7 Nov 2025).
2. Algorithmic Structure
The ProDER framework extends DER++ by adding prototype-based regularization and prototype-guided memory organization. The core algorithm, executed for each (possibly explicitly segmented) continual learning round, consists of the following steps:
- Prototype Computation: For each class seen so far, recompute the prototype as the mean of embeddings for examples labeled from both the current task and the replay memory.
where denotes the feature extractor.
- Joint Training with Multiple Loss Terms: During each epoch, input minibatches comprise both new samples and replayed memory items. Four loss components are combined:
- Cross-entropy on new-task examples,
- KL-divergence for dark-experience (logit) distillation on memory samples,
- Prototype attraction (pulls features toward the class prototype),
- Prototype repulsion (pushes class prototypes apart).
- Prototype-aware Memory Update: After each task, for each class, select both core and boundary examples: a fraction of samples closest to the prototype (core/typical) and fraction of the most distant (boundary/atypical) to populate the memory buffer, replacing the oldest if necessary to maintain fixed capacity.
The following pseudocode organizes these steps precisely:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
initialize model parameters θ initialize empty replay memory 𝓜 = ∅ for t = 1 … T do receive new-task data 𝒟ₜ = {(xᵢ,yᵢ)} for class c: P[c] ← (1 / |𝒟ₜ∪𝓜|) Σ_{(xᵢ,yᵢ)=c} φ_θ(xᵢ) for epoch = 1 … E do for minibatch B from 𝒟ₜ ∪ sampled(𝓜) do # forward pass and loss computation # (see Section 3 for exact formulation) backpropagate ∇_θ L_total, update θ for each class c: candidates = {(xᵢ,yᵢ,z_i,f_i=φ_θ(xᵢ))}₍yᵢ=c₎ select ρ·K core + (1–ρ)·K boundary samples add these K samples to 𝓜 (evict oldest if needed) |
3. Objective Function and Loss Components
ProDER combines four loss terms, integrated into a single optimization objective for each minibatch: where:
- : Cross-entropy on the new-task samples,
- : KL divergence between current and stored logits (soft targets) for replayed samples,
- : Mean squared distance of each feature to its class prototype,
- : Repulsion term encouraging separation of class prototypes,
Hyperparameters are scenario-specific; e.g., and in different class- and domain-incremental tasks.
4. Feature Extractor, Memory Buffer, and Architecture
ProDER uses a unified recurrent neural architecture suitable for multivariate sensor time series:
- Input: time steps features.
- Feature extractor : Bidirectional GRU with 150 hidden units/direction ($300$-dim embedding per time step), followed by $0.3$ dropout.
- Classifier head : Dynamic fully connected layer , where expands as classes are introduced; prior class weights remain unchanged.
- Replay Memory : Stores tuples . Memory is capped (e.g., samples, of the dataset) and updated using a prototype-bias strategy to maintain both typical and boundary samples for each class.
This architecture avoids task-specific heads and maintains a fixed computational and memory profile, favoring scalability.
5. Continual Learning Scenarios and Evaluation Protocol
ProDER was evaluated under four distinct, realistic continual learning scenarios for evolving smart grids:
| Scenario | Task Sequence | Description |
|---|---|---|
| 1 | +2 fault types per task (tasks 1-5) | 3 classes initially, 2 new per task, total 11 |
| 2 | +1 fault type per task (tasks 1-9) | 3 classes initially, 1 new per task, total 11 |
| 3 | Domain-incremental (zones) | All 11 faults, new input distribution per task (new grid zone each task) |
| 4 | +1 grid-zone class per task (tasks 1-3) | 2 zones initially, add one per task, total 4 zone classes |
Performance is reported as final accuracy (ACC) on all seen classes after training on the last task, and the performance gap to a Joint-Training upper bound (single batch with full data access).
6. Experimental Results and Comparative Performance
ProDER consistently outperforms both experience replay (ER) and the dark experience replay advanced variant (DER++), closing up to half the gap to the joint-training upper bound across all scenarios. Notable results include:
| Scenario | Joint ACC/gap | ER ACC/gap | DER++ ACC/gap | ProDER ACC/gap |
|---|---|---|---|---|
| 1 (faults +2) | 0.658 / 0.000 | 0.459 / 0.199 | 0.558 / 0.100 | 0.613 / 0.045 |
| 2 (faults +1) | 0.658 / 0.000 | 0.462 / 0.196 | 0.519 / 0.139 | 0.576 / 0.082 |
| 3 (zones) | 0.658 / 0.000 | 0.524 / 0.134 | 0.524 / 0.134 | 0.610 / 0.048 |
| 4 (zone-incr.) | 0.981 / 0.000 | 0.917 / 0.064 | 0.956 / 0.025 | 0.966 / 0.015 |
A steady memory footprint ( MB) and independence from task-specific heads establish ProDER's scalability. The results demonstrate that the addition of prototype objectives yields more robust feature space structure and improved knowledge retention.
7. Implications, Limitations, and Future Directions
Implications:
ProDER offers a practical framework for continual smart-grid fault prediction deployable in real-world edge scenarios. Prototype-based regularization regularizes class-wise representations, supporting fault-class separability, and potentially improves explainability and anomaly detection in grid monitoring.
Limitations:
The approach presumes known task boundaries, which may not always be available in live operational settings. Replay memory requires storing authentic sensor windows, possibly conflicting with regulatory constraints.
Future Directions:
ProDER could be extended to handle unknown or fuzzy task transitions, embrace additional metadata (e.g., fault severity), or to support mixed event types in real-time deployments. Evaluating adaptation to blurred or undetected task changes remains an open research challenge (Efatinasab et al., 7 Nov 2025).
References:
- "ProDER: A Continual Learning Approach for Fault Prediction in Evolving Smart Grids" (Efatinasab et al., 7 Nov 2025).
- "Dark Experience for General Continual Learning: a Strong, Simple Baseline" (Buzzega et al., 2020).