On-Device Federated Continual Learning on RISC-V-based Ultra-Low-Power SoC for Intelligent Nano-Drone Swarms (2503.17436v2)

Published 21 Mar 2025 in cs.LG, cs.CV, and cs.MA

Abstract: RISC-V-based architectures are paving the way for efficient On-Device Learning (ODL) in smart edge devices. When applied across multiple nodes, ODL enables the creation of intelligent sensor networks that preserve data privacy. However, developing ODL-capable, battery-operated embedded platforms presents significant challenges due to constrained computational resources and limited device lifetime, besides intrinsic learning issues such as catastrophic forgetting. We face these challenges by proposing a regularization-based On-Device Federated Continual Learning algorithm tailored for multiple nano-drones performing face recognition tasks. We demonstrate our approach on a RISC-V-based 10-core ultra-low-power SoC, optimizing the ODL computational requirements. We improve the classification accuracy by 24% over naive fine-tuning, requiring 178 ms per local epoch and 10.5 s per global epoch, demonstrating the effectiveness of the architecture for this task.

Summary

On-Device Federated Continual Learning on RISC-V-based Ultra-Low-Power SoC for Intelligent Nano-Drone Swarms

The emergence of RISC-V-based architectures is facilitating efficient On-Device Learning (ODL) capabilities in smart edge devices, thereby empowering sensor networks that uphold data privacy. Addressing technical challenges in developing ODL for battery-operated embedded platforms, this paper introduces a regularization-based algorithm for On-Device Federated Continual Learning (ODFCL) tailored to nano-drones tasked with face recognition.

The authors propose a system that enhances On-Device Continual Learning (ODCL) through strategic regularization, combating issues such as catastrophic forgetting within federated learning frameworks. Notably, the approach demonstrated a 24% improvement in classification accuracy over conventional fine-tuning methods, with specific computational metrics measured per local and global learning epoch. The computations underpinning the algorithm are optimized to function efficiently on a RISC-V-based 10-core ultra-low-power System-on-Chip (SoC), demonstrating significant viability in extreme edge scenarios.

Key Contributions and Methodological Insights

Algorithm Design: The introduction of regularization-based ODCL is central to the presented methodology. Utilizing a Mean Output Loss (MOL) alongside FedProx algorithms augments the model's capability to learn incrementally across a network of nano-drones without reliance on extensive memory storages for historical data.
Implementation and Optimization: The GAP9Shield, featuring a multi-core RISC-V SoC, provides the computational foundation for the framework. Deploying an integerized backbone and extensive SIMD extensions, the implementation parallelizes DNN execution effectively across edge devices.
Performance Metrics: Significant performance metrics were obtained, measuring both computational latency and energy usage. These underscore the optimized balance achieved between operational efficiency and learning accuracy, qualifying the platform for real-time applications with stringent power constraints.

Practical and Theoretical Implications

The distributed learning approach posits substantial advancements for intelligent sensor networks, particularly within IoT frameworks demanding high privacy standards. The architectural flexibility afforded by the RISC-V platform further expands the potential for scalable, low-power AI applications on-the-fly. In practical terms, the methodology enables enhanced interaction of autonomous devices, like UAVs, in dynamically changing environments, offering a valuable progression towards autonomous learning systems.

Future Prospects

Enhancing model accuracy by leveraging additional on-chip resources is an avenue for continued development. Moreover, hybrid strategies, which amalgamate model regularization with latent representation storage, could further refine the balance between memory efficiency and the mitigation of catastrophic forgetting. Such advancements would bolster the creation of adaptable, intelligent systems with broader application scopes in AI-driven edge devices.

Conclusively, while challenges remain in optimizing ODFCL for wider application, the foundational insights and numerical evidence presented articulate a robust platform upon which next-generation sensor networks may evolve. The conclusions drawn within the paper substantiate RISC-V as a potent architecture capable of sustaining meaningful advancements in AI technology at the edge.

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Find Related Papers

Authors (4)

Tweets

https://twitter.com/pulp_platform/status/1904433060751286387

https://twitter.com/pulp_platform/status/1924529627545940334