Steering a Predator Robot using a Mixed Frame/Event-Driven Convolutional Neural Network (1606.09433v1)

Published 30 Jun 2016 in cs.RO and cs.CV

Abstract: This paper describes the application of a Convolutional Neural Network (CNN) in the context of a predator/prey scenario. The CNN is trained and run on data from a Dynamic and Active Pixel Sensor (DAVIS) mounted on a Summit XL robot (the predator), which follows another one (the prey). The CNN is driven by both conventional image frames and dynamic vision sensor "frames" that consist of a constant number of DAVIS ON and OFF events. The network is thus "data driven" at a sample rate proportional to the scene activity, so the effective sample rate varies from 15 Hz to 240 Hz depending on the robot speeds. The network generates four outputs: steer right, left, center and non-visible. After off-line training on labeled data, the network is imported on the on-board Summit XL robot which runs jAER and receives steering directions in real time. Successful results on closed-loop trials, with accuracies up to 87% or 92% (depending on evaluation criteria) are reported. Although the proposed approach discards the precise DAVIS event timing, it offers the significant advantage of compatibility with conventional deep learning technology without giving up the advantage of data-driven computing.

Citations (114)

View on Semantic Scholar

Summary

The paper proposes a novel mixed frame/event-driven CNN to guide a predator robot by leveraging neuromorphic sensor data.
It integrates DVS and APS inputs, achieving high test accuracies (87%-92%) across over 500,000 augmented frames.
The lightweight network, with only around 10,000 parameters, enables efficient real-time processing on low-power platforms.

Overview of Steering a Predator Robot using a Mixed Frame/Event-Driven Convolutional Neural Network

The research manuscript, "Steering a Predator Robot using a Mixed Frame/Event-Driven Convolutional Neural Network," presents an innovative approach to robotic navigation in a predator/prey scenario utilizing Convolutional Neural Networks (CNNs) alongside Dynamic Vision Sensor (DVS) and Active Pixel Sensor (APS) technologies. This integration is particularly aimed at achieving robust and efficient real-time navigation, leveraging data-driven processing with a neuromorphic camera system.

Research Context and Methodology

The authors employ the DAVIS sensor, a neuromorphic camera that outputs both DVS events and APS frames. This dual-input mechanism allows the CNN to process information samples proportional to scene activity, facilitating a large range of sample rates from 15 Hz to 240 Hz. The robotic platform consists of a Summit XL mobile robot tasked to steer towards a tele-operated prey robot within a controlled arena environment, utilizing the CNN's predictions to guide movement.

Through rigorous trials, the network is trained to discern the prey's location within one of three distinct regions of the visual field: Left, Center, and Right, or label it as Non-visible (LCRN). Data for training the network was meticulously curated, encompassing over 500,000 augmented frames, ensuring robustness across varying environmental conditions and orientations.

Network Architecture and Performance

The implemented CNN design achieves a streamlined yet effective architecture with only around 10,000 parameters, minimizing computational expenses while maintaining satisfactory performance levels. The final runtime network is designated as a 4C5-R-2S-4C5-R-2S-40F-R-4F structure, combining convolutional and fully connected layers to provide multi-scale feature detection.

In terms of accuracy, the network reported test set accuracies ranging between 87% to 92%, depending on the evaluation criteria. This performance demonstrates not only the network’s capability to identify and track the prey but also to accurately assess 'non-visible' states, where absence detection requires identifying a complex background scene devoid of specific target features.

Practical Implications and Theoretical Insights

From a practical standpoint, the research delineates the potential for integrating event-driven neuromorphic sensors with conventional CNN architectures in autonomous robotic systems. By adapting processing rates to sensory input activity, this approach promises operational efficiency, particularly for low-powered computational platforms.

Theoretically, the work underscores the viability of small-scale CNN configurations in real-time applications, where a mere 350,000 operations per forward pass imply resource-effective processing. Furthermore, the paper signals significant avenues for future enhancements, such as incorporating prey size estimation for distance metrics or deploying adaptive APS frame capture strategies contingent on DVS event rates.

Speculations on Future Developments

This fusion of DVS-driven data efficiency and deep learning's robust image interpretation capabilities is likely to spur further exploration into real-time sensory processing for autonomous navigation. Potential applications include expanded multisensory integration for dynamic environments, progressing from constrained laboratory conditions to more unpredictable real-world scenarios.

In summary, this paper contributes substantial evidence towards the potential benefits of leveraging mixed frame/event-driven paradigms in robotic vision systems, with promising implications for both control algorithm optimization and future AI-driven automatization efforts.

PDF Markdown

Related Papers

YouTube

Show All Videos