From Detection to Action Recognition: An Edge-Based Pipeline for Robot Human Perception (2312.03477v1)

Published 6 Dec 2023 in cs.RO, cs.CV, and cs.LG

Abstract: Mobile service robots are proving to be increasingly effective in a range of applications, such as healthcare, monitoring Activities of Daily Living (ADL), and facilitating Ambient Assisted Living (AAL). These robots heavily rely on Human Action Recognition (HAR) to interpret human actions and intentions. However, for HAR to function effectively on service robots, it requires prior knowledge of human presence (human detection) and identification of individuals to monitor (human tracking). In this work, we propose an end-to-end pipeline that encompasses the entire process, starting from human detection and tracking, leading to action recognition. The pipeline is designed to operate in near real-time while ensuring all stages of processing are performed on the edge, reducing the need for centralised computation. To identify the most suitable models for our mobile robot, we conducted a series of experiments comparing state-of-the-art solutions based on both their detection performance and efficiency. To evaluate the effectiveness of our proposed pipeline, we proposed a dataset comprising daily household activities. By presenting our findings and analysing the results, we demonstrate the efficacy of our approach in enabling mobile robots to understand and respond to human behaviour in real-world scenarios relying mainly on the data from their RGB cameras.

Summary

The paper presents a novel pipeline integrating OpenPose-based detection, 3D tracking, and X3D action recognition on mobile robots.
The pipeline operates efficiently on edge devices, enabling near real-time decision-making despite environmental challenges.
Extensive experiments using a custom dataset demonstrate high prediction accuracy and resource-efficient performance in dynamic real-world settings.

Introduction to HAR in Mobile Robots

The role of robots in society is expanding, particularly in areas such as healthcare and ambient assisted living (AAL), where they can provide valuable assistance. These service robots are increasingly sophisticated, often being tasked to interpret human activities and respond accordingly. This is where Human Action Recognition (HAR) comes into play, a complex process which entails detecting human presence, tracking movements, and ultimately understanding human actions.

The Proposed Pipeline

To ensure efficient and reliable HAR on mobile service robots, we present a comprehensive pipeline that encompasses the entire process, from initial detection to action recognition. This pipeline is designed to operate effectively in near real-time on-edge devices – meaning that it runs directly on the robot rather than relying on additional remote processing power. This is critical for autonomous mobile robots that need to make swift decisions based on human activity.

The process starts with human detection, utilizing a solution known as OpenPose for locating and identifying the skeletal keypoints of humans captured by the robot's camera. From there, recognized individuals can be tracked in 3D within the robot's operating space. This tracking is crucial for the robot to contextualize the observations in its environment. The next steps involve identifying the user, employing a face recognition algorithm followed by action recognition which leverages the robot's sensory data to understand and categorize human actions.

Challenges and Solutions

Several challenges exist when deploying HAR systems on mobile robots. For instance, there can be issues relating to viewpoint variation, inconsistent lighting, and obstructions in the robot's field of vision. Overcoming these requires not only the carefully designed pipeline but also choosing lightweight yet efficient algorithms that can handle such challenges in real-time without overwhelming the robot's computational capabilities.

To determine the most fitting models for our system, we conducted extensive experiments comparing state-of-the-art detection and recognition solutions, focusing on both efficiency and detection performance. Our research led to the implementation of efficient algorithms like OpenPose for user detection and X3D for action recognition.

Dataset and Experimental Results

To evaluate our system, we compiled a robust dataset that combines publicly available datasets with data specifically captured for this project. This new dataset contains a variety of daily activities recorded from the robot's perspective, thus addressing the real-world scenarios the robot would encounter. When assessing various HAR models against our dataset, we prioritized models that offered a good balance between accuracy and resource utilization.

The efficacy of our approach is proven through rigorous testing on a dedicated mobile robot platform, which showed promising real-time performance while maintaining high prediction accuracy. We also introduced variations in the experimentation, like simulating different environmental conditions and user interactions, ensuring our system is well-adapted to the dynamic and varied real-world applications.

Conclusion

The contribution of this work is significant as it presents a viable end-to-end solution for recognizing human actions via mobile robots in real-time circumstances. Not only does it promise to enhance the capability of service robots in interpreting human behavior, but it also sets a precedent for future advancements in the field of robotic human perception. Our findings offer a pathway to improve the interaction between service robots and their human counterparts, potentially transforming how such robots can be utilized in critical sectors like healthcare.

PDF Markdown