Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation (2404.17031v2)
Abstract: Assistive visual navigation systems for visually impaired individuals have become increasingly popular thanks to the rise of mobile computing. Most of these devices work by translating visual information into voice commands. In complex scenarios where multiple objects are present, it is imperative to prioritize object detection and provide immediate notifications for key entities in specific directions. This brings the need for identifying the observer's motion direction (ego-motion) by merely processing visual information, which is the key contribution of this paper. Specifically, we introduce Motor Focus, a lightweight image-based framework that predicts the ego-motion - the humans (and humanoid machines) movement intentions based on their visual feeds, while filtering out camera motion without any camera calibration. To this end, we implement an optical flow-based pixel-wise temporal analysis method to compensate for the camera motion with a Gaussian aggregation to smooth out the movement prediction area. Subsequently, to evaluate the performance, we collect a dataset including 50 clips of pedestrian scenes in 5 different scenarios. We tested this framework with classical feature detectors such as SIFT and ORB to show the comparison. Our framework demonstrates its superiority in speed (> 40FPS), accuracy (MAE = 60pixels), and robustness (SNR = 23dB), confirming its potential to enhance the usability of vision-based assistive navigation tools in complex environments.
- Chatgpt for visually impaired and blind. Authorea Preprints, 2023.
- Multi-functional glasses for the blind and visually impaired: Design and development. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 67, pages 995–1001. SAGE Publications Sage CA: Los Angeles, CA, 2023.
- Feedback mechanism for blind and visually impaired: a review. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 67, pages 1748–1754. SAGE Publications Sage CA: Los Angeles, CA, 2023.
- Driving towards inclusion: Revisiting in-vehicle interaction in autonomous vehicles. arXiv preprint arXiv:2401.14571, 2024.
- Jonathan Donner. After access: Inclusion, development, and a more mobile Internet. MIT press, 2015.
- Smart technologies for visually impaired: Assisting and conquering infirmity of blind people using ai technologies. In 2020 12th Annual Undergraduate Research Conference on Applied Computing (URC), pages 1–4. IEEE, 2020.
- An ai-based visual aid with integrated reading assistant for the completely blind. IEEE Transactions on Human-Machine Systems, 50(6):507–517, 2020.
- Vision-based mobile indoor assistive navigation aid for blind people. IEEE transactions on mobile computing, 18(3):702–714, 2018.
- 6d-vision: Fusion of stereo and motion for robust environment perception. In Pattern Recognition: 27th DAGM Symposium, Vienna, Austria, August 31-September 2, 2005. Proceedings 27, pages 216–223. Springer, 2005.
- Moving object detection using unstable camera for video surveillance systems. Optik, 126(20):2436–2441, 2015.
- Visual odometry [tutorial]. IEEE robotics & automation magazine, 18(4):80–92, 2011.
- Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In 2017 IEEE international conference on robotics and automation (ICRA), pages 2043–2050. IEEE, 2017.
- Christopher D Monaco. Ego-Motion Estimation from Doppler and Spatial Data in Sonar, Radar, or Camera Images. The Pennsylvania State University, 2019.
- Robust ego and object 6-dof motion estimation and tracking. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5017–5023. IEEE, 2020.
- Self-supervised 3d keypoint learning for ego-motion estimation. In Conference on Robot Learning, pages 2085–2103. PMLR, 2021.
- Multi-scale spatiotemporal conv-lstm network for video saliency detection. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pages 362–369, 2018.
- Human visual attention prediction boosts learning & performance of autonomous driving agents. arXiv preprint arXiv:1909.05003, 2019.
- Real-time optical flow-based video stabilization for unmanned aerial vehicles. Journal of Real-Time Image Processing, 16:1975–1985, 2019.
- An evaluation of retinanet on indoor object detection for blind and visually impaired persons assistance navigation. Neural Processing Letters, 51:2265–2279, 2020.
- People with visual impairment training personal object recognizers: Feasibility and challenges. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 5839–5849, 2017.
- Object detection and recognition: using deep learning to assist the visually impaired. Disability and Rehabilitation: Assistive Technology, 16(3):280–288, 2021.
- Cnn-based object recognition and tracking system to assist visually impaired people. IEEE access, 10:14819–14834, 2022.
- Dynamic obstacle detection method based on u–v disparity and residual optical flow for autonomous driving. Scientific Reports, 13(1):7630, 2023.
- Unsupervised learning of monocular depth and ego-motion with optical flow features and multiple constraints. Sensors, 22(4):1383, 2022.
- Learning optical flow, depth, and scene flow without real-world labels. IEEE Robotics and Automation Letters, 7(2):3491–3498, 2022.
- Motion analysis in sport. In Sports innovation, technology and research, pages 3–30. 2016.
- Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3292–3310, 2022.
- Energy optimization for hvac systems in multi-vav open offices: A deep reinforcement learning approach. Applied Energy, 356:122354, 2024.
- Reconstructing street-scenes in real-time from a driving car. In 2015 International Conference on 3D Vision, pages 607–614. IEEE, 2015.
- 3d head-position prediction in first-person view by considering head pose for human-robot eye contact. In 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 1064–1068. IEEE, 2022.
- Forecasting human-object interaction: joint prediction of motor attention and actions in first person video. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 704–721. Springer, 2020.
- Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In Proceedings of the 26th ACM international conference on Multimedia, pages 1190–1198, 2018.
- Gunnar Farnebäck. Two-frame motion estimation based on polynomial expansion. In Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, June 29–July 2, 2003 Proceedings 13, pages 363–370. Springer, 2003.
- Fast key points detection and matching for tree-structured images. In 2022 International Conference on Computational Science and Computational Intelligence (CSCI), pages 1381–1387. IEEE, 2022.
- Visiongpt: Llm-assisted real-time anomaly detection for safe visual navigation. arXiv preprint arXiv:2403.12415, 2024.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.