DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information
Abstract: Utilizing a single camera for measuring object distances is a cost-effective alternative to stereo-vision and LiDAR. Although monocular distance estimation has been explored in the literature, most existing techniques rely on object class knowledge to achieve high performance. Without this contextual data, monocular distance estimation becomes more challenging, lacking reference points and object-specific cues. However, these cues can be misleading for objects with wide-range variation or adversarial situations, which is a challenging aspect of object-agnostic distance estimation. In this paper, we propose DMODE, a class-agnostic method for monocular distance estimation that does not require object class knowledge. DMODE estimates an object's distance by fusing its fluctuation in size over time with the camera's motion, making it adaptable to various object detectors and unknown objects, thus addressing these challenges. We evaluate our model on the KITTI MOTS dataset using ground-truth bounding box annotations and outputs from TrackRCNN and EagerMOT. The object's location is determined using the change in bounding box sizes and camera position without measuring the object's detection source or class attributes. Our approach demonstrates superior performance in multi-class object distance detection scenarios compared to conventional methods.
- Y. Zhang, Y. Li, M. Zhao, and X. Yu, “A regional regression network for monocular object distance estimation,” in 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2020, pp. 1–6.
- P. Agand, A. Iskrov, and M. Chen, “Deep reinforcement learning-based intelligent traffic signal controls with optimized co2 emissions,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 5495–5500.
- X. Yang, H. Luo, Y. Wu, Y. Gao, C. Liao, and K.-T. Cheng, “Reactive obstacle avoidance of monocular quadrotors with online adapted depth prediction network,” Neurocomputing, vol. 325, pp. 142–158, 2019.
- R. Kang, J. Shi, X. Li, Y. Liu, and X. Liu, “Df-slam: A deep-learning enhanced visual slam system based on deep local features,” arXiv preprint arXiv:1901.07223, 2019.
- A. Saxena, J. Schulte, A. Y. Ng, et al., “Depth estimation using monocular and stereo cues.” in IJCAI, vol. 7, 2007, pp. 2197–2203.
- D. Reyes Duran, E. Robinson, A. J. Kornecki, and J. Zalewski, “Safety analysis of autonomous ground vehicle optical systems: Bayesian belief networks approach,” in 2013 Federated Conference on Computer Science and Information Systems, 2013, pp. 1419–1425.
- F. Gökçe, G. Üçoluk, E. Şahin, and S. Kalkan, “Vision-based detection and distance estimation of micro unmanned aerial vehicles,” Sensors, vol. 15, no. 9, pp. 23 805–23 846, 2015. [Online]. Available: https://www.mdpi.com/1424-8220/15/9/23805
- S. Tuohy, D. O’Cualain, E. Jones, and M. Glavin, “Distance determination for an automobile environment using inverse perspective mapping in opencv,” in IET Irish Signals and Systems Conference (ISSC 2010), 2010, pp. 100–105.
- J. Zhu and Y. Fang, “Learning object-specific distance from a monocular image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3839–3848.
- T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1851–1858.
- V. R. Kumar, M. Klingner, S. Yogamani, S. Milz, T. Fingscheidt, and P. Mader, “Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 61–71.
- Y. Shi, T. Lin, B. Chen, R. Wang, and Y. Zhang, “Structured deep learning based object-specific distance estimation from a monocular image,” International Journal of Machine Learning and Cybernetics, pp. 1–11, 2023.
- P. Agand, M. Chen, and H. D. Taghirad, “Online probabilistic model identification using adaptive recursive mcmc,” in 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023, pp. 1–8.
- P. Agand and M. A. Shoorehdeli, “Adaptive model learning of neural networks with uub stability for robot dynamic estimation,” in 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019, pp. 1–6.
- F. Liu, C. Shen, G. Lin, and I. Reid, “Learning depth from single monocular images using deep convolutional neural fields,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 10, pp. 2024–2039, 2015.
- H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2002–2011.
- J. Zhang, Q. Su, C. Wang, and H. Gu, “Monocular 3d vehicle detection with multi-instance depth and geometry reasoning for autonomous driving,” Neurocomputing, vol. 403, pp. 182–192, 2020.
- C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 270–279.
- C. Godard, O. Mac Aodha, M. Firman, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3828–3838.
- Y. Wan, Q. Zhao, C. Guo, C. Xu, and L. Fang, “Multi-sensor fusion self-supervised deep odometry and depth estimation,” Remote Sensing, vol. 14, no. 5, p. 1228, 2022.
- A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimation using deep learning and geometry,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 7074–7082.
- G. Brazil and X. Liu, “M3d-rpn: Monocular 3d region proposal network for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9287–9296.
- Y. You, Y. Wang, W.-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving,” arXiv preprint arXiv:1906.06310, 2019.
- A. A. Ali and H. A. Hussein, “Distance estimation and vehicle position detection based on monocular camera,” in 2016 Al-Sadeq International Conference on Multidisciplinary in IT and Communication Science and Applications (AIC-MITCSA). IEEE, 2016, pp. 1–4.
- L. Bertoni, S. Kreiss, and A. Alahi, “Monoloco: Monocular 3d pedestrian localization and uncertainty estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6861–6871.
- Y. Cai, B. Li, Z. Jiao, H. Li, X. Zeng, and X. Wang, “Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 10 478–10 485.
- Y. Zhang, L. Ding, Y. Li, W. Lin, M. Zhao, X. Yu, and Y. Zhan, “A regional distance regression network for monocular object distance estimation,” Journal of Visual Communication and Image Representation, vol. 79, p. 103224, 2021.
- P. Agand, M. Motaharifar, and H. D. Taghirad, “Teleoperation with uncertain environment and communication channel: An h∞subscriptℎh_{\infty}italic_h start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT robust approach,” in 2017 Iranian Conference on Electrical Engineering (ICEE). IEEE, 2017, pp. 685–690.
- P. Alphonse and K. Sriharsha, “Depth perception in single rgb camera system using lens aperture and object size: a geometrical approach for depth estimation,” SN Applied Sciences, vol. 3, no. 6, pp. 1–16, 2021.
- Y. Fan, P. Agand, M. Chen, E. J. Park, A. Kennedy, and C. Bae, “Sequential modeling of complex marine navigation: Case study on a passenger vessel (student abstract),” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 21, 2024, pp. 23 484–23 485.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- S. Lambert-Lacroix and L. Zwald, “The adaptive berhu penalty in robust regression,” Journal of Nonparametric Statistics, vol. 28, no. 3, pp. 487–514, 2016.
- P. Agand, M. Mahdavian, M. Savva, and M. Chen, “Letfuser: Light-weight end-to-end transformer-based sensor fusion for autonomous driving with multi-task learning,” arXiv preprint arXiv:2310.13135, 2023.
- P. Agand, A. Kennedy, T. Harris, C. Bae, M. Chen, and E. J. Park, “Fuel consumption prediction for a passenger ferry using machine learning and in-service data: A comparative study,” Ocean Engineering, vol. 284, p. 115271, 2023.
- C. Garbin, X. Zhu, and O. Marques, “Dropout vs. batch normalization: an empirical study of their impact to deep learning,” Multimedia Tools and Applications, vol. 79, no. 19, pp. 12 777–12 815, 2020.
- A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler, “Mot16: A benchmark for multi-object tracking,” arXiv preprint arXiv:1603.00831, 2016.
- P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, “Mots: Multi-object tracking and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7942–7951.
- A. Kim, A. Ošep, and L. Leal-Taixé, “Eagermot: 3d multi-object tracking via sensor fusion,” arXiv preprint arXiv:2104.14682, 2021.
- H. Liang, Z. Ma, and Q. Zhang, “Self-supervised object distance estimation using a monocular camera,” Sensors, vol. 22, no. 8, p. 2936, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.