IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic (2404.08561v2)
Abstract: Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction. Overall, our dataset and introduced prediction models form the foundation for studying how road conditions and surrounding entities affect driving behavior in complex traffic situations.
- J. Li, H. Gang, H. Ma, M. Tomizuka, and C. Choi, “Important object identification with semi-supervised learning for autonomous driving,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2913–2919.
- M. Gao, A. Tawari, and S. Martin, “Goal-oriented object importance estimation in on-road driving videos,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 5509–5515.
- S. Malla, C. Choi, I. Dwivedi, J. H. Choi, and J. Li, “Drama: Joint risk localization and captioning in driving,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 1043–1052.
- Y. Xu, X. Yang, L. Gong, H.-C. Lin, T.-Y. Wu, Y. Li, and N. Vasconcelos, “Explainable object-induced action decision for autonomous vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9523–9532.
- J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 563–578.
- V. Ramanishka, Y.-T. Chen, T. Misu, and K. Saenko, “Toward driving scene understanding: A dataset for learning driver behavior and causal reasoning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7699–7707.
- R. Chandra, X. Wang, M. Mahajan, R. Kala, R. Palugulla, C. Naidu, A. Jain, and D. Manocha, “Meteor: A dense, heterogeneous, and unstructured traffic dataset with rare behaviors,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 9169–9175.
- M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
- J. Kim and J. Canny, “Interpretable learning for self-driving cars by visualizing causal attention,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2942–2950.
- H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174–2182.
- T. Wu, A. Luo, R. Huang, H. Cheng, and Y. Zhao, “End-to-end driving model for steering control of autonomous vehicles with future spatiotemporal features,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 950–955.
- H. Ben-Younes, É. Zablocki, P. Pérez, and M. Cord, “Driving behavior explanation with multi-level fusion,” Pattern Recognition, vol. 123, p. 108421, 2022.
- T. Jing, H. Xia, R. Tian, H. Ding, X. Luo, J. Domeyer, R. Sherony, and Z. Ding, “Inaction: Interpretable action decision making for autonomous driving,” in European Conference on Computer Vision. Springer, 2022, pp. 370–387.
- C. Li, Y. Meng, S. H. Chan, and Y.-T. Chen, “Learning 3d-aware egocentric spatial-temporal interaction via graph convolutional networks,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 8418–8424.
- Y. Yao, X. Wang, M. Xu, Z. Pu, E. Atkins, and D. Crandall, “When, where, and what? a new dataset for anomaly detection in driving videos,” arXiv preprint arXiv:2004.03044, 2020.
- S. Dokania, A. Hafez, A. Subramanian, M. Chandraker, and C. Jawahar, “Idd-3d: Indian driving dataset for 3d unstructured road scenes,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4482–4491.
- A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 3464–3468.
- M. Schuster and K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
- L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, “Temporal segment networks for action recognition in videos,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 11, pp. 2740–2755, 2018.
- C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li, S. Vijayanarasimhan, G. Toderici, S. Ricco, R. Sukthankar, et al., “Ava: A video dataset of spatio-temporally localized atomic visual actions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6047–6056.
- C. Zach, T. Pock, and H. Bischof, “A duality based approach for realtime tv-l 1 optical flow,” in Pattern Recognition: 29th DAGM Symposium, Heidelberg, Germany, September 12-14, 2007. Proceedings 29. Springer, 2007, pp. 214–223.
- G. Singh, V. Choutas, S. Saha, F. Yu, and L. Van Gool, “Spatio-temporal action detection under large motion,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 6009–6018.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- G. Varma, A. Subramanian, A. Namboodiri, M. Chandraker, and C. Jawahar, “IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019, pp. 1743–1751.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- Chirag Parikh (5 papers)
- Rohit Saluja (15 papers)
- C. V. Jawahar (110 papers)
- Ravi Kiran Sarvadevabhatla (42 papers)