MSight: An Edge-Cloud Infrastructure-based Perception System for Connected Automated Vehicles (2310.05290v1)
Abstract: As vehicular communication and networking technologies continue to advance, infrastructure-based roadside perception emerges as a pivotal tool for connected automated vehicle (CAV) applications. Due to their elevated positioning, roadside sensors, including cameras and lidars, often enjoy unobstructed views with diminished object occlusion. This provides them a distinct advantage over onboard perception, enabling more robust and accurate detection of road objects. This paper presents MSight, a cutting-edge roadside perception system specifically designed for CAVs. MSight offers real-time vehicle detection, localization, tracking, and short-term trajectory prediction. Evaluations underscore the system's capability to uphold lane-level accuracy with minimal latency, revealing a range of potential applications to enhance CAV safety and efficiency. Presently, MSight operates 24/7 at a two-lane roundabout in the City of Ann Arbor, Michigan.
- M. Schwall, T. Daniel, T. Victor, F. Favaro, and H. Hohnhold, “Waymo public road safety performance data,” arXiv preprint arXiv:2011.00038, 2020.
- J. B. Kenney, “Dedicated short-range communications (dsrc) standards in the united states,” Proceedings of the IEEE, vol. 99, no. 7, pp. 1162–1182, 2011.
- S. Chen, J. Hu, Y. Shi, Y. Peng, J. Fang, R. Zhao, and L. Zhao, “Vehicle-to-everything (v2x) services supported by lte-based systems and 5g,” IEEE Communications Standards Magazine, vol. 1, no. 2, pp. 70–76, 2017.
- S. Chen, J. Hu, Y. Shi, L. Zhao, and W. Li, “A vision of c-v2x: Technologies, field testing, and challenges with chinese development,” IEEE Internet of Things Journal, vol. 7, no. 5, pp. 3872–3881, 2020.
- T. Kitazato, M. Tsukada, H. Ochiai, and H. Esaki, “Proxy cooperative awareness message: an infrastructure-assisted v2v messaging,” in 2016 Ninth International Conference on Mobile Computing and Ubiquitous Networking (ICMU). IEEE, 2016, pp. 1–6.
- S. A. A. T. Committee et al., “V2x sensor-sharing for cooperative & automated driving,” SAE J3224. Available online: https://www. sae. org/servlets/works/committeeHome. do, 2019.
- S. R. E. Datondji, Y. Dupuis, P. Subirats, and P. Vasseur, “A survey of vision-based traffic monitoring of road intersections,” IEEE transactions on intelligent transportation systems, vol. 17, no. 10, pp. 2681–2698, 2016.
- D. Meng, O. Sayer, R. Zhang, S. Shen, H. Li, and H. X. Liu, “Roco: A roundabout traffic conflict dataset,” arXiv preprint arXiv:2303.00563, 2023.
- T. Furuya and C. J. Taylor, “Road intersection monitoring from video with large perspective deformation,” Ph.D. dissertation, University of Pennsylvania, 2014.
- S. Messelodi, C. M. Modena, and M. Zanin, “A computer vision system for the detection and classification of vehicles at urban road intersections,” Pattern analysis and applications, vol. 8, no. 1, pp. 17–31, 2005.
- R. Zhang, D. Meng, L. Bassett, S. Shen, Z. Zou, and H. X. Liu, “Robust roadside perception for autonomous driving: an annotation-free strategy with synthesized data,” arXiv preprint arXiv:2306.17302, 2023.
- N. Saunier and T. Sayed, “A feature-based tracking algorithm for vehicles in intersections,” in The 3rd Canadian Conference on Computer and Robot Vision (CRV’06). IEEE, 2006, pp. 59–59.
- C. Li, A. Chiang, G. Dobler, Y. Wang, K. Xie, K. Ozbay, M. Ghandehari, J. Zhou, and D. Wang, “Robust vehicle tracking for urban traffic videos at intersections,” in 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 2016, pp. 207–213.
- F. Faisal, S. K. Das, A. H. Siddique, M. Hasan, S. Sabrin, C. A. Hossain, and Z. Tong, “Automated traffic detection system based on image processing,” Journal of Computer Science and Technology Studies, vol. 2, no. 1, pp. 18–25, 2020.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229.
- D. Meng, X. Chen, Z. Fan, G. Zeng, H. Li, Y. Yuan, L. Sun, and J. Wang, “Conditional detr for fast training convergence,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3651–3660.
- X. Chen, F. Wei, G. Zeng, and J. Wang, “Conditional detr v2: Efficient detection transformer with box queries,” arXiv preprint arXiv:2207.08914, 2022.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
- S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang, “Dab-detr: Dynamic anchor boxes are better queries for detr,” arXiv preprint arXiv:2201.12329, 2022.
- Q. Chen, X. Chen, G. Zeng, and J. Wang, “Group detr: Fast training convergence with decoupled one-to-many label assignment,” arXiv preprint arXiv:2207.13085, 2022.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271.
- ——, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
- A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
- D. Barina, “Gabor wavelets in image processing,” arXiv preprint arXiv:1602.03308, 2016.
- J. Hariyono, V.-D. Hoang, and K.-H. Jo, “Moving object localization using optical flow for pedestrian detection from a moving vehicle,” The Scientific World Journal, vol. 2014, 2014.
- D.-S. Kim and J. Kwon, “Moving object detection on a vehicle mounted back-up camera,” Sensors, vol. 16, no. 1, p. 23, 2015.
- R. Yao, G. Lin, S. Xia, J. Zhao, and Y. Zhou, “Video object segmentation and tracking: A survey,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 11, no. 4, pp. 1–47, 2020.
- H. A. Patel and D. G. Thakore, “Moving object tracking using kalman filter,” International Journal of Computer Science and Mobile Computing, vol. 2, no. 4, pp. 326–332, 2013.
- L. E. Taylor, M. Mirdanies, and R. P. Saputra, “Optimized object tracking technique using kalman filter,” arXiv preprint arXiv:2103.05467, 2021.
- S.-H. Bae and K.-J. Yoon, “Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 1218–1225.
- J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, “Multiple object tracking using k-shortest paths optimization,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 9, pp. 1806–1819, 2011.
- P. Bergmann, T. Meinhardt, and L. Leal-Taixe, “Tracking without bells and whistles,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 941–951.
- C. Dicle, O. I. Camps, and M. Sznaier, “The way they move: Tracking multiple targets with similar appearance,” in Proceedings of the IEEE international conference on computer vision, 2013, pp. 2304–2311.
- A. Milan, S. Roth, and K. Schindler, “Continuous energy minimization for multitarget tracking,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 1, pp. 58–72, 2013.
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE international conference on image processing (ICIP). IEEE, 2016, pp. 3464–3468.
- A. Aboah, “A vision-based system for traffic anomaly detection using deep learning and decision trees,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4207–4212.
- L. Huang and W. Huang, “Rd-yolo: an effective and efficient object detector for roadside perception system,” Sensors, vol. 22, no. 21, p. 8097, 2022.
- W. Wang, T. Gee, J. Price, and H. Qi, “Real time multi-vehicle tracking and counting at intersections from a fisheye camera,” in 2015 IEEE Winter Conference on Applications of Computer Vision. IEEE, 2015, pp. 17–24.
- A. Gaszczak, T. P. Breckon, and J. Han, “Real-time people and vehicle detection from uav imagery,” in Intelligent Robots and Computer Vision XXVIII: Algorithms and Techniques, vol. 7878. International Society for Optics and Photonics, 2011, p. 78780B.
- Y. Iwasaki, M. Misumi, and T. Nakamiya, “Robust vehicle detection under various environmental conditions using an infrared thermal camera and its application to road traffic flow monitoring,” Sensors, vol. 13, no. 6, pp. 7756–7773, 2013.
- P. Sun, C. Sun, R. Wang, and X. Zhao, “Object detection based on roadside lidar for cooperative driving automation: A review,” Sensors, vol. 22, no. 23, p. 9316, 2022.
- Z. Zhang, J. Zheng, X. Wang, and X. Fan, “Background filtering and vehicle detection with roadside lidar based on point association,” in 2018 37th Chinese Control Conference (CCC). IEEE, 2018, pp. 7938–7943.
- J. Zhao, H. Xu, H. Liu, J. Wu, Y. Zheng, and D. Wu, “Detection and tracking of pedestrians and vehicles using roadside lidar sensors,” Transportation research part C: emerging technologies, vol. 100, pp. 68–87, 2019.
- Z. Zhang, J. Zheng, H. Xu, and X. Wang, “Vehicle detection and tracking in complex traffic circumstances with roadside lidar,” Transportation research record, vol. 2673, no. 9, pp. 62–71, 2019.
- S. Zhou, H. Xu, G. Zhang, T. Ma, and Y. Yang, “Leveraging deep convolutional neural networks pre-trained on autonomous driving data for vehicle detection from roadside lidar data,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 22 367–22 377, 2022.
- Z. Bai, G. Wu, M. J. Barth, Y. Liu, E. A. Sisbot, and K. Oguchi, “Pillargrid: Deep learning-based cooperative perception for 3d object detection from onboard-roadside lidar,” in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 1743–1749.
- J. Wu, H. Xu, Y. Tian, R. Pi, and R. Yue, “Vehicle detection under adverse weather from roadside lidar data,” Sensors, vol. 20, no. 12, p. 3433, 2020.
- J. Wu, H. Xu, J. Zheng, and J. Zhao, “Automatic vehicle detection with roadside lidar data under rainy and snowy conditions,” IEEE Intelligent Transportation Systems Magazine, vol. 13, no. 1, pp. 197–209, 2020.
- M. Shan, K. Narula, Y. F. Wong, S. Worrall, M. Khan, P. Alexander, and E. Nebot, “Demonstrations of cooperative perception: safety and robustness in connected and automated vehicle operations,” Sensors, vol. 21, no. 1, p. 200, 2021.
- R. Zhang, Z. Zou, S. Shen, and H. X. Liu, “Design, implementation, and evaluation of a roadside cooperative perception system,” Transportation Research Record, p. 03611981221092402, 2022.
- Z. Zou, R. Zhang, S. Shen, G. Pandey, P. Chakravarty, A. Parchami, and H. X. Liu, “Real-time full-stack traffic scene perception for autonomous driving with roadside cameras,” in 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 890–896.
- Y. Du, B. Qin, C. Zhao, Y. Zhu, J. Cao, and Y. Ji, “A novel spatio-temporal synchronization method of roadside asynchronous mmw radar-camera for sensor fusion,” IEEE Transactions on Intelligent Transportation Systems, 2021.
- J. Gao, C. Sun, H. Zhao, Y. Shen, D. Anguelov, C. Li, and C. Schmid, “Vectornet: Encoding hd maps and agent dynamics from vectorized representation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 525–11 533.
- H. Cui, V. Radosavljevic, F.-C. Chou, T.-H. Lin, T. Nguyen, T.-K. Huang, J. Schneider, and N. Djuric, “Multimodal trajectory predictions for autonomous driving using deep convolutional networks,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 2090–2096.
- M. Liang, B. Yang, R. Hu, Y. Chen, R. Liao, S. Feng, and R. Urtasun, “Learning lane graph representations for motion forecasting,” in European Conference on Computer Vision. Springer, 2020, pp. 541–556.
- M. Wang, X. Zhu, C. Yu, W. Li, Y. Ma, R. Jin, X. Ren, D. Ren, M. Wang, and W. Yang, “Ganet: Goal area network for motion forecasting,” arXiv preprint arXiv:2209.09723, 2022.
- W. Luo, B. Yang, and R. Urtasun, “Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 3569–3577.
- D. Meng, C. Yu, J. Deng, D. Qian, H. Li, and D. Ren, “Hybrid motion representation learning for prediction from raw sensor data,” IEEE Transactions on Multimedia, pp. 1–12, 2023.
- A. Rauch, F. Klanner, and K. Dietmayer, “Analysis of v2x communication parameters for the development of a fusion architecture for cooperative perception systems,” in 2011 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2011, pp. 685–690.
- A. Rauch, F. Klanner, R. Rasshofer, and K. Dietmayer, “Car2x-based perception in a high-level fusion architecture for cooperative perception systems,” in 2012 IEEE Intelligent Vehicles Symposium. IEEE, 2012, pp. 270–275.
- M. Tsukada, M. Kitazawa, T. Oi, H. Ochiai, and H. Esaki, “Cooperative awareness using roadside unit networks in mixed traffic,” in 2019 IEEE Vehicular Networking Conference (VNC). IEEE, 2019, pp. 1–8.
- M. Tsukada, T. Oi, A. Ito, M. Hirata, and H. Esaki, “Autoc2x: Open-source software to realize v2x cooperative perception among autonomous vehicles,” in 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall). IEEE, 2020, pp. 1–6.
- M. Tsukada, T. Oi, M. Kitazawa, and H. Esaki, “Networked roadside perception units for autonomous driving,” Sensors, vol. 20, no. 18, p. 5320, 2020.
- S. Yang, H. H. Yin, R. W. Yeung, X. Xiong, Y. Huang, L. Ma, M. Li, and C. Tang, “On scalable network communication for infrastructure-vehicle collaborative autonomous driving,” IEEE Open Journal of Vehicular Technology, 2022.
- J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 8, pp. 1335–1340, 2006.
- S. Shah and J. Aggarwal, “Intrinsic parameter calibration procedure for a (high-distortion) fish-eye lens camera with distortion model and accuracy estimation,” Pattern Recognition, vol. 29, no. 11, pp. 1775–1788, 1996.
- G. D. Evangelidis and E. Z. Psarakis, “Parametric image alignment using enhanced correlation coefficient maximization,” IEEE transactions on pattern analysis and machine intelligence, vol. 30, no. 10, pp. 1858–1865, 2008.
- G. Welch, G. Bishop et al., “An introduction to the kalman filter,” 1995.
- H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.