Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WoodScape Motion Segmentation for Autonomous Driving -- CVPR 2023 OmniCV Workshop Challenge (2401.00910v2)

Published 31 Dec 2023 in cs.CV and cs.LG

Abstract: Motion segmentation is a complex yet indispensable task in autonomous driving. The challenges introduced by the ego-motion of the cameras, radial distortion in fisheye lenses, and the need for temporal consistency make the task more complicated, rendering traditional and standard Convolutional Neural Network (CNN) approaches less effective. The consequent laborious data labeling, representation of diverse and uncommon scenarios, and extensive data capture requirements underscore the imperative of synthetic data for improving machine learning model performance. To this end, we employ the PD-WoodScape synthetic dataset developed by Parallel Domain, alongside the WoodScape fisheye dataset. Thus, we present the WoodScape fisheye motion segmentation challenge for autonomous driving, held as part of the CVPR 2023 Workshop on Omnidirectional Computer Vision (OmniCV). As one of the first competitions focused on fisheye motion segmentation, we aim to explore and evaluate the potential and impact of utilizing synthetic data in this domain. In this paper, we provide a detailed analysis on the competition which attracted the participation of 112 global teams and a total of 234 submissions. This study delineates the complexities inherent in the task of motion segmentation, emphasizes the significance of fisheye datasets, articulate the necessity for synthetic datasets and the resultant domain gap they engender, outlining the foundational blueprint for devising successful solutions. Subsequently, we delve into the details of the baseline experiments and winning methods evaluating their qualitative and quantitative results, providing with useful insights.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. R. Yadav, A. Samir, H. Rashed, S. Yogamani, and R. Dahyot, “Cnn based color and thermal image fusion for object detection in automated driving,” Irish Machine Vision and Image Processing, 2020.
  2. S. Mohapatra, S. Yogamani, H. Gotzig, S. Milz, and P. Mader, “Bevdetnet: bird’s eye view lidar point cloud based real-time 3d object detection for autonomous driving,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC).   IEEE, 2021, pp. 2809–2815.
  3. K. Dasgupta, A. Das, S. Das, U. Bhattacharya, and S. Yogamani, “Spatio-contextual deep network-based multimodal pedestrian detection for autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, 2022.
  4. C. Eising, J. Horgan, and S. Yogamani, “Near-field perception for low-speed vehicle automation using surround-view fisheye cameras,” IEEE Transactions on Intelligent Transportation Systems, 2021.
  5. V. R. Kumar, C. Eising, C. Witt, and S. Yogamani, “Surround-view fisheye camera perception for automated driving: Overview, survey and challenges,” arXiv preprint arXiv:2205.13281, 2022.
  6. W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” The International Journal of Robotics Research, vol. 36, no. 1, pp. 3–15, 2017.
  7. Y. Liao, J. Xie, and A. Geiger, “KITTI-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d,” Pattern Analysis and Machine Intelligence (PAMI), 2022.
  8. S. Yogamani, C. Hughes, J. Horgan, G. Sistu, P. Varley, D. O’Dea, M. Uricár, S. Milz, M. Simon, K. Amende, et al., “Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), 2019, pp. 9308–9318.
  9. M. Uricár, D. Hurych, P. Krizek, et al., “Challenges in designing datasets and validation for autonomous driving,” in Proceedings of the International Conference on Computer Vision Theory and Applications, 2019.
  10. A. Dahal, V. R. Kumar, S. Yogamani, et al., “An online learning system for wireless charging alignment using surround-view fisheye cameras,” IEEE Robotics and Automation Letters, 2021.
  11. H. Rashed, E. Mohamed, G. Sistu, et al., “FisheyeYOLO: Object Detection on Fisheye Cameras for Autonomous Driving,” Machine Learning for Autonomous Driving NeurIPSW, 2020.
  12. H. Rashed, E. Mohamed, G. Sistu, V. R. Kumar, C. Eising, A. El-Sallab, and S. Yogamani, “Generalized object detection on fisheye cameras for autonomous driving: Dataset, representations and baseline,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 2272–2280.
  13. L. Yahiaoui, C. Hughes, J. Horgan, et al., “Optimization of ISP parameters for object detection algorithms,” Electronic Imaging, vol. 2019, no. 15, pp. 44–1, 2019.
  14. A. Dahal, J. Hossen, C. Sumanth, G. Sistu, K. Malhan, M. Amasha, and S. Yogamani, “Deeptrailerassist: Deep learning based trailer detection, tracking and articulation angle estimation on automotive rear-view camera,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.
  15. M. Uricar, G. Sistu, H. Rashed, A. Vobecky, V. R. Kumar, P. Krizek, F. Burger, and S. Yogamani, “Let’s get dirty: Gan based data augmentation for camera lens soiling detection in autonomous driving,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 766–775.
  16. A. Das, P. Křížek, G. Sistu, et al., “Tiledsoilingnet: Tile-level soiling detection on automotive surround-view cameras using coverage metric,” in Proceedings of the International Conference on Intelligent Transportation Systems, 2020, pp. 1–6.
  17. M. Uricár, J. Ulicny, G. Sistu, et al., “Desoiling dataset: Restoring soiled areas on automotive fisheye cameras,” in Proceedings of the International Conference on Computer Vision Workshops.   IEEE, 2019, pp. 4273–4279.
  18. R. Cheke, G. Sistu, C. Eising, P. van de Ven, V. R. Kumar, and S. Yogamani, “Fisheyepixpro: self-supervised pretraining using fisheye images for semantic segmentation,” in Electronic Imaging, Autonomous Vehicles and Machines Conference 2022, 2022.
  19. I. Sobh, A. Hamed, V. Ravi Kumar, et al., “Adversarial attacks on multi-task visual perception for autonomous driving,” Journal of Imaging Science and Technology, vol. 65, no. 6, pp. 60 408–1, 2021.
  20. A. Dahal, E. Golab, R. Garlapati, et al., “RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images,” in Electronic Imaging.   Society for Imaging Science and Technology, 2021.
  21. M. Klingner, V. R. Kumar, S. Yogamani, A. Bär, and T. Fingscheidt, “Detecting adversarial perturbations in multi-task perception,” arXiv preprint arXiv:2203.01177, 2022.
  22. H. Rashed, A. El Sallab, S. Yogamani, et al., “Motion and depth augmented semantic segmentation for autonomous navigation,” in Proceedings of the Computer Vision and Pattern Recognition Conference Workshops, 2019, pp. 364–370.
  23. M. M. Dhananjaya, V. R. Kumar, and S. Yogamani, “Weather and Light Level Classification for Autonomous Driving: Dataset, Baseline and Active Learning,” in Proceedings of the International Conference on Intelligent Transportation Systems.   IEEE, 2021.
  24. V. Ravi Kumar, S. Milz, C. Witt, et al., “Monocular fisheye camera depth estimation using sparse lidar supervision,” in Proceedings of the International Conference on Intelligent Transportation Systems, 2018, pp. 2853–2858.
  25. V. Ravi Kumar, S. Milz, C. Witt, et al., “Near-field depth estimation using monocular fisheye camera: A semi-supervised learning approach using sparse LiDAR data,” in Proceedings of the Computer Vision and Pattern Recognition Conference Workshops, vol. 7, 2018.
  26. V. R. Kumar, M. Klingner, S. Yogamani, et al., “SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras,” Transactions on Intelligent Transportation Systems, 2021.
  27. V. Ravi Kumar, S. Yogamani, M. Bach, et al., “UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models,” in Proceedings of the International Conference on Intelligent Robots and Systems, 2020, pp. 8177–8183.
  28. V. Ravi Kumar, S. A. Hiremath, M. Bach, et al., “Fisheyedistancenet: Self-supervised scale-aware distance estimation using monocular fisheye camera for autonomous driving,” in Proceedings of the International Conference on Robotics and Automation, 2020, pp. 574–581.
  29. V. Ravi Kumar, S. Yogamani, S. Milz, et al., “FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization,” in Electronic Imaging.   Society for Imaging Science and Technology, 2021.
  30. V. Ravi Kumar, M. Klingner, S. Yogamani, et al., “Syndistnet: Self-supervised monocular fisheye camera distance estimation synergized with semantic segmentation for autonomous driving,” in Proceedings of the Workshop on Applications of Computer Vision, 2021, pp. 61–71.
  31. M. Siam, H. Mahgoub, M. Zahran, S. Yogamani, M. Jagersand, and A. El-Sallab, “Modnet: Motion and appearance based moving object detection network for autonomous driving,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2018, pp. 2859–2864.
  32. M. Yahiaoui, H. Rashed, L. Mariotti, et al., “FisheyeMODNet: Moving Object Detection on Surround-view Cameras for Autonomous Driving,” in Proceedings of the Irish Machine Vision and Image Processing, 2019.
  33. E. Mohamed, M. Ewaisha, M. Siam, et al., “Monocular instance motion segmentation for autonomous driving: Kitti instancemotseg dataset and multi-task baseline,” in Proceedings of the Intelligent Vehicles Symposium.   IEEE, 2021, pp. 114–121.
  34. N. Tripathi and S. Yogamani, “Trained trajectory based automated parking system using Visual SLAM,” in Proceedings of the Computer Vision and Pattern Recognition Conference Workshops, 2021.
  35. L. Gallagher, V. R. Kumar, S. Yogamani, and J. B. McDonald, “A hybrid sparse-dense monocular slam system for autonomous driving,” in 2021 European Conference on Mobile Robots (ECMR).   IEEE, 2021, pp. 1–8.
  36. I. Leang, G. Sistu, F. Bürger, et al., “Dynamic task weighting methods for multi-task networks in autonomous driving systems,” in Proceedings of the International Conference on Intelligent Transportation Systems.   IEEE, 2020, pp. 1–8.
  37. V. R. Kumar, S. Yogamani, H. Rashed, G. Sitsu, C. Witt, I. Leang, S. Milz, and P. Mäder, “Omnidet: Surround view cameras based multi-task visual perception network for autonomous driving,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2830–2837, 2021.
  38. M. Wrenninge and J. Unger, “Synscapes: A photorealistic synthetic dataset for street scene parsing,” 2018.
  39. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” 2016.
  40. A. R. Sekkat, Y. Dupuis, V. R. Kumar, H. Rashed, S. Yogamani, P. Vasseur, and P. Honeine, “Synwoodscape: Synthetic surround-view fisheye camera dataset for autonomous driving,” arXiv preprint arXiv:2203.05056, 2022.
  41. Domain Adaptation in Computer Vision Applications.   Springer International Publishing, 2017. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-58347-1
  42. A. Farahani, S. Voghoei, K. Rasheed, and H. R. Arabnia, “A brief review of domain adaptation,” in Advances in Data Science and Information Engineering, R. Stahlbock, G. M. Weiss, M. Abou-Nasr, C.-Y. Yang, H. R. Arabnia, and L. Deligiannidis, Eds.   Cham: Springer International Publishing, 2021, pp. 877–894.
  43. M. Toldo, A. Maracani, U. Michieli, and P. Zanuttigh, “Unsupervised domain adaptation in semantic segmentation: A review,” Technologies, vol. 8, no. 2, 2020. [Online]. Available: https://www.mdpi.com/2227-7080/8/2/35
  44. E. Reinhard, M. Ashikhmin, B. Gooch, and P. Shirley, “Color transfer between images,” IEEE Computer Graphics and Applications, vol. 21, pp. 34–41, 10 2001.
  45. Q. Lyu, M. Chen, and X. Chen, “Learning color space adaptation from synthetic to real images of cirrus clouds,” 2020.
  46. B. T. Imbusch, M. Schwarz, and S. Behnke, “Synthetic-to-real domain adaptation using contrastive unpaired translation,” in 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE).   IEEE, aug 2022. [Online]. Available: https://doi.org/10.1109%2Fcase49997.2022.9926640
  47. S. Sankaranarayanan, Y. Balaji, A. Jain, S. N. Lim, and R. Chellappa, “Learning from synthetic data: Addressing domain shift for semantic segmentation,” 2018.
  48. J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” 2017.
  49. Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” 2015.
  50. T.-H. Vu, H. Jain, M. Bucher, M. Cord, and P. Pérez, “Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation,” 2019.
  51. H. Wang, T. Shen, W. Zhang, L. Duan, and T. Mei, “Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation,” 2020.
  52. S. Ramachandran, G. Sistu, J. McDonald, and S. Yogamani, “Woodscape fisheye semantic segmentation for autonomous driving–cvpr 2021 omnicv workshop challenge,” arXiv preprint arXiv:2107.08246, 2021.
  53. S. Ramachandran, G. Sistu, V. R. Kumar, J. McDonald, and S. Yogamani, “Woodscape fisheye object detection for autonomous driving – cvpr 2022 omnicv workshop challenge,” 2022.
  54. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
  55. B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” 2022.
  56. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” 2021.
  57. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 11 966–11 976.
  58. W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, X. Wang, and Y. Qiao, “Internimage: Exploring large-scale vision foundation models with deformable convolutions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 14 408–14 419.
  59. R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 7262–7272.
  60. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), 2015.
  61. T. Zhang and W. Li, “kdecay: Just adding k-decay items on learning-rate schedule to improve neural networks,” 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:247597102
  62. T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” in European Conference on Computer Vision.   Springer, 2018.
  63. W. Zhang, J. Pang, K. Chen, and C. C. Loy, “K-Net: Towards unified image segmentation,” in NeurIPS, 2021.
  64. Y. Yang and S. Soatto, “Fda: Fourier domain adaptation for semantic segmentation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4084–4094.

Summary

We haven't generated a summary for this paper yet.