Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NeurAll: Towards a Unified Visual Perception Model for Automated Driving (1902.03589v3)

Published 10 Feb 2019 in cs.CV, cs.LG, cs.RO, and stat.ML

Abstract: Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently explored and modeled. In this paper, we propose a joint multi-task network design for learning several tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks. Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware. There is also some evidence for other benefits in improving accuracy for some tasks and easing development effort. It also offers scalability to add more tasks leveraging existing features and achieving better generalization. We survey various CNN based solutions for visual perception tasks in automated driving. Then we propose a unified CNN model for the important tasks and discuss several advanced optimization and architecture design techniques to improve the baseline model. The paper is partly review and partly positional with demonstration of several preliminary results promising for future research. We first demonstrate results of multi-stream learning and auxiliary learning which are important ingredients to scale to a large multi-task model. Finally, we implement a two-stream three-task network which performs better in many cases compared to their corresponding single-task models, while maintaining network size.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Computer vision in automated parking systems: Design, implementation and challenges. Image and Vision Computing, 68:88–101, 2017.
  2. Exploring applications of deep reinforcement learning for real-world autonomous driving systems. arXiv preprint arXiv:1901.01536, 2019.
  3. Iasonas Kokkinos. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In CVPR, volume 2, page 8, 2017.
  4. ”Performing Multiple Perceptual Tasks With a Single Deep Neural Network,” a Presentation from Magic Leap in Embedded Vision Alliance. https://www.youtube.com/watch?v=-5wAlxdxuQo, 2017.
  5. Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges. In Intelligent Transportation Systems (ITSC), 2017 IEEE 20th International Conference on. IEEE, 2017.
  6. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1520–1528, 2015.
  7. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481–2495, Dec 2017.
  8. U-Net: Convolutional Networks for Biomedical Image Segmentation. ArXiv e-prints, 2015.
  9. J. Redmon and A. Farhadi. YOLOv3: An Incremental Improvement. 2018.
  10. SSD: Single Shot MultiBox Detector. ArXiv e-prints, 2015.
  11. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
  12. Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE conference on computer vision and pattern recognition (CVPR), volume 2, page 6, 2017.
  13. Demon: Depth and motion network for learning monocular stereo. In IEEE Conference on computer vision and pattern recognition (CVPR), volume 5, page 6, 2017.
  14. Modnet: Moving object detection network with motion and appearance for autonomous driving. arXiv preprint arXiv:1709.04821, 2017.
  15. Real-time category-based and general obstacle detection for autonomous driving. In Proc. IEEE Int. Conf. Comput. Vis. Workshop, pages 198–205, 2017.
  16. Learning the frame-2-frame ego-motion for visual odometry with convolutional neural network. In CCF Chinese Conference on Computer Vision, pages 500–511. Springer, 2017.
  17. Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. In AAAI, pages 3995–4001, 2017.
  18. Learning temporal features with cnns for monocular visual ego motion estimation.
  19. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pages 2043–2050. IEEE, 2017.
  20. Visual slam for automated driving: Exploring the applications of deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 247–257, 2018.
  21. Automatic extrinsic camera parameters calibration using convolutional neural networks. In Intelligent Computer Communication and Processing (ICCP), 2017 13th IEEE International Conference on, pages 273–278. IEEE, 2017.
  22. Model adaptation with synthetic and real data for semantic dense foggy scene understanding. In The European Conference on Computer Vision (ECCV), September 2018.
  23. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. Ieee, 2009.
  24. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018.
  25. Universal representations: The missing link between faces, text, planktons, and cat breeds. arXiv preprint arXiv:1701.07275, 2017.
  26. Efficient parametrization of multi-domain deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8119–8127, 2018.
  27. One model to learn them all. arXiv preprint arXiv:1706.05137, 2017.
  28. Multinet: Real-time joint semantic reasoning for autonomous driving. 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1013–1020, 2018.
  29. Fast scene understanding for autonomous driving. arXiv preprint arXiv:1708.02550, 2017.
  30. Youssef Tamaazousti. On The Universality of Visual and Multimodal Representations. PhD thesis, University College London, 2018.
  31. Learning efficient convolutional networks through network slimming. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 2755–2763. IEEE, 2017.
  32. Condensenet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2752–2761, 2018.
  33. Nvidia xavier soc specification. https://en.wikichip.org/wiki/nvidia/tegra/xavier, 2018 (accessed Nov 22, 2018).
  34. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
  35. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision, pages 2938–2946, 2015.
  36. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7482–7491, 2018.
  37. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In ICML, 2018.
  38. Dynamic task prioritization for multitask learning. In European Conference on Computer Vision, pages 282–299. Springer, 2018.
  39. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
  40. Smsnet: Semantic motion segmentation using deep convolutional neural networks. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 582–589, 2017.
  41. Auxiliary tasks in multi-task learning. arXiv preprint arXiv:1805.06334, 2018.
  42. Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
  43. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016.
  44. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  45. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3234–3243, 2016.
  46. On scalarizing functions in multiobjective optimization. OR spectrum, 24(2):193–213, 2002.
  47. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ganesh Sistu (44 papers)
  2. Isabelle Leang (6 papers)
  3. Sumanth Chennupati (10 papers)
  4. Senthil Yogamani (81 papers)
  5. Ciaran Hughes (22 papers)
  6. Stefan Milz (23 papers)
  7. Samir Rawashdeh (5 papers)
Citations (26)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets