Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reusable Architecture Growth for Continual Stereo Matching (2404.00360v1)

Published 30 Mar 2024 in cs.CV

Abstract: The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, “3d object proposals using stereo imagery for accurate object class detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, pp. 1259–1272, 2017.
  2. A. Saxena, S. H. Chung, and A. Y. Ng, “3-d depth reconstruction from a single still image,” Int. J. Comput. Vis., vol. 76, no. 1, pp. 53–69, 2008.
  3. R. Gomez-Ojeda, F.-A. Moreno, D. Zuniga-Noël, D. Scaramuzza, and J. Gonzalez-Jimenez, “Pl-slam: A stereo slam system through the combination of points and line segments,” IEEE Trans. on Robot., vol. 35, no. 3, pp. 734–746, 2019.
  4. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1106–1114.
  5. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.
  6. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
  7. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2016.
  8. K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015.
  9. E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, 2016.
  10. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2017.
  11. A. Kendall, H. Martirosyan, S. Dasgupta, and P. Henry, “End-to-end learning of geometry and context for deep stereo regression,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 66–75.
  12. J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5410–5418.
  13. H. Xu and J. Zhang, “Aanet: Adaptive aggregation network for efficient stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 1959–1968.
  14. X. Cheng, P. Wang, and R. Yang, “Learning depth with convolutional spatial propagation network,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 10, pp. 2361–2379, 2019.
  15. F. Zhang, V. Prisacariu, R. Yang, and P. H. Torr, “Ga-net: Guided aggregation net for end-to-end stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 185–194.
  16. X. Cheng, Y. Zhong, M. Harandi, Y. Dai, X. Chang, T. Drummond, H. Li, and Z. Ge, “Hierarchical neural architecture search for deep stereo matching,” in Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 22 158–22 169.
  17. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 3354–3361.
  18. M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3061–3070.
  19. A. Tonioni, M. Poggi, S. Mattoccia, and L. D. Stefano, “Unsupervised adaptation for deep stereo,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1614–1622.
  20. N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4040–4048.
  21. J. Pang, W. Sun, C. Yang, J. Ren, R. Xiao, J. Zeng, and L. Lin, “Zoom and learn: Generalizing deep stereo matching to novel domains,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2070–2079.
  22. R. Liu, C. Yang, W. Sun, X. Wang, and H. Li, “Stereogan: Bridging synthetic-to-real domain gap by joint optimization of domain translation and stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 12 757–12 766.
  23. X. Song, G. Yang, X. Zhu, H. Zhou, Z. Wang, and J. Shi, “Adastereo: a simple and efficient approach for adaptive stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 328–10 337.
  24. A. Tonioni, F. Tosi, M. Poggi, S. Mattoccia, and L. D. Stefano, “Real-time self-adaptive deep stereo,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 195–204.
  25. M. Poggi, A. Tonioni, F. Tosi, S. Mattoccia, and L. Di Stefano, “Continual adaptation for deep stereo,” IEEE Trans. Pattern Anal. Mach. Intell., 2021.
  26. J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in Proc. Int. Conf. 3D Vis., 2017, pp. 11–20.
  27. C. Zhang, K. Tian, B. Fan, G. Meng, Z. Zhang, and C. Pan, “Continual stereo matching of continuous driving scenes with growing architecture,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 18 901–18 910.
  28. J. Zbontar and Y. LeCun, “Computing the stereo matching cost with a convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1592–1599.
  29. H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 328–341, 2007.
  30. J. Pang, W. Sun, J. Ren, C. Yang, and Q. Yan, “Cascade residual learning: A two-stage convolutional neural network for stereo matching,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 878–886.
  31. Z. Liang, Y. Feng, Y. Guo, H. Liu, W. Chen, L. Qiao, L. Zhou, and J. Zhang, “Learning for disparity estimation through feature constancy,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 2811–2820.
  32. G. Yang, H. Zhao, J. Shi, Z. Deng, and J. Jia, “Segstereo: Exploiting semantic information for disparity estimation,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 660–676.
  33. X. Song, X. Zhao, L. Fang, H. Hu, and Y. Yu, “Edgestereo: An effective multi-task learning network for stereo matching and edge detection,” Int. J. Comput. Vis., vol. 128, no. 4, pp. 910–930, 2020.
  34. X. Guo, K. Yang, W. Yang, X. Wang, and H. Li, “Group-wise correlation stereo network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3273–3282.
  35. X. Gu, Z. Fan, S. Zhu, Z. Dai, and P. Tan, “Cascade cost volume for high-resolution multi-view stereo and stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 2495–2504.
  36. Z. Shen, Y. Dai, and Z. Rao, “Cfnet: Cascade and fused cost volume for robust stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 906–13 915.
  37. S. Khamis, S. Fanello, C. Rhemann, A. Kowdle, J. Valentin, and S. Izadi, “Stereonet: Guided hierarchical refinement for real-time edge-aware depth prediction,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 573–590.
  38. S. Duggal, S. Wang, W.-C. Ma, R. Hu, and R. Urtasun, “Deeppruner: Learning efficient stereo matching via differentiable patchmatch,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 4384–4393.
  39. V. Tankovich, C. Hane, Y. Zhang, A. Kowdle, S. Fanello, and S. Bouaziz, “Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 14 362–14 372.
  40. Z. Li, X. Liu, N. Drenkow, A. Ding, F. X. Creighton, R. H. Taylor, and M. Unberath, “Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE Int. Conf. Comput. Vis., 2021, pp. 6197–6206.
  41. C. Zhou, H. Zhang, X. Shen, and J. Jia, “Unsupervised learning of stereo matching,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1567–1575.
  42. C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 270–279.
  43. Y. Zhong, H. Li, and Y. Dai, “Open-world stereo video matching with deep rnn,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 101–116.
  44. A. Tonioni, O. Rahnama, T. Joy, L. D. Stefano, T. Ajanthan, and P. H. S. Torr, “Learning to adapt for stereo.” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 9661–9670.
  45. B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8697–8710.
  46. H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient neural architecture search via parameters sharing,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 4095–4104.
  47. E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in AAAI, Proc. 33th AAAI Conf., pp. 4780–4789.
  48. H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” in Proc. Int. Conf. Learn. Represent., 2019.
  49. X. Zhang, Z. Huang, N. Wang, S. Xiang, and C. Pan, “You only search once: Single shot neural architecture search via direct sparse optimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 9, pp. 2891–2904, 2020.
  50. C. Liu, L.-C. Chen, F. Schroff, H. Adam, W. Hua, A. L. Yuille, and L. Fei-Fei, “Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 82–92.
  51. P. Sun, J. Wu, S. Li, P. Lin, J. Huang, and X. Li, “Real-time semantic segmentation via auto depth, downsampling joint decision and feature aggregation,” Int. J. Comput. Vis., vol. 129, no. 5, pp. 1506–1525, 2021.
  52. Y. Chen, T. Yang, X. Zhang, G. Meng, X. Xiao, and J. Sun, “Detnas: Backbone search for object detection,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 6642–6652.
  53. N. Wang, Y. Gao, H. Chen, P. Wang, Z. Tian, C. Shen, and Y. Zhang, “Nas-fcos: Efficient search for object detection architectures,” Int. J. Comput. Vis., vol. 129, no. 12, pp. 3299–3312, 2021.
  54. T. Saikia, Y. Marrakchi, A. Zela, F. Hutter, and T. Brox, “Autodispnet: Improving disparity estimation with automl,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1812–1823.
  55. M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3366–3385, 2021.
  56. S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2001–2010.
  57. H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 2990–2999.
  58. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proc. Nati. Acad. Sci., vol. 114, no. 13, pp. 3521–3526, 2017.
  59. Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 12, pp. 2935–2947, 2017.
  60. D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6467–6476.
  61. A. Mallya and S. Lazebnik, “Packnet: Adding multiple tasks to a single network by iterative pruning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7765–7773.
  62. C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra, “Pathnet: Evolution channels gradient descent in super neural networks,” arXiv preprint arXiv:1701.08734, 2017.
  63. J. Serra, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catastrophic forgetting with hard attention to the task,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 4548–4557.
  64. A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
  65. R. Aljundi, P. Chakravarty, and T. Tuytelaars, “Expert gate: Lifelong learning with a network of experts,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3366–3375.
  66. X. Li, Y. Zhou, T. Wu, R. Socher, and C. Xiong, “Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 3925–3934.
  67. W. Wang, Y. Hu, and Y. Zhang, “Lifelong learning with searchable extension units,” arXiv preprint arXiv:2003.08559, 2020.
  68. G. Yang, X. Song, C. Huang, Z. Deng, J. Shi, and B. Zhou, “Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 899–908.
  69. X. Zheng, R. Ji, L. Tang, B. Zhang, J. Liu, and Q. Tian, “Multinomial distribution learning for effective neural architecture search,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 1304–1313.
  70. Y. Cabon, N. Murray, and M. Humenberger, “Virtual kitti 2,” arXiv preprint arXiv:2001.10773, 2020.
  71. A. Gaidon, Q. Wang, Y. Cabon, and E. Vig, “Virtual worlds as proxy for multi-object tracking analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4340–4349.
  72. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  73. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1231–1237, 2013.
  74. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3213–3223.
  75. N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” in Proc. Int. Conf. Learn. Represent., 2017.
  76. D. Madaan, J. Yoon, Y. Li, Y. Liu, and S. J. Hwang, “Representational continuity for unsupervised continual learning,” in Proc. Int. Conf. Learn. Represent., 2022.
  77. W. Yuan, X. Gu, Z. Dai, S. Zhu, and P. Tan, “Newcrfs: Neural window fully-connected crfs for monocular depth estimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022, pp. 3916–3925.
  78. Y. Liu, L. Wang, and M. Liu, “Yolostereo3d: A step back to 2d for efficient stereo 3d detection,” in Proc. IEEE Conf. Robot. Auto., 2021, pp. 13 018–13 024.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com