Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos (2404.13953v1)

Published 22 Apr 2024 in cs.CV

Abstract: Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360{\deg} images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidirectional visual object tracking and segmentation tasks. Building upon our previous work on omnidirectional visual object tracking (360VOT), we propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS). The 360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories. To support both the development and evaluation of algorithms in this domain, we divide the dataset into a training subset with 170 sequences and a testing subset with 120 sequences. Furthermore, we tailor evaluation metrics for both omnidirectional tracking and segmentation to ensure rigorous assessment. Through extensive experiments, we benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset. Homepage: https://360vots.hkustvgd.com/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” TPAMI, vol. 37, no. 3, pp. 583–596, 2014.
  2. H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in CVPR, 2016.
  3. M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg, “Eco: Efficient convolution operators for tracking,” in CVPR, 2017, pp. 6638–6646.
  4. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in CVPR, 2019, pp. 4282–4291.
  5. G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte, “Learning discriminative model prediction for tracking,” in ICCV, 2019, pp. 6182–6191.
  6. S. W. Oh, J.-Y. Lee, N. Xu, and S. J. Kim, “Video object segmentation using space-time memory networks,” in ICCV, 2019.
  7. B. Yan, Y. Jiang, P. Sun, D. Wang, Z. Yuan, P. Luo, and H. Lu, “Towards grand unification of object tracking,” in ECCV, vol. 13681, 2022, pp. 733–751.
  8. Z. Yang, Y. Wei, and Y. Yang, “Collaborative video object segmentation by foreground-background integration,” in ECCV, vol. 12350, 2020, pp. 332–348.
  9. Z. Yang and Y. Yang, “Decoupling features in hierarchical propagation for video object segmentation,” in NeurIPS, vol. 35, 2022, pp. 36 324–36 336.
  10. H. K. Cheng and A. G. Schwing, “Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model,” in ECCV, vol. 13688, 2022, pp. 640–658.
  11. Y. Wu, J. Lim, and M.-H. Yang, “Object tracking benchmark,” TPAMI, vol. 37, no. 9, pp. 1834–1848, 2015.
  12. M. Mueller, N. Smith, and B. Ghanem, “A benchmark and simulator for uav tracking,” in ECCV, 2016, pp. 445–461.
  13. M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder, G. Fernandez, G. Nebehay, F. Porikli, and L. Čehovin, “A novel performance evaluation methodology for single-target trackers,” TPAMI, vol. 38, no. 11, pp. 2137–2155, Nov 2016.
  14. H. Fan, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, H. Bai, Y. Xu, C. Liao, and H. Ling, “Lasot: A high-quality benchmark for large-scale single object tracking,” in CVPR, 2019, pp. 5374–5383.
  15. L. Huang, X. Zhao, and K. Huang, “Got-10k: A large high-diversity benchmark for generic object tracking in the wild,” TPAMI, vol. 43, no. 5, pp. 1562–1577, 2019.
  16. P. Ochs, J. Malik, and T. Brox, “Segmentation of moving objects by long term video analysis,” TPAMI, vol. 36, no. 6, pp. 1187 – 1200, Jun 2014, preprint.
  17. L. Hong, W. Chen, Z. Liu, W. Zhang, P. Guo, Z. Chen, and W. Zhang, “Lvos: A benchmark for long-term video object segmentation,” in ICCV, 2023, pp. 13 480–13 492.
  18. J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. Sorkine-Hornung, and L. Van Gool, “The 2017 davis challenge on video object segmentation,” arXiv:1704.00675, 2017.
  19. N. Xu, L. Yang, Y. Fan, D. Yue, Y. Liang, J. Yang, and T. S. Huang, “Youtube-vos: A large-scale video object segmentation benchmark,” CoRR, vol. abs/1809.03327, 2018.
  20. F. Dai, B. Chen, H. Xu, Y. Ma, X. Li, B. Feng, P. Yuan, C. Yan, and Q. Zhao, “Unbiased iou for spherical image object detection,” in AAAI, vol. 36, 2022, pp. 508–515.
  21. H. Xu, Q. Zhao, Y. Ma, X. Li, P. Yuan, B. Feng, C. Yan, and F. Dai, “Pandora: A panoramic detection dataset for object with orientation,” in ECCV, 2022.
  22. H. Huang, Y. Xu, Y. Chen, and S.-K. Yeung, “360vot: A new benchmark dataset for omnidirectional visual object tracking,” in ICCV, 2023.
  23. A. W. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, “Visual tracking: An experimental survey,” TPAMI, vol. 36, no. 7, pp. 1442–1468, 2013.
  24. A. Li, M. Lin, Y. Wu, M.-H. Yang, and S. Yan, “Nus-pro: A new visual tracking challenge,” TPAMI, vol. 38, no. 2, pp. 335–349, 2015.
  25. P. Liang, E. Blasch, and H. Ling, “Encoding color information for visual tracking: Algorithms and benchmark,” IEEE transactions on image processing, vol. 24, no. 12, pp. 5630–5644, 2015.
  26. S. Li and D.-Y. Yeung, “Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models,” in AAAI, vol. 31, 2017.
  27. H. Kiani Galoogahi, A. Fagg, C. Huang, D. Ramanan, and S. Lucey, “Need for speed: A benchmark for higher frame rate object tracking,” in ICCV, 2017, pp. 1125–1134.
  28. D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, W. Zhang, Q. Huang, and Q. Tian, “The unmanned aerial vehicle benchmark: Object detection and tracking,” in ECCV, 2018, pp. 370–386.
  29. J. Valmadre, L. Bertinetto, J. F. Henriques, R. Tao, A. Vedaldi, A. W. Smeulders, P. H. Torr, and E. Gavves, “Long-term tracking in the wild: A benchmark,” in ECCV, 2018, pp. 670–685.
  30. M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, and B. Ghanem, “Trackingnet: A large-scale dataset and benchmark for object tracking in the wild,” in ECCV, 2018, pp. 300–317.
  31. H. Fan, H. A. Miththanthaya, S. R. Rajan, X. Liu, Z. Zou, Y. Lin, H. Ling et al., “Transparent object tracking benchmark,” in ICCV, 2021, pp. 10 734–10 743.
  32. M. Dunnhofer, A. Furnari, G. M. Farinella, and C. Micheloni, “Is first person vision challenging for object tracking?” in ICCV, 2021, pp. 2698–2710.
  33. E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke, “Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video,” in CVPR, 2017, pp. 5296–5305.
  34. Y. Wu, J. Lim, and M.-H. Yang, “Online object tracking: A benchmark,” in CVPR, 2013, pp. 2411–2418.
  35. Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, and W. Hu, “Distractor-aware siamese networks for visual object tracking,” in ECCV, 2018, pp. 101–117.
  36. H. Huang and S.-K. Yeung, “Siamx: An efficient long-term tracker using cross-level feature correlation and adaptive tracking scheme,” in ICRA, 2022.
  37. M. Paul, M. Danelljan, C. Mayer, and L. V. Gool, “Robust visual tracking by segmentation,” in ECCV, vol. 13682, 2022, pp. 571–588.
  38. Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast online object tracking and segmentation: A unifying approach,” in CVPR, 2019, pp. 1328–1338.
  39. P. Zhao, A. You, Y. Zhang, J. Liu, K. Bian, and Y. Tong, “Spherical criteria for fast and accurate 360° object detection,” AAAI, vol. 34, no. 07, p. 12959–12966, 2020.
  40. H. Huang and S.-K. Yeung, “360vo: Visual odometry using a single 360 camera,” in ICRA, 2022, pp. 5594–5600.
  41. F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in CVPR, 2016.
  42. D. Martin, C. Fowlkes, and J. Malik, “Learning to detect natural image boundaries using local brightness, color, and texture cues,” TPAMI, vol. 26, no. 5, pp. 530–549, 2004.
  43. B. Yan, H. Peng, J. Fu, D. Wang, and H. Lu, “Learning spatio-temporal transformer for visual tracking,” in ICCV, 2021, pp. 10 448–10 457.
  44. C. Mayer, M. Danelljan, G. Bhat, M. Paul, D. P. Paudel, F. Yu, and L. Van Gool, “Transforming model prediction for tracking,” in CVPR, 2022, pp. 8731–8740.
  45. Y. Cui, C. Jiang, L. Wang, and G. Wu, “Mixformer: End-to-end tracking with iterative mixed attention,” in CVPR, 2022, pp. 13 608–13 618.
  46. B. Chen, P. Li, L. Bai, L. Qiao, Q. Shen, B. Li, W. Gan, W. Wu, and W. Ouyang, “Backbone is all your need: A simplified architecture for visual object tracking,” arXiv preprint arXiv:2203.05328, 2022.
  47. S. Gao, C. Zhou, C. Ma, X. Wang, and J. Yuan, “Aiatrack: Attention in attention for transformer visual tracking,” in ECCV, 2022, pp. 146–164.
  48. Z. Zhang and H. Peng, “Deeper and wider siamese networks for real-time visual tracking,” in CVPR, June 2019.
  49. Z. Chen, B. Zhong, G. Li, S. Zhang, and R. Ji, “Siamese box adaptive network for visual tracking,” in CVPR, 2020, pp. 6668–6677.
  50. Z. Zhang, Y. Liu, X. Wang, B. Li, and W. Hu, “Learn to match: Automatic matching network design for visual tracking,” in CVPR, 2021, pp. 13 339–13 348.
  51. Z. Zhang, H. Peng, J. Fu, B. Li, and W. Hu, “Ocean: Object-aware anchor-free tracking,” in ECCV, 2020, pp. 771–787.
  52. N. Wang, Y. Song, C. Ma, W. Zhou, W. Liu, and H. Li, “Unsupervised deep tracking,” in CVPR, 2019.
  53. E. Park and A. C. Berg, “Meta-tracker: Fast and robust online adaptation for visual object trackers,” in ECCV, 2018, pp. 569–585.
  54. M. Danelljan, G. Bhat, F. S. Khan, and M. Felsberg, “Atom: Accurate tracking by overlap maximization,” in CVPR, 2019, pp. 4660–4669.
  55. G. Bhat, M. Danelljan, L. Van Gool, and R. Timofte, “Know your surroundings: Exploiting scene information for object tracking,” in ECCV, 2020, pp. 205–221.
  56. M. Danelljan, L. V. Gool, and R. Timofte, “Probabilistic regression for visual tracking,” in CVPR, 2020, pp. 7183–7192.
  57. X. Lu, W. Wang, M. Danelljan, T. Zhou, J. Shen, and L. V. Gool, “Video object segmentation with episodic graph memory networks,” in ECCV, vol. 12348, 2020, pp. 661–679.
  58. H. K. Cheng, Y.-W. Tai, and C.-K. Tang, “Rethinking space-time networks with improved memory coverage for efficient video object segmentation,” in NeurIPS, vol. 34, 2021, pp. 11 781–11 794.
  59. M. Bekuzarov, A. Bermudez, J.-Y. Lee, and H. Li, “Xmem++: Production-level video segmentation from few annotated frames,” in ICCV, October 2023, pp. 635–644.
  60. A. Athar, A. Hermans, J. Luiten, D. Ramanan, and B. Leibe, “Tarvis: A unified approach for target-based video segmentation,” in CVPR, June 2023, pp. 18 738–18 748.
  61. Z. Yang, Y. Wei, and Y. Yang, “Associating objects with transformers for video object segmentation,” in NeurIPS, vol. 34, 2021, pp. 2491–2502.
  62. Y. Liang, X. Li, N. Jafari, and J. Chen, “Video object segmentation with adaptive feature bank and uncertain-region refinement,” in NeurIPS, vol. 33, 2020, pp. 3430–3441.
  63. Z. Yang, Y. Wei, and Y. Yang, “Collaborative video object segmentation by multi-scale foreground-background integration,” TPAMI, vol. 44, no. 9, pp. 4701–4712, 2021.
  64. G. Bhat, F. J. Lawin, M. Danelljan, A. Robinson, M. Felsberg, L. V. Gool, and R. Timofte, “Learning what to learn for video object segmentation,” in ECCV, vol. 12347, 2020, pp. 777–794.
  65. S. Cho, H. Lee, M. Lee, C. Park, S. Jang, M. Kim, and S. Lee, “Tackling background distraction in video object segmentation,” in ECCV, vol. 13682, 2022, pp. 446–462.
  66. Y. Mao, N. Wang, W. Zhou, and H. Li, “Joint inductive and transductive learning for video object segmentation,” in ICCV, 2021, pp. 9670–9679.
  67. H. K. Cheng, Y.-W. Tai, and C.-K. Tang, “Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion,” in CVPR, 2021.
  68. B. Coors, A. P. Condurache, and A. Geiger, “Spherenet: Learning spherical representations for detection and classification in omnidirectional images,” in ECCV, 2018, pp. 518–533.
  69. M. Defferrard, M. Milani, F. Gusset, and N. Perraudin, “Deepsphere: a graph-based spherical cnn,” arXiv preprint arXiv:2012.15000, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yinzhe Xu (2 papers)
  2. Huajian Huang (12 papers)
  3. Yingshu Chen (9 papers)
  4. Sai-Kit Yeung (52 papers)

Summary

We haven't generated a summary for this paper yet.