Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras (2401.02826v1)

Published 5 Jan 2024 in cs.CV and cs.NE

Abstract: Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 \times 260$) is low for practical applications. Actually, only visible cameras are deployed in many practical systems, and the newly designed neuromorphic cameras may have different resolutions. The latest neuromorphic sensors can output high-definition event streams, but it is very difficult to achieve strict alignment between events and frames on both spatial and temporal views. Therefore, how to achieve accurate tracking with unaligned neuromorphic and visible sensors is a valuable but unresearched problem. In this work, we formally propose the task of object tracking using unaligned neuromorphic and visible cameras. We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system, which contains 1,030 high-definition RGB-Event video pairs, 304,974 video frames. In addition, we propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data. Specifically, we extract the template and search regions of RGB and Event data and feed them into a unified ViT backbone for feature embedding. Then, we propose uncertainty perception modules to encode the RGB and Event features, respectively, then, we propose a modality uncertainty fusion module to aggregate the two modalities. These three branches are jointly optimized in the training phase. Extensive experiments demonstrate that our tracker can collaborate the dual modalities for high-performance tracking even without strictly temporal and spatial alignment. The source code, dataset, and pre-trained models will be released at https://github.com/Event-AHU/Cross_Resolution_SOT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Haste: multi-hypothesis asynchronous speeded-up tracking of events. In 31st British Machine Vision Conference, page 744. ETH Zurich, Institute of Robotics and Intelligent Systems, 2020.
  2. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6182–6191, 2019a.
  3. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6182–6191, 2019b.
  4. A 240×\times× 180 130 db 3 μ𝜇\muitalic_μs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10):2333–2341, 2014.
  5. Robust object modeling for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9589–9600, 2023.
  6. Event-driven stereo visual tracking algorithm to solve object occlusion. IEEE transactions on neural networks and learning systems, 29(9):4223–4237, 2017.
  7. High-speed event camera tracking. In Proceedings of The 31st British Machine Vision Virtual Conference, pages 1–12, 2020.
  8. Data uncertainty learning in face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5710–5719, 2020.
  9. Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking. In Proceedings of the 27th ACM International Conference on Multimedia, pages 473–481, 2019.
  10. End-to-end learning of object motion estimation from retinal events for event-based object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10534–10541, 2020a.
  11. Live demonstration: Celex-v: a 1m pixel multi-mode event-based sensor. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1682–1683. IEEE, 2019.
  12. Transformer tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8126–8135, 2021.
  13. Seqtrack: Sequence to sequence learning for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14572–14581, 2023.
  14. Siamese box adaptive network for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6668–6677, 2020b.
  15. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13608–13618, 2022.
  16. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4660–4669, 2019.
  17. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7183–7192, 2020.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  19. 5.10 a 1280×\times× 720 back-illuminated stacked temporal contrast event-based vision sensor with 4.86 μ𝜇\muitalic_μm pixels, 1.066 geps readout, programmable event-rate controller and compressive data-formatting pipeline. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pages 112–114. IEEE, 2020.
  20. Distractor-aware event-based tracking. IEEE Transactions on Image Processing, 32:6129–6141, 2023.
  21. Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  22. Generalized relation modeling for transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18686–18695, 2023.
  23. Eklt: Asynchronous photometric feature tracking using events and frames. International Journal of Computer Vision, 128(3):601–618, 2020.
  24. Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, 2021.
  25. Dvs benchmark datasets for object tracking, action recognition, and object recognition. Frontiers in neuroscience, 10:405, 2016.
  26. Event-guided structured output tracking of fast-moving objects using a celex sensor. IEEE Transactions on Circuits and Systems for Video Technology, 28(9):2413–2417, 2018.
  27. Decoupled weight decay regularization. . In Proceedings of the International Conference on Learning Representations, 2019.
  28. Map: Multimodal uncertainty-aware vision-language pre-training model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23262–23271, 2023.
  29. Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE International Conference on Computer Vision, pages 1125–1134, 2017.
  30. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  31. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV), pages 734–750, 2018.
  32. Uncertainty aware proposal segmentation for unknown object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 241–250, 2022.
  33. Combined frame-and event-based detection and tracking. In 2016 IEEE International Symposium on Circuits and Systems, pages 2511–2514. IEEE, 2016.
  34. Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13444–13454, 2021.
  35. Transforming model prediction for tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8731–8740, 2022a.
  36. Transforming model prediction for tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8731–8740, 2022b.
  37. Event-based moving object detection and tracking. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–9. IEEE, 2018.
  38. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4293–4302, 2016.
  39. A qvga 143 db dynamic range frame-free pwm image sensor with lossless pixel-level video compression and time-domain cds. IEEE Journal of Solid-State Circuits, 46(1):259–275, 2010.
  40. Long-term object tracking with a moving event camera. In 29st British Machine Vision Conference, page 241, 2018.
  41. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 658–666, 2019.
  42. Probabilistic face embeddings. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6902–6911, 2019.
  43. Revisiting color-event based tracking: A unified network, dataset, and metric. arXiv preprint arXiv:2211.11010, 2022.
  44. Time lens: Event-based video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16155–16164, 2021.
  45. Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6578–6588, 2020.
  46. Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10081–10090, 2019.
  47. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1571–1580, 2021a.
  48. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1571–1580, 2021b.
  49. Visevent: Reliable object tracking via collaboration of frame and event flows. IEEE Transactions on Cybernetics, 2023a.
  50. Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline. arXiv preprint arXiv:2309.14611, 2023b.
  51. Autoregressive visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9697–9706, 2023.
  52. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9):1834–1848, 2015.
  53. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10448–10457, 2021.
  54. Prompting for multi-modal tracking. In Proceedings of the 30th ACM International Conference on Multimedia, pages 3492–3500, 2022.
  55. Dashnet: a hybrid artificial and spiking neural network for high-speed object tracking. arXiv preprint arXiv:1909.12942, 2019.
  56. Joint feature learning and relation modeling for tracking: A one-stream framework. In European Conference on Computer Vision, pages 341–357, 2022.
  57. Object tracking by jointly exploiting frame and event domain. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13043–13052, 2021a.
  58. Multi-domain collaborative feature representation for robust visual object tracking. The Visual Computer, pages 1–13, 2021b.
  59. Spiking transformers for event-based single object tracking. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8791–8800, 2022.
  60. Provable dynamic fusion for low-quality multimodal data. arXiv preprint arXiv:2306.02050, 2023.
  61. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9516–9526, 2023a.
  62. Learning graph-embedded key-event back-tracing for object tracking in event clouds. In Neural Information Processing Systems, 2022.
  63. Cross-modal orthogonal high-rank augmentation for rgb-event transformer-trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22045–22055, 2023b.
Citations (3)

Summary

We haven't generated a summary for this paper yet.