Annolid: Annotate, Segment, and Track Anything You Need (2403.18690v1)
Abstract: Annolid is a deep learning-based software package designed for the segmentation, labeling, and tracking of research targets within video files, focusing primarily on animal behavior analysis. Based on state-of-the-art instance segmentation methods, Annolid now harnesses the Cutie video object segmentation model to achieve resilient, markerless tracking of multiple animals from single annotated frames, even in environments in which they may be partially or entirely concealed by environmental features or by one another. Our integration of Segment Anything and Grounding-DINO strategies additionally enables the automatic masking and segmentation of recognizable animals and objects by text command, removing the need for manual annotation. Annolid's comprehensive approach to object segmentation flexibly accommodates a broad spectrum of behavior analysis applications, enabling the classification of diverse behavioral states such as freezing, digging, pup huddling, and social interactions in addition to the tracking of animals and their body parts.
- C. Yang, J. Forest, M. Einhorn, and T. A. Cleland, “Automated behavioral analysis using instance segmentation,” arXiv:2312.07723, 2023.
- S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu et al., “Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection,” arXiv:2303.05499, 2023.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment Anything,” arXiv:2304.02643, 2023.
- L. Ke, M. Ye, M. Danelljan, Y. Liu, Y.-W. Tai, C.-K. Tang, and F. Yu, “Segment Anything in high quality,” arXiv:2306.01567, 2023.
- C. Zhang, D. Han, Y. Qiao, J. U. Kim, S.-H. Bae, S. Lee, and C. S. Hong, “Faster Segment Anything: Towards lightweight SAM for mobile applications,” arXiv:2306.14289, 2023.
- H. K. Cheng, S. W. Oh, B. Price, J.-Y. Lee, and A. Schwing, “Putting the object back into video object segmentation,” arXiv:2310.12982, 2023.
- F. Romero-Ferrero, M. G. Bergomi, R. C. Hinz, F. J. Heras, and G. G. De Polavieja, “IdTracker.ai: tracking all individuals in small or large collectives of unmarked animals,” Nature Methods, vol. 16, no. 2, pp. 179–182, 2019.
- K. Wada, “Labelme: Image polygonal annotation with Python.” [Online]. Available: https://github.com/wkentaro/labelme
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in CVPR, 2017.
- Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2,” 2019. [Online]. Available: https://github.com/facebookresearch/detectron2
- J. Fang, C. Yang, and T. A. Cleland, “Scoring rodent digging behavior with Annolid,” Soc. Neurosci. Abstr. 512.01, 2023.
- C. Zhou, X. Li, C. C. Loy, and B. Dai, “Edgesam: Prompt-in-the-loop distillation for on-device deployment of SAM,” arXiv:2312.06660, 2023.
- W. Wang, “Advanced auto labeling solution with added features,” CVHub, 2023. [Online]. Available: https://github.com/CVHub520/X-AnyLabeling
- G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
- H. K. Cheng, S. W. Oh, B. Price, A. Schwing, and J.-Y. Lee, “Tracking anything with decoupled video segmentation,” in ICCV, 2023.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in ECCV, 2014.
- T. D. Pereira, N. Tabris, A. Matsliah, D. M. Turner, J. Li, S. Ravindranath, E. S. Papadoyannis, E. Normand, D. S. Deutsch, Z. Y. Wang, G. C. McKenzie-Smith, C. C. Mitelut, M. D. Castro, J. D’Uva, M. Kislin, D. H. Sanes, S. D. Kocher, S. S-H, A. L. Falkner, J. W. Shaevitz, and M. Murthy, “SLEAP: A deep learning system for multi-animal pose tracking,” Nature Methods, vol. 19, no. 4, 2022.
- J. Lauer, M. Zhou, S. Ye, W. Menegas, S. Schneider, T. Nath, M. M. Rahman, V. D. Santo, D. Soberanes, G. Feng, V. N. Murthy, G. Lauder, C. Dulac, M. Mathis, and A. Mathis, “Multi-animal pose estimation, identification and tracking with DeepLabCut,” Nature Methods, vol. 19, pp. 496 – 504, 2022.
- A. Pérez-Escudero, J. Vicente-Page, R. C. Hinz, S. Arganda, and G. G. De Polavieja, “idTracker: tracking individuals in a group by automatic identification of unmarked animals,” Nature Methods, vol. 11, no. 7, pp. 743–748, 2014.
- C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis tracking revisited,” in ICCV, 2015.
- S. Tang, M. Andriluka, B. Andres, and B. Schiele, “Multiple people tracking by lifted multicut and person re-identification,” in CVPR, 2017.
- P. Bergmann, T. Meinhardt, and L. Leal-Taixe, “Tracking without bells and whistles,” in ICCV, 2019.
- H. K. Cheng and A. G. Schwing, “XMem: Long-term video object segmentation with an Atkinson-Shiffrin memory model,” in ECCV, 2022.
- A. Athar, J. Luiten, P. Voigtlaender, T. Khurana, A. Dave, B. Leibe, and D. Ramanan, “BURST: A benchmark for unifying object recognition, segmentation and tracking in video,” in WACV, 2023.
- X. Zou, J. Yang, H. Zhang, F. Li, L. Li, J. Gao, and Y. J. Lee, “Segment everything everywhere all at once,” arXiv:2304.06718, 2023.
- F. Li, H. Zhang, P. Sun, X. Zou, S. Liu, J. Yang, C. Li, L. Zhang, and J. Gao, “Semantic-SAM: Segment and recognize anything at any granularity,” arXiv:2307.04767, 2023.
- X. Zhao, W. Ding, Y. An, Y. Du, T. Yu, M. Li, M. Tang, and J. Wang, “Fast Segment Anything,” 2023.
- Y. Xiong, B. Varadarajan, L. Wu, X. Xiang, F. Xiao, C. Zhu, X. Dai, D. Wang, F. Sun, F. Iandola, R. Krishnamoorthi, and V. Chandra, “EfficientSAM: Leveraged masked image pretraining for efficient Segment Anything,” arXiv:2312.00863, 2023.
- H. Cai, C. Gan, and S. Han, “Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition,” arXiv:2205.14756, 2022.