Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An overview on the evaluated video retrieval tasks at TRECVID 2022 (2306.13118v1)

Published 22 Jun 2023 in cs.AI, cs.CV, and cs.IR

Abstract: The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology. Over the last twenty-one years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2022 planned for the following six tasks: Ad-hoc video search, Video to text captioning, Disaster scene description and indexing, Activity in extended videos, deep video understanding, and movie summarization. In total, 35 teams from various research organizations worldwide signed up to join the evaluation campaign this year. This paper introduces the tasks, datasets used, evaluation frameworks and metrics, as well as a high-level results overview.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. ActEV22 (2022). Actev self-reported leaderboard (srl) challenge draft evaluation plan. https://actev.nist.gov/uassets/Draft_ActEV_SRL_Eval_Plan_Sep09.pdf.
  2. Spice: Semantic propositional image caption evaluation. In ECCV.
  3. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking. In Proceedings of TRECVID 2016. NIST, USA.
  4. Meteor: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, volume 29, pages 65–72.
  5. Findings of the 2020 conference on machine translation (wmt20). In Proceedings of the Fifth Conference on Machine Translation, pages 1–55, Online. Association for Computational Linguistics.
  6. Evaluating multiple object tracking performance: the clear mot metrics. Journal on Image and Video Processing, 2008:1.
  7. Findings of the 2017 conference on machine translation (wmt17). In Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers, pages 169–214, Copenhagen, Denmark. Association for Computational Linguistics.
  8. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pages 67–74. IEEE.
  9. Scrfd: Spatial coherence based rib fracture detection. In Proceedings of the 2018 5th International Conference on Biomedical and Bioinformatics Engineering, pages 105–109.
  10. HLVU: A new challenge to test deep understanding of movies the way humans do. In Proceedings of the 2020 International Conference on Multimedia Retrieval, pages 355–361.
  11. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699.
  12. TRECVID 2020 actev evaluation plan. https://actev.nist.gov/pub/TRECVID_2020_ActEV_EvaluationPlan.pdf.
  13. Evaluation of automatic video captioning using direct assessment. PloS one, 13(9):e0202789.
  14. Can machine translation systems be evaluated by the crowd alone. Natural Language Engineering, FirstView:1–28.
  15. Two decades of speaker recognition evaluation at the national institute of standards and technology. Computer Speech & Language, 60:101032.
  16. UMBC EBIQUITY-CORE: Semantic Textual Similarity Systems. In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, volume 1, pages 44–52.
  17. Fingerprint classification. Pattern recognition, 29(3):389–404.
  18. Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):319–336.
  19. Kitware (2020). MEVA Data Website. https://www.mevadata.org. Accessed: 2020-03-12.
  20. Developing stt and kws systems using limited language resources. In Fifteenth Annual Conference of the International Speech Communication Association.
  21. TRECVID 2019 actev evaluation plan. https://actev.nist.gov/pub/Draft_ActEV_2018_EvaluationPlan.pdf.
  22. Large scale organization and inference of an imagery dataset for public safety. In 2019 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–6.
  23. Proceedings of lrec2022 workshop ”people in language, vision and the mind”(p-vlam2022). In Proceedings of LREC2022 Workshop” People in language, vision and the mind”(P-VLAM2022).
  24. Manly, B. F. J. (1997). Randomization, Bootstrap, and Monte Carlo Methods in Biology. Chapman & Hall, London, UK, 2nd edition.
  25. The DET curve in assessment of detection task performance. In Proceedings, pages 1895–1898.
  26. The third multilingual surface realisation shared task (SR’20): Overview and evaluation results. In Proceedings of the Third Workshop on Multilingual Surface Realisation, pages 1–20, Barcelona, Spain (Online). Association for Computational Linguistics.
  27. A large-scale benchmark dataset for event recognition in surveillance video. In Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pages 3153–3160. IEEE.
  28. TRECVID 2006 Overview. www-nlpir.nist.gov/projects/tvpubs/tv6.papers/tv6overview.pdf.
  29. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
  30. Gensim–python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic, 3(2):2.
  31. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
  32. V3C–a research video collection. In International Conference on Multimedia Modeling, pages 349–360. Springer.
  33. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning, pages 10096–10106. PMLR.
  34. TV22Pubs (2022). http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.22.org.html.
  35. CIDEr: Consensus-based Image Description Evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4566–4575.
  36. Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP), pages 3645–3649. IEEE.
  37. Estimating Average Precision with Incomplete and Imperfect Judgments. In Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management (CIKM), Arlington, VA, USA.
  38. A Simple and Efficient Sampling Method for Estimating AP and NDCG. In SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 603–610, New York, NY, USA. ACM.
  39. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503.
Citations (10)

Summary

We haven't generated a summary for this paper yet.