Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SLVideo: A Sign Language Video Moment Retrieval Framework (2407.15668v2)

Published 22 Jul 2024 in cs.CV and cs.AI

Abstract: SLVideo is a video moment retrieval system for Sign Language videos that incorporates facial expressions, addressing this gap in existing technology. The system extracts embedding representations for the hand and face signs from video frames to capture the signs in their entirety, enabling users to search for a specific sign language video segment with text queries. A collection of eight hours of annotated Portuguese Sign Language videos is used as the dataset, and a CLIP model is used to generate the embeddings. The initial results are promising in a zero-shot setting. In addition, SLVideo incorporates a thesaurus that enables users to search for similar signs to those retrieved, using the video segment embeddings, and also supports the edition and creation of video sign language annotations. Project web page: https://novasearch.github.io/SLVideo/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (7)
  1. End-to-End Object Detection with Transformers. CoRR abs/2005.12872 (2020). arXiv:2005.12872 https://arxiv.org/abs/2005.12872
  2. CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19016–19026.
  3. CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition. arXiv:2303.00193 [cs.CV]
  4. Max Planck Institute for Psycholinguistics, The Language Archive. 2023. ELAN (Version 6.7). https://archive.mpi.nl/tla/elan Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive.
  5. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
  6. CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages. In Workshop on Multi-lingual Representation Learning (MRL), Conference on Empirical Methods in Natural Language Processing (EMNLP).
  7. Emely Silva and Paula Costa. 2017. Recognition of Non-Manual Expressions in Brazilian Sign Language. 12th IEEE International Conference on Automatic Face and Gesture Recognition.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Gonçalo Vinagre Martins (1 paper)
  2. Afonso Quinaz (1 paper)
  3. Carla Viegas (5 papers)
  4. Sofia Cavaco (3 papers)
  5. João Magalhães (35 papers)

Summary

We haven't generated a summary for this paper yet.