SLVideo: A Sign Language Video Moment Retrieval Framework (2407.15668v2)
Abstract: SLVideo is a video moment retrieval system for Sign Language videos that incorporates facial expressions, addressing this gap in existing technology. The system extracts embedding representations for the hand and face signs from video frames to capture the signs in their entirety, enabling users to search for a specific sign language video segment with text queries. A collection of eight hours of annotated Portuguese Sign Language videos is used as the dataset, and a CLIP model is used to generate the embeddings. The initial results are promising in a zero-shot setting. In addition, SLVideo incorporates a thesaurus that enables users to search for similar signs to those retrieved, using the video segment embeddings, and also supports the edition and creation of video sign language annotations. Project web page: https://novasearch.github.io/SLVideo/
- End-to-End Object Detection with Transformers. CoRR abs/2005.12872 (2020). arXiv:2005.12872 https://arxiv.org/abs/2005.12872
- CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19016–19026.
- CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition. arXiv:2303.00193 [cs.CV]
- Max Planck Institute for Psycholinguistics, The Language Archive. 2023. ELAN (Version 6.7). https://archive.mpi.nl/tla/elan Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive.
- Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
- CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages. In Workshop on Multi-lingual Representation Learning (MRL), Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Emely Silva and Paula Costa. 2017. Recognition of Non-Manual Expressions in Brazilian Sign Language. 12th IEEE International Conference on Automatic Face and Gesture Recognition.
- Gonçalo Vinagre Martins (1 paper)
- Afonso Quinaz (1 paper)
- Carla Viegas (5 papers)
- Sofia Cavaco (3 papers)
- João Magalhães (35 papers)