Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets (2403.14534v2)

Published 21 Mar 2024 in cs.CV

Abstract: Sign language recognition (SLR) has recently achieved a breakthrough in performance thanks to deep neural networks trained on large annotated sign datasets. Of the many different sign languages, these annotated datasets are only available for a select few. Since acquiring gloss-level labels on sign language videos is difficult, learning by transferring knowledge from existing annotated sources is useful for recognition in under-resourced sign languages. This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets. We use a temporal graph convolution-based sign language recognition approach to evaluate five supervised transfer learning approaches and experiment with closed-set and partial-set cross-dataset transfer learning. Experiments demonstrate that improvement over finetuning based transfer learning is possible with specialized supervised transfer learning methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Spatial attention-based 3d graph convolutional neural network for sign language recognition. Sensors, 22(12):4558, 2022.
  2. Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues. In European conference on computer vision, pages 35–53, 2020.
  3. BOBSL: BBC-Oxford British Sign Language Dataset. 2021.
  4. S. Aly and W. Aly. Deeparslr: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition. IEEE Access, 8:83199–83212, 2020.
  5. S. G. Azar and H. Seyedarabi. Trajectory-based recognition of dynamic persian sign language using hidden markov model. Computer Speech I& Language, 61:101053, 2020.
  6. Hierarchical domain-adapted feature learning for video saliency prediction. International Journal of Computer Vision, 129(12):3216–3232, 2021.
  7. M. Boháček and M. Hrúz. Sign pose-based transformer for word-level sign language recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 182–191, 2022.
  8. Domain separation networks. Advances in neural information processing systems, 29, 2016.
  9. Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
  10. The devisign large vocabulary of chinese sign language database and baseline evaluations. In Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS). Institute of Computing Technology, 2014.
  11. Domain-specific batch normalization for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 7354–7362, 2019.
  12. Temporal attentive alignment for large-scale video domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6321–6330, 2019.
  13. Decoupling gcn with dropgraph module for skeleton-based action recognition. In European Conference on Computer Vision, pages 536–553. Springer, 2020.
  14. M. Contributors. Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose, 2020.
  15. Towards discriminability and diversity: Batch nuclear-norm maximization under label insufficient situations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  16. Spatial-temporal graph convolutional networks for sign language recognition. In International Conference on Artificial Neural Networks, pages 646–657. Springer, 2019.
  17. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  18. Score-level multi cue fusion for sign language recognition. In European Conference on Computer Vision, pages 294–309. Springer, 2020.
  19. I. Grishchenko and R. Valentin Bazarevsky. Mediapipe holistic—simultaneous face, hand and pose prediction, on device. Retrieved June, 15:2021, 2022.
  20. Attention-based 3d-cnns for large-vocabulary sign language recognition. IEEE Transactions on Circuits and Systems for Video Technology, 29(9):2822–2832, 2018.
  21. Publishing deutsche gebärdensprache (dgs) corpus data: Different formats for different needs. In Proceedings of the Workshop on the Representation and Processing of Sign Languages at LREC, volume 2, 2018.
  22. Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3413–3423, 2021.
  23. Minimum class confusion for versatile domain adaptation. In European Conference on Computer Vision, pages 464–480. Springer, 2020.
  24. H. R. V. Joze and O. Koller. Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053, 2018.
  25. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
  26. Temporal accumulative features for sign language recognition. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 1288–1297. IEEE Computer Society, 2019.
  27. Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent cnn-hmms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4297–4305, 2017.
  28. Hmdb: a large video database for human motion recognition. In 2011 International conference on computer vision, pages 2556–2563. IEEE, 2011.
  29. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In The IEEE Winter Conference on Applications of Computer Vision, pages 1459–1469, 2020.
  30. Transferring cross-domain knowledge for video sign language recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3 2020.
  31. Deep transfer learning with joint adaptation networks. In International conference on machine learning, pages 2208–2217. PMLR, 2017.
  32. Moments in time dataset: one million videos for event understanding. IEEE transactions on pattern analysis and machine intelligence, 42(2):502–508, 2019.
  33. Challenges in development of the american sign language lexicon video dataset (asllvd) corpus. In 5th workshop on the representation and processing of sign languages: interactions between corpus and Lexicon, LREC, Istanbul, Turkey, may 2012.
  34. Bosphorussign22k sign language recognition dataset. arXiv preprint arXiv:2004.01283, 2020.
  35. Spatio-temporal graph convolutional networks for continuous sign language recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8457–8461. IEEE, 2022.
  36. T. Ringwald and R. Stiefelhagen. Adaptiope: A modern benchmark for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 101–110, January 2021.
  37. N. Sarhan and S. Frintrop. Transfer learning for videos: From action recognition to sign language recognition. In 2020 IEEE International Conference on Image Processing (ICIP), pages 1811–1815, 2020.
  38. Openhands: Making sign language recognition accessible with pose-based pretrained models across languages. arXiv preprint arXiv:2110.05877, 2021.
  39. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Transactions on Image Processing, 29:9532–9545, 2020.
  40. Autsl: A large scale multi-modal turkish sign language dataset and baseline methods. IEEE Access, 8:181340–181355, 2020.
  41. Using motion history images with 3d convolutional networks in isolated sign language recognition. arXiv preprint arXiv:2110.12396, 2021.
  42. Isolated sign language recognition with multi-scale features using lstm. In 2019 27th Signal Processing and Communications Applications Conference (SIU), pages 1–4. IEEE, 2019.
  43. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
  44. B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443–450. Springer, 2016.
  45. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
  46. M. Wang and W. Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018.
  47. Multilingualism: The global approach to sign languages. The Sociolinguistics of Sign Languages, pages 8–32, dec 2001.
  48. Aligning correlation information for domain adaptation in action recognition. arXiv preprint arXiv:2107.04932, 2021.
  49. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence, 2018.
  50. Z. Zou and W. Tang. Modulated graph convolutional network for 3d human pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11477–11487, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ahmet Alp Kindiroglu (9 papers)
  2. Ozgur Kara (10 papers)
  3. Ogulcan Ozdemir (4 papers)
  4. Lale Akarun (10 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com