Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets (2403.14534v2)
Abstract: Sign language recognition (SLR) has recently achieved a breakthrough in performance thanks to deep neural networks trained on large annotated sign datasets. Of the many different sign languages, these annotated datasets are only available for a select few. Since acquiring gloss-level labels on sign language videos is difficult, learning by transferring knowledge from existing annotated sources is useful for recognition in under-resourced sign languages. This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets. We use a temporal graph convolution-based sign language recognition approach to evaluate five supervised transfer learning approaches and experiment with closed-set and partial-set cross-dataset transfer learning. Experiments demonstrate that improvement over finetuning based transfer learning is possible with specialized supervised transfer learning methods.
- Spatial attention-based 3d graph convolutional neural network for sign language recognition. Sensors, 22(12):4558, 2022.
- Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues. In European conference on computer vision, pages 35–53, 2020.
- BOBSL: BBC-Oxford British Sign Language Dataset. 2021.
- S. Aly and W. Aly. Deeparslr: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition. IEEE Access, 8:83199–83212, 2020.
- S. G. Azar and H. Seyedarabi. Trajectory-based recognition of dynamic persian sign language using hidden markov model. Computer Speech I& Language, 61:101053, 2020.
- Hierarchical domain-adapted feature learning for video saliency prediction. International Journal of Computer Vision, 129(12):3216–3232, 2021.
- M. Boháček and M. Hrúz. Sign pose-based transformer for word-level sign language recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 182–191, 2022.
- Domain separation networks. Advances in neural information processing systems, 29, 2016.
- Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
- The devisign large vocabulary of chinese sign language database and baseline evaluations. In Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS). Institute of Computing Technology, 2014.
- Domain-specific batch normalization for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 7354–7362, 2019.
- Temporal attentive alignment for large-scale video domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6321–6330, 2019.
- Decoupling gcn with dropgraph module for skeleton-based action recognition. In European Conference on Computer Vision, pages 536–553. Springer, 2020.
- M. Contributors. Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose, 2020.
- Towards discriminability and diversity: Batch nuclear-norm maximization under label insufficient situations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Spatial-temporal graph convolutional networks for sign language recognition. In International Conference on Artificial Neural Networks, pages 646–657. Springer, 2019.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
- Score-level multi cue fusion for sign language recognition. In European Conference on Computer Vision, pages 294–309. Springer, 2020.
- I. Grishchenko and R. Valentin Bazarevsky. Mediapipe holistic—simultaneous face, hand and pose prediction, on device. Retrieved June, 15:2021, 2022.
- Attention-based 3d-cnns for large-vocabulary sign language recognition. IEEE Transactions on Circuits and Systems for Video Technology, 29(9):2822–2832, 2018.
- Publishing deutsche gebärdensprache (dgs) corpus data: Different formats for different needs. In Proceedings of the Workshop on the Representation and Processing of Sign Languages at LREC, volume 2, 2018.
- Skeleton aware multi-modal sign language recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3413–3423, 2021.
- Minimum class confusion for versatile domain adaptation. In European Conference on Computer Vision, pages 464–480. Springer, 2020.
- H. R. V. Joze and O. Koller. Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053, 2018.
- The kinetics human action video dataset. arXiv preprint arXiv:1705.06950, 2017.
- Temporal accumulative features for sign language recognition. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pages 1288–1297. IEEE Computer Society, 2019.
- Re-sign: Re-aligned end-to-end sequence modelling with deep recurrent cnn-hmms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4297–4305, 2017.
- Hmdb: a large video database for human motion recognition. In 2011 International conference on computer vision, pages 2556–2563. IEEE, 2011.
- Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In The IEEE Winter Conference on Applications of Computer Vision, pages 1459–1469, 2020.
- Transferring cross-domain knowledge for video sign language recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3 2020.
- Deep transfer learning with joint adaptation networks. In International conference on machine learning, pages 2208–2217. PMLR, 2017.
- Moments in time dataset: one million videos for event understanding. IEEE transactions on pattern analysis and machine intelligence, 42(2):502–508, 2019.
- Challenges in development of the american sign language lexicon video dataset (asllvd) corpus. In 5th workshop on the representation and processing of sign languages: interactions between corpus and Lexicon, LREC, Istanbul, Turkey, may 2012.
- Bosphorussign22k sign language recognition dataset. arXiv preprint arXiv:2004.01283, 2020.
- Spatio-temporal graph convolutional networks for continuous sign language recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8457–8461. IEEE, 2022.
- T. Ringwald and R. Stiefelhagen. Adaptiope: A modern benchmark for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 101–110, January 2021.
- N. Sarhan and S. Frintrop. Transfer learning for videos: From action recognition to sign language recognition. In 2020 IEEE International Conference on Image Processing (ICIP), pages 1811–1815, 2020.
- Openhands: Making sign language recognition accessible with pose-based pretrained models across languages. arXiv preprint arXiv:2110.05877, 2021.
- Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Transactions on Image Processing, 29:9532–9545, 2020.
- Autsl: A large scale multi-modal turkish sign language dataset and baseline methods. IEEE Access, 8:181340–181355, 2020.
- Using motion history images with 3d convolutional networks in isolated sign language recognition. arXiv preprint arXiv:2110.12396, 2021.
- Isolated sign language recognition with multi-scale features using lstm. In 2019 27th Signal Processing and Communications Applications Conference (SIU), pages 1–4. IEEE, 2019.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443–450. Springer, 2016.
- Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
- M. Wang and W. Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018.
- Multilingualism: The global approach to sign languages. The Sociolinguistics of Sign Languages, pages 8–32, dec 2001.
- Aligning correlation information for domain adaptation in action recognition. arXiv preprint arXiv:2107.04932, 2021.
- Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence, 2018.
- Z. Zou and W. Tang. Modulated graph convolutional network for 3d human pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11477–11487, 2021.
- Ahmet Alp Kindiroglu (9 papers)
- Ozgur Kara (10 papers)
- Ogulcan Ozdemir (4 papers)
- Lale Akarun (10 papers)