Investigating Semi-Supervised Learning Algorithms in Text Datasets (2401.01843v2)
Abstract: Using large training datasets enhances the generalization capabilities of neural networks. Semi-supervised learning (SSL) is useful when there are few labeled data and a lot of unlabeled data. SSL methods that use data augmentation are most successful for image datasets. In contrast, texts do not have consistent augmentation methods as images. Consequently, methods that use augmentation are not as effective in text data as they are in image data. In this study, we compared SSL algorithms that do not require augmentation; these are self-training, co-training, tri-training, and tri-training with disagreement. In the experiments, we used 4 different text datasets for different tasks. We examined the algorithms from a variety of perspectives by asking experiment questions and suggested several improvements. Among the algorithms, tri-training with disagreement showed the closest performance to the Oracle; however, performance gap shows that new semi-supervised algorithms or improvements in existing methods are needed.
- A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
- J. T. Springenberg, “Unsupervised and semi-supervised learning with categorical generative adversarial networks,” arXiv preprint arXiv:1511.06390, 2015.
- A. Odena, “Semi-supervised learning with generative adversarial networks,” arXiv preprint arXiv:1606.01583, 2016.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- D. Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” in 33rd annual meeting of the association for computational linguistics, 1995, pp. 189–196.
- D. McClosky, E. Charniak, and M. Johnson, “Effective self-training for parsing,” in Proceedings of the main conference on human language technology conference of the North American Chapter of the Association of Computational Linguistics. Citeseer, 2006, pp. 152–159.
- R. Mihalcea, “Co-training and self-training for word sense disambiguation,” in Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, 2004, pp. 33–40.
- K. Nigam and R. Ghani, “Analyzing the effectiveness and applicability of co-training,” in Proceedings of the ninth international conference on Information and knowledge management, 2000, pp. 86–93.
- Z.-H. Zhou and M. Li, “Tri-training: Exploiting unlabeled data using three classifiers,” IEEE Transactions on knowledge and Data Engineering, vol. 17, no. 11, pp. 1529–1541, 2005.
- M. Sajjadi, M. Javanmardi, and T. Tasdizen, “Regularization with stochastic transformations and perturbations for deep semi-supervised learning,” Advances in neural information processing systems, vol. 29, 2016.
- A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko, “Semi-supervised learning with ladder networks,” Advances in neural information processing systems, vol. 28, 2015.
- M. Pezeshki, L. Fan, P. Brakel, A. Courville, and Y. Bengio, “Deconstructing the ladder network architecture,” in International conference on machine learning. PMLR, 2016, pp. 2368–2376.
- S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” arXiv preprint arXiv:1610.02242, 2016.
- A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” Advances in neural information processing systems, vol. 30, 2017.
- Y. He and D. Zhou, “Self-training from labeled features for sentiment analysis,” Information Processing & Management, vol. 47, no. 4, pp. 606–616, 2011.
- V. Van Asch and W. Daelemans, “Predicting the effectiveness of self-training: Application to sentiment classification,” arXiv preprint arXiv:1601.03288, 2016.
- R. Van der Goot, B. Plank, and M. Nissim, “To normalize, or not to normalize: The impact of normalization on part-of-speech tagging,” arXiv preprint arXiv:1707.05116, 2017.
- S. Kiritchenko and S. Matwin, “Email classification with co-training,” in Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research. Citeseer, 2001, p. 8.
- W. Dong-DongChen and Z.-H. WeiGao, “Tri-net for semi-supervised deep learning,” in Proceedings of twenty-seventh international joint conference on artificial intelligence, 2018, pp. 2014–2020.
- (2021) Kemik veri kümeleri. [Online]. Available: http://www.kemik.yildiz.edu.tr/veri_kumelerimiz.html
- (2022) Turkish movie sentiment analysis dataset — kaggle. [Online]. Available: https://www.kaggle.com/datasets/mustfkeskin/turkish-movie-sentiment-analysis-dataset
- R. Misra and P. Arora, “Sarcasm detection using hybrid neural network,” arXiv preprint arXiv:1908.07414, 2019.
- İ. Mayda, D. Banu, and T. YILDIZ, “Türkçe tweetler üzerinde makine öğrenmesi ile nefret söylemi tespiti,” Avrupa Bilim ve Teknoloji Dergisi, no. 24, pp. 328–334, 2021.
- S. Schweter, “Berturk - bert models for turkish,” Apr. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3770924
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.