Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reinforcement Learning-Guided Semi-Supervised Learning (2405.01760v1)

Published 2 May 2024 in cs.LG and cs.AI

Abstract: In recent years, semi-supervised learning (SSL) has gained significant attention due to its ability to leverage both labeled and unlabeled data to improve model performance, especially when labeled data is scarce. However, most current SSL methods rely on heuristics or predefined rules for generating pseudo-labels and leveraging unlabeled data. They are limited to exploiting loss functions and regularization methods within the standard norm. In this paper, we propose a novel Reinforcement Learning (RL) Guided SSL method, RLGSSL, that formulates SSL as a one-armed bandit problem and deploys an innovative RL loss based on weighted reward to adaptively guide the learning process of the prediction model. RLGSSL incorporates a carefully designed reward function that balances the use of labeled and unlabeled data to enhance generalization performance. A semi-supervised teacher-student framework is further deployed to increase the learning stability. We demonstrate the effectiveness of RLGSSL through extensive experiments on several benchmark datasets and show that our approach achieves consistent superior performance compared to state-of-the-art SSL methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, “Mixmatch: A holistic approach to semi-supervised learning,” Advances in neural information processing systems (NeurIPS), 2019.
  2. J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” Annual Meeting of the Association for Computational Linguistics (ACL), 2018.
  3. A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko, “Semi-supervised learning with ladder networks,” Advances in neural information processing systems (NeurIPS), 2015.
  4. X. Zhu and A. B. Goldberg, “Introduction to semi-supervised learning,” Synthesis lectures on artificial intelligence and machine learning, 2009.
  5. T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: a regularization method for supervised and semi-supervised learning,” IEEE transactions on pattern analysis and machine intelligence, 2018.
  6. S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” in International Conference on Learning Representations (ICLR), 2017.
  7. H. Zhang, Z. Zhang, A. Odena, and H. Lee, “Consistency regularization for generative adversarial networks,” in International Conference on Learning Representations (ICLR), 2020.
  8. A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” Advances in neural information processing systems (NeurIPS), 2017.
  9. B. Athiwaratkun, M. Finzi, P. Izmailov, and A. G. Wilson, “There are many consistent explanations of unlabeled data: Why you should average,” in International Conference on Learning Representations (ICLR), 2019.
  10. Y. Luo, J. Zhu, M. Li, Y. Ren, and B. Zhang, “Smooth neighbors on teacher graphs for semi-supervised learning,” in IEEE conference on computer vision and pattern recognition (CVPR), 2018.
  11. V. Verma, K. Kawaguchi, A. Lamb, J. Kannala, A. Solin, Y. Bengio, and D. Lopez-Paz, “Interpolation consistency training for semi-supervised learning,” Neural Networks, 2022.
  12. D.-H. Lee et al., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Workshop on challenges in representation learning, 2013.
  13. D. Berthelot, N. Carlini, E. D. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel, “Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring,” in International Conference on Learning Representations (ICLR), 2020.
  14. Q. Xie, Z. Dai, E. Hovy, T. Luong, and Q. Le, “Unsupervised data augmentation for consistency training,” Advances in neural information processing systems (NeurIPS), 2020.
  15. K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” Advances in neural information processing systems (NeurIPS), 2020.
  16. W. Shi, Y. Gong, C. Ding, Z. M. Tao, and N. Zheng, “Transductive semi-supervised deep learning using min-max features,” in European Conference on Computer Vision (ECCV), 2018.
  17. A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Label propagation for deep semi-supervised learning,” in IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019.
  18. Y. Oh, D.-J. Kim, and I. S. Kweon, “Distribution-aware semantics-oriented pseudo-label for imbalanced semi-supervised learning,” in IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2022.
  19. Y. Xu, L. Shang, J. Ye, Q. Qian, Y.-F. Li, B. Sun, H. Li, and R. Jin, “Dash: Semi-supervised learning with dynamic thresholding,” in International Conference on Machine Learning (ICML), 2021.
  20. B. Zhang, Y. Wang, W. Hou, H. Wu, J. Wang, M. Okumura, and T. Shinozaki, “Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling,” Advances in Neural Information Processing Systems (NeurIPS), 2021.
  21. H. Pham, Z. Dai, Q. Xie, and Q. V. Le, “Meta pseudo labels,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  22. A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Annual conference on Computational learning theory, 1998.
  23. Z.-H. Zhou and M. Li, “Tri-training: Exploiting unlabeled data using three classifiers,” IEEE Transactions on knowledge and Data Engineering, 2005.
  24. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
  25. B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” International Conference on Learning Representations (ICLR), 2017.
  26. D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe, J. Pineau, A. Courville, and Y. Bengio, “An actor-critic algorithm for sequence prediction,” International Conference on Learning Representations (ICLR), 2017.
  27. M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, “Sequence level training with recurrent neural networks,” International Conference on Learning Representations (ICLR), 2016.
  28. A. Fickinger, H. Hu, B. Amos, S. Russell, and N. Brown, “Scalable online planning via reinforcement learning fine-tuning,” Advances in Neural Information Processing Systems (NeurIPS), 2021.
  29. P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems (NeurIPS), 2017.
  30. D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving, “Fine-tuning language models from human preferences,” arXiv preprint arXiv:1909.08593, 2019.
  31. N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano, “Learning to summarize with human feedback,” Advances in Neural Information Processing Systems (NeurIPS), 2020.
  32. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems (NeurIPS), 2022.
  33. H. Robbins, “Some aspects of the sequential design of experiments,” Bulletin of the American Mathematical Society, 1952.
  34. M. Rothschild, “A two-armed bandit theory of market pricing,” Journal of Economic Theory, 1974.
  35. Cowles Foundation for Research in Economics, Yale University, 2006.
  36. Cambridge University Press, 2020.
  37. M. S. Mortazavi, T. Qin, and N. Yan, “Theta-resonance: A single-step reinforcement learning method for design space exploration,” arXiv preprint arXiv:2211.02052, 2022.
  38. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in International Conference on Learning Representations (ICLR), 2018.
  39. A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” Technical report, 2009.
  40. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” Workshop on Deep Learning and Unsupervised Feature Learning in Advances in Neural Information Processing Systems (NeurIPS), 2011.
  41. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” in International Conference on Learning Representations (ICLR), 2017.
  42. M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” in International conference on machine learning (ICML), 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Marzi Heidari (7 papers)
  2. Hanping Zhang (9 papers)
  3. Yuhong Guo (52 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets