Losses over Labels: Weakly Supervised Learning via Direct Loss Construction (2212.06921v2)
Abstract: Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the data. These weak labels are combined (typically via a graphical model) to form pseudolabels, which are then used to train a downstream model. In this work, we question a foundational premise of the typical weakly supervised learning pipeline: given that the heuristic provides all ``label" information, why do we need to generate pseudolabels at all? Instead, we propose to directly transform the heuristics themselves into corresponding loss functions that penalize differences between our model and the heuristic. By constructing losses directly from the heuristics, we can incorporate more information than is used in the standard weakly supervised pipeline, such as how the heuristics make their decisions, which explicitly informs feature selection during training. We call our method Losses over Labels (LoL) as it creates losses directly from heuristics without going through the intermediate step of a label. We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks and further demonstrate that incorporating gradient information leads to better performance on almost every task.
- Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
- Stochastic generalized adversarial label learning. arXiv preprint arXiv:1906.00512.
- Constrained Labeling for Weakly Supervised Learning. ArXiv, abs/2009.07360.
- Data Consistency for Weakly Supervised Learning. ArXiv, abs/2202.03987.
- Multi-Task Feature Learning. In Schölkopf, B.; Platt, J.; and Hoffman, T., eds., Advances in Neural Information Processing Systems, volume 19. MIT Press.
- Learning from Rules Generalizing Labeled Exemplars. In International Conference on Learning Representations.
- Learning the Structure of Generative Models without Labeled Data. Proceedings of machine learning research, 70: 273–82.
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. Proceedings of the 2019 International Conference on Management of Data.
- Optimally Combining Classifiers Using Unlabeled Data. In Grünwald, P.; Hazan, E.; and Kale, S., eds., Proceedings of The 28th Conference on Learning Theory, volume 40 of Proceedings of Machine Learning Research, 211–225. Paris, France: PMLR.
- Scalable Semi-Supervised Aggregation of Classifiers. In Neural Information Processing Systems (NeurIPS).
- Exploiting Task Relatedness for Multiple Task Learning. In Schölkopf, B.; and Warmuth, M. K., eds., Learning Theory and Kernel Machines, 567–580. Berlin, Heidelberg: Springer Berlin Heidelberg.
- Active WeaSuL: Improving Weak Supervision with Active Learning. arXiv:2104.14847.
- Breiman, L. 1996. Bagging Predictors. Machine Learning, 24(2): 123–140.
- Caruana, R. 1993. Multitask Learning: A Knowledge-Based Source of Inductive Bias. In ICML.
- Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning. ArXiv, abs/2001.06001.
- Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation. In AISTATS.
- XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Certified Adversarial Robustness via Randomized Smoothing. In ICML.
- Aggregating Crowdsourced Binary Ratings. In Proceedings of the 22nd International Conference on World Wide Web, WWW ’13, 285–294. New York, NY, USA: Association for Computing Machinery.
- Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1): 20–28.
- Freund, Y. 1995. Boosting a weak learning algorithm by majority. Information and Computation, 121(2): 256–285.
- Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020).
- Minimax Optimal Convergence Rates for Estimating Ground Truth from Crowdsourced Labels. CoRR, abs/1207.0016.
- Improving Label Noise Filtering by Exploiting Unlabeled Data. IEEE Access, 6: 11154–11165.
- A survey of label-noise representation learning: Past, present and future. arXiv preprint arXiv:2011.04406.
- Self-training with Weak Supervision. In NAACL 2021.
- Adam: A Method for Stochastic Optimization. arXiv:1412.6980.
- Lee, D.-H. 2013. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks.
- Adversarial Multi Class Learning under Weak Supervision with Performance Guarantees. In International Conference on Machine Learning, 7534–7543. PMLR.
- Semi-supervised aggregation of dependent weak supervision sources with performance guarantees. In International Conference on Artificial Intelligence and Statistics, 3196–3204. PMLR.
- Learning with Noisy Labels. In Burges, C. J. C.; Bottou, L.; Welling, M.; Ghahramani, Z.; and Weinberger, K. Q., eds., Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc.
- Optimal Decision Rules in Uncertain Dichotomous Choice Situations. International Economic Review, 23(2): 289–97.
- Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1944–1952.
- Meta Pseudo Labels. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11552–11563.
- Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment, 11(3): 269–282.
- Training Complex Models with Multi-Task Weak Supervision. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01): 4763–4771.
- Data programming: Creating large training sets, quickly. In Neural Information Processing Systems (NeurIPS).
- Sluice networks: Learning what to share between loosely related tasks. arXiv preprint arXiv:1705.08142.
- End-to-End Weak Supervision. Advances in Neural Information Processing Systems, 34.
- Weakly Supervised Sequence Tagging from Noisy Rules. In AAAI.
- Schapire, R. E. 1990. The strength of weak learnability. Machine Learning, 5(2): 197–227.
- Universalizing Weak Supervision. In International Conference on Learning Representations.
- A principled approach for learning task similarity in multitask learning. International Joint Conferences on AI.
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56): 1929–1958.
- Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).
- DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples. In Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems.
- Trace Norm Regularised Deep Multi-Task Learning. ArXiv, abs/1606.04038.
- Learning from Multiple Noisy Partial Labelers. In Artificial Intelligence and Statistics (AISTATS).
- Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1063–1077.
- A survey on programmatic weak supervision. arXiv preprint arXiv:2202.05433.
- Creating Training Sets via Weak Indirect Supervision. In International Conference on Learning Representations.
- Learning from crowdsourced labeled data: a survey. Artificial Intelligence Review, 46.
- WRENCH: A Comprehensive Benchmark for Weak Supervision. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.