Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels (2208.14362v2)

Published 30 Aug 2022 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. A. Ratner, C. D. Sa, S. Wu, D. Selsam, and C. Ré, “Data programming: Creating large training sets, quickly,” in Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, (Red Hook, NY, USA), p. 3574–3582, Curran Associates Inc., 2016.
  2. A. J. Ratner, S. H. Bach, H. R. Ehrenberg, J. A. Fries, S. Wu, and C. Ré, “Snorkel: Rapid training data creation with weak supervision,” Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, vol. 11 3, pp. 269–282, 2017.
  3. A. J. Ratner, B. Hancock, J. Dunnmon, F. Sala, S. Pandey, and C. Ré, “Training complex models with multi-task weak supervision,” in Proceedings of the AAAI Conference on Artificial Intelligence, (Honolulu, Hawaii), 2019.
  4. D. Y. Fu, M. F. Chen, F. Sala, S. M. Hooper, K. Fatahalian, and C. Ré, “Fast and three-rious: Speeding up weak supervision with triplet methods,” in Proceedings of the 37th International Conference on Machine Learning (ICML 2020), 2020.
  5. J. A. Dunnmon, A. J. Ratner, K. Saab, N. Khandwala, M. Markert, H. Sagreiya, R. Goldman, C. Lee-Messer, M. P. Lungren, D. L. Rubin, and C. Ré, “Cross-modal data programming enables rapid medical machine learning.,” Patterns (N Y), vol. 1, May 2020.
  6. J. A. Fries, P. Varma, V. S. Chen, K. Xiao, H. Tejeda, P. Saha, J. Dunnmon, H. Chubb, S. Maskatia, M. Fiterau, S. Delp, E. Ashley, C. Ré, and J. R. Priest, “Weakly supervised classification of aortic valve malformations using unlabeled cardiac mri sequences,” Nature Communications, vol. 10, no. 1, p. 3111, 2019.
  7. C. Sudlow, J. Gallacher, N. Allen, V. Beral, P. Burton, J. Danesh, P. Downey, P. Elliott, J. Green, M. Landray, B. Liu, P. Matthews, G. Ong, J. Pell, A. Silman, A. Young, T. Sprosen, T. Peakman, and R. Collins, “Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.,” PLoS Med, vol. 12, p. e1001779, Mar 2015.
  8. S. H. Bach, D. Rodriguez, Y. Liu, C. Luo, H. Shao, C. Xia, S. Sen, A. Ratner, B. Hancock, H. Alborzi, et al., “Snorkel drybell: A case study in deploying weak supervision at industrial scale,” in Proceedings of the 2019 International Conference on Management of Data, pp. 362–375, 2019.
  9. C. Ré, F. Niu, P. Gudipati, and C. Srisuwananukorn, “Overton: A data system for monitoring and improving machine-learned products,” in Proceedings of the 10th Annual Conference on Innovative Data Systems Research, 2020.
  10. S. Hooper, M. Wornow, Y. H. Seah, P. Kellman, H. Xue, F. Sala, C. Langlotz, and C. Re, “Cut out the annotator, keep the cutout: better segmentation with weak supervision,” in International Conference on Learning Representations, 2021.
  11. C. Shin, W. Li, H. Vishwakarma, N. C. Roberts, and F. Sala, “Universalizing weak supervision,” in International Conference on Learning Representations (ICLR), 2022.
  12. P. Varma and C. Ré, “Snuba: Automating weak supervision to label training data,” Proc. VLDB Endow., vol. 12, p. 223–236, nov 2018.
  13. B. Boecking, W. Neiswanger, E. Xing, and A. Dubrawski, “Interactive weak supervision: Learning useful heuristics for data labeling,” in International Conference on Learning Representations (ICLR), 2021.
  14. N. Das, S. Chaba, R. Wu, S. Gandhi, D. H. Chau, and X. Chu, “Goggles: Automatic image labeling with affinity coding,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD ’20, (New York, NY, USA), p. 1717–1732, Association for Computing Machinery, 2020.
  15. M. Chen, D. Fu, D. Adila, M. Zhang, F. Sala, K. Fatahalian, and C. Ré, “Shoring up the foundations: Fusing model embeddings and weak supervision,” in Uncertainty in Artificial Intelligence (UAI), 2022.
  16. R. Smith, J. A. Fries, B. Hancock, and S. H. Bach, “Language models in the loop: Incorporating prompting into weak supervision,” 2022.
  17. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning, pp. 8748–8763, PMLR, 2021.
  18. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, 2020.
  19. V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja, M. Dey, M. S. Bari, C. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, N. Nayak, D. Datta, J. Chang, M. T.-J. Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Fevry, J. A. Fries, R. Teehan, S. Biderman, L. Gao, T. Bers, T. Wolf, and A. M. Rush, “Multitask prompted training enables zero-shot task generalization,” in International Conference on Learning Representations (ICLR), 2022.
  20. J. Zhang, Y. Yu, Y. Li, Y. Wang, Y. Yang, M. Yang, and A. Ratner, “WRENCH: A comprehensive benchmark for weak supervision,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
  21. S. Gupta and C. Manning, “Improved pattern learning for bootstrapped entity extraction,” in Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pp. 98–108, 2014.
  22. D. R. Karger, S. Oh, and D. Shah, “Iterative learning for reliable crowdsourcing systems,” in Advances in neural information processing systems, pp. 1953–1961, 2011.
  23. A. P. Dawid and A. M. Skene, “Maximum likelihood estimation of observer error-rates using the EM algorithm,” Applied statistics, pp. 20–28, 1979.
  24. M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision for relation extraction without labeled data,” in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pp. 1003–1011, Association for Computational Linguistics, 2009.
  25. M. A. Hearst, “Automatic acquisition of hyponyms from large text corpora,” in Proceedings of the 14th conference on Computational linguistics-Volume 2, pp. 539–545, Association for Computational Linguistics, 1992.
  26. S. Galhotra, B. Golshan, and W.-C. Tan, “Adaptive rule discovery for labeling text data,” in Proceedings of the 2021 International Conference on Management of Data, SIGMOD ’21, (New York, NY, USA), p. 2217–2225, Association for Computing Machinery, 2021.
  27. R. Zhang, Y. Yu, P. Shetty, L. Song, and C. Zhang, “Prompt-based rule discovery and boosting for interactive weakly-supervised learning,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Dublin, Ireland), pp. 745–758, Association for Computational Linguistics, May 2022.
  28. C.-Y. Hsieh, J. Zhang, and A. Ratner, “Nemo: Guiding and contextualizing weak supervision for interactive data programming,” 2022.
  29. R. Pryzant, Z. Yang, Y. Xu, C. Zhu, and M. Zeng, “Automatic rule induction for efficient semi-supervised learning,” 2022.
  30. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
  31. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proceedings of the 37th International Conference on Machine Learning (H. D. III and A. Singh, eds.), vol. 119 of Proceedings of Machine Learning Research, pp. 1597–1607, PMLR, 13–18 Jul 2020.
  32. J.-B. Grill, F. Strub, F. Altché, C. Tallec, et al., “Bootstrap your own latent a new approach to self-supervised learning,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.
  33. R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, et al., “On the opportunities and risks of foundation models.” https://arxiv.org/abs/2108.07258, 2021.
  34. L. Deng, “The mnist database of handwritten digit images for machine learning research [best of the web],” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
  35. A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research),”
  36. T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical CNNs,” in International Conference on Learning Representations (ICLR), 2018.
  37. T. C. Alberto, J. V. Lochter, and T. A. Almeida, “Tubespam: Comment spam filtering on youtube,” in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 138–143, 2015.
  38. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in Neural Information Processing Systems (C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, eds.), vol. 28, Curran Associates, Inc., 2015.
  39. W. Ren, Y. Li, H. Su, D. Kartchner, C. Mitchell, and C. Zhang, “Denoising multi-source weak supervision for neural text classification,” in Findings of the Association for Computational Linguistics: EMNLP 2020, (Online), pp. 3739–3754, Association for Computational Linguistics, Nov. 2020.
  40. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, (Portland, Oregon, USA), pp. 142–150, Association for Computational Linguistics, June 2011.
  41. G. D. Clifford, C. Liu, B. Moody, L.-w. H. Lehman, I. Silva, Q. Li, A. E. Johnson, and R. G. Mark, “Af classification from a short single lead ecg recording: The physionet/computing in cardiology challenge 2017,” in 2017 Computing in Cardiology (CinC), pp. 1–4, 2017.
  42. Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, “Fourier neural operator for parametric partial differential equations,” in International Conference on Learning Representations (ICLR), 2021.
  43. H. S. Anderson and P. Roth, “EMBER: an open dataset for training static PE malware machine learning models,” CoRR, vol. abs/1804.04637, 2018.
  44. F. Wang and C. Zhang, “Label propagation through linear neighborhoods,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 1, pp. 55–67, 2008.
  45. E. D. Dolan and J. J. Moré, “Benchmarking optimization software with performance profiles,” Mathematical Programming, vol. 91, no. 2, pp. 201–213, 2002.
  46. I. Papadimitriou and D. Jurafsky, “Learning Music Helps You Read: Using transfer to study linguistic structure in language models,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), (Online), pp. 6829–6839, Association for Computational Linguistics, Nov. 2020.
  47. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
Citations (6)

Summary

We haven't generated a summary for this paper yet.