Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Universalizing Weak Supervision (2112.03865v3)

Published 7 Dec 2021 in cs.LG and cs.AI

Abstract: Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new label type requires manually designing a new synthesis algorithm. Instead, we propose a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees. We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic space. Theoretically, our synthesis approach produces a consistent estimators for learning some challenging but important generalizations of the exponential family model. Experimentally, we validate our framework and show improvement over baselines in diverse settings including real-world learning-to-rank and regression problems along with learning on hyperbolic manifolds.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Imdb movie dataset. https://www.imdb.com/interfaces/.
  2. MSLR-WEB10K. https://www.microsoft.com/en-us/research/project/mslr/.
  3. Tmdb 5k movie dataset version 2. https://www.kaggle.com/tmdb/tmdb-movie-metadata.
  4. BoardGameGeek Reviews Version 2. https://www.kaggle.com/jvanelteren/boardgamegeek-reviews, 2017.
  5. Tensor decompositions for learning latent variable models. Journal of Machine Learning Research, 15:2773–2832, 2014.
  6. Snorkel drybell: A case study in deploying weak supervision at industrial scale. In Proceedings of the 2019 International Conference on Management of Data, pp.  362–375, 2019.
  7. Combining labeled and unlabeled data with co-training. In Proc. of the eleventh annual conference on Computational learning theory, pp.  92–100. ACM, 1998.
  8. J. Bourgain. On lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics, 52(1-2):46–52, 1985.
  9. Optimal learning of mallows block model. In Alina Beygelzimer and Daniel Hsu (eds.), Conference on Learning Theory, COLT 2019, 2019.
  10. When do noisy votes reveal the truth? ACM Trans. Econ. Comput., 4(3), March 2016. ISSN 2167-8375. doi: 10.1145/2892565. URL https://doi.org/10.1145/2892565.
  11. Estimating latent-variable graphical models using moments and likelihoods. In International Conference on Machine Learning, pp.  1872–1880, 2014.
  12. Comparing the value of labeled and unlabeled data in method-of-moments latent variable estimation. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
  13. On reconstructing a hidden permutation. In 17th Int’l Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX’14), 2014.
  14. A computational study of the kemeny rule for preference aggregation. In Proceedings of the 19th national conference on Artifical intelligence (AAAI 04), 2004.
  15. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics, pp.  20–28, 1979.
  16. Neural ranking models with weak supervision. In Proceedings of the 40th International ACM SIGIR Conferenceon Research and Development in Information Retrieval, 2017.
  17. Distance based ranking models. J.R. Statist. Soc. B, 48(3), 1986. URL https://www-jstor-org.ezproxy.library.wisc.edu/stable/2345433?seq=1#metadata_info_tab_contents.
  18. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pp.  1189–1232, 2001.
  19. Fast and three-rious: Speeding up weak supervision with triplet methods. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), 2020.
  20. Improved pattern learning for bootstrapped entity extraction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pp.  98–108, 2014.
  21. Cut out the annotator, keep the cutout: better segmentation with weak supervision. In Proceedings of the International Conference on Learning Representations (ICLR 2021), 2021.
  22. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp.  448–456. PMLR, 2015.
  23. Evaluating the crowd with confidence. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.  686–694, 2013.
  24. Comprehensive and reliable crowd assessment algorithms. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on, pp.  195–206. IEEE, 2015.
  25. Iterative learning for reliable crowdsourcing systems. In Advances in neural information processing systems, pp.  1953–1961, 2011.
  26. J. Kemeny. Mathematics without numbers. Daedalus, 88(4):577–591, 1959.
  27. Maurice Kendall. A new measure of rank correlation. Biometrika, pp.  81–89, 1938.
  28. How to rank with few errors. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing (STOC ’07), 2007.
  29. C. L. Mallows. Non-Null Ranking Models. I. Biometrika, 44, 1957. doi: 10.1093/biomet/44.1-2.114. URL https://doi.org/10.1093/biomet/44.1-2.114.
  30. J. Marden. Analyzing and modeling rank data. Chapman and Hall/CRC, 2014.
  31. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pp.  1003–1011. Association for Computational Linguistics, 2009.
  32. Sumit Mukherjee. Estimation in exponential families on permutations. The Annals of Statistics, 44(2):853–875, 2016.
  33. Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Language Resources and Evaluation Conference, pp.  4034–4043, Marseille, France, May 2020. URL https://aclanthology.org/2020.lrec-1.497.
  34. R. L. Plackett. The analysis of permutations. J. Applied Statistics, 24:193–202, 1975.
  35. Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020. URL https://nlp.stanford.edu/pubs/qi2020stanza.pdf.
  36. Estimation from indirect supervision with linear moments. In International conference on machine learning, pp.  2568–2577, 2016.
  37. Data programming: Creating large training sets, quickly. In Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2016.
  38. Training complex models with multi-task weak supervision. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, 2019.
  39. Snorkel: Rapid training data creation with weak supervision. In Proceedings of the 44th International Conference on Very Large Data Bases (VLDB), Rio de Janeiro, Brazil, 2018.
  40. Overton: A data system for monitoring and improving machine-learned products. In Proceedings of the 10th Annual Conference on Innovative Data Systems Research, 2020.
  41. Weakly supervised sequence tagging from noisy rules. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pp.  5570–5578, Apr. 2020.
  42. Multi-resolution weak supervision for sequential data. In Advances in Neural Information Processing Systems 32, pp.  192–203, 2019.
  43. Joel A. Tropp. An Introduction to Matrix Concentration Inequalities. 2014.
  44. Learning dependency structures for weak supervision models. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), 2019.
  45. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1-2):1–305, 2008.
  46. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning, pp.  1192–1199, 2008.
  47. Hai-Tao Yu. Pt-ranking: A benchmarking platform for neural learning-to-rank. arXiv preprint arXiv:2008.13368, 2020.
Citations (30)

Summary

We haven't generated a summary for this paper yet.