A Universal Unbiased Method for Classification from Aggregate Observations (2306.11343v2)
Abstract: In conventional supervised classification, true labels are required for individual instances. However, it could be prohibitive to collect the true labels for individual instances, due to privacy concerns or unaffordable annotation costs. This motivates the study on classification from aggregate observations (CFAO), where the supervision is provided to groups of instances, instead of individual instances. CFAO is a generalized learning framework that contains various learning problems, such as multiple-instance learning and learning from label proportions. The goal of this paper is to present a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses -- previous research failed to achieve this goal. Practically, our method works by weighing the importance of each label for each instance in the group, which provides purified supervision for the classifier to learn. Theoretically, our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses. Extensive experiments on various problems of CFAO demonstrate the superiority of our proposed method.
- Support vector machines for multiple-instance learning. In NeurIPS, pp. 577–584, 2002.
- Classification from pairwise similarity and unlabeled data. In ICML, pp. 452–461, 2018a.
- Convex formulation of multiple instance learning from positive and unlabeled bags. Neural Networks, 105:132–141, 2018b.
- Pairwise supervision can provably elicit a decision boundary. In AISTATS, 2022.
- A probabilistic framework for semi-supervised clustering. In KDD, pp. 59–68, 2004.
- Integrating constraints and metric learning in semi-supervised clustering. In ICML, pp.  11, 2004.
- Bock, H.-H. Clustering methods: a history of k-means algorithms. Selected Contributions in Data Analysis and Classification, pp. 161–172, 2007.
- Learning from similarity-confidence data. In ICML, pp. 1272–1282, 2021.
- Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition, 77:329–353, 2018.
- Preference learning with gaussian processes. In ICML, pp. 137–144, 2005.
- Deep learning for classical japanese literature. arXiv preprint arXiv:1812.01718, 2018.
- Classification from triplet comparison data. Neural Computation, 32(3):659–681, 2020.
- Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1-2):31–71, 1997.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Deep multi-class learning from label proportions. arXiv preprint arXiv:1905.12909, 2019.
- Provably consistent partial-label learning. In NeurIPS, volume 33, pp. 10948–10960, 2020.
- Pointwise binary classification with pairwise confidence comparisons. In ICML, pp. 3252–3262, 2021.
- Who supported obama in 2012? ecological inference through distribution regression. In KDD, pp. 289–298, 2015.
- Pairwise preference learning and ranking. In ECMLP-KDD, pp. 145–156, 2003.
- Multi-instance kernels. In ICML, pp.  7, 2002.
- Dimensionality reduction by learning an invariant mapping. In CVPR, pp. 1735–1742, 2006.
- Multi-class classification without multi-class labels. In ICLR, 2018.
- Densely connected convolutional networks. In CVPR, pp. 4700–4708, 2017.
- Attention-based deep multiple instance learning. In ICML, pp. 2127–2136. PMLR, 2018.
- Binary classification from positive-confidence data. In NeurIPS, 2018.
- Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260, 2015.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Siamese neural networks for one-shot image recognition. In ICML, 2015.
- Machine learning: a review of classification and combining techniques. Artificial Intelligence Review, 26(3):159–190, 2006.
- Learning multiple layers of features from tiny images. 2009.
- Kuhn, H. W. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1-2):83–97, 1955.
- Semisupervised clustering with metric learning using relative comparisons. IEEE Transactions on Knowledge and Data Engineering, 20(4):496–503, 2008.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- A framework for multiple-instance learning. In NeurIPS, volume 10, 1997.
- Relative distance comparisons with confidence judgements. In ICDM, pp. 459–467. SIAM, 2019.
- Moon, T. The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6):47–60, 1996. doi: 10.1109/79.543975.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Nishii, R. Maximum likelihood principle and model selection when the true model is unspecified. In Multivariate Statistics and Probability, pp. 392–403. Elsevier, 1989.
- Estimating labels from label proportions. In ICML, pp. 776–783, 2008.
- Facenet: A unified embedding for face recognition and clustering. In CVPR, June 2015.
- Schuessler, A. A. Ecological inference. Proceedings of the National Academy of Sciences, 96(19):10578–10581, 1999.
- Learning a distance metric from relative comparisons. In NeurIPS, volume 16, 2003.
- Learning from label proportions: A mutual contamination framework. In NeurIPS, pp. 22256–22267, 2020.
- Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. In NeurIPS, volume 29, 2016.
- Machine learning from weak supervision: An empirical risk minimization approach. MIT Press, 2022.
- Vapnik, V. N. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5):988–999, 1999.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- \\\backslash\proptosvm for learning with label proportions. In ICML, pp. 504–512. PMLR, 2013.
- On learning from label proportions. arXiv preprint arXiv:1402.5902, 2014.
- Em-dd: An improved multiple-instance learning technique. In NeurIPS, volume 14, 2001.
- Learning from aggregate observations. In NeurIPS, volume 33, pp. 7993–8005, 2020.
- Zhou, Z.-H. A brief introduction to weakly supervised learning. National Science Review, 5(1):44–53, 2018.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.