Pool-Based Active Learning with Proper Topological Regions (2310.01597v1)
Abstract: Machine learning methods usually rely on large sample size to have good performance, while it is difficult to provide labeled set in many applications. Pool-based active learning methods are there to detect, among a set of unlabeled data, the ones that are the most relevant for the training. We propose in this paper a meta-approach for pool-based active learning strategies in the context of multi-class classification tasks based on Proper Topological Regions. PTR, based on topological data analysis (TDA), are relevant regions used to sample cold-start points or within the active learning scheme. The proposed method is illustrated empirically on various benchmark datasets, being competitive to the classical methods from the literature.
- Query learning strategies using boosting and bagging. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML ’98, page 1–9, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
- Active learning with model selection. In AAAI.
- Learning with Partially Labeled and Interdependent Data. Springer, New York, USA.
- Seneca: Change detection in optical imagery using siamese networks with active-transfer learning. Expert Systems with Applications, 214:119123.
- Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235–256.
- Online choice of active learning algorithms. J. Mach. Learn. Res., 5:255–291.
- Algorithms for hyper-parameter optimization. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors, Advances in Neural Information Processing Systems, volume 24, Red Hook, New York, USA. Curran Associates, Inc.
- Mixmatch: A holistic approach to semi-supervised learning. In Wallach, H., Larochelle, H., Beygelzimer, A., d Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 32, Red Hook, New York, USA. Curran Associates, Inc.
- A cluster-based strategy for active learning of rgb-d object detectors. In ICCV Workshops, pages 1215–1220, New York, USA. IEEE.
- The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition, pages 3121–3124.
- Carlsson, G. (2012). The Shape of Data, page 16–44. London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge, United Kingdom.
- Topological approaches to deep learning. In Baas, N. A., Carlsson, G. E., Quick, G., Szymik, M., and Thaule, M., editors, Topological Data Analysis, pages 119–146, Cham. Springer International Publishing.
- Semi-Supervised Learning. MIT Press, Cambridge, Massachusetts, USA.
- Proximity of persistence modules and their diagrams. In Proceedings of the Twenty-Fifth Annual Symposium on Computational Geometry, SCG ’09, page 237–246, New York, NY, USA. Association for Computing Machinery.
- Persistence stability for geometric complexes. Geometriae Dedicata, 173(1):193–214.
- Scalar field analysis over point cloud data. Discrete & Computational Geometry, 46(4):743.
- Persistence-based clustering in riemannian manifolds. J. ACM, 60(6).
- Making your first choice: To address cold start problem in vision active learning. ArXiv, abs/2210.02442.
- Batch active learning at scale. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems.
- modAL: A modular active learning framework for Python. available on arXiv at https://arxiv.org/abs/1805.00979.
- Hierarchical sampling for active learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, page 208–215, New York, NY, USA. Association for Computing Machinery.
- Computational Topology - an Introduction. American Mathematical Society, Boston, USA.
- Spoken letter recognition. In Lippmann, R. P., Moody, J. E., and Touretzky, D. S., editors, Advances in Neural Information Processing Systems 3, pages 220–226. Morgan-Kaufmann, Massachusetts, USA.
- Garnett, R. (2022). Bayesian Optimization. Cambridge University Press, Cambridge, United Kingdoms.
- Results of the active learning challenge. In Guyon, I., Cawley, G., Dror, G., Lemaire, V., and Statnikov, A., editors, Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, volume 16 of Proceedings of Machine Learning Research, pages 19–45, Sardinia, Italy. PMLR.
- Hatcher, A. (2000). Algebraic topology. Cambridge Univ. Press, Cambridge.
- Hausmann, J.-C. (1995). On the Vietoris-Rips complexes and a cohomology theory for metric spaces, pages 175–188. Prospects in topology : proceedings of a conference in honor of William Browder. Princeton University Press, Princeton, N.J. ID: unige:12821.
- Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome. PLOS ONE, 10(6):1–28.
- Ho, T. K. (1995). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE.
- Off to a good start: Using clustering to select the initial training set in active learning. In FLAIRS.
- Topological representations of crystalline compounds for the machine-learning prediction of materials properties. npj Computational Materials, 7(1):28.
- Using cluster-based sampling to select initial training set for active learning in text classification. In PAKDD.
- Finding Groups in Data: An Introduction To Cluster Analysis.
- Clustering-based optimised probabilistic active learning (copal). In Japkowicz, N. and Matwin, S., editors, Discovery Science, pages 101–115, Cham. Springer International Publishing.
- Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks. Scientific Reports, 11(1):8888.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems 30, pages 6402–6413.
- Heterogeneous uncertainty sampling for supervised learning. In Cohen, W. W. and Hirsh, H., editors, Machine Learning Proceedings 1994, pages 148–156. Morgan Kaufmann, San Francisco (CA).
- Finding the homology of decision boundaries with active learning. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 8355–8365. Curran Associates, Inc.
- Lloyd, S. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137.
- Lughofer, E. (2012). Single-pass active learning with conflict and ignorance. Evolving Systems, 3(4):251–271.
- Extracting insights from the shape of complex data using topology. Scientific reports, 3:1236.
- Cluster-based active learning. CoRR, page abs/1812.11780.
- A simple baseline for low-budget active learning. arXiv preprint arXiv:2110.12033.
- Uncovering the topology of time-varying fmri data using cubical persistence. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H., editors, Advances in Neural Information Processing Systems, volume 33, pages 6900–6912, Red Hook, New York, USA. Curran Associates, Inc.
- Pmlb v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058v2.
- Toward optimal active learning through sampling estimation of error reduction. In Brodley, C. E. and Danyluk, A. P., editors, Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, 2001, pages 441–448, Massachusetts, USA. Morgan Kaufmann.
- Settles, B. (2009). Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison.
- Rethinking deep active learning: Using unlabeled data at model training. ICPR.
- Topological methods for the analysis of high dimensional data sets and 3d object recognition. pages 91–100.
- Active learning on large hyperspectral datasets: A preprocessing method. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLIII-B3-2022:435–442.
- Plal: Cluster-based active learning. In Shalev-Shwartz, S. and Steinwart, I., editors, Proceedings of the 26th Annual Conference on Learning Theory, volume 30 of Proceedings of Machine Learning Research, pages 376–397, Princeton, NJ, USA. PMLR.
- Voorhees, E. M. (1985). The Effectiveness & Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval. PhD thesis, Cornell University, USA.
- Wolfe, D. A. (2012). Nonparametrics: Statistical Methods Based on Ranks and Its Impact on the Field of Nonparametric Statistics, pages 1101–1110. Springer US, Boston, MA.
- Active learning from crowds. In Proceedings of the 28th International Conference on International Conference on Machine Learning, page 1161–1168.
- Robust affine invariant descriptors. Mathematical Problems in Engineering.
- Active learning based constrained clustering for speaker diarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(11):2188–2198.
- Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 1137–1144, Manchester, UK.