2000 character limit reached
Information-Theoretic Active Correlation Clustering (2402.03587v2)
Published 5 Feb 2024 in cs.LG and stat.ML
Abstract: We study correlation clustering where the pairwise similarities are not known in advance. For this purpose, we employ active learning to query pairwise similarities in a cost-efficient way. We propose a number of effective information-theoretic acquisition functions based on entropy and information gain. We extensively investigate the performance of our methods in different settings and demonstrate their superior performance compared to the alternatives.
- Correlation clustering. Machine Learning, 56(1-3):89–113, 2004. doi: 10.1023/B:MACH.0000033116.57574.95.
- Correlation clustering in general weighted graphs. Theor. Comput. Sci., 361(2-3):172–187, 2006. doi: 10.1016/j.tcs.2006.05.008.
- Higher-order correlation clustering for image segmentation. In Advances in Neural Information Processing Systems 24 (NIPS), pages 1530–1538, 2011.
- Overlapping correlation clustering. Knowl. Inf. Syst., 35(1):1–32, 2013a. doi: 10.1007/s10115-012-0522-9.
- Correlation clustering: from theory to practice. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, page 1972. ACM, 2014. doi: 10.1145/2623330.2630808.
- Filtering spam with behavioral blacklisting. In Proceedings of the 14th ACM Conference on Computer and Communications Security, page 342–351, 2007. ISBN 9781595937032. doi: 10.1145/1315245.1315288.
- A survey of signed network mining in social media. ACM Comput. Surv., 49(3), 2016. doi: 10.1145/2956185.
- Chromatic correlation clustering. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, page 1321–1329, 2012. doi: 10.1145/2339530.2339735. URL https://doi.org/10.1145/2339530.2339735.
- Framework for evaluating clustering algorithms in duplicate detection. Proc. VLDB Endow., 2(1):1282–1293, 2009. doi: 10.14778/1687627.1687771.
- Conditional models of identity uncertainty with application to noun coreference. In Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, NIPS, pages 905–912, 2004.
- Entity resolution: Theory, practice & open challenges. Proc. VLDB Endow., 5(12):2018–2019, 2012.
- A non-convex optimization approach to correlation clustering. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI, pages 5159–5166. AAAI Press, 2019. doi: 10.1609/aaai.v33i01.33015159.
- Clustering aggregation. ACM Trans. Knowl. Discov. Data, 1(1):4, 2007. doi: 10.1145/1217299.1217303.
- Learning representations from dendrograms. Mach. Learn., 109(9-10):1779–1802, 2020. doi: 10.1007/s10994-020-05895-3.
- Clustering with qualitative information. J. Comput. Syst. Sci., 71(3):360–383, 2005. doi: 10.1016/j.jcss.2004.10.012.
- Aggregating inconsistent information: Ranking and clustering. J. ACM, 55(5):23:1–23:27, 2008. doi: 10.1145/1411509.1411513.
- Bounding and comparing methods for correlation clustering beyond ILP. In Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing, pages 19–27, Boulder, Colorado, June 2009. Association for Computational Linguistics.
- Correlation clustering with a fixed number of clusters. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pages 1167–1176. ACM Press, 2006.
- Morteza Haghir Chehreghani. Shift of pairwise similarities for data clustering. Mach. Learn., 112(6):2025–2051, 2023. doi: 10.1007/S10994-022-06189-6. URL https://doi.org/10.1007/s10994-022-06189-6.
- Correlation clustering with adaptive similarity queries. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Query-efficient correlation clustering. In Proceedings of The Web Conference 2020, WWW ’20, page 1468–1478, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450370233. doi: 10.1145/3366423.3380220.
- Active Learning in Recommender Systems, pages 809–846. 2015. ISBN 978-1-4899-7637-6. doi: 10.1007/978-1-4899-7637-6_24.
- Active learning for sound event detection. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 28:2895–2905, nov 2020. ISSN 2329-9290. doi: 10.1109/TASLP.2020.3029652.
- Active learning of driving scenario trajectories. Eng. Appl. Artif. Intell., 113:104972, 2022.
- Using active learning to develop machine learning models for reaction yield prediction. Molecular Informatics, 41(12):2200043, 2022. doi: https://doi.org/10.1002/minf.202200043.
- Active learning with logged data. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine Learning Research, pages 5517–5526. PMLR, 2018.
- Supervised clustering. In Advances in Neural Information Processing Systems (NIPS), pages 91–99, 2010.
- Clustering with interactive feedback. In Yoav Freund, László Györfi, György Turán, and Thomas Zeugmann, editors, Algorithmic Learning Theory, pages 316–328, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg. ISBN 978-3-540-87987-9.
- Local algorithms for interactive clustering. Journal of Machine Learning Research, 18(3):1–35, 2017.
- Active semi-supervision for pairwise constrained clustering. In SDM, 2004.
- Active clustering: Robust and efficient hierarchical clustering using adaptively selected similarities. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 260–268, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.
- Efficient active algorithms for hierarchical clustering. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, page 267–274, Madison, WI, USA, 2012. Omnipress. ISBN 9781450312851.
- Local correlation clustering, 2013b.
- Crowdsourced clustering: Querying edges vs triangles. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Approximate correlation clustering using same-cluster queries. In Michael A. Bender, Martín Farach-Colton, and Miguel A. Mosteiro, editors, LATIN 2018: Theoretical Informatics, pages 14–27, Cham, 2018. Springer International Publishing. ISBN 978-3-319-77404-6.
- Clustering with noisy queries. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017a.
- Query complexity of clustering with side information. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017b.
- Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms (ESA 2019), volume 144 of Leibniz International Proceedings in Informatics (LIPIcs), pages 81:1–81:17, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. ISBN 978-3-95977-124-5. doi: 10.4230/LIPIcs.ESA.2019.81.
- Cobras: Interactive clustering with pairwise queries. In International Symposium on Intelligent Data Analysis, 2018a.
- Tackling noise in active semi-supervised clustering, 2021. ISSN 978-3-030-67661-2.
- Correlation clustering with active learning of pairwise similarities. Transactions on Machine Learning Research, 2024.
- Unifying approaches in active learning and active sampling via fisher information and information-theoretic quantities. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=UVDAKQANOW. Expert Certification.
- Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, page 441–448, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. ISBN 1558607781.
- Morteza Haghir Chehreghani. Information-theoretic validation of clustering algorithms. PhD thesis, 2013.
- T. Hofmann and J.M. Buhmann. Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(1):1–14, 1997. doi: 10.1109/34.566806.
- Information theoretic model validation for spectral clustering. In Neil D. Lawrence and Mark Girolami, editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of Proceedings of Machine Learning Research, pages 495–503, La Palma, Canary Islands, 21–23 Apr 2012. PMLR. URL https://proceedings.mlr.press/v22/haghir12.html.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
- Cobra: A fast and simple method for active clustering with pairwise constraints. In International Joint Conference on Artificial Intelligence, 2018b.
- A survey of deep active learning. ACM Comput. Surv., 54(9), oct 2021. ISSN 0360-0300. doi: 10.1145/3472291. URL https://doi.org/10.1145/3472291.
- Stochastic batch acquisition: A simple baseline for deep active learning. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=vcHwQyNBjW. Expert Certification.
- The uci machine learning repository, 2023. URL https://archive.ics.uci.edu.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
- Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
- Contextual string embeddings for sequence labeling. In COLING 2018, 27th International Conference on Computational Linguistics, pages 1638–1649, 2018.