Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Incremental Affinity Propagation based on Cluster Consolidation and Stratification (2401.14439v1)

Published 25 Jan 2024 in cs.LG and cs.NE

Abstract: Modern data mining applications require to perform incremental clustering over dynamic datasets by tracing temporal changes over the resulting clusters. In this paper, we propose A-Posteriori affinity Propagation (APP), an incremental extension of Affinity Propagation (AP) based on cluster consolidation and cluster stratification to achieve faithfulness and forgetfulness. APP enforces incremental clustering where i) new arriving objects are dynamically consolidated into previous clusters without the need to re-execute clustering over the entire dataset of objects, and ii) a faithful sequence of clustering results is produced and maintained over time, while allowing to forget obsolete clusters with decremental learning functionalities. Four popular labeled datasets are used to test the performance of APP with respect to benchmark clustering performances obtained by conventional AP and Incremental Affinity Propagation based on Nearest neighbor Assignment (IAPNA) algorithms. Experimental results show that APP achieves comparable clustering performance while enforcing scalability at the same time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. A Survey of Stream Clustering Algorithms, in: Data Clustering. Chapman and Hall/CRC, pp. 231–258.
  2. An Evolutionary Clustering Analysis of Social Media Content and Global Infection Rates During the COVID-19 Pandemic. Journal of Information & Knowledge Management 20, 2150038. URL: https://doi.org/10.1142/S0219649221500386, doi:https://doi.org/10.1142/S0219649221500386.
  3. Evolutionary Affinity Propagation, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2681–2685. URL: https://ieeexplore.ieee.org/document/7952643, doi:10.1109/ICASSP.2017.7952643.
  4. Evolutionary Clustering via Message Passing. IEEE Transactions on Knowledge and Data Engineering (TKDE) 33, 2452–2466. URL: https://ieeexplore.ieee.org/document/8908802, doi:10.1109/TKDE.2019.2954869.
  5. Online Clustering of Parallel Data Streams. Data & Knowledge Engineering (DKE) 58, 180–204. URL: https://www.sciencedirect.com/science/article/pii/S0169023X05000819, doi:https://doi.org/10.1016/j.datak.2005.05.009.
  6. Semantic Shift Detection in Vatican Publications: a Case Study from Leo XIII to Francis, in: Proceedings of the 30th Italian Symposium on Advanced Database Systems (SEBD), CEUR-WS, Pisa, Italy. pp. 231–243. URL: https://ceur-ws.org/Vol-3194/paper29.pdf.
  7. Evolutionary Clustering, in: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (KDD), Association for Computing Machinery, Philadelphia, PA, USA. p. 554–560. URL: https://doi.org/10.1145/1150402.1150467, doi:10.1145/1150402.1150467.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding, in: Burstein, J., Doran, C., Solorio, T. (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota. pp. 4171–4186. URL: https://aclanthology.org/N19-1423, doi:10.18653/v1/N19-1423.
  9. Clustering by Passing Messages Between Data Points. science 315, 972–976. URL: https://www.science.org/doi/10.1126/science.1136800, doi:doi.org/10.1126/science.1136800.
  10. Tracing Evolving Subspace Clusters in Temporal Climate Data. Data Mining and Knowledge Discovery (DMKD) 24, 387–410. URL: https://link.springer.com/article/10.1007/s10618-011-0237-7, doi:doi.org/10.1007/s10618-011-0237-7.
  11. A Survey of Evolutionary Algorithms for Clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39, 133–155. URL: https://ieeexplore.ieee.org/document/4783080, doi:10.1109/TSMCC.2008.2007252.
  12. Diachronic word embeddings and semantic shifts: a survey, in: Proceedings of the 27th International Conference on Computational Linguistics, Association for Computational Linguistics, Santa Fe, New Mexico, USA. pp. 1384–1397. URL: https://aclanthology.org/C18-1117.
  13. An evaluation of data stream clustering algorithms. Statistical Analysis and Data Mining: The ASA Data Science Journal 11, 167–187. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/sam.11380, doi:https://doi.org/10.1002/sam.11380, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/sam.11380.
  14. Capturing Evolution in Word Usage: Just Add More Clusters?. Association for Computing Machinery, New York, NY, USA. p. 343–349. URL: https://doi.org/10.1145/3366424.3382186, doi:doi.org/10.1145/3366424.3382186.
  15. Discovering Evolutionary Theme Patterns from Text: An Exploration of Temporal Text Mining, in: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Association for Computing Machinery, Chicago, Illinois, USA. p. 198–207. URL: https://doi.org/10.1145/1081870.1081895, doi:10.1145/1081870.1081895.
  16. A Survey on Contextualised Semantic Shift Detection. URL: https://arxiv.org/pdf/2304.01666.pdf, doi:https://doi.org/10.48550/arXiv.2304.01666, arXiv:2304.01666.
  17. UCI Repository of Machine Learning Databases. URL: http://www.ics.uci.edu/~mlearn/MLRepository.html.
  18. EvolveCluster: an evolutionary clustering algorithm for streaming data. Evolving Systems , 1--21URL: https://doi.org/10.1007/s12530-021-09408-y, doi:doi.org/10.1007/s12530-021-09408-y.
  19. Unsupervised Incremental Learning for Long-term Autonomy, in: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 4022--4029. doi:10.1109/ICRA.2012.6224605.
  20. What is Done is Done: an Incremental Approach to Semantic Shift Detection, in: Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change, Association for Computational Linguistics, Dublin, Ireland. pp. 33--43. URL: https://aclanthology.org/2022.lchange-1.4, doi:10.18653/v1/2022.lchange-1.4.
  21. Studying Word Meaning Evolution through Incremental Semantic Shift Detection: A Case Study of Italian Parliamentary Speeches. URL: https://www.techrxiv.org/doi/full/10.36227/techrxiv.24210915.v1, doi:10.36227/techrxiv.24210915.v1.
  22. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection, in: Proceedings of the 14th Workshop on Semantic Evaluation (SemEval), International Committee for Computational Linguistics, Barcelona (online). pp. 1--23. URL: https://aclanthology.org/2020.semeval-1.1, doi:10.18653/v1/2020.semeval-1.1.
  23. An Incremental Affinity Propagation Algorithm and its Applications for Text Clustering, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 2914--2919. doi:10.1109/IJCNN.2009.5178973.
  24. Breast Lesion Segmentation in DCE-MRI using Multi-Objective Clustering with NSGA-II, in: Proceedings of the International Conference on Innovative Trends in Information Technology (ICITIIT), pp. 1--6. URL: https://ieeexplore.ieee.org/document/9744148, doi:10.1109/ICITIIT54346.2022.9744148.
  25. Incremental Affinity Propagation Clustering Based on Message Passing. IEEE Transactions on Knowledge and Data Engineering (TKDE) 26, 2731--2744. doi:10.1109/TKDE.2014.2310215.
  26. Evolution and Affinity-Propagation Based Approach for Data Stream Clustering, in: Proceedings of the International Conference on Frontiers of Educational Technologies (ICFET), p. 97–101. URL: https://dl.acm.org/doi/10.1145/3233347.3233382, doi:10.1145/3233347.3233382.
  27. Survey of Computational Approaches to Lexical Semantic Change Detection. Language Science Press, Berlin. pp. 1--91. doi:10.5281/zenodo.5040302.
  28. SED-Stream: Discriminative Dimension Selection for Evolution-Based Clustering of High Dimensional Data Streams. International Journal of Intelligent Systems Technologies and Applications (IJISTA) 13, 187–201. URL: https://doi.org/10.1504/IJISTA.2014.065174, doi:10.1504/IJISTA.2014.065174.
  29. Incremental and Decremental Affinity Propagation for Semisupervised Clustering in Multispectral Images. IEEE Transactions on Geoscience and Remote Sensing (TGRS) 51, 1666--1679. doi:10.1109/TGRS.2012.2206818.
  30. Frugal and Online Affinity Propagation, in: Proceedings of the Conférence francophone sur l’Apprentissage (CAP). URL: https://inria.hal.science/inria-00287381/document.
Citations (3)

Summary

  • The paper introduces the APP algorithm, which consolidates past clusters via centroid reduction to efficiently integrate new data points.
  • It employs cluster stratification to balance stability and plasticity, tracking cluster evolution without full recomputation.
  • Performance tests show APP's scalability and accuracy, with applications in semantic shift detection in computational linguistics.

Introduction

Advancements in data mining necessitate methodologies capable of handling dynamically changing datasets. One crucial area where this is evident is in incremental clustering, where objects are introduced over time, and cluster formations need to reflect these temporal changes. Affinity Propagation (AP) has been a cornerstone algorithm for static datasets, but the challenge lies in extending AP to incremental scenarios. This paper posits that conventional extension strategies are inadequate as they fail to efficiently assimilate newly arriving data points into existing cluster structures, resulting in a re-computation burden.

In response, the authors put forward an innovative approach termed A-Posteriori affinity Propagation (APP) which builds upon AP, designed specifically for dynamic datasets and aimed at resolving scalability issues while upholding the quality of clustering through strategies of cluster consolidation and stratification.

Related Work

Incremental clustering has received considerable attention in literature, modifying algorithms like k-means and AP to adapt to data streams and dynamically evolving data. These efforts often focus on lower computational costs by updating clusters with new data points instead of global recomputation. However, incremental algorithms also need to strike a delicate balance between stability, to avoid dramatic changes with each new data point, and plasticity, to integrate new information. The authors review various existing approaches, emphasizing that none adequately support both the accurate tracing of cluster evolution over time and efficient scalability, setting the stage for the introduction of APP.

A-Posteriori Affinity Propagation

APP stands out by dynamically collapsing each t-1 time step cluster into a centroid, using this simplified cluster representation to quickly assimilate new data points at time t. Cluster stratification plays a critical role—new clusters emerge, existing ones are enriched, or old clusters merge based on incoming data. Importantly, APP places emphasis on "group evolution," wherein a novel data point is more likely to augment an existing cluster than pioneer a new one, unless a significant number of similar points warrants such creation.

The authors also address the need to forget obsolete clusters, incorporating decremental learning functionalities—an aspect critical to maintaining high performance and relevancy over time. This aspect of forgetfulness introduces a practical mechanism for discarding outdated information, thereby conserving memory and enhancing scalability.

Performance Evaluation

The efficacy of APP is tested against AP and other benchmarks on several established datasets, demonstrating comparable performance while markedly improving scalability. Results indicate APP's enhanced ability to manage dynamic datasets with frequently introduced data points, producing accurate and coherent cluster histories. This is attributed to APP's innovative consolidation and stratification mechanisms.

An application to semantic shift detection in computational linguistics further underscores APP's relevance, showing promise in uncovering word meaning evolution in diachronic corpora—a pressing need in the age of digital humanities and LLMs.

Conclusion

Through the proposed framework, APP looks to revamp incremental clustering within dynamic datasets. It is a significant stride towards a scalable solution that preserves both the historic integrity of clusters and the flexibility to accrete or prune as dictated by data evolution. The algorithms' potential in semantic shift detection indicates its wider applicability across computational linguistics, opening avenues in natural language processing where data is inherently dynamic and evolving.

X Twitter Logo Streamline Icon: https://streamlinehq.com