Adaptive Differentially Private Structural Entropy Minimization for Unsupervised Social Event Detection (2407.18274v1)
Abstract: Social event detection refers to extracting relevant message clusters from social media data streams to represent specific events in the real world. Social event detection is important in numerous areas, such as opinion analysis, social safety, and decision-making. Most current methods are supervised and require access to large amounts of data. These methods need prior knowledge of the events and carry a high risk of leaking sensitive information in the messages, making them less applicable in open-world settings. Therefore, conducting unsupervised detection while fully utilizing the rich information in the messages and protecting data privacy remains a significant challenge. To this end, we propose a novel social event detection framework, ADP-SEMEvent, an unsupervised social event detection method that prioritizes privacy. Specifically, ADP-SEMEvent is divided into two stages, i.e., the construction stage of the private message graph and the clustering stage of the private message graph. In the first stage, an adaptive differential privacy approach is used to construct a private message graph. In this process, our method can adaptively apply differential privacy based on the events occurring each day in an open environment to maximize the use of the privacy budget. In the second stage, to address the reduction in data utility caused by noise, a novel 2-dimensional structural entropy minimization algorithm based on optimal subgraphs is used to detect events in the message graph. The highlight of this process is unsupervised and does not compromise differential privacy. Extensive experiments on two public datasets demonstrate that ADP-SEMEvent can achieve detection performance comparable to state-of-the-art methods while maintaining reasonable privacy budget parameters.
- Hadi Amiri and Hal Daume III. 2016. Short text representation for detecting churn in microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
- Investigating label suggestions for opinion mining in German Covid-19 social media. (2021).
- Knowledge-preserving incremental social event detection via heterogeneous gnns. In Proceedings of the Web Conference 2021. 3383–3395.
- Hierarchical and incremental structural entropy minimization for unsupervised social event detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8255–8264.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
- Cynthia Dwork. 2006. Differential privacy. In International colloquium on automata, languages, and programming. Springer, 1–12.
- Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25. Springer, 486–503.
- Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing. 371–380.
- Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 265–284.
- Differentially private graph learning via sensitivity-bounded personalized pagerank. Advances in Neural Information Processing Systems 35 (2022), 22617–22627.
- STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream. In 2015 IEEE 31st international conference on data engineering. IEEE, 1561–1572.
- Collaborative social group influence for event recommendation. In Proceedings of the 25th ACM international on conference on information and knowledge management. 1941–1944.
- Neil Zhenqiang Gong and Bin Liu. 2018. Attribute inference attacks in online social networks. ACM Transactions on Privacy and Security (TOPS) 21, 1 (2018), 1–30.
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
- Event detection in online social network: Methodologies, state-of-art, and evolution. Computer Science Review 46 (2022), 100500.
- Zhiyi Huang and Jinyan Liu. 2018. Optimal differentially private algorithms for k-means clustering. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 395–408.
- Differentially private hierarchical clustering with provable approximation guarantees. In International Conference on Machine Learning. PMLR, 14353–14375.
- Discovering natural communities in networks. Physica A: Statistical Mechanics and its Applications 436 (2015), 878–896.
- Angsheng Li and Yicheng Pan. 2016. Structural information and dynamical complexity of networks. IEEE Transactions on Information Theory 62, 6 (2016), 3290–3339.
- Event extraction by associating event types and argument roles. IEEE Transactions on Big Data (2023).
- Type information utilized event detection via multi-channel gnns in electrical power systems. ACM Transactions on the Web 17, 3 (2023), 1–26.
- Locally Differentially Private Graph Embedding. arXiv preprint arXiv:2310.11060 (2023).
- Story forest: Extracting events and telling stories from breaking news. ACM Transactions on Knowledge Discovery from Data (TKDD) 14, 3 (2020), 1–28.
- CPMF: A collective pairwise matrix factorization model for upcoming event recommendation. In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 1532–1539.
- Fang Liu. 2018. Generalized gaussian mechanism for differential privacy. IEEE Transactions on Knowledge and Data Engineering 31, 4 (2018), 747–756.
- Event early embedding: Predicting event volume dynamics at early stage. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 997–1000.
- A french corpus for event detection on twitter. (2020).
- Building a large-scale corpus for evaluating event detection on twitter. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. 409–418.
- Tackling fake news detection by continually improving social context representations using graph neural networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1363–1380.
- SEDTWik: segmentation-based event detection from tweets using Wikipedia. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. 77–85.
- Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. 75–84.
- Fine-grained event categorization with heterogeneous graph convolutional networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 3238–3245.
- Streaming social event detection and evolution discovery in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 15, 5 (2021), 1–33.
- Reinforced, incremental and cross-lingual event detection from social messages. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2022), 980–998.
- Social media for crisis management: clustering approaches for sub-event detection. Multimedia tools and applications 74 (2015), 3901–3932.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3982–3992.
- From known to unknown: quality-aware self-improving graph neural network for open set social event detection. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1696–1705.
- Cross-network social user embedding with hybrid differential privacy guarantees. In Proceedings of the 31st ACM international conference on information & knowledge management. 1685–1695.
- Transferring knowledge distillation for multilingual social event detection. arXiv preprint arXiv:2108.03084 (2021).
- Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018).
- Samir Elloumi Sihem Sahnoun and Sadok Ben Yahia. 2020. Event detection based on open information extraction and ontology. Journal of Information and Telecommunication 4, 3 (2020), 383–403. https://doi.org/10.1080/24751839.2020.1763007
- Tajinder Singh and Madhu Kumari. 2021. Burst: real-time events burst detection in social text stream. The Journal of Supercomputing 77, 10 (2021), 11228–11256.
- Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing 17 (2007), 395–416.
- Zhongqing Wang and Yue Zhang. 2017. A Neural Model for Joint Event Detection and Summarization.. In IJCAI. 4158–4164.
- Heterogeneous graph neural network for privacy-preserving recommendation. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 528–537.
- A probabilistic model for bursty topic discovery in microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
- {{\{{PrivGraph}}\}}: Differentially Private Graph Data Publication by Exploiting Community Information. In 32nd USENIX Security Symposium (USENIX Security 23). 3241–3258.
- On the (in) feasibility of attribute inference attacks on machine learning models. In EuroS&P. IEEE, 232–251.
- Comparing twitter and traditional media using topic models. In Advances in Information Retrieval: 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings 33. Springer, 338–349.
- Zhiwei Yang (43 papers)
- Yuecen Wei (8 papers)
- Haoran Li (166 papers)
- Qian Li (236 papers)
- Lei Jiang (85 papers)
- Li Sun (135 papers)
- Xiaoyan Yu (22 papers)
- Chunming Hu (20 papers)
- Hao Peng (291 papers)