Papers
Topics
Authors
Recent
2000 character limit reached

Active Learning for Identifying Disaster-Related Tweets: A Comparison with Keyword Filtering and Generic Fine-Tuning (2408.09914v1)

Published 19 Aug 2024 in cs.CL and cs.LG

Abstract: Information from social media can provide essential information for emergency response during natural disasters in near real-time. However, it is difficult to identify the disaster-related posts among the large amounts of unstructured data available. Previous methods often use keyword filtering, topic modelling or classification-based techniques to identify such posts. Active Learning (AL) presents a promising sub-field of Machine Learning (ML) that has not been used much in the field of text classification of social media content. This study therefore investigates the potential of AL for identifying disaster-related Tweets. We compare a keyword filtering approach, a RoBERTa model fine-tuned with generic data from CrisisLex, a base RoBERTa model trained with AL and a fine-tuned RoBERTa model trained with AL regarding classification performance. For testing, data from CrisisLex and manually labelled data from the 2021 flood in Germany and the 2023 Chile forest fires were considered. The results show that generic fine-tuning combined with 10 rounds of AL outperformed all other approaches. Consequently, a broadly applicable model for the identification of disaster-related Tweets could be trained with very little labelling effort. The model can be applied to use cases beyond this study and provides a useful tool for further research in social media analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Enhancing multimodal disaster tweet classification using state-of-the-art deep learning networks. Multimedia Tools and Applications, 2022.
  2. What is wrong with topic modeling? And how to fix it using search-based software engineering. Information and Software Technology, 98:74–88, June 2018.
  3. Active Learning Based Federated Learning for Waste and Natural Disaster Image Classification. IEEE Access, 8:208518–208531, 2020.
  4. XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 258–266, Marseille, France, 2022. European Language Resources Association.
  5. Tim Berners-Lee. Web architecture: Filtering and Censorship. https://www.w3.org/DesignIssues/Filtering.html, December 1997.
  6. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Medical Image Analysis, 71:102062, 2021.
  7. Zi Chen and Samsung Lim. Collecting Typhoon Disaster Information from Twitter Based on Query Expansion. ISPRS International Journal of Geo-Information, 7(4):139, 2018.
  8. Gobinda G. Chowdhury. Introduction to Modern Information Retrieval. Facet Publishing, 2010.
  9. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online, 2020. Association for Computational Linguistics.
  10. A global database of historic and real-time flood events based on social media. Scientific Data, 6(311), 2019.
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1:4171–4186, 2019.
  12. Active Learning for BERT: An Empirical Study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7949–7962, Online, November 2020. Association for Computational Linguistics.
  13. Beyond English-Centric Multilingual Machine Translation, 2020.
  14. Here comes the flood, but not failure? Lessons to learn after the heavy rain and pluvial floods in Germany 2021. Water, 13(21):3016, 2021.
  15. Discriminative Active Learning, 2019.
  16. Information Filtering: Overview of Issues, Research and Systems. User Modeling and User-Adapted Interaction, 11(3):203–259, August 2001.
  17. Portability of semantic and spatial-temporal machine learning methods to analyse social media for near-real-time disaster monitoring. Natural Hazards, pages 1–31, 2021.
  18. Spatio-temporal machine learning analysis of social media data and refugee movement statistics. ISPRS International Journal of Geo-Information, 10(8):498, 2021.
  19. Does the spatiotemporal distribution of tweets match the spatiotemporal distribution of flood phenomena? A study about the River Elbe Flood in June 2013. In Starr Roxanne Hiltz, Linda Plotnick, Mark Pfaf, and Patrick C. Shih, editors, 11th Proceedings of the International Conference on Information Systems for Crisis Response and Management, University Park, Pennsylvania, USA, May 18-21, 2014. ISCRAM Association, 2014.
  20. Entropy-based active learning for object recognition. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 1–8, 2008.
  21. Early detection of emergency events from social media: A new text clustering approach. Natural Hazards, 111(1):851–875, 2022.
  22. Multimodal tweet classification in disaster response systems using transformer-based bidirectional attention model. Neural Computing and Applications, 35(2):1607–1627, 2023.
  23. V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics. Doklady, 1965.
  24. A Sequential Algorithm for Training Text Classifiers. In Bruce W. Croft and C. J. van Rijsbergen, editors, SIGIR ’94, pages 3–12, London, 1994. Springer.
  25. MGMP: Multimodal Graph Message Propagation Network for Event Detection. In Björn Þór Jónsson, Cathal Gurrin, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Anita Min-Chun Hu, Binh Huynh Thi Thanh, and Benoit Huet, editors, MultiMedia Modeling, volume 13141, pages 141–153. Springer International Publishing, Cham, 2022.
  26. RoBERTa: A Robustly Optimized BERT Pretraining Approach, 2019.
  27. Practical Obstacles to Deploying Active Learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 21–30, Hong Kong, China, 2019. Association for Computational Linguistics.
  28. Social media applications and emergency management: A literature review and research agenda. International Journal of Disaster Risk Reduction, 28, 2018.
  29. Active Learning to Recognize Multiple Types of Plankton. Journal of Machine Learning Research, 6(20):589–613, 2005.
  30. A RoBERTa based model for identifying the multi-modal informative tweets during disaster. Multimedia Tools and Applications, 82(24):37615–37633, 2023.
  31. Detecting informative tweets during disaster using Deep Neural Networks. 2019 11th International Conference on Communication Systems & Networks (COMSNETS), pages 709–713, 2019.
  32. Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-Centered AI. Sherlter Island, NY, July 2021.
  33. Human-in-the-loop machine learning: A state of the art. Artificial Intelligence Review, 56(4):3005–3054, 2023.
  34. Reuters Institute Digital News Report 2023. Technical report, 2023.
  35. CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. Proceedings of the International AAAI Conference on Web and Social Media, 8(1):376–385, 2014.
  36. What to Expect When the Unexpected Happens: Social Media Communications Across Crises. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW ’15, pages 994–1009, New York, NY, USA, 2015. Association for Computing Machinery.
  37. Flood-Related Multimedia Benchmark Evaluation: Challenges, Results and a Novel GNN Approach. Sensors, 23(7):3767, 2023.
  38. Fine-Tuning Transformer-Based Representations in Active Learning for Labelling Crisis Dataset of Tweets. SN Computer Science, 4(5):553, July 2023.
  39. A near-real-time global landslide incident reporting tool demonstrator using social media and artificial intelligence. International Journal of Disaster Risk Reduction, 77:103089, 2022.
  40. David Powers. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1):37–63, 2011.
  41. On Identifying Hashtags in Disaster Twitter Data. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01):498–506, 2020.
  42. What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter. Journal of Grid Computing, 17(2):279–312, 2019.
  43. Active Learning for Text Classification and Fake News Detection. In 2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC), pages 87–94, Rome, Italy, 2021. IEEE.
  44. An Analytical Framework for Analyzing Tweets for Disaster Management: Case Study of Turkey Earthquake 2023. In 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), pages 1–7, Delhi, India, 2023. IEEE.
  45. Self-reported COVID-19 symptoms on Twitter: An analysis and a research resource. Journal of the American Medical Informatics Association, 27(8):1310–1315, August 2020.
  46. Small-Text: Active Learning for Text Classification in Python. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 84–95, Dubrovnik, Croatia, 2023. Association for Computational Linguistics.
  47. Active Learning for Convolutional Neural Networks: A Core-Set Approach. 2018.
  48. Burr Settles. Active Learning Literature Survey. Technical Report, University of Wisconsin-Madison Department of Computer Sciences, 2009.
  49. Twitter Streaming Data Analytics for Disaster Alerts. In 2021 2nd International Informatics and Software Engineering Conference (IISEC), pages 1–6, Ankara, Turkey, 2021. IEEE.
  50. Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: A case study of Hurricane Irma. International Journal of Digital Earth, 12(11):1205–1229, 2019.
  51. Automated Disaster Monitoring From Social Media Posts Using AI-Based Location Intelligence and Sentiment Analysis. IEEE Transactions on Computational Social Systems, pages 1–11, 2022.
  52. Attention Is All You Need, 2017.
  53. Large Language Models Are Zero-Shot Text Classifiers, December 2023.
  54. Detecting natural hazard-related disaster impacts with social media analytics: The case of Australian states and territories. Sustainability, 14(2):810, 2022.
  55. Probing language identity encoded in pre-trained multilingual models: A typological view. PeerJ Computer Science, 8:e899, 2022.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.