Into the crossfire: evaluating the use of a language model to crowdsource gun violence reports (2401.12989v1)
Abstract: Gun violence is a pressing and growing human rights issue that affects nearly every dimension of the social fabric, from healthcare and education to psychology and the economy. Reliable data on firearm events is paramount to developing more effective public policy and emergency responses. However, the lack of comprehensive databases and the risks of in-person surveys prevent human rights organizations from collecting needed data in most countries. Here, we partner with a Brazilian human rights organization to conduct a systematic evaluation of LLMs to assist with monitoring real-world firearm events from social media data. We propose a fine-tuned BERT-based model trained on Twitter (now X) texts to distinguish gun violence reports from ordinary Portuguese texts. Our model achieves a high AUC score of 0.97. We then incorporate our model into a web application and test it in a live intervention. We study and interview Brazilian analysts who continuously fact-check social media texts to identify new gun violence events. Qualitative assessments show that our solution helped all analysts use their time more efficiently and expanded their search capacities. Quantitative assessments show that the use of our model was associated with more analysts' interactions with online users reporting gun violence. Taken together, our findings suggest that modern Natural Language Processing techniques can help support the work of human rights organizations.
- An NLP-Powered Human Rights Monitoring Platform. Expert Systems with Applications 153 (2020). https://doi.org/10.1016/j.eswa.2020.113365
- Towards a Corpus of Violence Acts in Arabic Social Media. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association, 1627–1631. https://aclanthology.org/L16-1257
- Self-Training: A Survey. https://doi.org/10.48550/arXiv.2202.12040
- Overview of DA-VINCIS at IberLEF 2022: Detection of Aggressive and Violent Incidents from Social Media in Spanish. Procesamiento del Lenguaje Natural 69 (2022). https://doi.org/10.26342/2022-69-18
- NLP in Human Rights Research: Extracting Knowledge Graphs about Police and Army Units and Their Commanders. In Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022. European Language Resources Association, 62–69. https://aclanthology.org/2022.law-1.7
- Ignacio Cano. 2013. Violence and organized crime in brazil: The case of “militias” in rio de janeiro. In Transnational Organized Crime. Transcript Verlag, 179–210. https://www.jstor.org/stable/j.ctv1fxh0d.16
- Ann Marie Clark and Kathryn Sikkink. 2013. Information Effects and Human Rights Data: Is the Good News About Increased Human Rights Information Bad News for Human Rights Measures? Human Rights Quarterly 35, 3 (2013), 539–568. https://www.jstor.org/stable/24518073
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805
- Ragini Gokhale and Maria Fasli. 2017. Deploying a co-training algorithm to classify human-rights abuses. In 2017 International Conference on the Frontiers and Advances in Data Science (FADS). 108–113. https://doi.org/10.1109/FADS.2017.8253206
- Michael Goodhart. 2016. Human Rights: Politics and Practice. Oxford University Press. https://doi.org/10.1093/hepl/9780198708766.001.0001
- Machine Learning Human Rights and Wrongs: How the Successes and Failures of Supervised Learning Algorithms Can Inform the Debate About Information Effects. Political Analysis 27, 2 (2019), 223–230. https://doi.org/10.1017/pan.2018.11
- Daniel Hirata and Maria Isabel Couto. 2022. Mapa Histórico dos Grupos Armados no Rio de Janeiro. https://geni.uff.br/2022/09/13/mapa-historico-dos-grupos-armados-no-rio-de-janeiro/
- Daniel Hirata and Carolina Christoph Grillo. 2019. Roubos, proteção patrimonial e letalidade no Rio de Janeiro. https://geni.uff.br/2021/03/26/roubos-protecao-patrimonial-e-letalidade-no-rio-de-janeiro/
- Chacinas Policiais no Rio de Janeiro: Estatização das mortes, mega chacinas policiais e impunidade. https://geni.uff.br/2023/05/05/chacinas-policiais-no-rio-de-janeiro-estatizacao-das-mortes-mega-chacinas-policiais-e-impunidade/
- ConfliBERT: A Pre-trained Language Model for Political Conflict and Violence. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 5469–5482. https://doi.org/10.18653/v1/2022.naacl-main.400
- Jonathan Kolieb and Marta Poblet. 2018. Responding to Human Rights Abuses in the Digital Era: New Tools, Old Challenges. Stanford Journal of International Law 52, 2 (2018). https://papers.ssrn.com/abstract=3859873
- Computational social science: Obstacles and opportunities. Science 369, 6507 (2020), 1060–1062. https://doi.org/10.1126/science.aaz8170
- Julita Lemgruber. 2022. Tiros no futuro: Impactos da guerra às drogas na rede municipal de educação do Rio de Janeiro. https://cesecseguranca.com.br/textodownload/tiros-no-futuro-impactos-da-guerra-as-drogas-na-rede-municipal-de-educacao-do-rio-de-janeiro/
- Digging into human rights violations: Data modelling and collective memory. In 2013 IEEE International Conference on Big Data. 37–45. https://doi.org/10.1109/BigData.2013.6691668
- Amanda M. Murdie and K. Anne Watson. 2021. Quantitative Human Rights. Oxford Research Encyclopedia of International Studies (2021). https://doi.org/10.1093/acrefore/9780190846626.013.603
- United Nations. 2013. Human Rights Indicators: A Guide to Measurement and Implementation. https://doi.org/10.18356/58576336-en
- Detecting Human Rights Violations on Social Media during Russia-Ukraine War. https://doi.org/10.48550/arXiv.2306.05370
- The Global Burden of Disease 2016 Injury Collaborators. 2018. Global Mortality From Firearms, 1990-2016. JAMA 320, 8 (2018), 792. https://doi.org/10.1001/jama.2018.10060
- Global burden and trends of firearm violence in 204 countries/territories from 1990 to 2019. Frontiers in Public Health 10 (2022). https://doi.org/10.3389/fpubh.2022.966507
- Ellie Pavlick and Chris Callison-Burch. 2016. The gun violence database. In Presented at the Data For Good Exchange 2016. https://doi.org/10.48550/arXiv.1610.01670
- Detecting Violation of Human Rights via Social Media. In Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference. European Language Resources Association, 40–45. https://aclanthology.org/2022.csrnlp-1.6
- Megan Price and Patrick Ball. 2015. The Limits of Observation for Understanding Mass Violence. Canadian Journal of Law and Society / La Revue Canadienne Droit et Société 30, 2 (2015), 237–257. https://doi.org/10.1017/cls.2015.24
- Introducing ACLED: An armed conflict location and event dataset. Journal of peace research 47, 5 (2010), 651–660. https://doi.org/10.1177/0022343310378914
- A New Task and Dataset on Detecting Attacks on Human Rights Defenders. https://doi.org/10.48550/arXiv.2306.17695
- Gretchen B. Rossman and Sharon F. Rallis. 2017. An Introduction to Qualitative Research: Learning in the Field. SAGE Publications. https://doi.org/10.4135/9781071802694
- “No meio do fogo cruzado”: reflexões sobre os impactos da violência armada na Atenção Primária em Saúde no município do Rio de Janeiro. Ciência & Saúde Coletiva 26 (2021), 2109–2118. https://doi.org/10.1590/1413-81232021266.00632021
- BERTimbau: Pretrained BERT Models for Brazilian Portuguese. In Intelligent Systems (Lecture Notes in Computer Science), Ricardo Cerri and Ronaldo C. Prati (Eds.). Springer International Publishing, 403–417. https://doi.org/10.1007/978-3-030-61377-8_28
- Attention is all you need. In Advances in neural information processing systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- The brWaC corpus: a new open resource for Brazilian Portuguese. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). European Language Resources Association. https://aclanthology.org/L18-1686