Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Wikibio: a Semantic Resource for the Intersectional Analysis of Biographical Events (2306.09505v1)

Published 15 Jun 2023 in cs.CL

Abstract: Biographical event detection is a relevant task for the exploration and comparison of the ways in which people's lives are told and represented. In this sense, it may support several applications in digital humanities and in works aimed at exploring bias about minoritized groups. Despite that, there are no corpora and models specifically designed for this task. In this paper we fill this gap by presenting a new corpus annotated for biographical event detection. The corpus, which includes 20 Wikipedia biographies, was compared with five existing corpora to train a model for the biographical event detection task. The model was able to detect all mentions of the target-entity in a biography with an F-score of 0.808 and the entity-related events with an F-score of 0.859. Finally, the model was used for performing an analysis of biases about women and non-Western people in Wikipedia biographies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Interoperable annotation of events and event relations across domains. In Proceedings 14th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, pages 10–20, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
  2. David Bamman and Noah A Smith. 2014. Unsupervised discovery of biographical structure from text. Transactions of the Association for Computational Linguistics, 2:363–376.
  3. Propbank annotation guidelines. Center for Computational Language and Education Research Institute of Cognitive Science University of Colorado at Boulder.
  4. Claire Bonial and Martha Palmer. 2016. Comprehensive and consistent PropBank light verb annotation. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3980–3985, Portorož, Slovenia. European Language Resources Association (ELRA).
  5. Unhinging the National Framework: Perspectives on Transnational Life Writing. Sidestone Press.
  6. Nathanael Chambers and Dan Jurafsky. 2008. Unsupervised learning of narrative event chains. In Proceedings of ACL-08: HLT, pages 789–797.
  7. Nathanael Chambers and Dan Jurafsky. 2009. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 602–610, Suntec, Singapore. Association for Computational Linguistics.
  8. Kimberlé Crenshaw. 1989. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. u. Chi. Legal f., page 139.
  9. Kimberlé W Crenshaw. 2017. On intersectionality: Essential writings. The New Press.
  10. Kees van Deemter and Rodger Kibble. 2000. On coreferring: Coreference in muc and related annotation schemes. Computational linguistics, 26(4):629–637.
  11. Catherine D’ignazio and Lauren F Klein. 2020. Data feminism. MIT press.
  12. BiographyNet: Extracting Relations Between People and Events, pages 193–227. New Academic Press. Online published in: Computing Research Repository / ArXiv [v2 Wed, 26 Dec 2018].
  13. Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts. EPJ Data Science, 10(1):4.
  14. Simon Gottschalk and Elena Demidova. 2018. Eventkg: a multilingual event-centric temporal knowledge graph. In European Semantic Web Conference, pages 272–287. Springer.
  15. Ralph Grishman and Beth M Sundheim. 1996. Message understanding conference-6: A brief history. In COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics.
  16. Ontonotes: the 90% solution. In Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers, pages 57–60.
  17. Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77.
  18. Hans-Ulrich Krieger. 2014. A detailed comparison of seven approaches for the annotation of time-dependent factual knowledge in RDF and OWL. In Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation.
  19. Discovering differences in the representation of people using contextualized semantic axes. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3477–3494, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  20. The jensen-shannon divergence. Journal of the Franklin Institute, 334(2):307–318.
  21. Ramble on: Tracing movements of popular historical figures. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 77–80.
  22. The NomBank project: An interim report. In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004, pages 24–31, Boston, Massachusetts, USA. Association for Computational Linguistics.
  23. Meantime, the newsreader multilingual event and time corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 4417–4422.
  24. Semeval-2015 task 4: Timeline: Cross-document event ordering. In 9th international workshop on semantic evaluation (SemEval 2015), pages 778–786.
  25. Richer event description: Integrating event coreference with temporal, causal and bridging annotation. In Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016), pages 47–56.
  26. Large-scale data harvesting for biographical data. In Proceedings of the Third Conference on Biographical Data in a Digital World 2019, Varna, Bulgaria, September 5-6, 2019, volume 3152 of CEUR Workshop Proceedings, pages 66–72. CEUR-WS.org.
  27. Timeml: Robust specification of event and temporal expressions in text. New directions in question answering, 3:28–34.
  28. The TimeBank corpus. In Corpus linguistics, volume 2003, page 40. Lancaster, UK.
  29. Building event-centric knowledge graphs from news. Journal of Web Semantics, 37:132–151.
  30. Extracting and visualising biographical events from wikipedia. In BD, pages 111–115.
  31. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  32. Literary event detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3623–3634, Florence, Italy. Association for Computational Linguistics.
  33. The URW-KG: a resource for tackling the underrepresentation of non-western writers. arXiv preprint arXiv:2212.13104.
  34. Jiao Sun and Nanyun Peng. 2021. Men are elected, women are married: Events gender bias on wikipedia. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 350–360.
  35. Bio CRM: A data model for representing biographical data for prosopographical research. In Proceedings of the Second Conference on Biographical Data in a Digital World 2017 (BD2017). CEUR Workshop Proceedings.
  36. Newsreader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowledge-Based Systems, 110:60–85.
  37. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85.
  38. Amir Zeldes. 2017. The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3):581–612.
Citations (6)

Summary

We haven't generated a summary for this paper yet.