Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models (2405.12206v1)

Published 20 May 2024 in cs.CL and cs.LG

Abstract: Scientist learn early on how to cite scientific sources to support their claims. Sometimes, however, scientists have challenges determining where a citation should be situated -- or, even worse, fail to cite a source altogether. Automatically detecting sentences that need a citation (i.e., citation worthiness) could solve both of these issues, leading to more robust and well-constructed scientific arguments. Previous researchers have applied machine learning to this task but have used small datasets and models that do not take advantage of recent algorithmic developments such as attention mechanisms in deep learning. We hypothesize that we can develop significantly accurate deep learning architectures that learn from large supervised datasets constructed from open access publications. In this work, we propose a Bidirectional Long Short-Term Memory (BiLSTM) network with attention mechanism and contextual information to detect sentences that need citations. We also produce a new, large dataset (PMOA-CITE) based on PubMed Open Access Subset, which is orders of magnitude larger than previous datasets. Our experiments show that our architecture achieves state of the art performance on the standard ACL-ARC dataset ($F_{1}=0.507$) and exhibits high performance ($F_{1}=0.856$) on the new PMOA-CITE. Moreover, we show that it can transfer learning across these datasets. We further use interpretable models to illuminate how specific language is used to promote and inhibit citations. We discover that sections and surrounding sentences are crucial for our improved predictions. We further examined purported mispredictions of the model, and uncovered systematic human mistakes in citation behavior and source data. This opens the door for our model to check documents during pre-submission and pre-archival procedures. We make this new dataset, the code, and a web-based tool available to the community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Researchers’ perceptions of citations. Research Policy, 38(6):895–905.
  2. Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2):211–36.
  3. Allerton, D. J. (1969). The sentence as a linguistic unit. Lingua, 22:27–46.
  4. ANSI/NISO, Z. (2013). JATS: Journal Article Tag Suite. National Information Standards Organization.
  5. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
  6. Content-based citation recommendation. In Proceedings of NAACL-HLT 2018, page 13.
  7. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4):77–84.
  8. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022.
  9. Citation worthiness of sentences in scientific reports. In SIGIR, pages 1061–1064.
  10. The Craft of Research, Fourth Edition. Chicago Guides to Writing, Editing, and Publishing. University of Chicago Press.
  11. {{\{{{{\{{Citation needed}}\}}}}\}}: the dynamics of referencing in wikipedia. In Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration, page 8. ACM.
  12. Automatic generation of related work through summarizing citations. Concurrency and Computation: Practice and Experience, 31(3):e4261.
  13. Joint learning of character and word embeddings. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
  14. Citation resolution: A method for evaluating context-based citation recommendation systems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 358–363.
  15. Applying core scientific concepts to context-based citation recommendation. In LREC.
  16. Neural citation network for context-aware citation recommendation. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 1093–1096. ACM.
  17. To cite, or not to cite? detecting citation contexts in text. In European Conference on Information Retrieval, pages 598–603. Springer.
  18. Fine-grained citation span detection for references in wikipedia. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1990–1999.
  19. Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. Studies in linguistic analysis.
  20. Linguistic diversity and language theories, volume 72. John Benjamins Publishing.
  21. Author practices in citing other authors, institutions, and journals. Journal of the Association for Information Science and Technology, 67(10):2536–2549.
  22. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE.
  23. Neural turing machines. arXiv preprint arXiv:1410.5401.
  24. An introduction to functional grammar. Routledge.
  25. Harris, Z. S. (1954). Distributional structure. Word, 10(2-3):146–162.
  26. The elements of statistical learning: Data Mining, Inference, and Prediction. Springer series in statistics New York, 2nd edition.
  27. Position-aligned translation model for citation recommendation. In International Symposium on String Processing and Information Retrieval, pages 251–263. Springer.
  28. Citation recommendation without author supervision. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 755–764. ACM.
  29. Context-aware citation recommendation. In WWW ’10 Proceedings of the 19th international conference on World wide web.
  30. Long short-term memory. Neural computation, 9(8):1735–1780.
  31. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear.
  32. A Neural Probabilistic Model for Context Based Citation Recommendation. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, page 7.
  33. Citation needed: filling in wikipedia’s citation shaped holes. In Bibliometric-enhanced Information Retrieval, pages 45–52. BIR 2014.
  34. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.
  35. A new approach for implicit citation extraction. In International Conference on Intelligent Data Engineering and Automated Learning, pages 121–129. Springer.
  36. Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING 2012, pages 1343–1358.
  37. Speech and language processing. Pearson London:, 3rd edition.
  38. Characteristics of citation scopes: a preliminary study to detect citing sentences. In Computer Applications for Database, Education, and Ubiquitous Computing, pages 80–85. Springer.
  39. Citation block determination using textual coherence. Journal of Information Processing, 24(3):540–553.
  40. Direction awareness in citation recommendation. In DBRank’12.
  41. High-reproducibility and high-accuracy method for automated topic classification. Physical Review X, 5(1):011007.
  42. Dataset and neural recurrent sequence labeling model for open-domain factoid question answering.
  43. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130.
  44. Citation and quotation accuracy in three anatomy journals. Clinical Anatomy: The Official Journal of the American Association of Clinical Anatomists and the British Association of Clinical Anatomists, 17(7):534–539.
  45. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421. Association for Computational Linguistics.
  46. Introduction to Information Retrieval. Cambridge University Press.
  47. Masic, I. (2013). The importance of proper citation of references in biomedical articles. Acta Informatica Medica, 21(3):148.
  48. On the recommending of citations for research papers. In Proceedings of the 2002 ACM conference on Computer supported cooperative work, pages 116–125. ACM.
  49. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
  50. Mogull, S. A. (2017). Accuracy of cited “facts” in medical research articles: A review of study methodology and recalculation of quotation error rate. PloS one, 12(9):e0184727.
  51. Citances: Citation sentences for semantic analysis of bioscience text. In Proceedings of the SIGIR, volume 4, pages 81–88.
  52. On the difficulty of training recurrent neural networks. In International conference on machine learning, pages 1310–1318.
  53. News citation recommendation with implicit and explicit semantics. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 388–398.
  54. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
  55. Ritchie, A. (2009). Citation context analysis for information retrieval. Technical report, University of Cambridge, Computer Laboratory.
  56. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1818–1826.
  57. Object-based visual attention for computer vision. Artificial Intelligence, 146(1):77–123.
  58. Enhancing digital libraries with techlens+. In Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, pages 228–236. ACM.
  59. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  60. Wikipedia contributors (2018). A rape on campus — Wikipedia, the free encyclopedia. [Online; accessed 13-June-2018].
  61. Character-level convolutional networks for text classification. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 649–657. Curran Associates, Inc.
  62. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 207–212. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Tong Zeng (8 papers)
  2. Daniel E. Acuna (15 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.