Is there really a Citation Age Bias in NLP? (2401.03545v1)
Abstract: Citations are a key ingredient of scientific research to relate a paper to others published in the community. Recently, it has been noted that there is a citation age bias in the NLP community, one of the currently fastest growing AI subfields, in that the mean age of the bibliography of NLP papers has become ever younger in the last few years, leading to `citation amnesia' in which older knowledge is increasingly forgotten. In this work, we put such claims into perspective by analyzing the bibliography of $\sim$300k papers across 15 different scientific fields submitted to the popular preprint server Arxiv in the time period from 2013 to 2022. We find that all AI subfields (in particular: cs.AI, cs.CL, cs.CV, cs.LG) have similar trends of citation amnesia, in which the age of the bibliography has roughly halved in the last 10 years (from above 12 in 2013 to below 7 in 2022), on average. Rather than diagnosing this as a citation age bias in the NLP community, we believe this pattern is an artefact of the dynamics of these research fields, in which new knowledge is produced in ever shorter time intervals.
- Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 596–606, Atlanta, Georgia. Association for Computational Linguistics.
- Did AI get more negative recently? Royal Society Open Science, 10(3):221159.
- Automatikz: Text-guided synthesis of scientific vector graphics with tikz. ArXiv, abs/2310.00367.
- Marcel Bollmann and Desmond Elliott. 2020. On forgetting to cite older papers: An analysis of the ACL Anthology. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7819–7827, Online. Association for Computational Linguistics.
- Sparks of artificial general intelligence: Early experiments with gpt-4. ArXiv, abs/2303.12712.
- The incidence and role of negative citations in science. Proceedings of the National Academy of Sciences, 112(45):13823–13826.
- Yanran Chen and Steffen Eger. 2022. Transformers go for the lols: Generating (humourous) titles from scientific abstracts end-to-end. ArXiv, abs/2212.10522.
- Yanran Chen and Steffen Eger. 2023. MENLI: Robust evaluation metrics from natural language inference. Transactions of the Association for Computational Linguistics, 11:804–825.
- On the use of arxiv as a dataset. ArXiv, abs/1905.00075.
- Structural scaffolds for citation intent classification in scientific publications. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3586–3596, Minneapolis, Minnesota. Association for Computational Linguistics.
- Nllg quarterly arxiv report 06/23: What are the most influential current ai papers? ArXiv, abs/2308.04889.
- Predicting research trends from arxiv. ArXiv, abs/1903.02831.
- Toward the discovery of citation cartels in citation networks. Frontiers in Physics, 4:49.
- Peter C Gøtzsche. 2022. Citation bias: questionable research practice or scientific misconduct? Journal of the Royal Society of Medicine, 115(1):31–35. PMID: 35105192.
- Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics, 6:391–406.
- Chatgpt: A meta-analysis after 2.5 months. ArXiv, abs/2302.13795.
- Gendered citation patterns among the scientific elite. Proceedings of the National Academy of Sciences, 119(40):e2206070119.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc.
- Automatic title generation for text with pre-trained transformer language model. 2021 IEEE 15th International Conference on Semantic Computing (ICSC), pages 17–24.
- Saif M. Mohammad. 2020. Gender gap in natural language processing research: Disparities in authorship and citations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7860–7870, Online. Association for Computational Linguistics.
- The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: The hotspot. Science Advances, 3(4):e1601315.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- Attention decay in science. Journal of Informetrics, 9(4):734–745.
- COMET: A neural framework for MT evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702, Online. Association for Computational Linguistics.
- Geographic citation gaps in NLP research. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1371–1383, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Exploring the landscape of natural language processing research. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 1034–1045, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Self-citations as strategic response to the use of metrics for career decisions. Research Policy, 48(2):478–491.
- BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online. Association for Computational Linguistics.
- Forgotten knowledge: Examining the citational amnesia in NLP. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6192–6208, Toronto, Canada. Association for Computational Linguistics.
- On the shoulders of giants: The growing impact of older articles. CoRR, abs/1411.0275.
- We are who we cite: Bridges of influence between natural language processing and other academic fields. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12896–12913, Singapore. Association for Computational Linguistics.
- Can we automate scientific reviewing? J. Artif. Int. Res., 75.
- Nllg quarterly arxiv report 09/23: What are the most influential current ai papers?
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
- MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 563–578, Hong Kong, China. Association for Computational Linguistics.
- Hoa Nguyen (11 papers)
- Steffen Eger (90 papers)