CausalCite: A Causal Formulation of Paper Citations (2311.02790v3)
Abstract: Citation count of a paper is a commonly used proxy for evaluating the significance of a paper in the scientific community. Yet citation measures are widely criticized for failing to accurately reflect the true impact of a paper. Thus, we propose CausalCite, a new way to measure the significance of a paper by assessing the causal impact of the paper on its follow-up papers. CausalCite is based on a novel causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. TextMatch encodes each paper using text embeddings from LLMs, extracts similar samples by cosine similarity, and synthesizes a counterfactual sample as the weighted average of similar papers according to their similarity values. We demonstrate the effectiveness of CausalCite on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various subfields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of the quality of a paper. Our code is available at https://github.com/causalNLP/causal-cite.
- Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American statistical Association, 105(490):493–505.
- Alberto Abadie and Javier Gardeazabal. 2003. The economic costs of conflict: A case study of the basque country. American economic review, 93(1):113–132.
- SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China. Association for Computational Linguistics.
- Applied usage and performance of statistical matching in bibliometrics: The comparison of milestone and regular papers with multiple measurements of disruptiveness as an empirical example. Quantitative Science Studies, 2(4):1246–1270.
- Language models are few-shot learners. ArXiv, abs/2005.14165.
- Håkan Carlsson. 2009. Allocation of research funds using bibliometric indicators–asset and challenge to swedish higher education sector.
- Patent citation network analysis: A perspective from descriptive statistics and ergms. Plos one, 15(12):e0241797.
- Dhivya Chandrasekaran and Vijay Mago. 2022. Evolution of semantic similarity - A survey. ACM Comput. Surv., 54(2):41:1–41:37.
- Citesee: Augmenting citations in scientific papers with persistent and personalized historical context. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–15.
- Palm: Scaling language modeling with pathways. J. Mach. Learn. Res., 24:240:1–240:113.
- Speeding up to keep up: exploring the use of ai in the research process. AI & society, 37(4):1439–1457.
- SPECTER: Document-level Representation Learning using Citation-informed Transformers. In ACL.
- Artificial intelligence in information systems research: A systematic literature review and research agenda. International Journal of Information Management, 60:102383.
- Corinna Cortes and Neil D. Lawrence. 2021. Inconsistency in conference peer review: Revisiting the 2014 neurips experiment. CoRR, abs/2109.09774.
- The strong-focusing synchrotron-a new high energy accelerator. Physical Review, 88:1190–1196.
- Current state and future trends: A citation network analysis of the learning analytics field. In Proceedings of the fourth international conference on learning analytics and knowledge, pages 231–240.
- ImageNet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition (CVPR), pages 248–255.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- How to conduct a bibliometric analysis: An overview and guidelines. Journal of business research, 133:285–296.
- Susan A Elmore. 2018. The altmetric attention score: what does it mean and why should i care?
- Xing Fang and Justin Zhijun Zhan. 2015. Sentiment analysis using product review data. Journal of Big Data, 2:1–14.
- Eugene Garfield. 1964. " science citation index"—a new dimension in indexing: This unique approach underlies versatile bibliographic systems for communicating and evaluating information. Science, 144(3619):649–654.
- Eugene Garfield. 1972. Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies. Science, 178(4060):471–479.
- The use of citation data in writing the history of science.
- Gary Rosenberg Gary Holden and Kathleen Barker. 2005. Bibliometrics. Social Work in Health Care, 41(3-4):67–92.
- Miguel A Hernán and James M Robins. 2010. Causal inference.
- Jorge E Hirsch. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences, 102(46):16569–16572.
- Jianhua Hou. 2017. Exploration into the evolution and historical roots of citation analysis by referenced publication year spectroscopy. Scientometrics, 110:1437–1452.
- Relative citation ratio (rcr): a new metric that uses citation rates to measure influence at the article level. PLoS biology, 14(9):e1002541.
- A decade of in-text citation analysis based on natural language processing and machine learning techniques: An overview of empirical studies. Scientometrics, 126(8):6551–6599.
- The use of citation context to detect the evolution of research topics: a large-scale analysis. Scientometrics, 126(4):2971–2989.
- The semantic scholar open data platform. CoRR, abs/2301.10140.
- Teun Kloek and Herman K. van Dijk. 1976. Bayesian estimates of equation system parameters, an application of integration by monte carlo. Econometrica, 46:1–19.
- RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
- S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online. Association for Computational Linguistics.
- Semantic textual similarity methods, tools, and applications: A survey. Computación y Sistemas, 20.
- New trends in scientific knowledge graphs and research impact assessment.
- Introduction to information retrieval. In J. Assoc. Inf. Sci. Technol.
- Henk F Moed. 2006. Citation analysis in research evaluation, volume 9. Springer Science & Business Media.
- Judea Pearl. 2009. Causality. Cambridge University Press.
- Fredrik Niclas Piro and Gunnar Sivertsen. 2016. How can differences in international university rankings be explained? Scientometrics, 109(3):2263–2278.
- A community’s perspective on the status and future of peer review in software engineering. Information and Software Technology, 95:75–85.
- Alec Radford and Karthik Narasimhan. 2018. Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Conference on Empirical Methods in Natural Language Processing.
- Perceptions of ethical problems with scientific journal peer review: an exploratory study. Science and engineering ethics, 14(3):305–310.
- Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3:333–389.
- Program chairs’ report on peer review at acl 2023. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages xl–lxxv, Toronto, Canada. Association for Computational Linguistics.
- High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685.
- Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55.
- Nihar B Shah. 2022. An overview of challenges, experiments, and computational solutions in peer review. Communications of the ACM, 65(6):76–87.
- Surya Nath Singh. 2014. Sampling techniques & determination of sample size in applied statistics research : an overview.
- Abir Smiti. 2020. A critical overview of outlier detection methods. Computer Science Review, 38:100306.
- Mpnet: Masked and permuted pre-training for language understanding. arXiv preprint arXiv:2004.09297.
- Robert F. Tate. 1954. Correlation between a discrete and a continuous variable. point-biserial correlation. Annals of Mathematical Statistics, 25:603–607.
- Scientific papers citation analysis using textual features and smote resampling techniques. Pattern Recognition Letters, 150:250–257.
- Identifying meaningful citations. In Scholarly Big Data: AI Perspectives, Challenges, and Ideas, Papers from the 2015 AAAI Workshop, Austin, Texas, USA, January, 2015, volume WS-15-13 of AAAI Technical Report. AAAI Press.
- Identifying meaningful citations. In AAAI Workshop: Scholarly Big Data.
- Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008.
- The metric tide: report of the independent review of the role of metrics in research assessment and management.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Marta Natalia Wróblewska. 2021. Research impact evaluation and academic discourse. Humanities and Social Sciences Communications, 8(1):1–12.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond. ArXiv, abs/2304.13712.
- Nurulhuda Zainuddin and Ali Selamat. 2014. Sentiment analysis using support vector machine. 2014 International Conference on Computer, Communications, and Control Technology (I4CT), pages 333–337.
- Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66.