Forecasting high-impact research topics via machine learning on evolving knowledge graphs (2402.08640v2)
Abstract: The exponential growth in scientific publications poses a severe challenge for human researchers. It forces attention to more narrow sub-fields, which makes it challenging to discover new impactful research ideas and collaborations outside one's own field. While there are ways to predict a scientific paper's future citation counts, they need the research to be finished and the paper written, usually assessing impact long after the idea was conceived. Here we show how to predict the impact of onsets of ideas that have never been published by researchers. For that, we developed a large evolving knowledge graph built from more than 21 million scientific papers. It combines a semantic network created from the content of the papers and an impact network created from the historic citations of papers. Using machine learning, we can predict the dynamic of the evolving network into the future with high accuracy, and thereby the impact of new research directions. We envision that the ability to predict the impact of new ideas will be a crucial component of future artificial muses that can inspire new impactful and interesting scientific ideas.
- D. Wang and A.-L. Barabási, The science of science (Cambridge University Press, 2021).
- L. Bornmann, R. Haunschild, and R. Mutz, Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases, Humanities and Social Sciences Communications 8, 1 (2021).
- OpenAI, Gpt-4 technical report, arXiv:2303.08774 (2023).
- Google, Gemini: a family of highly capable multimodal models, arXiv:2312.11805 (2023).
- V. Martínez, F. Berzal, and J.-C. Cubero, A survey of link prediction in complex networks, ACM computing surveys (CSUR) 49, 1 (2016).
- M. Krenn and A. Zeilinger, Predicting research trends with semantic and neural networks with an application in quantum physics, Proceedings of the National Academy of Sciences 117, 1910 (2020).
- Challenge the impact factor, Nature Biomedical Engineering 1, 0103 (2017).
- The many facets of impact, Nature Reviews Physics 6, 71 (2024).
- A.-L. Barabási, The formula: The universal laws of success (Hachette UK, 2018).
- D. Wang, C. Song, and A.-L. Barabási, Quantifying long-term scientific impact, Science 342, 127 (2013).
- L. Wu, D. Wang, and J. A. Evans, Large teams develop and small teams disrupt science and technology, Nature 566, 378 (2019).
- J. W. Weis and J. M. Jacobson, Learning on knowledge graph dynamics provides an early warning of impactful research, Nature Biotechnology 39, 1300 (2021).
- L. Fu and C. Aliferis, Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature, Scientometrics 85, 257 (2010).
- C. Stegehuis, N. Litvak, and L. Waltman, Predicting the long-term citation impact of recent publications, Journal of informetrics 9, 642 (2015).
- L. Weihs and O. Etzioni, Learning to predict citation-based impact measures, in 2017 ACM/IEEE joint conference on digital libraries (JCDL) (IEEE, 2017) pp. 1–10.
- E. van Nieuwenburg, E. Bairey, and G. Refael, Learning phase transitions from dynamics, Phys. Rev. B 98, 060301 (2018).
- A. V. Leeuwenhoek, Ii. microscopical observations on the blood vessels and membranes of the intestines. in a letter to the royal society from mr. anthony van leeuwenhoek, frs, Philosophical Transactions of the Royal Society of London 26, 53 (1709).
- E. J. Bergholtz, J. C. Budich, and F. K. Kunst, Exceptional topology of non-hermitian systems, Rev. Mod. Phys. 93, 015005 (2021).
- A.-L. Barabási, Network Science (Cambridge University Press, 2016).
- Y. Lu, Predicting research trends in artificial intelligence with gradient boosting decision trees and time-aware graph neural networks, in 2021 IEEE International Conference on Big Data (Big Data) (IEEE, 2021) pp. 5809–5814.
- T. Fawcett, Roc graphs: Notes and practical considerations for researchers, Machine learning 31, 1 (2004).
- A. V. Belikov, A. Rzhetsky, and J. Evans, Prediction of robust scientific facts from literature, Nature Machine Intelligence 4, 445 (2022).
- J. G. Foster, A. Rzhetsky, and J. A. Evans, Tradition and innovation in scientists’ research strategies, American Sociological Review 80, 875 (2015).
- J. Sourati and J. A. Evans, Accelerating science with human-aware artificial intelligence, Nature Human Behaviour 7, 1682 (2023).
- A. A. Salatino, F. Osborne, and E. Motta, How are topics born? understanding the research dynamics preceding the emergence of new areas, PeerJ Computer Science 3, e119 (2017).
- A. A. Salatino, F. Osborne, and E. Motta, Augur: forecasting the emergence of new research topics, in Proceedings of the 18th ACM/IEEE on joint conference on digital libraries (2018) pp. 303–312.
- J. G. Foster, F. Shi, and J. Evans, Surprise! measuring novelty as expectation violation, SocArXiv 2t46f (2021).
- F. Shi and J. Evans, Surprising combinations of research contents and contexts are related to impact and emerge with scientific outsiders from distant disciplines, Nature Communications 14, 1641 (2023).
- Xuemei Gu (17 papers)
- Mario Krenn (74 papers)