Impact-Oriented Contextual Scholar Profiling using Self-Citation Graphs (2304.12217v3)
Abstract: Quantitatively profiling a scholar's scientific impact is important to modern research society. Current practices with bibliometric indicators (e.g., h-index), lists, and networks perform well at scholar ranking, but do not provide structured context for scholar-centric, analytical tasks such as profile reasoning and understanding. This work presents GeneticFlow (GF), a suite of novel graph-based scholar profiles that fulfill three essential requirements: structured-context, scholar-centric, and evolution-rich. We propose a framework to compute GF over large-scale academic data sources with millions of scholars. The framework encompasses a new unsupervised advisor-advisee detection algorithm, a well-engineered citation type classifier using interpretable features, and a fine-tuned graph neural network (GNN) model. Evaluations are conducted on the real-world task of scientific award inference. Experiment outcomes show that the F1 score of best GF profile significantly outperforms alternative methods of impact indicators and bibliometric networks in all the 6 computer science fields considered. Moreover, the core GF profiles, with 63.6%-66.5% nodes and 12.5%-29.9% edges of the full profile, still significantly outrun existing methods in 5 out of 6 fields studied. Visualization of GF profiling result also reveals human explainable patterns for high-impact scholars.
- 2020. Microsoft Academic Graph. https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.
- 2021. The ACL Anthology Reference Corpus. https://aclanthology.org/.
- 2021. Microsoft Academic Search. https://academic.microsoft.com.
- 2023. ACM Author Profile. https://www.acm.org/publications/acm-author-profile-page.
- 2023. AMiner. https://www.aminer.org.
- 2023. CiteSeerX. https://citeseer.ist.psu.edu/.
- 2023. DBLP. https://dblp.org/.
- 2023. Google Scholar. https://scholar.google.com/.
- 2023. OpenReview. https://OpenReview.net/.
- 2023. PyG library. https://www.pyg.org/.
- 2023. ResearchGate. https://www.researchgate.net/.
- 2023. Semantic Scholar. https://www.semanticscholar.org/.
- Graph neural networks with convolutional ARMA filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2022), 3496–3507.
- Chaomei Chen. 2006. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology 57, 3 (2006), 359–377.
- ParsCit: an Open-source CRF Reference String Parsing Package.. In LREC’08, Vol. 8. 661–667.
- J De Leeuw. 1977. Application of convex analysis to multidimensional scaling. Recent developments in statistics (1977), 133–145.
- Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 3844–3852.
- Frederik Diehl. 2019. Edge contraction pooling for graph neural networks. arXiv preprint arXiv:1905.10990 (2019).
- Nees Jan Van Eck and Ludo Waltman. 2014. Visualizing bibliometric networks. In Measuring scholarly impact. 285–320.
- Leo Egghe. 2006. Theory and practise of the g-index. Scientometrics 69, 1 (2006), 131–152.
- Science of science. Science 359, 6379 (2018).
- James Fowler and Dag Aksnes. 2007. Does self-citation pay? Scientometrics 72, 3 (2007), 427–437.
- Eugene Garfield. 2004. Historiographic mapping of knowledge domains literature. Journal of Information Science 30, 2 (2004), 119–145.
- Algorithmic citation-linked historiography - Mapping the literature of science. Proceedings of the American Society for Information Science and Technology 39, 1 (2002), 14–24.
- Why do we need algorithmic historiography? Journal of the American Society for Information Science and Technology 54, 5 (2003), 400–412.
- Researcher and author profiles: opportunities, advantages, and limitations. Journal of Korean Medical Science 32, 11 (2017), 1749–1756.
- Extremely randomized trees. Machine learning 63, 1 (2006), 3–42.
- Nils T Hagen. 2009. Credit for coauthors. Science 323, 5914 (2009), 583–583.
- Stefanie Haustein. 2012. Multidimensional Journal Evaluation: Analyzing Scientific Periodicals beyond the Impact Factor. Walter de Gruyter.
- Self-citations, co-authorships and keywords: A new approach to scientists’ field mobility? Scientometrics 72, 3 (2007), 469–486.
- Jorge E Hirsch. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences 102, 46 (2005), 16569–16572.
- Eiffel: Evolutionary flow map for influence graph visualization. IEEE Transactions on Visualization and Computer Graphics 26, 10 (2019), 2944–2960.
- Bihui Jin. 2006. H-index: an evaluation indicator proposed by scientist. Science Focus 1, 1 (2006), 8–9.
- The R-and AR-indices: Complementing the h-index. Chinese science bulletin 52, 6 (2007), 855–863.
- Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics 6 (2018), 391–406.
- Impact-based ranking of scientific publications: A survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2019), 1567–1584.
- Maxwell Mirton Kessler. 1963. Bibliographic coupling between scientific papers. American documentation 14, 1 (1963), 10–25.
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR’17.
- Academic social networks: Modeling, analysis, mining and applications. Journal of Network and Computer Applications 132 (2019), 86–103.
- Shifu2: A network representation learning based model for advisor-advisee relationship mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2019), 1763–1777.
- Michael Färber. 2019. The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. In ISWC’19. 113–129.
- The research guarantors of scientific papers and the output counting: a promising new approach. Scientometrics 97, 2 (2013), 421–434.
- OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833 (2022).
- Ronald Rousseau. 2006. New Developments Related to the Hirsch Index. Science Focus 1, 4 (2006), 23–25.
- Lesley A Schimanski and Juan Pablo Alperin. 2018. The evaluation of scholarship in academic promotion and tenure processes: Past, present, and future. F1000Research 7 (2018).
- Vegas: Visual influence graph summarization on citation networks. IEEE Transactions on Knowledge and Data Engineering 27, 12 (2015), 3417–3431.
- Henry Small. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for information Science 24, 4 (1973), 265–269.
- How much is too much? The difference between research influence and self-citation excess. Scientometrics 123, 2 (2020), 1119–1147.
- Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing. 103–110.
- Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering 32, 10 (2019), 1881–1896.
- Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence.
- Nees Jan van Eck and Ludo Waltman. 2008. Generalizing the h- and g-indices. Journal of Informetrics 2, 4 (2008), 263–271.
- Graph Attention Networks. In ICLR’18.
- Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. Quantitative Science Studies 2, 1 (2021), 20–41.
- Ludo Waltman and Nees Jan van Eck. 2015. Field-normalized citation impact indicators and the choice of an appropriate counting method. Journal of Informetrics 9, 4 (2015), 872–894.
- Mining advisor-advisee relationships from research publication networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 203–212.
- A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics 101, 1 (2014), 125–158.
- Erjia Yan and Ying Ding. 2012. Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other. Journal of the American Society for Information Science and Technology 63, 7 (2012), 1313–1326.
- A survey on scholar profiling techniques in the open Internet. Journal of Computer Research and Development 55, 9 (2018), 1903–1919.
- Hierarchical graph pooling with structure learning. arXiv preprint arXiv:1911.05954 (2019).
- Identifying advisor-advisee relationships from co-author networks via a novel deep model. Information Sciences 466 (2018), 258–269.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.