Papers
Topics
Authors
Recent
2000 character limit reached

Impact-Oriented Contextual Scholar Profiling using Self-Citation Graphs (2304.12217v3)

Published 24 Apr 2023 in cs.DL and cs.AI

Abstract: Quantitatively profiling a scholar's scientific impact is important to modern research society. Current practices with bibliometric indicators (e.g., h-index), lists, and networks perform well at scholar ranking, but do not provide structured context for scholar-centric, analytical tasks such as profile reasoning and understanding. This work presents GeneticFlow (GF), a suite of novel graph-based scholar profiles that fulfill three essential requirements: structured-context, scholar-centric, and evolution-rich. We propose a framework to compute GF over large-scale academic data sources with millions of scholars. The framework encompasses a new unsupervised advisor-advisee detection algorithm, a well-engineered citation type classifier using interpretable features, and a fine-tuned graph neural network (GNN) model. Evaluations are conducted on the real-world task of scientific award inference. Experiment outcomes show that the F1 score of best GF profile significantly outperforms alternative methods of impact indicators and bibliometric networks in all the 6 computer science fields considered. Moreover, the core GF profiles, with 63.6%-66.5% nodes and 12.5%-29.9% edges of the full profile, still significantly outrun existing methods in 5 out of 6 fields studied. Visualization of GF profiling result also reveals human explainable patterns for high-impact scholars.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. 2020. Microsoft Academic Graph. https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/.
  2. 2021. The ACL Anthology Reference Corpus. https://aclanthology.org/.
  3. 2021. Microsoft Academic Search. https://academic.microsoft.com.
  4. 2023. ACM Author Profile. https://www.acm.org/publications/acm-author-profile-page.
  5. 2023. AMiner. https://www.aminer.org.
  6. 2023. CiteSeerX. https://citeseer.ist.psu.edu/.
  7. 2023. DBLP. https://dblp.org/.
  8. 2023. Google Scholar. https://scholar.google.com/.
  9. 2023. OpenReview. https://OpenReview.net/.
  10. 2023. PyG library. https://www.pyg.org/.
  11. 2023. ResearchGate. https://www.researchgate.net/.
  12. 2023. Semantic Scholar. https://www.semanticscholar.org/.
  13. Graph neural networks with convolutional ARMA filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2022), 3496–3507.
  14. Chaomei Chen. 2006. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology 57, 3 (2006), 359–377.
  15. ParsCit: an Open-source CRF Reference String Parsing Package.. In LREC’08, Vol. 8. 661–667.
  16. J De Leeuw. 1977. Application of convex analysis to multidimensional scaling. Recent developments in statistics (1977), 133–145.
  17. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 3844–3852.
  18. Frederik Diehl. 2019. Edge contraction pooling for graph neural networks. arXiv preprint arXiv:1905.10990 (2019).
  19. Nees Jan Van Eck and Ludo Waltman. 2014. Visualizing bibliometric networks. In Measuring scholarly impact. 285–320.
  20. Leo Egghe. 2006. Theory and practise of the g-index. Scientometrics 69, 1 (2006), 131–152.
  21. Science of science. Science 359, 6379 (2018).
  22. James Fowler and Dag Aksnes. 2007. Does self-citation pay? Scientometrics 72, 3 (2007), 427–437.
  23. Eugene Garfield. 2004. Historiographic mapping of knowledge domains literature. Journal of Information Science 30, 2 (2004), 119–145.
  24. Algorithmic citation-linked historiography - Mapping the literature of science. Proceedings of the American Society for Information Science and Technology 39, 1 (2002), 14–24.
  25. Why do we need algorithmic historiography? Journal of the American Society for Information Science and Technology 54, 5 (2003), 400–412.
  26. Researcher and author profiles: opportunities, advantages, and limitations. Journal of Korean Medical Science 32, 11 (2017), 1749–1756.
  27. Extremely randomized trees. Machine learning 63, 1 (2006), 3–42.
  28. Nils T Hagen. 2009. Credit for coauthors. Science 323, 5914 (2009), 583–583.
  29. Stefanie Haustein. 2012. Multidimensional Journal Evaluation: Analyzing Scientific Periodicals beyond the Impact Factor. Walter de Gruyter.
  30. Self-citations, co-authorships and keywords: A new approach to scientists’ field mobility? Scientometrics 72, 3 (2007), 469–486.
  31. Jorge E Hirsch. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences 102, 46 (2005), 16569–16572.
  32. Eiffel: Evolutionary flow map for influence graph visualization. IEEE Transactions on Visualization and Computer Graphics 26, 10 (2019), 2944–2960.
  33. Bihui Jin. 2006. H-index: an evaluation indicator proposed by scientist. Science Focus 1, 1 (2006), 8–9.
  34. The R-and AR-indices: Complementing the h-index. Chinese science bulletin 52, 6 (2007), 855–863.
  35. Measuring the evolution of a scientific field through citation frames. Transactions of the Association for Computational Linguistics 6 (2018), 391–406.
  36. Impact-based ranking of scientific publications: A survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2019), 1567–1584.
  37. Maxwell Mirton Kessler. 1963. Bibliographic coupling between scientific papers. American documentation 14, 1 (1963), 10–25.
  38. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR’17.
  39. Academic social networks: Modeling, analysis, mining and applications. Journal of Network and Computer Applications 132 (2019), 86–103.
  40. Shifu2: A network representation learning based model for advisor-advisee relationship mining. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2019), 1763–1777.
  41. Michael Färber. 2019. The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data. In ISWC’19. 113–129.
  42. The research guarantors of scientific papers and the output counting: a promising new approach. Scientometrics 97, 2 (2013), 421–434.
  43. OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833 (2022).
  44. Ronald Rousseau. 2006. New Developments Related to the Hirsch Index. Science Focus 1, 4 (2006), 23–25.
  45. Lesley A Schimanski and Juan Pablo Alperin. 2018. The evaluation of scholarship in academic promotion and tenure processes: Past, present, and future. F1000Research 7 (2018).
  46. Vegas: Visual influence graph summarization on citation networks. IEEE Transactions on Knowledge and Data Engineering 27, 12 (2015), 3417–3431.
  47. Henry Small. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for information Science 24, 4 (1973), 265–269.
  48. How much is too much? The difference between research influence and self-citation excess. Scientometrics 123, 2 (2020), 1119–1147.
  49. Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing. 103–110.
  50. Automatic classification of algorithm citation functions in scientific literature. IEEE Transactions on Knowledge and Data Engineering 32, 10 (2019), 1881–1896.
  51. Identifying meaningful citations. In Workshops at the twenty-ninth AAAI conference on artificial intelligence.
  52. Nees Jan van Eck and Ludo Waltman. 2008. Generalizing the h- and g-indices. Journal of Informetrics 2, 4 (2008), 263–271.
  53. Graph Attention Networks. In ICLR’18.
  54. Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. Quantitative Science Studies 2, 1 (2021), 20–41.
  55. Ludo Waltman and Nees Jan van Eck. 2015. Field-normalized citation impact indicators and the choice of an appropriate counting method. Journal of Informetrics 9, 4 (2015), 872–894.
  56. Mining advisor-advisee relationships from research publication networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 203–212.
  57. A review of the characteristics of 108 author-level bibliometric indicators. Scientometrics 101, 1 (2014), 125–158.
  58. Erjia Yan and Ying Ding. 2012. Scholarly network similarities: How bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other. Journal of the American Society for Information Science and Technology 63, 7 (2012), 1313–1326.
  59. A survey on scholar profiling techniques in the open Internet. Journal of Computer Research and Development 55, 9 (2018), 1903–1919.
  60. Hierarchical graph pooling with structure learning. arXiv preprint arXiv:1911.05954 (2019).
  61. Identifying advisor-advisee relationships from co-author networks via a novel deep model. Information Sciences 466 (2018), 258–269.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.