Time to Cite: Modeling Citation Networks using the Dynamic Impact Single-Event Embedding Model (2403.00032v1)
Abstract: Understanding the structure and dynamics of scientific research, i.e., the science of science (SciSci), has become an important area of research in order to address imminent questions including how scholars interact to advance science, how disciplines are related and evolve, and how research impact can be quantified and predicted. Central to the study of SciSci has been the analysis of citation networks. Here, two prominent modeling methodologies have been employed: one is to assess the citation impact dynamics of papers using parametric distributions, and the other is to embed the citation networks in a latent space optimal for characterizing the static relations between papers in terms of their citations. Interestingly, citation networks are a prominent example of single-event dynamic networks, i.e., networks for which each dyad only has a single event (i.e., the point in time of citation). We presently propose a novel likelihood function for the characterization of such single-event networks. Using this likelihood, we propose the Dynamic Impact Single-Event Embedding model (DISEE). The \textsc{\modelabbrev} model characterizes the scientific interactions in terms of a latent distance model in which random effects account for citation heterogeneity while the time-varying impact is characterized using existing parametric representations for assessment of dynamic impact. We highlight the proposed approach on several real citation networks finding that the DISEE well reconciles static latent distance network embedding approaches with classical dynamic impact assessments.
- J. Anderson and D. Gerbing. Structural equation modeling in practice: A review and recommended two-step approach. Psychological bulletin, 103(3):411–423, May 1988. ISSN 0033-2909. doi: 10.1037/0033-2909.103.3.411.
- CHIP: A hawkes process model for continuous-time networks with scalable and consistent estimation. In NeurIPS, volume 33, pages 16983–16996, 2020.
- Handful of papers dominates citation. Nature, 491(7422):40–41, 2012.
- Citation prediction using diverse features. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pages 589–596, 2015. doi: 10.1109/ICDMW.2015.131.
- Modelling reciprocating relationships with hawkes processes. In NeurIPS, volume 25, 2012.
- Which peers matter? the relative impacts of collaborators, colleagues, and competitors. Review of economics and statistics, 97(5):1104–1117, 2015.
- Piecewise-velocity model for learning continuous-time dynamic node representations. arXiv preprint arXiv:2212.12345, 2022.
- Continuous-time graph representation with sequential survival process, 2023.
- Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3), may 2011. ISSN 2157-6904. doi: 10.1145/1961189.1961199. URL https://doi.org/10.1145/1961189.1961199.
- Emergent behavior of growing knowledge about molecular interactions. Nature Biotechnology, 23:1243–1247, Nov. 2005. doi: 10.1038/nbt1005-1243.
- N. R. Council et al. Enhancing the effectiveness of team science. 2015.
- High impact academic paper prediction using temporal and topological features. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM ’14, page 491–498, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450325981. doi: 10.1145/2661829.2662066. URL https://doi.org/10.1145/2661829.2662066.
- Hawkes processes on large networks. The Annals of Applied Probability, 26(1):216 – 261, 2016.
- The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact. PloS one, 7(12):e51332, 2012.
- D. Durante and D. Dunson. Bayesian Logistic Gaussian Process Models for Dynamic Networks. In AISTATS, volume 33, pages 194–201, 2014.
- D. Durante and D. B. Dunson. Locally adaptive dynamic networks. The Annals of Applied Statistics, 10(4):2203–2232, 2016.
- Y.-H. Eom and S. Fortunato. Characterizing and modeling citation dynamics. PLOS ONE, 6(9):1–7, 09 2011. doi: 10.1371/journal.pone.0024926. URL https://doi.org/10.1371/journal.pone.0024926.
- Continuous-time edge modelling using non-parametric point processes. NeurIPS, 34:2319–2330, 2021.
- Science of science. Science, 359(6379):eaao0185, 2018. doi: 10.1126/science.aao0185.
- Tradition and Innovation in Scientists’ Research Strategies. American Sociological Review, 80(5):875–908, 2015. doi: 10.1177/0003122415601618.
- Competition and careers in biosciences, 2001.
- E. Garfield. Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies. Science, 178(4060):471–479, 1972.
- Group-based yule model for bipartite author-paper networks. Phys. Rev. E, 71:026108, Feb 2005. doi: 10.1103/PhysRevE.71.026108.
- M. Golosovsky and S. Solomon. Runaway events dominate the heavy tail of citation distributions. The European Physical Journal Special Topics, 205(1):303–311, 2012.
- Spatio-temporal point process statistics: A review. Spatial Statistics, 18:505–544, 2016. ISSN 2211-6753. doi: https://doi.org/10.1016/j.spasta.2016.10.002.
- G. Grimmett and D. Stirzaker. Probability and Random Processes. Oxford University Press, 2001.
- A. Grover and J. Leskovec. Node2Vec: Scalable feature learning for networks. In KDD, pages 855–864, 2016.
- Influence, originality and similarity in directed acyclic graphs. Europhysics Letters, 96(1):18004, sep 2011a. doi: 10.1209/0295-5075/96/18004.
- Tracing the evolution of physics on the backbone of citation networks. Phys. Rev. E, 84:046104, Oct 2011b. doi: 10.1103/PhysRevE.84.046104.
- Inductive representation learning on large graphs. In NIPS, 2017.
- Deep residual learning for image recognition, 2015.
- C. Heaukulani and Z. Ghahramani. Dynamic probabilistic models for latent feature propagation in social networks. In ICML, pages 275–283, 2013.
- Modeling temporal evolution and multiscale structure in networks. In ICML, pages 960–968, 2013.
- J. E. Hirsch. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences, 102(46):16569–16572, 2005.
- P. D. Hoff. Bilinear mixed-effects models for dyadic data. JASA, 100(469):286–295, 2005.
- Latent space approaches to social network analysis. JASA, 97(460):1090–1098, 2002a.
- Latent space approaches to social network analysis. JASA, 97(460):1090–1098, 2002b.
- Predicting citation count of bioinformatics papers within four years of publication. Bioinformatics (Oxford, England), 25:3303–9, 10 2009. doi: 10.1093/bioinformatics/btp585.
- Dynamic infinite relational model for time-varying relational data analysis. NeurIPS, 23, 2010.
- Representation learning for dynamic graphs: A survey. JMLR, 21(70):1–73, 2020.
- A review of dynamic network models with latent variables. Statistics surveys, 12:105, 2018.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2017.
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks, 2017.
- Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social Networks, 31(3):204 – 213, 2009.
- Bibliometrics: Global gender disparities in science. Nature, 504(7479):211–213, 2013.
- Team size matters: Collaboration and scientific impact since 1900. Journal of the Association for Information Science and Technology, 66(7):1323–1332, 2015.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
- Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, page 177–187, New York, NY, USA, 2005. Association for Computing Machinery. ISBN 159593135X. doi: 10.1145/1081870.1081893. URL https://doi.org/10.1145/1081870.1081893.
- The advantage of short paper titles. Royal Society Open Science, 2:150266, 08 2015. doi: 10.1098/rsos.150266.
- The advantage of simple paper abstracts. Journal of Informetrics, 10(1):1–8, 2016. ISSN 1751-1577. doi: https://doi.org/10.1016/j.joi.2015.11.001.
- Is heterophily a real nightmare for graph neural networks to do node classification?, 2021.
- S. Milojević. Principles of scientific research team formation and evolution. Proceedings of the National Academy of Sciences, 111(11):3984–3989, 2014.
- A hierarchical block distance model for ultra low-dimensional graph representations. 2022.
- HM-LDM: A hybrid-membership latent distance model. In CNA XI, pages 350–363. Springer International Publishing, 2023a.
- A hierarchical block distance model for ultra low-dimensional graph representations. IEEE Transactions on Knowledge and Data Engineering, pages 1–14, 2023b. doi: 10.1109/TKDE.2023.3304344.
- M. E. Newman. The structure of scientific collaboration networks. Proceedings of the national academy of sciences, 98(2):404–409, 2001a.
- M. E. Newman. Scientific collaboration networks. i. network construction and fundamental results. Physical review E, 64(1):016131, 2001b.
- M. E. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001c.
- Continuous-time dynamic network embeddings. In TheWebConf, page 969–976, 2018.
- M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.
- M. Nickel and D. Kiela. Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In International Conference on Machine Learning, pages 3779–3788. PMLR, 2018.
- R. V. Noorden. Interdisciplinary research by the numbers. Nature, 525:306–307, 2015.
- Deepwalk: Online learning of social representations. In KDD, page 701–710, 2014.
- Persistence and uncertainty in the academic career. Proceedings of the National Academy of Sciences, 109(14):5213–5218, 2012.
- Openalex: A fully-open index of scholarly works, authors, venues, institutions, and concepts, 2022.
- Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec. In WSDM, 2018.
- Netsmf: Large-scale network embedding as sparse matrix factorization. In The World Wide Web Conference, pages 1509–1520, 2019.
- Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105(45):17268–17272, 2008.
- Fast inference for the latent space network model using a case-control approximate likelihood. Journal of Computational and Graphical Statistics, 21(4):901–919, 2012.
- S. Redner. How popular is your paper? an empirical study of the citation distribution. The European Physical Journal B, 4(2):131–134, aug 1998. doi: 10.1007/s100510050359.
- S. Redner. Citation statistics from 110 years of physical review. Physics Today, 58, 06 2005. doi: 10.1063/1.1996475.
- Temporal graph networks for deep learning on dynamic graphs. ICML 2020 Workshop, 2020.
- P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. In Y. Weiss, B. Schölkopf, and J. Platt, editors, NeurIPS, volume 18, 2005.
- Collective classification in network data. AI magazine, 2008.
- D. K. Sewell and Y. Chen. Latent space models for dynamic networks. JASA, 110(512):1646–1657, 2015.
- Modeling and predicting popularity dynamics via reinforced poisson processes. CoRR, abs/1401.0778, 2014. URL http://arxiv.org/abs/1401.0778.
- Link prediction in citation networks. Journal of the American Society for Information Science and Technology, 63(1):78–85, 2012. doi: https://doi.org/10.1002/asi.21664.
- The role of citation context in predicting long-term citation profiles: An experimental study based on a massive bibliographic text dataset. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, page 1271–1280, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450337946. doi: 10.1145/2806416.2806566. URL https://doi.org/10.1145/2806416.2806566.
- Reviewers are blinkered by bibliometrics. Nature, 544:411–412, 2017.
- R. L. Streit. Poisson point processes: imaging, tracking, and sensing. Springer Science & Business Media, 2010.
- Dyrep: Learning representations over dynamic graphs. In ICLR, 2019.
- Verse: Versatile graph embeddings from similarity measures. In Proceedings of the 2018 world wide web conference, pages 539–548, 2018.
- Atypical combinations and scientific impact. Science (New York, N.Y.), 342:468–72, 10 2013. doi: 10.1126/science.1240474.
- Modeling a century of citation distributions. Journal of Informetrics, 3, 11 2008. doi: 10.1016/j.joi.2009.03.010.
- L. Waltman. A review of the literature on citation impact indicators. Journal of informetrics, 10(2):365–391, 2016a.
- L. Waltman. A review of the literature on citation impact indicators. Journal of Informetrics, 10(2):365–391, 2016b. ISSN 1751-1577. doi: https://doi.org/10.1016/j.joi.2016.02.007.
- Universality of citation distributions revisited. Journal of the American Society for Information Science and Technology, 63(1):72–77, 2012.
- Quantifying long-term scientific impact. Science, 342(6154):127–132, 2013. doi: 10.1126/science.1237825.
- Comment on quantifying long-term scientific impact. Science, 345(6193):149–149, 2014. doi: 10.1126/science.1248770.
- Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Research Policy, 46(8):1416–1436, 2017.
- Large teams have developed science and technology; small teams have disrupted it. arXiv preprint arXiv:1709.02445, 2017.
- Generalized preferential attachment considering aging. Journal of Informetrics, 8(3):650–658, 2014. ISSN 1751-1577. doi: https://doi.org/10.1016/j.joi.2014.06.002.
- On modeling and predicting individual paper citation count over time. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, page 2676–2682. AAAI Press, 2016. ISBN 9781577357704.
- Dynamic network embedding survey. Neurocomputing, 472:212–223, 2022.
- Citation count prediction: Learning to estimate future citations for literature. pages 1247–1252, 10 2011. doi: 10.1145/2063576.2063757.
- Clustering coefficient and community structure of bipartite networks. Physica A: Statistical Mechanics and its Applications, 387(27):6869–6875, 2008. ISSN 0378-4371. doi: https://doi.org/10.1016/j.physa.2008.09.006.
- Embedding temporal network via neighborhood formation. In KDD, page 2857–2866, 2018.