Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mapping Researcher Activity based on Publication Data by means of Transformers (2306.09049v1)

Published 15 Jun 2023 in cs.CL, cs.DL, cs.IR, and cs.LG

Abstract: Modern performance on several NLP tasks has been enhanced thanks to the Transformer-based pre-trained LLM BERT. We employ this concept to investigate a local publication database. Research papers are encoded and clustered to form a landscape view of the scientific topics, in which research is active. Authors working on similar topics can be identified by calculating the similarity between their papers. Based on this, we define a similarity metric between authors. Additionally we introduce the concept of self-similarity to indicate the topical variety of authors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).   Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19-1423
  2. L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximation and projection for dimension reduction,” 2018, cite arxiv:1802.03426Comment: Reference implementation available at http://github.com/lmcinnes/umap. [Online]. Available: http://arxiv.org/abs/1802.03426
  3. J. A. Hartigan and M. A. Wong, “A k-means clustering algorithm,” JSTOR: Applied Statistics, vol. 28, no. 1, pp. 100–108, 1979.
  4. I. Beltagy, K. Lo, and A. Cohan, “SciBERT: A pretrained language model for scientific text,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).   Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 3615–3620. [Online]. Available: https://aclanthology.org/D19-1371
  5. M. Ostendorff, T. Ruas, T. Blume, B. Gipp, and G. Rehm, “Aspect-based document similarity for research papers,” in Proceedings of the 28th International Conference on Computational Linguistics.   Barcelona, Spain (Online): International Committee on Computational Linguistics, Dec. 2020, pp. 6194–6206. [Online]. Available: https://aclanthology.org/2020.coling-main.545
  6. D. Chandrasekaran and V. Mago, “Evolution of semantic similarity—a survey,” ACM Comput. Surv., vol. 54, no. 2, feb 2021. [Online]. Available: https://doi.org/10.1145/3440755
  7. K. Kades, J. Sellner, G. Koehler, P. M. Full, T. Y. E. Lai, J. Kleesiek, and K. H. Maier-Hein, “Adapting bidirectional encoder representations from transformers (bert) to assess clinical semantic textual similarity: Algorithm development and validation study,” JMIR Med Inform, vol. 9, no. 2, p. e22795, Feb 2021. [Online]. Available: https://medinform.jmir.org/2021/2/e22795
  8. X. Yang, X. He, H. Zhang, Y. Ma, J. Bian, and Y. Wu, “Measurement of semantic textual similarity in clinical texts: Comparison of transformer-based models,” JMIR Med Inform, vol. 8, no. 11, p. e19735, Nov 2020. [Online]. Available: http://medinform.jmir.org/2020/11/e19735/
  9. M. Kretschmann, A. Fischer, and B. Elser, “Extracting keywords from publication abstracts for an automated researcher recommendation system,” Digitale Welt, vol. 4, pp. 20–25, 01 2020.
  10. P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0377042787901257
Citations (1)

Summary

We haven't generated a summary for this paper yet.