Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the evolution of research topics during the COVID-19 pandemic (2310.03928v1)

Published 5 Oct 2023 in cs.CL and cs.SI

Abstract: The COVID-19 pandemic has changed the research agendas of most scientific communities, resulting in an overwhelming production of research articles in a variety of domains, including medicine, virology, epidemiology, economy, psychology, and so on. Several open-access corpora and literature hubs were established; among them, the COVID-19 Open Research Dataset (CORD-19) has systematically gathered scientific contributions for 2.5 years, by collecting and indexing over one million articles. Here, we present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including LLMs), resulting in an architecture for clustering articles along orthogonal dimensions and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series, equipped with easy-to-drive statistical testing for analyzing the significance of topic emergence along arbitrarily selected time windows. The processes of data preparation and results visualization are completely general and virtually applicable to any corpus of textual documents - thus suited for effective adaptation to other contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. American Society for Microbiology, 2023. COVID-19 (SARS-CoV-2 Coronavirus) Resources. https://asm.org:443/Resource-Pages/COVID-19-Resources. Last accessed: Sept 22nd, 2023.
  2. Top2Vec: Distributed Representations of Topics. arXiv preprint arXiv:2008.09470 .
  3. The effect of COVID-19 on scientific publishing in Italy. Epidemiologia & Prevenzione 45, 449–451.
  4. Molecular characterization of SARS-CoV-2 from the first case of COVID-19 in Italy. Clinical Microbiology and Infection 26, 954–956.
  5. Information Retrieval Models, in: Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., Fraternali, P., Quarteroni, S. (Eds.), Web Information Retrieval, Springer, Berlin, Heidelberg. pp. 27–37.
  6. When did coronavirus arrive in europe? Statistical Methods & Applications 31, 181–195.
  7. LitCovid: An Open Database of COVID-19 Literature. Nucleic Acids Research 49, D1534–D1540.
  8. Experimental explorations on short text topic mining between LDA and NMF based Schemes. Knowledge-Based Systems 163, 1–13.
  9. SPECTER: Document-level Representation Learning using Citation-informed Transformers.
  10. A Scientometric Overview of CORD-19. PLOS ONE 16, e0244839.
  11. About the lens. https://about.lens.org/covid-19/. Last accessed: Sept 22nd, 2023.
  12. Evidence Extraction to Validate Medical Claims in Fake News Detection, in: Traina, A., Wang, H., Zhang, Y., Siuly, S., Zhou, R., Chen, L. (Eds.), Health Information Science, Springer Nature Switzerland, Cham. pp. 3–15.
  13. Analysis of the influence of political polarization in the vaccination stance: the brazilian covid-19 scenario, in: Proceedings of the International AAAI Conference on Web and Social Media, pp. 159–170.
  14. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Frontiers in Sociology 7, 886498.
  15. Elsevier, 2023. Novel Coronavirus Information Center. https://www.elsevier.com/connect/coronavirus-information-center. Last accessed: Sept 22nd, 2023.
  16. Growing polarization around climate change on social media. Nature Climate Change 12, 1114–1121.
  17. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 .
  18. CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset, in: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, Association for Computational Linguistics, Online.
  19. An introduction to statistical learning: with applications in R. Springer.
  20. Document clustering and topic discovery based on semantic similarity in scientific literature, in: 2011 IEEE 3rd International Conference on Communication Software and Networks, pp. 425–429.
  21. COVID-KOP: Integrating emerging COVID-19 data with the ROBOKOP database. Bioinformatics (Oxford, England) 37, 586–587.
  22. Data association for topic intensity tracking, in: Proceedings of the 23rd international conference on Machine learning, pp. 497–504.
  23. Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association 47, 583–621.
  24. A machine-generated view of the role of blood glucose levels in the severity of COVID-19. Frontiers in Public Health 9, 695139.
  25. MacMillan Learning, 2022. Information about COVID-19. https://covid19.macmillanlearning.com/. Last accessed: Sept 22nd, 2023.
  26. Coronavirus Pandemic (COVID-19). https://ourworldindata.org/coronavirus. Last accessed: Sept 22nd, 2023.
  27. hdbscan: Hierarchical Density Based Clustering. Journal of Open Source Software 2, 205.
  28. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426 .
  29. Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations, in: Proceedings of the ACM Web Conference 2022, pp. 3143–3152.
  30. Mixing Dirichlet Topic Models and Word Embeddings to Make Lda2vec. arXiv preprint arXiv:1605.02019 .
  31. Density-based clustering validation, in: Proceedings of the 2014 SIAM international conference on data mining, SIAM. pp. 839–847.
  32. Wordcloud. https://github.com/amueller/wordcloud. Last accessed: Sept 22nd, 2023.
  33. National Institutes of Health, 2023. NIH OPA iSearch COVID-19 Portfolio. https://icite.od.nih.gov/covid19/search/. Last accessed: Sept 22nd, 2023.
  34. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830.
  35. Mapping the intellectual structure of the coronavirus field (2000–2020): A co-word analysis. Scientometrics 126, 6625–6657.
  36. Language models are unsupervised multitask learners. OpenAI blog 1, 9.
  37. Software Framework for Topic Modelling with Large Corpora, in: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta. pp. 45–50.
  38. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084 .
  39. CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning. GigaScience 12, giad036.
  40. Cluster Quality Analysis Using Silhouette Score, in: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 747–748.
  41. Clinical Text Classification with Word Embedding Features vs. Bag-of-Words Features, in: 2018 IEEE International Conference on Big Data (Big Data), pp. 2874–2878.
  42. Springer Nature, 2023. Querystring parameter - api portal. https://dev.springernature.com/querystring-parameters. Last accessed: Sept 22nd, 2023.
  43. Streamlit, 2023. A Faster Way to Build and Share Data Apps. https://streamlit.io/. Last accessed: Sept 22nd, 2023.
  44. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models. arXiv preprint arXiv:2104.08663 .
  45. Studies of Novel Coronavirus Disease 19 (COVID-19) Pandemic: A Global Analysis of Literature. International Journal of Environmental Research and Public Health 17, 4095.
  46. COVIDScholar: An automated COVID-19 research aggregation and analysis platform. arXiv preprint arXiv:2012.03891 .
  47. United Nations | UN News, 2023. WHO chief declares end to COVID-19 as a global health emergency. https://news.un.org/en/story/2023/05/1136367. Last accessed: Sept 22nd, 2023.
  48. A Second Pandemic? Perspective on Information Overload in the COVID-19 Era. Otolaryngology-Head and Neck Surgery 163, 931–933.
  49. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272.
  50. Fact or Fiction: Verifying Scientific Claims, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online. pp. 7534–7550.
  51. Text mining approaches for dealing with the rapidly expanding literature on COVID-19. Briefings in Bioinformatics 22, 781–799.
  52. CORD-19: The COVID-19 Open Research Dataset. arXiv preprint arXiv:2004.10706 .
  53. COVID-19 literature knowledge graph construction and drug repurposing report generation, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, Association for Computational Linguistics, Online. pp. 66–77.
  54. Wikipedia, 2023. Timeline of the COVID-19 pandemic. https://en.wikipedia.org/wiki/Timeline_of_the_COVID-19_pandemic. Last accessed: Sept 22nd, 2023.
  55. COVID-19 knowledge graph: Accelerating information retrieval and discovery for scientific literature, in: Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP.
  56. World Health Organization, 2023a. A brief history of vaccination. https://www.who.int/news-room/spotlight/history-of-vaccination/a-brief-history-of-vaccination. Last accessed: Sept 22nd, 2023.
  57. World Health Organization, 2023b. Global research on coronavirus disease (COVID-19). https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-research-on-novel-coronavirus-2019-ncov. Last accessed: Sept 22nd, 2023.
  58. Topic evolution, disruption and resilience in early COVID-19 research. Scientometrics 126, 4225–4253.
  59. Quantifying the Impact of Positive Stress on Companies from Online Employee Reviews. Scientific Reports 13, 1603.
Citations (2)

Summary

We haven't generated a summary for this paper yet.