Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text mining arXiv: a look through quantitative finance papers (2401.01751v2)

Published 3 Jan 2024 in cs.DL, cs.IR, and q-fin.GN

Abstract: This paper explores articles hosted on the arXiv preprint server with the aim to uncover valuable insights hidden in this vast collection of research. Employing text mining techniques and through the application of natural language processing methods, we examine the contents of quantitative finance papers posted in arXiv from 1997 to 2022. We extract and analyze crucial information from the entire documents, including the references, to understand the topics trends over time and to find out the most cited researchers and journals on this domain. Additionally, we compare numerous algorithms to perform topic modeling, including state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. A. Ali and H.A. Bashir. Bibliometric study on asset pricing. Qualitative Research in Financial Markets, 14(3):433–460, 2022.
  2. D. Angelov. Top2vec: Distributed representations of topics. https://arxiv.org/abs/2008.09470, 2020.
  3. M. Aria and C. Cuccurullo. bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4):959–975, 2017.
  4. Gephi: an open source software for exploring and manipulating networks. In Proceedings of the international AAAI conference on web and social media, volume 3, pages 361–362, 2009.
  5. M. Bianchetti and M. Carlicchi. Interest rates after the credit crunch: multiple-curve vanilla derivatives and SABR. https://arxiv.org/abs/1103.2567, 2012.
  6. Fat and heavy tails in asset management. The Journal of Portfolio Management, 2023.
  7. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
  8. Fine-tuning of RoBERTa for document classification of arXiv dataset. In G. Shakya, S.and Papakostas and K.A. Kamel, editors, Mobile Computing and Sustainable Informatics, pages 243–255, Singapore, 2023. Springer Nature Singapore.
  9. Twenty-five years of The European Journal of Finance (EJF): A retrospective analysis. The European Journal of Finance, 26(18):1817–1841, 2020.
  10. R. Carmona. The influence of economic research on financial mathematics: Evidence from the last 25 years. Finance and Stochastics, 26(1):85–101, 2022.
  11. M. Cesa. A brief history of quantitative finance. Probability, Uncertainty and Quantitative Risk, 2(1):1–16, 2017.
  12. On the use of arXiv as a dataset. https://arxiv.org/abs/1905.00075, 2019.
  13. E. Derman. Models behaving badly: Why confusing illusion with reality can lead to disaster, on Wall Street and in life. Wiley, 2011.
  14. E. Derman and M.B. Miller. The volatility smile. Wiley, 2016.
  15. How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133:285–296, 2021.
  16. W.H. DuBay. The principles of readability. ERIC, 2004.
  17. S. Ebinezer. Transform your topic modeling with ChatGPT: Cutting-edge NLP. https://medium.com/, 2023.
  18. Predicting research trends from arXiv. https://arxiv.org/abs/1903.02831, 2019.
  19. R. Egger and J. Yu. A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts. Frontiers in sociology, 7, 2022.
  20. Retrieving arXiv, SocArXiv, and SSRN metadata for initial review screening. Information and Software Technology, 161:107251, 2023. ISSN 0950-5849.
  21. M. Grootendorst. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. https://arxiv.org/abs/2203.05794, 2022.
  22. J. Huisman and J. Smits. Duration and quality of the peer review process: the author’s perspective. Scientometrics, 113(1):633–650, 2017.
  23. E. Ippoliti. Mathematics and finance: Some philosophical remarks. Topoi, 40:771–781, 2021.
  24. Q. Le and T. Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014.
  25. How many preprints have actually been printed and why: a case study of computer science preprints on arXiv. Scientometrics, 124(1):555–574, 2020.
  26. Z. Metelko and J. Maver. Exploring arXiv usage habits among Slovenian scientists. Journal of Documentation, 79(7):72–94, 2023.
  27. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781, 2013.
  28. ArXiving before submission helps everyone. https://arxiv.org/abs/2010.05365, 2020.
  29. K. Okamura. Scientometric engineering: Exploring citation dynamics via arXiv eprints. Quantitative Science Studies, 3(1):122–146, 2022.
  30. Topic modeling revisited: New evidence on algorithm performance and quality metrics. Plos one, 17(4):e0266325, 2022.
  31. Framework for topic modeling using BERT, LDA and K-Means. In 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pages 2204–2208. IEEE, 2022.
  32. Review of research on option pricing: A bibliometric analysis. Qualitative Research in Financial Markets, 2023.
  33. Images of the arXiv: Reconfiguring large scientific image datasets. Journal of Cultural Analytics, 3:1–41, 2021.
  34. OCTIS: comparing and optimizing topic models is simple! In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 263–270, 2021.
  35. N. Van Eck and L. Waltman. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2):523–538, 2010.
  36. Analyzing recent research trends of computer science from academic open-access digital library. In 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), pages 31–36, 2019.
  37. M. Vogl. Quantitative modelling frontiers: a literature review on the evolution in financial and risk modelling after the financial crisis (2008–2019). SN Business & Economics, 2(12):183, 2022.
  38. Preprints as accelerator of scholarly communication: An empirical analysis in Mathematics. Journal of Informetrics, 14(4):101097, 2020.
  39. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Computational Biology, 14(2), 2018.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com