Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Startup Success with Text Analysis (2312.06236v1)

Published 11 Dec 2023 in cs.LG, cs.CL, and cs.IR

Abstract: Investors are interested in predicting future success of startup companies, preferably using publicly available data which can be gathered using free online sources. Using public-only data has been shown to work, but there is still much room for improvement. Two of the best performing prediction experiments use 17 and 49 features respectively, mostly numeric and categorical in nature. In this paper, we significantly expand and diversify both the sources and the number of features (to 171) to achieve better prediction. Data collected from Crunchbase, the Google Search API, and Twitter (now X) are used to predict whether a company will raise a round of funding within a fixed time horizon. Much of the new features are textual and the Twitter subset include linguistic metrics such as measures of passive voice and parts-of-speech. A total of ten machine learning models are also evaluated for best performance. The adaptable model can be used to predict funding 1-5 years into the future, with a variable cutoff threshold to favor either precision or recall. Prediction with comparable assumptions generally achieves F scores above 0.730 which outperforms previous attempts in the literature (0.531), and does so with fewer examples. Furthermore, we find that the vast majority of the performance impact comes from the top 18 of 171 features which are mostly generic company observations, including the best performing individual feature which is the free-form text description of the company.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. 2022. Crunchbase Daily CSV Export. https://data.crunchbase.com/docs/daily-csv-export.
  2. Piotr Antosiuk. 2021. Predicting startup success with machine learning methods. https://repo.pw.edu.pl/info/master/WUTadfafd4ff4284265b3820d0743f24cba/
  3. Predicting new venture survival: A Twitter-based machine learning approach to measuring online legitimacy. (2019). https://linkinghub.elsevier.com/retrieve/pii/S2352673418301197
  4. Finding the Unicorn: Predicting Early Stage Startup Success Through a Hybrid Intelligence Method. (2017). https://doi.org/10.2139/ssrn.3159123
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  6. Simplification of Flesch reading ease formula. Journal of applied psychology 35, 5 (1951), 333.
  7. Where Do You Want To Invest? Predicting Startup Funding From Freely, Publicly Available Web Information. arXiv:2204.06479 [cs.CE]
  8. Emily Gavrilenko. 2022. Predicting Startup Success using Publicly Available Data. https://digitalcommons.calpoly.edu/theses/2652/
  9. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. (2017).
  10. NetShop ISP. 2020. How Many Tech Startups Are Created Each Year? https://netshopisp.medium.com/how-many-tech-startups-are-created-each-year-27539d0a4c48
  11. Stylianos Kampakis and Andreas Adamides. [n. d.]. Using Twitter to predict football outcomes. https://arxiv.org/pdf/1411.1243.pdf
  12. Election prediction on twitter: A systematic mapping study. https://www.hindawi.com/journals/complexity/2021/5565434/
  13. Bryce Murray. 2021. Overcoming AI bias in predicting startup success. (2021). https://towardsdatascience.com/overcoming-ai-faults-in-predicting-startup-success-768985e6e289
  14. Web-based Startup Success Prediction. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (New York, NY, USA, 2018-10-17). Association for Computing Machinery. https://doi.org/10.1145/3269206.3272011
  15. Event classification and location prediction from tweets during disasters. https://d-nb.info/1132940923/34
  16. Twitter as a Tool for Predicting Elections Results. https://ieeexplore.ieee.org/abstract/document/6425594
  17. Robert Stuart and Pier Abetti. 1987. Start-up ventures: Towards the prediction of initial success. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1504468
  18. Twitter sentiment as a weak signal in venture capital financing. (2021). https://linkinghub.elsevier.com/retrieve/pii/S0883902620306704
  19. A Supervised Approach to Predict Company Acquisition With Factual and Topic Features Using Profiles and News Articles on TechCrunch. (2012).
  20. Cemre Ünal and Ioana Ceasu. 2019. A Machine Learning Approach Towards Startup Success Prediction. https://www.econstor.eu/handle/10419/230798
  21. Kamil Żbikowski and Piotr Antosiuk. 2021. A machine learning, bias-free approach for predicting business success using Crunchbase data. Information Processing & Management (2021). https://www.sciencedirect.com/science/article/pii/S0306457321000595

Summary

We haven't generated a summary for this paper yet.