Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 120 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

GPT-Signal: Generative AI for Semi-automated Feature Engineering in the Alpha Research Process (2410.18448v1)

Published 24 Oct 2024 in cs.CE

Abstract: In the trading process, financial signals often imply the time to buy and sell assets to generate excess returns compared to a benchmark (e.g., an index). Alpha is the portion of an asset's return that is not explained by exposure to this benchmark, and the alpha research process is a popular technique aiming at developing strategies to generate alphas and gain excess returns. Feature Engineering, a significant pre-processing procedure in machine learning and data analysis that helps extract and create transformed features from raw data, plays an important role in algorithmic trading strategies and the alpha research process. With the recent development of Generative Artificial Intelligence(Gen AI) and LLMs, we present a novel way of leveraging GPT-4 to generate new return-predictive formulaic alphas, making alpha mining a semi-automated process, and saving time and energy for investors and traders.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Thomas Arkan et al. 2016. The importance of financial ratios in predicting stock price trends: A case study in emerging markets. Finanse, Rynki Finansowe, Ubezpieczenia, (79):13–26.
  2. Charles Schwab. 2023. Five key financial ratios for stock analysis. https://www.schwab.com/learn/story/five-key-financial-ratios-stock-analysis. Accessed: 2023-04-25.
  3. Harrison Chase. 2022. Langchain. Available at: https://github.com/langchain-ai/langchain.
  4. Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence. arXiv preprint arXiv:2407.07061.
  5. Wenhu Chen. 2022. Large language models are few (1)-shot table reasoners. arXiv preprint arXiv:2210.06710.
  6. Training verifiers to solve math word problems.
  7. What do llms know about financial markets? a case study on reddit market sentiment analysis. In Companion Proceedings of the ACM Web Conference 2023, pages 107–110.
  8. Enhance image-to-image generation with llava-generated prompts. In 2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), pages 77–81. IEEE.
  9. Eugene F Fama and James D MacBeth. 1973. Risk, return, and equilibrium: Empirical tests. Journal of political economy, 81(3):607–636.
  10. Teacherlm: Teaching to fish rather than giving the fish, language modeling likewise. arXiv preprint arXiv:2310.19019.
  11. Tabllm: Few-shot classification of tabular data with large language models. In Conference Proceedings Name, pages Start Page–End Page. Publisher Name, if available. Replace "Conference Proceedings Name" and page numbers with actual details.
  12. Llms for semi-automated data science: Introducing caafe for context-aware automated feature engineering. Available at: https://arxiv.org/abs/2305.03403 (Accessed: 28 February 2024).
  13. Automated Machine Learning: Methods, Systems, Challenges. Springer. Available for free at http://automl.org/book.
  14. Mixtral of experts.
  15. Zura Kakushadze. 2016. 101 formulaic alphas.
  16. Amporn Atsawarungruangkit Kawee Numpacharoen. 2012. Generating correlation matrices based on the boundaries of their coefficients. PloS one, 7:e48902.
  17. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
  18. Deception detection from linguistic and physiological data streams using bimodal convolutional neural networks. In 2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), pages 263–267. IEEE.
  19. Contextual hourglass network for semantic segmentation of high resolution aerial imagery. In 2024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI), pages 15–18. IEEE.
  20. Stock market analysis and prediction using lstm: A case study on technology stocks. Innovations in Applied Engineering and Technology, 2(1):1–6.
  21. A contrastive deep learning approach to cryptocurrency portfolio with us treasuries. Journal of Computer Technology and Applied Mathematics, 1(3):1–10.
  22. Incorporating economic indicators and market sentiment effect into us treasury bond yield prediction with machine learning. Journal of Infrastructure, Policy and Development, 8(9):7671.
  23. Knowledge graph embedding and few-shot relational learning methods for digital assets in usa. Journal of Industrial Engineering and Applied Science, 2(5):10–18.
  24. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 158–167, Vancouver, Canada. Association for Computational Linguistics.
  25. Fingpt: Democratizing internet-scale data for financial large language models. Name of Journal or Conference if known. Available online.
  26. A study of the attention abnormality in trojaned berts. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4727–4741.
  27. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  28. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  29. Llama: Open and efficient foundation language models.
  30. Attention is all you need. Advances in neural information processing systems, 30.
  31. Prompt2model: Generating deployable models from natural language instructions. arXiv preprint arXiv:2308.12261.
  32. GPT-RE: In-context learning for relation extraction using large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3534–3547, Singapore. Association for Computational Linguistics.
  33. A novel bayesian pay-as-you-drive insurance model with risk prediction and causal mapping. Decision Analytics Journal, page 100522.
  34. S. Wang et al. 2023a. Alpha-gpt: Human-ai interactive alpha mining for quantitative investment. Available at: https://arxiv.org/abs/2308.00016v1 (Accessed: 28 February 2024).
  35. Gpt-ner: Named entity recognition via large language models.
  36. Graph neural network recommendation system for football formation. Applied Science and Biotechnology Journal for Advanced Research, 3(3):33–39.
  37. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  38. The application of big data and ai in risk control models: Safeguarding user security. International Journal of Frontiers in Engineering Technology, 6(3).
  39. Yijie Weng and Jianhao Wu. 2024a. Big data and machine learning in defence. International Journal of Computer Science and Information Technology, 16(2).
  40. Yijie Weng and Jianhao Wu. 2024b. Fortifying the global data fortress: a multidimensional examination of cyber security indexes and data protection measures across 193 nations. International Journal of Frontiers in Engineering Technology, 6(2).
  41. Yijie Weng and Jianhao Wu. 2024c. Leveraging artificial intelligence to enhance data security and combat cyber attacks. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 5(1):392–399.
  42. Empirical study of zero-shot NER with ChatGPT. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7935–7956, Singapore. Association for Computational Linguistics.
  43. Autoalpha: An efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment. arXiv preprint arXiv:2002.08245.
  44. Self-guide: Better task-specific instruction following via self-synthetic finetuning. arXiv preprint arXiv:2407.12874.
  45. H. Zhao et al. 2024b. Revolutionizing finance with llms: An overview of applications and insights. Available at: https://arxiv.org/abs/2401.11641 (Accessed: 28 February 2024).
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.