Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Data-Centric Automatic R&D (2404.11276v2)

Published 17 Apr 2024 in cs.AI and q-fin.GN

Abstract: The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments. Researchers often seek the potential research directions by reading and then verifying them through experiments. The process imposes a significant burden on researchers. In the past decade, the data-driven black-box deep learning method has demonstrated its effectiveness in a wide range of real-world scenarios, which exacerbates the experimental burden of researchers and thus renders the potential successful discoveries veiled. Therefore, automating such a research and development (R&D) process is an urgent need. In this paper, we serve as the first effort to formalize the goal by proposing a Real-world Data-centric automatic R&D Benchmark, namely RD2Bench. RD2Bench benchmarks all the operations in data-centric automatic R&D (D-CARD) as a whole to navigate future work toward our goal directly. We focus on evaluating the interaction and synergistic effects of various model capabilities and aiding in selecting well-performing trustworthy models. Although RD2Bench is very challenging to the state-of-the-art (SOTA) LLM named GPT-4, indicating ample research opportunities and more research efforts, LLMs possess promising potential to bring more significant development to D-CARD: They are able to implement some simple methods without adopting any additional techniques. We appeal to future work to take developing techniques for tackling automatic R&D into consideration, thus bringing the opportunities of the potential revolutionary upgrade to human productivity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Anonymous. MetaTool benchmark: Deciding whether to use tools and which to use. In The Twelfth International Conference on Learning Representations, 2024.
  2. Autonomous chemical research with large language models. Nature, 624(7992):570–578, December 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06792-0. URL http://dx.doi.org/10.1038/s41586-023-06792-0.
  3. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  4. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. WW Norton & Company, 2014.
  5. Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  6418–6435, Toronto, Canada, 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.354.
  6. Stable Learning Establishes some Common Ground between Causal Inference and Machine Learning. Nature Machine Intelligence, 4(2):110–115, February 2022. ISSN 2522-5839. doi: 10.1038/s42256-022-00445-z.
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, 2018.
  8. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  9. Shortcut Learning in Deep Neural Networks. Nature Machine Intelligence, 2(11):665–673, November 2020. ISSN 2522-5839. doi: 10.1038/s42256-020-00257-z.
  10. Anti-symmetric DGN: a stable architecture for deep graph networks. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=J3Y7cgZOOS.
  11. Swe-bench: Can language models resolve real-world github issues?, 2023a. URL https://arxiv.org/abs/2310.06770.
  12. SWE-bench: Can language models resolve real-world GitHub issues? arXiv preprint arXiv:2310.06770, 2023b.
  13. Artificial intelligence for autonomous molecular design: A perspective. Molecules, 26(22):6761, November 2021. ISSN 1420-3049. doi: 10.3390/molecules26226761. URL http://dx.doi.org/10.3390/molecules26226761.
  14. Theory of mind for multi-agent collaboration via large language models. In Conference on Empirical Methods in Natural Language Processing, 2023. URL https://api.semanticscholar.org/CorpusID:264172518.
  15. Large scale learning on non-homophilous graphs: New benchmarks and strong simple methods. Advances in Neural Information Processing Systems, 34:20887–20902, 2021.
  16. Ml-bench: Large language models leverage open-source libraries for machine learning tasks, 2023a. URL https://arxiv.org/abs/2311.09835.
  17. ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks, November 2023b.
  18. A survey of deep learning for mathematical reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.acl-long.817. URL http://dx.doi.org/10.18653/v1/2023.acl-long.817.
  19. Distributed representations of words and phrases and their compositionality. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
  20. Did the Model Understand the Question? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1896–1906, Melbourne, Australia, 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1176.
  21. OpenAI. Gpt-4 technical report, 2023a. URL https://arxiv.org/abs/2303.08774.
  22. OpenAI. GPT-4 Technical Report, March 2023b.
  23. Training language models to follow instructions with human feedback, March 2022.
  24. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
  25. yoheinakajima/babyagi. 1 2024. URL https://github.com/yoheinakajima/babyagi.
  26. Carlota Perez. Technological Revolutions and Financial Capital. Edward Elgar Publishing, 2003.
  27. Karl Popper. The Logic of Scientific Discovery. Routledge, 2005.
  28. Creator: Tool creation for disentangling abstract and concrete reasoning of large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.findings-emnlp.462. URL http://dx.doi.org/10.18653/v1/2023.findings-emnlp.462.
  29. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?, February 2023a.
  30. Toolllm: Facilitating large language models to master 16000+ real-world apis, 2023b. URL https://arxiv.org/abs/2307.16789.
  31. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023c.
  32. Recipe for a General, Powerful, Scalable Graph Transformer. Advances in Neural Information Processing Systems, 35, 2022.
  33. Gustav Ranis and John C. H. Fei. A theory of economic development. The American Economic Review, 51(4):533–565, 1961. ISSN 00028282.
  34. Edge directionality improves learning on heterophilic graphs, 2023. URL https://arxiv.org/abs/2305.10498.
  35. Gisbert Schneider. Automating drug discovery. Nature Reviews Drug Discovery, 17(2):97–113, December 2017. ISSN 1474-1784. doi: 10.1038/nrd.2017.232. URL http://dx.doi.org/10.1038/nrd.2017.232.
  36. Dudley Shapere. The structure of scientific revolutions. The Philosophical Review, 73(3):383–394, 1964. ISSN 00318108, 15581470. doi: 10.2307/2183664.
  37. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, 2023. URL https://arxiv.org/abs/2303.17580.
  38. Reflexion: language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=vAElhFcKW6.
  39. Adam Smith. The Wealth of Nations [1776], volume 11937. na, 1937.
  40. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=uyTL5Bvosj.
  41. Debugbench: Evaluating debugging capability of large language models, 2024. URL https://arxiv.org/abs/2401.04621.
  42. Llama 2: Open foundation and fine-tuned chat models, 2023. URL https://arxiv.org/abs/2307.09288.
  43. Revisiting Relation Extraction in the era of Large Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  15566–15589, Toronto, Canada, 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.868.
  44. Voyager: An open-ended embodied agent with large language models, 2023a. URL https://arxiv.org/abs/2305.16291.
  45. Dt-solver: Automated theorem proving with dynamic-tree sampling guided by proof-level value function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2023b. doi: 10.18653/v1/2023.acl-long.706. URL http://dx.doi.org/10.18653/v1/2023.acl-long.706.
  46. Identifying and mitigating spurious correlations for improving robustness in NLP models. In Findings of the Association for Computational Linguistics: NAACL 2022, pp.  1719–1729, Seattle, United States, July 2022. Association for Computational Linguistics.
  47. Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nature Communications, 15(1), January 2024. ISSN 2041-1723. doi: 10.1038/s41467-023-43720-2. URL http://dx.doi.org/10.1038/s41467-023-43720-2.
  48. Daniel Whalen. Holophrasm: a neural automated theorem prover for higher-order logic, 2016. URL https://arxiv.org/abs/1608.02644.
  49. Autogen: Enabling next-gen llm applications via multi-agent conversation, 2023. URL https://arxiv.org/abs/2308.08155.
  50. Graph neural networks are inherently good generalizers: Insights by bridging gnns and mlps. In International Conference on Learning Representations (ICLR), 2023a.
  51. Auto-gpt for online decision making: Benchmarks and additional opinions, 2023b. URL https://arxiv.org/abs/2306.02224.
  52. Leandojo: Theorem proving with retrieval-augmented language models. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023c. URL https://openreview.net/forum?id=g7OX2sOJtn.
  53. Leveraging large language model for automatic evolving of industrial data-centric r&d cycle, 2023d. URL https://arxiv.org/abs/2310.11249.
  54. Tree of thoughts: Deliberate problem solving with large language models, 2023. URL https://arxiv.org/abs/2305.10601.
  55. Deep Stable Learning for Out-Of-Distribution Generalization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  5368–5378, Nashville, TN, USA, June 2021. IEEE. ISBN 978-1-66544-509-2. doi: 10.1109/CVPR46437.2021.00533.
  56. A survey of large language models, 2023. URL https://arxiv.org/abs/2303.18223.
  57. ToolQA: A Dataset for LLM Question Answering with External Tools, June 2023.
  58. Emergent abilities of large language models. TMLR, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.