2000 character limit reached
Benchmarking ChatGPT on Algorithmic Reasoning (2404.03441v2)
Published 4 Apr 2024 in cs.AI, cs.CL, and cs.LG
Abstract: We evaluate ChatGPT's ability to solve algorithm problems from the CLRS benchmark suite that is designed for GNNs. The benchmark requires the use of a specified classical algorithm to solve a given problem. We find that ChatGPT outperforms specialist GNN models, using Python to successfully solve these problems. This raises new points in the discussion about learning algorithms with neural networks and how we think about what out of distribution testing looks like with web scale training data.
- End-to-end algorithm synthesis with recurrent networks: Logical extrapolation without overthinking. Advances in Neural Information Processing Systems, 35, 2022.
- Neural algorithmic reasoning with causal regularisation. arXiv preprint arXiv:2302.10258, 2023.
- On the markov property of neural algorithmic reasoning: Analyses and methods. arXiv preprint arXiv:2403.04929, 2024.
- Introduction to algorithms. MIT press, 2022.
- Simulation of graph algorithms with looped transformers. arXiv preprint arXiv:2402.01107, 2024.
- Graph neural networks are dynamic programmers. Advances in Neural Information Processing Systems, 35:20635–20647, 2022.
- Pal: Program-aided language models. In International Conference on Machine Learning, pp. 10764–10799. PMLR, 2023.
- Neural algorithmic reasoning for combinatorial optimisation. arXiv preprint arXiv:2306.06064, 2023.
- A generalist neural algorithmic learner. In The First Learning on Graphs Conference, 2022. URL https://openreview.net/forum?id=FebadKZf6Gd.
- Neural priority queues for graph neural networks. arXiv preprint arXiv:2307.09660, 2023.
- Recursive algorithmic reasoning. arXiv preprint arXiv:2307.00337, 2023.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Triplet edge attention for algorithmic reasoning. arXiv preprint arXiv:2312.05611, 2023.
- E.L. Lawler. The Travelling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley-Interscience series in discrete mathematics and optimization. John Wiley & Sons, 1985. URL https://books.google.co.uk/books?id=qbFlMwEACAAJ.
- Faster sorting algorithms discovered using deep reinforcement learning. Nature, 618(7964):257–263, 2023.
- [re] end-to-end algorithm synthesis with recurrent networks: Logical extrapolation without overthinking. In ML Reproducibility Challenge 2022, 2023. URL https://openreview.net/forum?id=WaZB4pUVTi.
- Salsa-clrs: A sparse and scalable benchmark for algorithmic reasoning. arXiv preprint arXiv:2309.12253, 2023.
- Latent space representations of neural algorithmic reasoners. arXiv preprint arXiv:2307.08874, 2023.
- Dual algorithmic reasoning. arXiv preprint arXiv:2302.04496, 2023.
- Neural algorithmic reasoning without intermediate supervision. arXiv preprint arXiv:2306.13411, 2023.
- Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks. Advances in Neural Information Processing Systems, 34:6695–6706, 2021.
- Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Zephyr: Direct distillation of lm alignment, 2023.
- Neural execution of graph algorithms. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SkgKO0EtvS.
- The CLRS algorithmic reasoning benchmark. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 22084–22102. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/velickovic22a.html.
- If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. arXiv preprint arXiv:2401.00812, 2024.