Papers
Topics
Authors
Recent
2000 character limit reached

Are Large-Language Models Graph Algorithmic Reasoners? (2410.22597v1)

Published 29 Oct 2024 in cs.LG and cs.AI

Abstract: We seek to address a core challenge facing current LLMs. LLMs have demonstrated superior performance in many tasks, yet continue to struggle with reasoning problems on explicit graphs that require multiple steps. To address this gap, we introduce a novel benchmark designed to evaluate LLM performance on classical algorithmic reasoning tasks on explicit graphs. Our benchmark encompasses five fundamental algorithms: Breadth-First Search (BFS) and Depth-First Search (DFS) for connectivity, Dijkstra's algorithm and Floyd-Warshall algorithm for all nodes shortest path, and Prim's Minimum Spanning Tree (MST-Prim's) algorithm. Through extensive experimentation, we assess the capabilities of state-of-the-art LLMs in executing these algorithms step-by-step and systematically evaluate their performance at each stage. Our findings highlight the persistent challenges LLMs face in this domain and underscore the necessity for advanced prompting techniques and algorithmic instruction to enhance their graph reasoning abilities. This work presents MAGMA, the first comprehensive benchmark focused on LLMs completing classical graph algorithms, and provides a critical step toward understanding and improving their structured problem-solving skills.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Exploring the Limitations of Graph Reasoning in Large Language Models. arXiv preprint. ArXiv:2402.01805 [cs].
  2. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.
  3. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.
  4. Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts. arXiv preprint. ArXiv:2401.14295 [cs].
  5. Neural algorithmic reasoning with causal regularisation. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
  6. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42.
  7. Exploring the potential of large language models (llms) in learning on graphs. arXiv preprint arXiv:2307.03393.
  8. Neural algorithmic reasoners are implicit planners. Preprint, arXiv:2110.05442.
  9. Specializing smaller language models towards multi-step reasoning. Preprint, arXiv:2301.12726.
  10. Graph Descriptive Order Improves Reasoning with Large Language Model. arXiv preprint. ArXiv:2402.07140 [cs].
  11. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212.
  12. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471.
  13. Gpt4graph: Can large language models understand graph structured data? an empirical evaluation and benchmarking. arXiv preprint arXiv:2305.15066.
  14. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584.
  15. A generalist neural algorithmic learner. Preprint, arXiv:2209.11142.
  16. Mistral 7b. Preprint, arXiv:2310.06825.
  17. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. In ICLR.
  18. Jon Kleinberg and Eva Tardos. 2005. Algorithm Design. Addison-Wesley Longman Publishing Co., Inc., USA.
  19. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475.
  20. A Survey of Graph Meets Large Language Model: Progress and Future Directions. arXiv preprint. ArXiv:2311.12399 [cs].
  21. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493.
  22. Evaluating large language models on graphs: Performance insights and comparative analysis. arXiv preprint arXiv:2308.11224.
  23. The clrs-text algorithmic reasoning language benchmark. Preprint, arXiv:2406.04229.
  24. Benchmarking chatgpt on algorithmic reasoning. Preprint, arXiv:2404.03441.
  25. Dual algorithmic reasoning. Preprint, arXiv:2302.04496.
  26. OpenAI. 2023. Gpt-4 technical report. Preprint, arXiv:2303.08774.
  27. Scott Reed and Nando De Freitas. 2015. Neural programmer-interpreters. arXiv preprint arXiv:1511.06279.
  28. Daniel Selsam and Nikolaj Bjørner. 2019. Guiding high-performance sat solvers with unsat-core predictions. In International Conference on Theory and Applications of Satisfiability Testing, pages 336–353. Springer.
  29. Learning a sat solver from single-bit supervision. arXiv preprint arXiv:1802.03685.
  30. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  31. The CLRS algorithmic reasoning benchmark. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 22084–22102. PMLR.
  32. Petar Velickovic and Charles Blundell. 2021. Neural algorithmic reasoning. CoRR, abs/2105.02761.
  33. Pointer networks. In Advances in Neural Information Processing Systems, pages 2692–2700.
  34. Can Language Models Solve Graph Problems in Natural Language?
  35. Can language models solve graph problems in natural language? arXiv preprint arXiv:2305.10037.
  36. How to transfer algorithmic reasoning knowledge to learn new algorithms? In Advances in Neural Information Processing Systems, volume 34, pages 19500–19512. Curran Associates, Inc.
  37. What can neural networks reason about? arXiv preprint arXiv:1905.13211.
  38. Inference in probabilistic graphical models by graph neural networks. arXiv preprint arXiv:1803.07710.
  39. Llm4dyg: Can large language models solve problems on dynamic graphs? arXiv preprint arXiv:2310.17110.
  40. Graphtext: Graph reasoning in text space. arXiv preprint arXiv:2310.01089.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.