Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Guiding Exploration for Combinatorial Problems (2405.17950v1)

Published 28 May 2024 in cs.AI
Self-Guiding Exploration for Combinatorial Problems

Abstract: LLMs have become pivotal in addressing reasoning tasks across diverse domains, including arithmetic, commonsense, and symbolic reasoning. They utilize prompting techniques such as Exploration-of-Thought, Decomposition, and Refinement to effectively navigate and solve intricate tasks. Despite these advancements, the application of LLMs to Combinatorial Problems (CPs), known for their NP-hardness and critical roles in logistics and resource management remains underexplored. To address this gap, we introduce a novel prompting strategy: Self-Guiding Exploration (SGE), designed to enhance the performance of solving CPs. SGE operates autonomously, generating multiple thought trajectories for each CP task. It then breaks these trajectories down into actionable subtasks, executes them sequentially, and refines the results to ensure optimal outcomes. We present our research as the first to apply LLMs to a broad range of CPs and demonstrate that SGE outperforms existing prompting strategies by over 27.84% in CP optimization performance. Additionally, SGE achieves a 2.46% higher accuracy over the best existing results in other reasoning tasks (arithmetic, commonsense, and symbolic).

The paper "Self-Guiding Exploration for Combinatorial Problems" addresses a significant gap in the application of LLMs to Combinatorial Problems (CPs), which are crucial in complex fields such as logistics and resource management. Combinatorial Problems are known for their NP-hardness, making them challenging for conventional algorithms and requiring innovative methods to achieve efficient solutions.

LLMs have shown their effectiveness in various reasoning tasks, utilizing prompting techniques like Exploration-of-Thought, Decomposition, and Refinement. Despite these advances, their application to CPs has been largely unexplored. This paper introduces a novel prompting strategy termed Self-Guiding Exploration (SGE), which aims to enhance LLM performance in solving CPs.

SGE is designed to operate autonomously, generating multiple "thought trajectories" for each CP task. These trajectories are broken down into manageable subtasks, which are executed sequentially. The results are then refined to ensure optimal outcomes. This process significantly improves the LLM's ability to navigate the solution space of CPs efficiently.

Key findings and contributions of the paper include:

  1. Performance Optimization: SGE outperforms existing prompting strategies by an impressive 27.84% in terms of CP optimization performance. This highlights the effectiveness of the self-guiding mechanism in improving LLM's capability to solve complex combinatorial problems.
  2. Accuracy Improvement: Beyond optimization, SGE demonstrates a 2.46% higher accuracy over the best existing results in other reasoning tasks, including arithmetic, commonsense, and symbolic reasoning. This showcases the general applicability and robustness of the SGE strategy across different domains.
  3. Novelty and Scope: The research is pioneering in its application of LLMs to a broad range of combinatorial problems. This sets a foundation for future work exploring the utility of advanced prompting strategies in CPs.

Overall, the paper presents a compelling advancement in leveraging LLMs for combinatorial problems. By introducing the Self-Guiding Exploration approach, it not only addresses a critical gap but also sets new benchmarks for performance in this challenging domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Proposed heuristic method for solving assignment problems. American Journal of Operations Research, 06:436–441, 01 2016. doi: 10.4236/ajor.2016.66040.
  2. Dynamic job-shop scheduling using reinforcement learning agents. Robotics and Autonomous Systems, 33(2-3):169–178, 2000.
  3. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
  4. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  5. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023. URL http://jmlr.org/papers/v24/22-1144.html.
  6. Learning heuristics for the tsp by policy gradient. In International conference on the integration of constraint programming, artificial intelligence, and operations research, pages 170–181. Springer, 2018.
  7. Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intelligent Computing, 24(4):14–18, 2008.
  8. Vehicle routing problem with time windows having stochastic customers demands and stochastic service times: Modelling and solution. J. Comput. Sci., 34:1–10, 2019.
  9. Google. Or-tools, 2023. URL https://developers.google.com/optimization.
  10. On the study of curriculum learning for inferring dispatching policies on the job shop scheduling. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pages 5350–5358. ijcai.org, 2023a. doi: 10.24963/IJCAI.2023/594. URL https://doi.org/10.24963/ijcai.2023/594.
  11. Reinforcement learning approach to stochastic vehicle routing problem with correlated demands. IEEE Access, 11:87958–87969, 2023b. doi: 10.1109/ACCESS.2023.3306076. URL https://doi.org/10.1109/ACCESS.2023.3306076.
  12. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406, 2022.
  13. Large language models are zero-shot reasoners. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html.
  14. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018.
  15. An improved tabu search algorithm for the stochastic vehicle routing problem with soft time windows. IEEE Access, 8:158115–158124, 2020.
  16. Evolution of heuristics: Towards efficient automatic algorithm design using large language mode, 2024.
  17. Self-refine: Iterative refinement with self-feedback. ArXiv preprint, abs/2303.17651, 2023. URL https://arxiv.org/abs/2303.17651.
  18. Optimizing production manufacturing using reinforcement learning. In FLAIRS conference, volume 372, page 377, 1998.
  19. Self-improving factory simulation using continuous-time average-reward reinforcement learning. In Machine Learning Interantional Workshop, pages 202–210, 1997.
  20. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM special interest group on data communication, pages 270–288. 2019.
  21. Exploring combinatorial problem solving with large language models: A case study on the travelling salesman problem using gpt-3.5 turbo, 2024.
  22. The parallelism tradeoff: Limitations of log-precision transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 2023. doi: 10.1162/tacl_a_00562. URL https://aclanthology.org/2023.tacl-1.31.
  23. Reinforcement learning for solving the vehicle routing problem. In Conference on Neural Information Processing Systems, NeurIPS 2018, 2018.
  24. Applying deep learning to the newsvendor problem. IISE Transactions, 52(4):444–463, 2020.
  25. Training language models to follow instructions with human feedback. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html.
  26. A heuristic approach based on clarke-wright algorithm for open vehicle routing problem. The Scientific World Journal, 2013, 2013.
  27. A comparison of priority rules for the job shop scheduling problem under different flow time-and tardiness-related objective functions. International Journal of Production Research, 50(15):4255–4270, 2012.
  28. Deepweave: Accelerating job completion time with deep reinforcement learning-based coflow scheduling. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3314–3320, 2021.
  29. Challenging big-bench tasks and whether chain-of-thought can solve them. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13003–13051. Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.findings-acl.824. URL https://doi.org/10.18653/v1/2023.findings-acl.824.
  30. Lamda: Language models for dialog applications. ArXiv preprint, abs/2201.08239, 2022. URL https://arxiv.org/abs/2201.08239.
  31. Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
  32. Pointer networks. Advances in neural information processing systems, 28, 2015.
  33. Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Computer Networks, 190:107969, 2021.
  34. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
  35. Emergent abilities of large language models. Transactions on Machine Learning Research, 2022a. ISSN 2835-8856. URL https://openreview.net/forum?id=yzkSU5zdwD. Survey Certification.
  36. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc., 2022b. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.
  37. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022c.
  38. Large language models as optimizers, 2024.
  39. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
  40. Rlscheduler: an automated hpc batch job scheduler using reinforcement learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE, 2020.
  41. A reinforcement learning approach to job-shop scheduling. In IJCAI, volume 95, pages 1114–1120. Citeseer, 1995.
  42. Prompting with divide-and-conquer program makes large language models discerning to hallucination and deception, 2024.
  43. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=5NTt8GFjUHkr.
  44. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
  45. Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=WZH7099tgfM.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zangir Iklassov (9 papers)
  2. Yali Du (63 papers)
  3. Farkhad Akimov (1 paper)
  4. Martin Takac (31 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets