Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior Simulation (2403.18230v1)

Published 27 Mar 2024 in cs.AI

Abstract: LLMs, in conjunction with various reasoning reinforcement methodologies, have demonstrated remarkable capabilities comparable to humans in fields such as mathematics, law, coding, common sense, and world knowledge. In this paper, we delve into the reasoning abilities of LLMs within complex human systems. We propose a novel reasoning framework, termed Mosaic Expert Observation Wall'' (MEOW) exploiting generative-agents-based simulation technique. In the MEOW framework, simulated data are utilized to train an expert model concentratingexperience'' about a specific task in each independent time of simulation. It is the accumulated ``experience'' through the simulation that makes for an expert on a task in a complex human system. We conduct the experiments within a communication game that mirrors real-world security scenarios. The results indicate that our proposed methodology can cooperate with existing methodologies to enhance the reasoning abilities of LLMs in complex human systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439.
  3. How attentive are graph attention networks? In International Conference on Learning Representations.
  4. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  5. QuAC: Question answering in context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2174–2184, Brussels, Belgium. Association for Computational Linguistics.
  6. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  7. Scalable earthquake simulation on petascale supercomputers. In SC ’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–20.
  8. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  9. Larry Eisenberg and Thomas H Noe. 2001. Systemic risk in financial systems. Management Science, 47(2):236–249.
  10. S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984.
  11. Measuring mathematical problem solving with the math dataset. Sort, 2(4):0–6.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  13. Knowledge enhanced gan for iot traffic generation. In Proceedings of the ACM Web Conference 2022, pages 3336–3346.
  14. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  15. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  16. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 785–794, Copenhagen, Denmark. Association for Computational Linguistics.
  17. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  18. Edward N Lorenz. 1963. Deterministic nonperiodic flow. Journal of atmospheric sciences, 20(2):130–141.
  19. Robert M May. 1976. Simple mathematical models with very complicated dynamics. Nature, 261(5560):459–467.
  20. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22.
  21. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
  22. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
  23. Social IQa: Commonsense reasoning about social interactions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4463–4473, Hong Kong, China. Association for Computational Linguistics.
  24. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2107–2116.
  25. InternLM Team. 2023. Internlm: A multilingual language model with progressively enhanced capabilities. https://github.com/InternLM/InternLM.
  26. Fine-tuning large neural language models for biomedical natural language processing. Patterns, 4(4).
  27. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  28. Graph attention networks. International Conference on Learning Representations.
  29. Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation. arXiv preprint arXiv:2112.12731.
  30. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  31. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  32. Multi-scenario simulation of urban land change in shanghai by random forest and ca-markov model. Sustainable Cities and Society, 55:102045.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com