Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models (2401.02777v2)

Published 5 Jan 2024 in cs.CL and cs.AI

Abstract: This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of LLMs like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. It entails a comprehensive agent construction scenario, including phases like Conversation Selection, Scene Extraction, CoT Completion, and Scene Augmentation, leading to the LLMs Training phase. This approach appears to enhance agent controllability and adaptability in complex, multi-turn dialogues. Our preliminary evaluations in a real estate sales context suggest that RAISE has some advantages over traditional agents, indicating its potential for broader applications. This work contributes to the AI field by providing a robust framework for developing more context-aware and versatile conversational agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. 2023. Qwen technical report. arXiv preprint arXiv:2309.16609.
  2. 2009. Defining agency: Individuality, normativity, asymmetry, and spatio-temporality in action. Adaptive Behavior, 17(5):367–386.
  3. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. CoRR. arXiv:2303.12712.
  4. 2023. Dialogue chain-of-thought distillation for commonsense-aware conversational agents. arXiv preprint arXiv:2310.09343.
  5. 2023a. Fireact: Toward language agent fine-tuning. arXiv preprint arXiv:2310.05915.
  6. 2023b. T-eval: Evaluating the tool utilization capability step by step.
  7. 2023c. Chatcot: Tool-augmented chain-of-thought reasoning on\\\backslash\\\\backslash\chat-based large language models. arXiv preprint arXiv:2305.14323.
  8. 2023. Agent instructs large language models to be general zero-shot reasoners. ArXiv, abs/2310.03710.
  9. 2023. Zero-shot goal-directed dialogue via rl on imagined conversations. arXiv preprint arXiv:2311.05584.
  10. 2023. Tptu-v2: Boosting task planning and tool usage of large language model-based agents in real-world systems. arXiv preprint arXiv:2311.11315.
  11. 2023. Apibank: A benchmark for tool-augmented llms. arXiv preprint.
  12. 2023. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688.
  13. 2022. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint.
  14. 2023. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. ArXiv, abs/2311.16452.
  15. OpenAI. 2023a. Chatgpt: Optimizing language models for dialogue. Blog post.
  16. OpenAI. 2023b. Gpt-4 technical report. Blog post.
  17. 2022. Training language models to follow instructions with human feedback.
  18. 2023. Kwaiagents: Generalized information-seeking agent system with large language models. arXiv preprint arXiv:2312.04889.
  19. 2023. Tptu: Task planning and tool usage of large language model-based ai agents. arXiv preprint arXiv:2308.03427.
  20. 2023. Toolformer: Language models can teach themselves to use tools. arXiv preprint.
  21. 2023. Character-llm: A trainable agent for role-playing. arXiv preprint arXiv:2310.10158.
  22. 2023. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580.
  23. 2023. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427.
  24. 2023a. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432.
  25. 2023b. A survey on large language model based autonomous agents. ArXiv, abs/2308.11432.
  26. 2023c. Self-consistency improves chain of thought reasoning in language models. In Proceedings of ICLR.
  27. 2023d. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv preprint arXiv:2310.00746.
  28. 2022a. Emergent abilities of large language models. Trans. Mach. Learn. Res.
  29. 2022b. Chain of thought prompting elicits reasoning in large language models. In Proceedings of NeurIPS.
  30. L. Weng. 2023. Llm-powered autonomous agents.
  31. 2023. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864.
  32. 2023. Openagents: An open platform for language agents in the wild. arXiv preprint arXiv:2310.10634.
  33. 2022. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.
  34. 2023. React: Synergizing reasoning and acting in language models. arXiv preprint.
  35. 1995. Stanford encyclopedia of philosophy.
  36. 2023. Agenttuning: Enabling generalized agent abilities for llms. arXiv preprint arXiv:2310.12823.
  37. 2023. Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144.
  38. 2023. Least-to-most prompting enables complex reasoning in large language models. In Proceedings of ICLR.
Citations (22)

Summary

  • The paper introduces the RAISE architecture, which integrates short-term and long-term memory modules to boost context-aware LLM performance.
  • It outlines a systematic methodology involving conversation selection, scene extraction, chain-of-thought completion, and targeted fine-tuning.
  • Experiments, notably in real estate, demonstrate RAISE's efficiency improvements and adaptability compared to standard prompting methods.

Introduction to RAISE Architecture

In the sphere of AI, the integration of LLMs into conversational agents represents a major leap forward in developing more intuitive and effective systems. Despite the peak performance of these models in singular tasks, aligning them with the intricacies of multi-turn dialogues remains an intricate task. Bridging this gap is the RAISE (Reasoning and Acting through Scratchpad and Examples) architecture, an innovative system purposely engineered to empower conversational agents.

Reimagining Memory in AI

A focal point of the RAISE architecture is its emulate of human cognitive functions, specifically mimicking short-term and long-term memory through a dual-component memory system. The Scratchpad module, serving as short-term memory, captures important conversational elements and conclusions drawn from recent interactions. The retrieval module, likened to long-term memory, sources contextual examples relevant to the ongoing discussion. This advanced memory alignment enhances the conversational agents' capability to maintain and build on context, which in turn, translates to a more adaptive and controlled conversational experience.

The RAISE Methodology

The strategic blueprint of RAISE follows carefully planned stages to create context-aware agents. These range from Conversation Selection and Scene Extraction to Chain of Thought (CoT) Completion, Scene Augmentation, and culminating in LLM Training. This structured process ensures that agents not only process language efficiently but also adapt to the ebb and flow of human conversation, acknowledging varied communication patterns. Initial experiments within the real estate domain affirm the drivers of RAISE—context awareness and adaptability—while also establishing its potential utility across other fields.

Agent Tuning and Analysis

The core of RAISE lies in fine-tuning LLMs to sharpen their operation within this architecture. A dataset construction pipeline guides the fine-tuning process, emphasizing authenticity, diversity, and CoT quality. This process supports the AI in delivering role-adequate behavior and reduces training costs by leanly focusing on role-specific logic. Notably, fine-tuning in RAISE has proven superior to standard prompting methods in focused contexts, enhancing operational efficiency and agent responsiveness. Through the lens of the RAISE framework, the AI community is equipped with an architecture promising more natural, coherent, and user-centric conversational agents.