Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

38 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

104

Large Language Models as Zero-shot Dialogue State Tracker through Function Calling (2402.10466v4)

Published 16 Feb 2024 in cs.CL and cs.AI

Abstract: LLMs are increasingly prevalent in conversational systems due to their advanced understanding and generative capabilities in general contexts. However, their effectiveness in task-oriented dialogues (TOD), which requires not only response generation but also effective dialogue state tracking (DST) within specific tasks and domains, remains less satisfying. In this work, we propose a novel approach FnCTOD for solving DST with LLMs through function calling. This method improves zero-shot DST, allowing adaptation to diverse domains without extensive data collection or model tuning. Our experimental results demonstrate that our approach achieves exceptional performance with both modestly sized open-source and also proprietary LLMs: with in-context prompting it enables various 7B or 13B parameter models to surpass the previous state-of-the-art (SOTA) achieved by ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% average joint goal accuracy (JGA). Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%, respectively. We also show that by fine-tuning on a small collection of diverse task-oriented dialogues, we can equip modestly sized models, specifically a 13B parameter LLaMA2-Chat model, with function-calling capabilities and DST performance comparable to ChatGPT while maintaining their chat capabilities. We have made the code publicly available at https://github.com/facebookresearch/FnCTOD

PDF HTML Abstract

Leveraging LLMs for Zero-shot Dialogue State Tracking via Function Calling

Introduction to FnCTOD Approach

The novel FnCTOD approach endeavors to harness the potent capabilities of LLMs for zero-shot dialogue state tracking (DST) by introducing function calling within conversational contexts. This strategy circumvents the necessity for extensive data collection and model re-training for task-oriented dialogues (TOD), addressing a significant bottleneck in deploying conversational systems across diverse domains. By embedding function specifications into the dialogues as system prompts, FnCTOD enables LLMs to generate both dialogue states and responses seamlessly, marking a critical advance in making versatile conversational systems practical and scalable.

Key Contributions and Results

The paper delineates several vital contributions through the FnCTOD methodology. Firstly, it showcases the ability of FnCTOD to significantly enhance the performance of both modestly-sized open-source and proprietary LLMs through in-context prompting. Notably, the approach sets a new benchmark by improving the performance of GPT-4 by 14%, establishing a new state-of-the-art for zero-shot DST. Moreover, it bridges the performance gap between open-source models and ChatGPT by fine-tuning a 13B LLaMA2-Chat model on a diversified set of task-oriented dialogues, thereby maintaining chat capabilities while imbuing the model with function-calling DST capacities.

Empirical Validation

The experimental validation conducted on the MultiWOZ benchmark illustrates FnCTOD’s efficacy in enhancing DST performance without further fine-tuning across various open-source and proprietary models. The approach significantly outperforms existing state-of-the-art methods, demonstrating substantial performance improvements - a 5.6% average JGA increment over the prior benchmarks with GPT-3.5 and a remarkable 14% with GPT-4. Additionally, the fine-tuned 13B parameter LLaMA2-Chat model exhibits comparable performance with ChatGPT, underscoring the approach’s utility in upgrading moderately sized models for zero-shot DST tasks.

Methodological Insights

FnCTOD redefines DST as a function calling task, effectively converting domain schemas into function specifications embedded within dialogue prompts. This novel formulation facilitates LLMs in generating function calls aligned with dialogue state requirements seamlessly. Incorporating function call decomposition and leveraging in-context prompting, the methodology distinctly improves over non-decomposed methods, emphasizing the merit of fine-tuning with a manageable dataset size for optimal zero-shot generalization capabilities.

Theoretical and Practical Implications

From a theoretical standpoint, FnCTOD advances our understanding of leveraging LLMs for task-specific functions without the stringent need for domain-specific training data, enhancing the adaptability of conversational systems. Practically, the approach paves the way for scalable and efficient deployment of chatbots and virtual assistants across myriad domains, significantly reducing the overhead associated with model training and data annotation for new domains.

Future Directions

While FnCTOD posits a robust framework for incorporating DST in TOD systems through LLMs, the pursuit towards achieving higher accuracy for practical deployment remains. Future advancements in LLM capabilities, coupled with methodological refinements in FnCTOD, are anticipated to further augment performance. Moreover, developing more realistic evaluation protocols for TOD systems, especially concerning response generation, will be crucial in realizing the full potential of such conversational models in real-world applications.

Concluding Remarks

FnCTOD represents a pivotal step forward in the quest to utilize LLMs for the dynamic and diverse field of task-oriented dialogues. By enabling zero-shot DST through function calling, this approach mitigates significant barriers to deploying conversational systems across various domains, offering a blueprint for future innovations in the field of conversational AI.

PDF Markdown Bookmark Chat (Pro)

References (49)

Authors (10)

Zekun Li (73 papers)
Zhiyu Zoey Chen (9 papers)
Mike Ross (6 papers)
Patrick Huber (146 papers)
Seungwhan Moon (28 papers)
Zhaojiang Lin (45 papers)
Xin Luna Dong (46 papers)
Adithya Sagar (10 papers)
Xifeng Yan (52 papers)
Paul A. Crook (7 papers)

Citations (8)

View on Semantic Scholar

Tweets

https://twitter.com/arankomatsuzaki/status/1759411195025506656

https://twitter.com/fly51fly/status/1761506121158947314

https://twitter.com/_akhaliq/status/1759422374577926530

https://twitter.com/Lidinwise/status/1765024486179454983

https://twitter.com/osanpochuudayo/status/1759444194844885156

https://twitter.com/mctalentowen/status/1796450096625950997