Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets (2406.13269v1)

Published 19 Jun 2024 in cs.AI, cs.CL, cs.HC, and eess.SP

Abstract: In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of LLM fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lucas Druart (4 papers)
  2. Valentin Vielzeuf (17 papers)
  3. Yannick Estève (45 papers)

Summary

We haven't generated a summary for this paper yet.