Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues? (2307.07047v2)
Abstract: The capabilities of pretrained LLMs have opened opportunities to explore new application areas, but applications involving human-human interaction are limited by the fact that most data is protected from public release for privacy reasons. Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections, preventing successful domain transfer. To support information extraction (IE) for a private call center dataset, we introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues. In IE experiments with auto insurance call center dialogues, we observe 25\% relative improvement in $F_1$ after augmenting a small set of real human conversations with synthetic data. We release code and our synthetic dataset to illustrate the complexity of real-world call center conversations and encourage development of complex dialogue datasets that are more representative of natural data.
- Bo-Ru Lu (8 papers)
- Nikita Haduong (6 papers)
- Chia-Hsuan Lee (12 papers)
- Zeqiu Wu (15 papers)
- Hao Cheng (190 papers)
- Paul Koester (1 paper)
- Jean Utke (17 papers)
- Tao Yu (282 papers)
- Noah A. Smith (224 papers)
- Mari Ostendorf (57 papers)