Synth-Empathy: Towards High-Quality Synthetic Empathy Data (2407.21669v2)

Published 31 Jul 2024 in cs.CL and cs.LG

Abstract: In recent years, with the rapid advancements in LLMs, achieving excellent empathetic response capabilities has become a crucial prerequisite. Consequently, managing and understanding empathetic datasets have gained increasing significance. However, empathetic data are typically human-labeled, leading to insufficient datasets and wasted human labor. In this work, we present Synth-Empathy, an LLM-based data generation and quality and diversity selection pipeline that automatically generates high-quality empathetic data while discarding low-quality data. With the data generated from a low empathetic model, we are able to further improve empathetic response performance and achieve state-of-the-art (SoTA) results across multiple benchmarks. Moreover, our model achieves SoTA performance on various human evaluation benchmarks, demonstrating its effectiveness and robustness in real-world applications. Furthermore, we show the trade-off between data quantity and quality, providing insights into empathetic data generation and selection.

References (50)

Authors (8)

Hao Liang (137 papers)
Linzhuang Sun (18 papers)
Jingxuan Wei (21 papers)
Xijie Huang (26 papers)
Linkun Sun (2 papers)
Bihui Yu (16 papers)
Conghui He (114 papers)
Wentao Zhang (261 papers)

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Synth-Empathy: Towards High-Quality Synthetic Empathy Data (2407.21669v2)

Summary

Related Papers

Tweets