Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Implicit Feedback from Deployment Data in Dialogue (2307.14117v2)

Published 26 Jul 2023 in cs.CL

Abstract: We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployment data from BlenderBot (Xu et al., 2023). Human evaluation indicates improvements in our new models over baseline responses; however, we find that some proxy signals can lead to more generations with undesirable properties as well. For example, optimizing for conversation length can lead to more controversial or unfriendly generations compared to the baseline, whereas optimizing for positive sentiment or reaction can decrease these behaviors.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Richard Yuanzhe Pang (26 papers)
  2. Stephen Roller (27 papers)
  3. Kyunghyun Cho (292 papers)
  4. He He (71 papers)
  5. Jason Weston (130 papers)
Citations (7)
X Twitter Logo Streamline Icon: https://streamlinehq.com