Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics (1710.00689v1)

Published 2 Oct 2017 in cs.CL

Abstract: We propose to use question answering (QA) data from Web forums to train chatbots from scratch, i.e., without dialog training data. First, we extract pairs of question and answer sentences from the typically much longer texts of questions and answers in a forum. We then use these shorter texts to train seq2seq models in a more efficient way. We further improve the parameter optimization using a new model selection strategy based on QA measures. Finally, we propose to use extrinsic evaluation with respect to a QA task as an automatic evaluation method for chatbots. The evaluation shows that the model achieves a MAP of 63.5% on the extrinsic task. Moreover, it can answer correctly 49.5% of the questions when they are similar to questions asked in the forum, and 47.3% of the questions when they are more conversational in style.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Martin Boyanov (2 papers)
  2. Ivan Koychev (33 papers)
  3. Preslav Nakov (253 papers)
  4. Alessandro Moschitti (48 papers)
  5. Giovanni Da San Martino (43 papers)
Citations (10)