Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants (2502.07956v1)

Published 11 Feb 2025 in cs.SE

Abstract: As LLMs are increasingly adopted in software engineering, recently in the form of conversational assistants, ensuring these technologies align with developers' needs is essential. The limitations of traditional human-centered methods for evaluating LLM-based tools at scale raise the need for automatic evaluation. In this paper, we advocate combining insights from human-computer interaction (HCI) and AI research to enable human-centered automatic evaluation of LLM-based conversational SE assistants. We identify requirements for such evaluation and challenges down the road, working towards a framework that ensures these assistants are designed and deployed in line with user needs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jonan Richards (2 papers)
  2. Mairieli Wessel (11 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com