Dice Question Streamline Icon: https://streamlinehq.com

Evaluation framework for visual analytics agents in human-AI teaming

Develop a systematic, scalable, and automatic evaluation framework for visual analytics agents operating within human-AI teaming in multimedia analytics, beyond manual synthesis and small-scale user studies.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper argues that evaluating multimedia analytics systems must go beyond traditional AI metrics to include human-AI collaboration aspects. Specifically, the authors identify the lack of established frameworks for automatically evaluating visual analytics (VA) agents, noting current reliance on manual synthesis across benchmarks and complex, hard-to-scale user studies.

Although prior work on Analytic Quality (AQ) sought to bridge evaluation gaps for multimedia analytics backends, the authors contend that it is not directly applicable in the foundation model era. A new evaluation framework tailored to VA agents and human-AI teaming remains an open need.

References

Evaluating the human-AI teaming core of multimedia analytics systems remains an open challenge. To the best of our knowledge, there is currently no established framework for automatically evaluating VA agents.

A Multimedia Analytics Model for the Foundation Model Era (2504.06138 - Worring et al., 8 Apr 2025) in Section 6 (Evaluation of Multimedia Analytics Solutions — VA Agents)