Quantitatively evaluate the accuracy of the OnGoal goal pipeline
Develop and conduct a quantitative evaluation of the accuracy of OnGoal’s three-stage goal pipeline (infer, merge, evaluate) using expert-annotated benchmark datasets to validate goal identification, merging operations, and evaluation categories against ground truth labels.
Sponsor
References
However, quantitatively evaluating our pipeline's accuracy, such as on expert-annotated benchmarks, remains untested.
— OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models
(2508.21061 - Coscia et al., 28 Aug 2025) in Section 7.2 Limitations and Future Work