Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

Gemini 2.5 Flash 99 tok/s

Gemini 2.5 Pro 43 tok/s Pro

GPT-5 Medium 28 tok/s

GPT-5 High 35 tok/s Pro

GPT-4o 94 tok/s

GPT OSS 120B 476 tok/s Pro

Kimi K2 190 tok/s Pro

2000 character limit reached

OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models (2508.21061v1)

Published 28 Aug 2025 in cs.HC, cs.AI, and cs.LG

Abstract: As multi-turn dialogues with LLMs grow longer and more complex, how can users better evaluate and review progress on their conversational goals? We present OnGoal, an LLM chat interface that helps users better manage goal progress. OnGoal provides real-time feedback on goal alignment through LLM-assisted evaluation, explanations for evaluation results with examples, and overviews of goal progression over time, enabling users to navigate complex dialogues more effectively. Through a study with 20 participants on a writing task, we evaluate OnGoal against a baseline chat interface without goal tracking. Using OnGoal, participants spent less time and effort to achieve their goals while exploring new prompting strategies to overcome miscommunication, suggesting tracking and visualizing goals can enhance engagement and resilience in LLM dialogues. Our findings inspired design implications for future LLM chat interfaces that improve goal communication, reduce cognitive load, enhance interactivity, and enable feedback to improve LLM performance.

Collections

Summary

The paper introduces an innovative modular pipeline that infers, merges, and evaluates conversational goals in LLM dialogues.
It leverages prompt engineering with generative LLMs like GPT-4 to execute sequential goal inference, merging, and evaluation in real time.
Empirical results show reduced cognitive load and improved user engagement, validating OnGoal’s effectiveness in managing multi-turn interactions.

OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with LLMs

Introduction and Motivation

The increasing complexity and length of multi-turn dialogues with LLMs have introduced significant challenges for users in tracking, evaluating, and reviewing their conversational goals. Traditional linear chat interfaces, while familiar, are ill-suited for managing evolving objectives, especially as LLMs may forget, ignore, or misinterpret user intent over extended interactions. The OnGoal system addresses these limitations by augmenting the chat interface with real-time goal tracking, evaluation, and visualization, thereby supporting users in maintaining alignment between their objectives and LLM responses.

System Architecture and Goal Pipeline

OnGoal introduces a modular, LLM-driven goal pipeline that operates independently from the main chat LLM. The pipeline consists of three sequential stages: goal inference, goal merging, and goal evaluation. Each stage is implemented via prompt engineering and executed using a generative LLM (e.g., GPT-4o). The pipeline is designed to be domain-agnostic but is tuned for writing tasks in the current implementation.

Goal Inference: Extracts all user-specified conversational goals (questions, requests, offers, suggestions) from each user message.
Goal Merging: Reconciles newly inferred goals with existing goals, handling redundancy, contradiction, and evolution of objectives.
Goal Evaluation: Assesses each goal against the LLM's response, categorizing the outcome as confirm, contradict, or ignore, and provides explanations and supporting evidence.
Figure 1: An example of the conversational goal pipeline in action, showing inference, merging, and evaluation of goals with visual feedback in the chat interface.

This architecture enables real-time, global tracking of conversational goals, with the flexibility to adapt to different LLMs and domains by modifying prompt templates and goal definitions.

User Interface and Visualization Components

OnGoal integrates several visualization modalities to support sensemaking and reduce cognitive load:

Inline Goal Glyphs: Visual indicators beneath each message summarize inferred and evaluated goals, color-coded by evaluation status (green: confirm, red: contradict, yellow: ignore). Clicking a glyph reveals detailed explanations and evidence.
Progress Panel: A side panel with three tabs:
- Goals Tab: Lists all active goals, allows users to lock, complete, or restore goals, and provides access to evaluation histories.
- Timeline Tab: Sankey-style node-link diagram visualizing the evolution and evaluation of goals across conversation turns.
- Events Tab: Verbose, list-based log of all pipeline operations for validation and traceability.
- Figure 2: The three tabs in the progress panel—Goals, Timeline, and Events—supporting control, temporal analysis, and validation of goal progress.
Individual Goal View: Filters the chat to display only those messages relevant to a selected goal, facilitating longitudinal analysis of how a specific objective is addressed.
Figure 3: The individual goal view, highlighting all LLM responses evaluated against a selected goal, with supporting evidence and text highlighting.
Text Highlighting: Multiple modes (key phrases, similar sentences, unique sentences) are available to help users quickly identify patterns, repetitions, or omissions in LLM responses relevant to their goals.

Implementation Details

The backend is implemented in Python, leveraging the OpenAI API for LLM inference and evaluation. The frontend is built with Vue.js and D3.js for interactive visualizations. The system is LLM-agnostic and can be adapted to open-source or local models with minimal changes. Prompt templates for each pipeline stage are provided, enabling reproducibility and extensibility.

Empirical Evaluation

A controlled user paper ( $n=20$ ) compared OnGoal to a baseline chat interface without goal tracking or visualization. Participants completed writing tasks requiring satisfaction of multiple, sometimes conflicting, goals. Key findings include:

Reduced Cognitive Load: OnGoal users reported significantly lower mental demand and effort (NASA TLX), and spent less time reading chat logs, reallocating effort to evaluating and reviewing goals.
Enhanced Goal Management: Users employing OnGoal experimented more with prompting strategies, adapted their communication based on system feedback, and demonstrated higher variability in goal status changes, indicating more active engagement.
Improved Confidence and Agreement: OnGoal users expressed higher confidence in their evaluations and greater agreement with system-generated goal assessments.
Feature Usefulness: Inline goal glyphs and explanations were rated most useful for evaluation, while the individual goal view and text highlighting were critical for reviewing progress and identifying LLM issues.
Figure 4: Time allocation differences between interfaces, with OnGoal users spending more time on evaluation and review, and less on exhaustive reading.

Figure 5: NASA TLX workload ratings, showing lower mental demand and effort for OnGoal compared to baseline.

Figure 6: Interaction metrics, including turns taken, goal status changes, and agreement with system evaluations.

Figure 7: Usefulness ratings for OnGoal features, highlighting the importance of explanations and individual goal views.

Thematic Analysis and Emergent Issues

Qualitative analysis revealed that OnGoal facilitated more dynamic and reflective goal communication, allowed users to offload cognitive effort to the system, and enabled more effective identification of LLM misalignment or drift. However, the introduction of explicit goal evaluation also surfaced interpretive complexity and occasional disagreement between user and system assessments, underscoring the need for user feedback mechanisms and further refinement of evaluation prompts.

Design Implications and Future Directions

The paper suggests several actionable implications for the design of LLM chat interfaces:

Support Multiple Goal Communication Strategies: Interfaces should accommodate both proactive and reactive goal setting, and allow for flexible, iterative refinement of objectives.
Visualize Goal Alignment and Progress: Real-time, context-sensitive visualizations can reduce cognitive load and improve user awareness of LLM behavior.
Enable Feedback and Personalization: Incorporating user feedback into the goal evaluation pipeline can improve alignment and trust, and support adaptation to individual preferences.
Extend to Local and Fine-Grained Goals: Future work should explore local (e.g., paragraph- or sentence-level) goal tracking and support for more complex, hierarchical goal structures.

Limitations

The current evaluation is limited to writing tasks and a specific set of goal types. The generalizability to other domains (e.g., programming, data analysis) and the robustness of the goal pipeline across different LLMs remain open questions. Additionally, the subjective nature of goal evaluation highlights the need for more sophisticated, possibly user-adaptive, evaluation mechanisms.

Conclusion

OnGoal demonstrates that augmenting LLM chat interfaces with explicit, real-time goal tracking and visualization can significantly improve users' ability to manage, evaluate, and review their conversational objectives in multi-turn dialogue. The system architecture, prompt engineering strategies, and visualization techniques presented are extensible to a wide range of LLM-driven applications. Future research should focus on expanding the scope of goal types, integrating user feedback for adaptive evaluation, and exploring longitudinal effects in open-ended, real-world tasks.

PDF Markdown

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

Authors (4)

Tweets

https://twitter.com/JacksonAtkinsX/status/1962563390653501553

https://twitter.com/AdamCoscia/status/1962958627670491642

https://twitter.com/AhmedHamdy29189/status/1962649480760598762

alphaXiv

OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models (10 likes, 0 questions)