Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 188 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 39 tok/s Pro

GPT-5 High 39 tok/s Pro

GPT-4o 78 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 446 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona (2504.17390v1)

Published 24 Apr 2025 in cs.CL

Abstract: Task-Oriented Dialogue (TOD) systems are designed to fulfill user requests through natural language interactions, yet existing systems often produce generic, monotonic responses that lack individuality and fail to adapt to users' personal attributes. To address this, we introduce PicPersona-TOD, a novel dataset that incorporates user images as part of the persona, enabling personalized responses tailored to user-specific factors such as age or emotional context. This is facilitated by first impressions, dialogue policy-guided prompting, and the use of external knowledge to reduce hallucinations. Human evaluations confirm that our dataset enhances user experience, with personalized responses contributing to a more engaging interaction. Additionally, we introduce a new NLG model, Pictor, which not only personalizes responses, but also demonstrates robust performance across unseen domains https://github.com/JihyunLee1/PicPersona.

Summary

Overview of PicPersona-TOD: A Novel Dataset for Personalizing Task-Oriented Dialogue

The paper "PicPersona-TOD: A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona" presents a pioneering approach to personalizing task-oriented dialogue (TOD) systems by incorporating user images as part of the persona. This innovative dataset, PicPersona-TOD, addresses the limitations of existing TOD systems, which often produce generic responses that lack individuality and adaptability to users' personal attributes such as age and emotional context.

Key Contributions

Integration of Visionary Persona into TOD: The introduction of PicPersona-TOD marks an important advancement in TOD systems by utilizing realistic user images to generate personalized responses. This approach leverages a visionary persona, enhancing interaction by capturing subtle facial expressions and contextual cues, analogous to human interpretation of visual information.
Comprehensive Dataset Construction: The dataset construction involves a sophisticated pipeline comprising user image collection, dialogue extension, stylistic alignment, personalized system responses, and data filtering. The use of LLMs facilitates dialogue generation, while external information sources like Google Maps and Wikipedia ensure factual accuracy in personalized recommendations.
Innovative NLG Model - Pictor: The paper introduces Pictor, an NLG model trained on PicPersona-TOD, which demonstrates robust personalization capabilities even in domains not included in the training set. Pictor utilizes user impressions and dialogue context to generate responses, establishing a new benchmark in personalized TOD dialogue management.

Numerical Results and Human Evaluation

PicPersona-TOD was evaluated for its efficacy in delivering personalized user experience, achieving high scores in human assessments related to style appropriateness, semantic consistency, and overall satisfaction. The human evaluation showed that PicPersona-TOD consistently outperformed existing textual personalization methods, underscoring the efficacy of visionary personas in enhancing dialogue personalization.

Implications and Future Directions

The development of PicPersona-TOD opens new avenues for personalized interaction in TOD systems, particularly in scenarios where real-time visual data can be employed, such as kiosks and robots equipped with cameras. This approach not only has practical implications for improving user satisfaction but also raises intriguing possibilities for future research in multimodal dialogue systems. The integration of richer, concurrent user data promises increased personalization depth, paving the way for more human-like interactions.

Looking ahead, the paper suggests several future directions, including refining retrieval-augmented generation methods to minimize information hallucination and exploring advanced persona-based retrieval strategies. Such enhancements are poised to further improve personalization and accuracy in user-system interactions.

PicPersona-TOD represents a significant step in aligning dialogue systems more closely with human communication paradigms, challenging researchers to explore and develop systems that are not only functional but also deeply personalized, making interactions with machines more engaging and effective.