Polite Dialogue Generation Without Parallel Data (1805.03162v1)

Published 8 May 2018 in cs.CL, cs.AI, and cs.LG

Abstract: Stylistic dialogue response generation, with valuable applications in personality-based conversational agents, is a challenging task because the response needs to be fluent, contextually-relevant, as well as paralinguistically accurate. Moreover, parallel datasets for regular-to-stylistic pairs are usually unavailable. We present three weakly-supervised models that can generate diverse polite (or rude) dialogue responses without parallel data. Our late fusion model (Fusion) merges the decoder of an encoder-attention-decoder dialogue model with a LLM trained on stand-alone polite utterances. Our label-fine-tuning (LFT) model prepends to each source sequence a politeness-score scaled label (predicted by our state-of-the-art politeness classifier) during training, and at test time is able to generate polite, neutral, and rude responses by simply scaling the label embedding by the corresponding score. Our reinforcement learning model (Polite-RL) encourages politeness generation by assigning rewards proportional to the politeness classifier score of the sampled response. We also present two retrieval-based polite dialogue model baselines. Human evaluation validates that while the Fusion and the retrieval-based models achieve politeness with poorer context-relevance, the LFT and Polite-RL models can produce significantly more polite responses without sacrificing dialogue quality.

PDF Abstract

Overview of "Polite Dialogue Generation Without Parallel Data"

The paper "Polite Dialogue Generation Without Parallel Data" by Tong Niu and Mohit Bansal presents a comprehensive exploration into the generation of polite dialogue responses devoid of parallel datasets, which are typically available in tasks like machine translation. It addresses the challenge of stylistic dialogue response generation, focusing on the balance between maintaining fluency, context relevance, and paralinguistic accuracy in pronunciation versus rudeness styles. This is particularly pertinent for the development of personality-based conversational agents used in various applications such as intelligent tutoring and customer service.

Methodology

The authors introduce three weakly-supervised models that do not rely on regular-to-stylistic pairs:

Fusion Model: This model implements a late fusion approach, combining the outputs of a dialogue model's decoder with a LLM trained solely on polite utterances selected by a politeness classifier. The fusion parameter adjusts the impact of each model's output.
Label-Fine-Tuning (LFT) Model: The model prepends a politeness-scaled label to source sequences. The scoring is guided by a politeness classifier, allowing for controlled generation of polite, neutral, and rude responses based solely on label scaling.
Polite Reinforcement Learning (Polite-RL) Model: Using reinforcement learning, this model reinforces the generation of polite responses by assigning rewards in proportion to classifier-assessed politeness. The model leverages mixed-objective policy gradient methods to promote politeness and discourage rudeness.

Additionally, the paper introduces two retrieval-based baseline models to benchmark against the proposed generative approaches.

Results

Through human evaluation on Amazon Mechanical Turk, the paper finds that the LFT and Polite-RL models produce highly polite responses while maintaining contextual relevance, surpassing the Fusion and retrieval-based models, which compromise dialogue quality. The Fusion model achieves politeness with less contextual alignment, suggesting a trade-off range. The systems’ outputs were further validated via qualitative and quantitative assessments, showcasing improved dialogue politeness strategies such as indirection and positive lexicon.

Implications

The models presented offer substantial implications for the practical deployment of conversational agents. They can improve user-agent interactions by generating contextually relevant and socially appropriate responses. The research highlights the feasibility of integrating nuanced politeness aspects in dialogue systems without parallel training data, thereby increasing interaction quality.

The paper explores further implications through analysis of output examples, revealing a tangible application of psycholinguistic strategies and politeness theory within dialogue systems. Future developments may focus on augmenting these models with more advanced methodologies, including adversarial training and further reinforcement structures, to enhance attributes like empathy and other stylistic dimensions in scalable AI systems.

Future Directions

While demonstrating improved stylistic generation, the paper points out areas for enhancement, such as incorporating more complex and diverse personality and stylistic dimensions or extending reinforcement learning to task-oriented dialogues. These explorations promise to enrich the conversational capability and adaptability of AI, aligning automated systems closer to human-like interaction standards. The potential interdisciplinary extensions could facilitate broader societal adoption across interactive AI platforms.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Tong Niu (25 papers)
Mohit Bansal (304 papers)

Citations (161)

View on Semantic Scholar