Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer (1905.10060v1)

Published 24 May 2019 in cs.CL

Abstract: Unsupervised text style transfer aims to transfer the underlying style of text but keep its main content unchanged without parallel data. Most existing methods typically follow two steps: first separating the content from the original style, and then fusing the content with the desired style. However, the separation in the first step is challenging because the content and style interact in subtle ways in natural language. Therefore, in this paper, we propose a dual reinforcement learning framework to directly transfer the style of the text via a one-step mapping model, without any separation of content and style. Specifically, we consider the learning of the source-to-target and target-to-source mappings as a dual task, and two rewards are designed based on such a dual structure to reflect the style accuracy and content preservation, respectively. In this way, the two one-step mapping models can be trained via reinforcement learning, without any use of parallel data. Automatic evaluations show that our model outperforms the state-of-the-art systems by a large margin, especially with more than 8 BLEU points improvement averaged on two benchmark datasets. Human evaluations also validate the effectiveness of our model in terms of style accuracy, content preservation and fluency. Our code and data, including outputs of all baselines and our model are available at https://github.com/luofuli/DualLanST.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Fuli Luo (23 papers)
  2. Peng Li (390 papers)
  3. Jie Zhou (687 papers)
  4. Pengcheng Yang (28 papers)
  5. Baobao Chang (80 papers)
  6. Zhifang Sui (89 papers)
  7. Xu Sun (194 papers)
Citations (167)

Summary

A Dual Reinforcement Learning Framework for Unsupervised Text Style Transfer

This paper presents a dual reinforcement learning (RL) framework for the task of unsupervised text style transfer, specifically addressing the challenge of transforming the stylistic aspect of text while preserving its semantic content, all achieved without the necessity of parallel data. Most existing methods take a two-step approach by initially disentangling the content from its style and subsequently integrating the extracted content with a new style. This paper critiques the inherent difficulty of separating content and style due to their intricate interdependencies in natural language and proposes a simpler, direct method.

The authors introduce a novel dual RL framework that circumvents the need to explicitly disentangle content and style. Instead, they implement a one-step mapping model that employs two concurrent transfer tasks: source-to-target and target-to-source transformations. This dual task setup enables the use of two distinct reward mechanisms in the RL algorithm—style accuracy and content preservation—to independently assure both objectives are fulfilled without reliance on parallel corpora.

A distinguishing feature of the proposed framework is its dual RL approach. The dual tasks serve as teacher models for each other, interacting through a closed-loop system facilitated by rewards based on style accuracy and content preservation. This symmetrical structure enhances model learning and outcomes, as evidenced by significant improvements in BLEU scores. On the Yelp and Gyafc datasets, the proposed model markedly outperforms benchmark systems, boasting an enhancement of over 8 BLEU points on average.

Within the technical details, the paper elaborates on the RL algorithm involving policy gradients, incorporating a style classifier to ensure style transfer accuracy and using reconstruction probability as a measure of content retention. The approach diverges from traditional heuristics such as adversarial training for content-style disentanglement, offering instead a sophisticated dual-task strategy that aligns more closely with the real-world complication of intertwined style-content dynamics.

Furthermore, the paper innovates on unsupervised task-specific challenges through a pre-training phase complemented by an annealing pseudo teacher-forcing mechanism. This ensures the model starts with a robust foundation, graduating to unsupervised RL training without parallel data—a crucial aspect given the inadequacy of parallel corpora for many style transfer tasks. The pre-training leverages a template-based methodology to generate initial pseudo-parallel data, thus offering a form of weak supervision to bootstrap the model's learning.

The implications of this research are significant for both practical applications, such as sentiment engineering and formal-informal language adaptations in NLP systems, and theoretical explorations of cross-modal text generation without aligned data. The dual RL architecture, being both generic and straightforward, shows promise for adaptation to other sequence generation tasks lacking parallel data, paving forward avenues in the paper of text representation and manipulation in unsupervised learning contexts.

In sum, this work advances the burgeoning field of text generation by presenting a technically sophisticated and empirically validated framework, moving closer to achieving nuanced language style transformation without compromising on content integrity. This is a noteworthy contribution that may inspire subsequent developments in the broader domain of unsupervised learning and sequence-to-sequence generation models.