GeDi: Generative Discriminator Guided Sequence Generation (2009.06367v2)

Published 14 Sep 2020 in cs.CL and cs.LG

Abstract: While large-scale LLMs (LMs) are able to imitate the distribution of natural language well enough to generate realistic text, it is difficult to control which regions of the distribution they generate. This is especially problematic because datasets used for training large LMs usually contain significant toxicity, hate, bias, and negativity. We propose GeDi as an efficient method for using smaller LMs as generative discriminators to guide generation from large LMs to make them safer and more controllable. GeDi guides generation at each step by computing classification probabilities for all possible next tokens via Bayes rule by normalizing over two class-conditional distributions; one conditioned on the desired attribute, or control code, and another conditioned on the undesired attribute, or anti control code. We find that GeDi gives stronger controllability than the state of the art method while also achieving generation speeds more than 30 times faster. Additionally, training GeDi on only four topics allows us to controllably generate new topics zero-shot from just a keyword, unlocking a new capability that previous controllable generation methods do not have. Lastly, we show that GeDi can make GPT-2 (1.5B parameters) significantly less toxic without sacrificing linguistic quality, making it by far the most practical existing method for detoxifying LLMs while maintaining a fast generation speed.

PDF Abstract

Essay: GeDi: Generative Discriminator guided Sequence Generation

The paper "GeDi: Generative Discriminator guided Sequence Generation" presents a compelling approach to improving control over generative text outputs from LLMs (LMs), such as GPT-2 and GPT-3, by leveraging smaller class-conditional LLMs as generative discriminators (GeDis). The motivation stems from the inherent challenges associated with the unconstrained generation of LMs, which often emit biased, toxic, or otherwise undesirable text due to the nature of their training datasets. The proposed GeDi framework offers both a method for enhancing control over the generative process and a computationally efficient solution compared to prevailing techniques.

GeDi Methodology

GeDi enhances the controllability of language generation by integrating class-conditional LLMs (CC-LMs) as discriminators to guide the sampling process of larger LMs. The mechanism involves computing class probabilities at each generative step, leveraging Bayes' rule to effectively optimize the likelihood of desired attributes against undesired ones. The innovation lies in this technique’s significant reduction in computational cost, performing classification and guidance using only a constant number of forward passes per token. Specifically, GeDi enables efficient computation of probabilities for generative forecasts by contrasting the distributions conditioned on desired versus undesired attributes.

Empirical Validation

The research empirically validates GeDi’s efficacy in several settings, demonstrating its superior attribute control and efficiency. The experiments revolve around sentiment modification, detoxification, and topic control, showcasing that:

Sentiment Control: The paper demonstrates GeDi's capability to manipulate sentiment across various domains, including out-of-domain text, outperforming existing methods like PPLM in terms of control strength and computational efficiency. The human evaluation confirms GeDi's prowess in achieving intended sentiment while maintaining linguistic fluency across diverse topics like book texts—achieving sentiment control while preventing domain overfitting unlike models like CTRL, which revert to their training domains.
Detoxification: GeDi significantly reduces toxicity in the text generated by GPT-2 while maintaining its linguistic quality. The results underline GeDi's potential for large-scale model detoxification, offering a more pragmatic solution than fine-tuning large LMs or other post-hoc detoxification methods.
Topic Control and Zero-shot Generalization: The paper illustrates GeDi's robust performance in topic-conditioning tasks by guiding generation using pretrained discriminators on few topics. Notably, GeDi extends its utility through zero-shot generalization, handling topics outside its training set with simple prefix guidance, a capability that could not be easily replicated by precedents like CTRL.

Theoretical and Practical Implications

The theoretical underpinnings of GeDi lie in intersecting the qualities of generative models and discriminative classifiers, providing an elegant, scalable solution to guide LLM outputs. Practically, GeDi illustrates a strategic avenue for implementing safer, more reliable LLM deployments in industry settings, addressing ethical concerns related to model toxicity and bias. Its design caters to the computational limitations typically faced during model application, further paving the way for scalable AI model implementation.

Speculation on Future Developments

Looking ahead, the development of GeDi invites consideration for broader applications in AI. For instance, combining multiple generative discriminators for nuanced attribute guidance could enhance text quality further, aligning generated outputs more closely with human expectations. The zero-shot capability demonstrated warrants exploration into adaptive control mechanisms where models incorporate dynamic attributes based on real-time requirements or user input.

In conclusion, the GeDi framework marks a substantial step forward in the field of controlled language generation. It presents a much-needed solution to the challenges of safety and controllability while retaining operational speed and model quality. This work not only posits a method for controlling output attributes in LLMs but also bridges generative tasks with class-conditional learning, with implications that could extend well into the future of AI model deployment and safety.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Ben Krause (54 papers)
Akhilesh Deepak Gotmare (7 papers)
Bryan McCann (18 papers)
Nitish Shirish Keskar (30 papers)
Shafiq Joty (187 papers)
Richard Socher (115 papers)
Nazneen Fatema Rajani (18 papers)

Citations (371)

View on Semantic Scholar