Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (1705.10843v3)

Published 30 May 2017 in stat.ML and cs.LG

Abstract: In unsupervised data generation tasks, besides the generation of a sample based on previous observations, one would often like to give hints to the model in order to bias the generation towards desirable metrics. We propose a method that combines Generative Adversarial Networks (GANs) and reinforcement learning (RL) in order to accomplish exactly that. While RL biases the data generation process towards arbitrary metrics, the GAN component of the reward function ensures that the model still remembers information learned from data. We build upon previous results that incorporated GANs and RL in order to generate sequence data and test this model in several settings for the generation of molecules encoded as text sequences (SMILES) and in the context of music generation, showing for each case that we can effectively bias the generation process towards desired metrics.

Authors (5)

Citations (501)

View on Semantic Scholar

Summary

The paper introduces a novel dual-reward framework that integrates adversarial and domain-specific reinforcement for enhanced sequence generation.
It leverages a tunable parameter and Wasserstein distance to balance data fidelity with targeted objective optimization, reducing common GAN issues.
Empirical results in molecular and music generation demonstrate improved metric performance and valid sequence generation compared to standard models.

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models

The paper "Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models" introduces a framework to enhance sequence generation models by optimizing them for domain-specific objectives. This paper focuses on the intersection of Generative Adversarial Networks (GANs) and Reinforcement Learning (RL), employing these to guide the generative process towards desired properties while maintaining fidelity to the data distribution.

Framework Development and Methodology

ORGAN builds upon the SeqGAN framework, which models data generation as a stochastic policy in a reinforcement learning context. The core innovation of ORGAN is its dual-reward structure: it incorporates both adversarial training rewards and expert-based, domain-specific rewards. This allows a generator to be trained not only to deceive a discriminator, as in traditional GANs, but also to optimize for specific objectives critical to the domain.

The balance between maintaining data distribution fidelity and optimizing for specific objectives is controlled by a tune-able parameter, $\lambda$ . When $\lambda$ is set to zero, the model purely focuses on domain-specific objectives, essentially operating as a naive RL algorithm. Conversely, setting $\lambda$ to one reverts the model to function like a SeqGAN.

To enhance training stability, the Wasserstein distance is employed as the loss function for the discriminator, mitigating the typical convergence issues faced by GANs, such as mode-collapse.

Experimental Evaluation

The paper presents empirical evaluations in two domains: molecular generation and musical melody composition. These domains were chosen due to their relevance and the complexity of generating sequences that need to meet precise configurations or styles.

Molecular Generation: The sequences are SMILES-encoded molecular structures. ORGAN was evaluated on its ability to generate molecules optimized for solubility, synthesizability, and drug-likeness. The results demonstrated that ORGAN could generate a high percentage of valid sequences, achieving enhancements in the optimized metrics compared to baseline models like Max Likelihood Estimation (MLE) and SeqGAN.
Music Generation: The sequences in this domain correspond to musical notes over time. The tasks involved maximizing musical tonality and the ratio of melodic steps, both of which contribute to the subjective quality and aesthetic of the generated music. In this application, ORGAN exhibited improved performance on the targeted musical metrics with maintained diversity.

Implications and Future Directions

The ORGAN framework's ability to integrate domain-specific objectives into the generative process represents a significant advancement for applications requiring tailored generative outputs. The approach is flexible, allowing it to be adapted to a range of domains where complex sequence data with specific property requirements is common.

Future research could explore several directions:

Refinement in Multi-objective Optimization: While the paper demonstrated the potential for alternating between different objectives during training, there is room to develop more sophisticated strategies that optimize several objectives simultaneously.
Application to Other Data Types: Extending the ORGAN framework's applicability to real-valued data, such as images and audio, could open new avenues where non-differentiable objectives must be optimized.
Outlier Detection and Management: Given the potential value of outliers in fields such as drug discovery, refining methods to push the boundaries of data distributions while ensuring meaningful and diverse generation is critical.

The ORGAN framework offers a robust approach to sequence generation by incorporating domain-specific enhancements without significantly sacrificing fidelity to the original data distributions, indicating its potential utility across various applied machine learning scenarios.

PDF Markdown

Related Papers

GitHub

GitHub - gablg1/ORGAN: Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models (241 stars)