ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles (2306.16649v1)

Published 29 Jun 2023 in cs.CL

Abstract: Automatically generating textual content with desired attributes is an ambitious task that people have pursued long. Existing works have made a series of progress in incorporating unimodal controls into LLMs (LMs), whereas how to generate controllable sentences with multimodal signals and high efficiency remains an open question. To tackle the puzzle, we propose a new paradigm of zero-shot controllable text generation with multimodal signals (\textsc{ZeroGen}). Specifically, \textsc{ZeroGen} leverages controls of text and image successively from token-level to sentence-level and maps them into a unified probability space at decoding, which customizes the LM outputs by weighted addition without extra training. To achieve better inter-modal trade-offs, we further introduce an effective dynamic weighting mechanism to regulate all control weights. Moreover, we conduct substantial experiments to probe the relationship of being in-depth or in-width between signals from distinct modalities. Encouraging empirical results on three downstream tasks show that \textsc{ZeroGen} not only outperforms its counterparts on captioning tasks by a large margin but also shows great potential in multimodal news generation with a higher degree of control. Our code will be released at https://github.com/ImKeTT/ZeroGen.

References (50)

Authors (3)

Haoqin Tu (25 papers)
Bowen Yang (55 papers)
Xianfeng Zhao (22 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - ImKeTT/ZeroGen: [NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation (13 stars)

Tweets

https://twitter.com/HaoqinT/status/1757963014396154198

ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles (2306.16649v1)

Summary

Related Papers

GitHub

Tweets