Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Translation between Molecules and Natural Language (2204.11817v3)

Published 25 Apr 2022 in cs.CL and cs.AI

Abstract: We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Carl Edwards (15 papers)
  2. Tuan Lai (8 papers)
  3. Kevin Ros (6 papers)
  4. Garrett Honke (8 papers)
  5. Kyunghyun Cho (292 papers)
  6. Heng Ji (266 papers)
Citations (134)

Summary

We haven't generated a summary for this paper yet.