Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language (2303.03363v2)

Published 6 Mar 2023 in q-bio.BM, cs.CL, cs.LG, and stat.ML

Abstract: Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific LLMs could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pre-training objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Philipp Seidl (8 papers)
  2. Andreu Vall (9 papers)
  3. Sepp Hochreiter (82 papers)
  4. Günter Klambauer (28 papers)
Citations (35)
X Twitter Logo Streamline Icon: https://streamlinehq.com