Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Utilizing Low-Dimensional Molecular Embeddings for Rapid Chemical Similarity Search (2402.07970v1)

Published 12 Feb 2024 in cs.IR and cs.LG

Abstract: Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality reductions of standard chemical embeddings as well as a learned, structurally-aware embedding -- SmallSA -- for this task. With this framework, searches on over one billion chemicals execute in less than a second on a single CPU core, five orders of magnitude faster than the brute-force approach. We also demonstrate that SmallSA achieves competitive performance on chemical similarity benchmarks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kathryn E. Kirchoff (2 papers)
  2. James Wellnitz (4 papers)
  3. Joshua E. Hochuli (1 paper)
  4. Travis Maxfield (12 papers)
  5. Konstantin I. Popov (5 papers)
  6. Shawn Gomez (1 paper)
  7. Alexander Tropsha (12 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.