Papers
Topics
Authors
Recent
2000 character limit reached

Fréchet ChemNet Distance: A metric for generative models for molecules in drug discovery (1803.09518v3)

Published 26 Mar 2018 in cs.LG, q-bio.QM, and stat.ML

Abstract: The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, assessing the performance of such generative models is notoriously difficult. Metrics that are typically used to assess the performance of such generative models are the percentage of chemically valid molecules or the similarity to real molecules in terms of particular descriptors, such as the partition coefficient (logP) or druglikeness. However, method comparison is difficult because of the inconsistent use of evaluation metrics, the necessity for multiple metrics, and the fact that some of these measures can easily be tricked by simple rule-based systems. We propose a novel distance measure between two sets of molecules, called Fr\'echet ChemNet distance (FCD), that can be used as an evaluation metric for generative models. The FCD is similar to a recently established performance metric for comparing image generation methods, the Fr\'echet Inception Distance (FID). Whereas the FID uses one of the hidden layers of InceptionNet, the FCD utilizes the penultimate layer of a deep neural network called ChemNet, which was trained to predict drug activities. Thus, the FCD metric takes into account chemically and biologically relevant information about molecules, and also measures the diversity of the set via the distribution of generated molecules. The FCD's advantage over previous metrics is that it can detect if generated molecules are a) diverse and have similar b) chemical and c) biological properties as real molecules. We further provide an easy-to-use implementation that only requires the SMILES representation of the generated molecules as input to calculate the FCD. Implementations are available at: https://www.github.com/bioinf-jku/FCD

Citations (306)

Summary

  • The paper introduces the Fréchet ChemNet Distance (FCD) as a unified metric that comprehensively evaluates generative molecular models by assessing chemical and biological features.
  • It leverages a deep learning framework through ChemNet to compute Gaussian distribution characteristics from molecular representations, ensuring robust comparison.
  • Experimental results confirm that FCD outperforms traditional metrics by detecting biases and issues like mode collapse in drug discovery applications.

Fréchet ChemNet Distance: A Metric for Generative Models in Drug Discovery

The paper at hand presents a critical advancement in the evaluation of generative models for molecular design, specifically within the field of drug discovery. The paper introduces the Fréchet ChemNet Distance (FCD), an innovative metric aimed at addressing the complex challenge of assessing generative molecular models which often require multifaceted evaluation criteria encompassing chemical validity, similarity to known molecules, and diversity.

Problem Statement and Motivation

Current methodologies in evaluating generative models for molecules are hampered by inconsistent metrics that make comparative analysis difficult and susceptible to manipulation by simple rule-based systems. Typically utilized metrics, such as the percentage of chemically valid molecules or chemical property-based descriptors (e.g., logP, druglikeness), fall short in providing a comprehensive evaluation framework. There is a distinct need for a unified measure that can encapsulate multiple criteria efficiently.

Fréchet ChemNet Distance

The FCD is analogous to the Fréchet Inception Distance (FID) used for image generation evaluations but is tailored specifically for molecular structures. It leverages the deep learning architecture ChemNet, which is trained to predict drug activities and thus encapsulates both chemical and biological relevance. The FCD evaluates the distributional distance between generated molecules and a reference set of real molecules by comparing the activations of the penultimate layer of ChemNet. The metric is calculated using the Fréchet distance between two Gaussian distributions derived from these activations.

Methodology and Implementation

The authors detail the process of calculating the FCD by gathering numerical representations of molecules through ChemNet, and subsequently estimating their mean and covariance. The analysis demonstrates that a sample size as low as 5,000 molecules can yield reliable FCD estimates.

Flaws in generative models, such as biases in druglikeness, logP, synthetic accessibility, mode collapse, and biological targeting (e.g., PLK1 inhibitors), are effectively detected by the FCD. The metric is shown to be superior to four traditional metrics and a fingerprint-based Fréchet distance (FFD), especially in its ability to discern biologically relevant information.

Experimental Results and Applications

The paper conducts a comprehensive evaluation of recent generative models, including those employing LSTM networks and reinforcement learning-based strategies. The FCD successfully ranks these models in line with intuitive expectations and previously reported outcomes, highlighting its capability to reflect both the chemical and biological distribution of generated molecules.

Implications and Future Outlook

The introduction of FCD marks an important contribution to the field of machine learning-driven drug discovery, providing researchers with a robust tool to guide the development of more sophisticated generative models. It allows for a more uniform assessment framework that can foster greater focus and comparability across studies.

By capturing both chemical and biological dimensions, FCD is poised to significantly streamline model evaluations and spur the development of more targeted and effective drug discovery methodologies. Future research could leverage this metric to enhance graph-based molecular representations or to evaluate models focused on specific biological pathways.

Conclusion

In summation, the Fréchet ChemNet Distance offers an incisive and empirically validated approach for evaluating the intricacies of generative models in drug discovery. Its ability to encapsulate diverse and crucial evaluation criteria into a single, comprehensive metric stands to improve the fidelity and applicability of generative models across various drug design applications.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.