Normalized rotation shape descriptors and lossy compression of molecular shape (1509.09211v1)
Abstract: There is a common need to search of molecular databases for compounds resembling some shape, what suggests having similar biological activity while searching for new drugs. The large size of the databases requires fast methods for such initial screening, for example based on feature vectors constructed to fulfill the requirement that similar molecules should correspond to close vectors. Ultrafast Shape Recognition (USR) is a popular approach of this type. It uses vectors of 12 real number as 3 first moments of distances from 4 emphasized points. These coordinates might contain unnecessary correlations and does not allow to reconstruct the approximated shape. In contrast, spherical harmonic (SH) decomposition uses orthogonal coordinates, suggesting their independence and so lager informational content of the feature vector. There is usually considered rotationally invariant SH descriptors, what means discarding of some essential information. This article discusses framework for descriptors with normalized rotation, for example by using principal component analysis (PCA-SH). As one of the most interesting are ligands which have to slide into a protein, we will introduce descriptors optimized for such flat elongated shapes. Bent deformed cylinder (BDC) describes the molecule as a cylinder which was first bent, then deformed such that its cross-sections became ellipses of evolving shape. Legendre polynomials are used to describe the central axis of such bent cylinder. Additional polynomials are used to define evolution of such elliptic cross-section along the main axis. There will be also discussed bent cylindrical harmonics (BCH), which uses cross-sections described by cylindrical harmonics instead of ellipses. All these normalized rotation descriptors allow to reconstruct (decode) the approximated representation of the shape, hence can be also used for lossy compression purposes.