Determine effects of 3D pre-training data accuracy versus diversity on downstream performance
Ascertain a definitive conclusion on how 3D pre-training data accuracy (e.g., DFT-calculated equilibrium conformations) versus 3D data diversity (e.g., RDKit-generated conformers spanning broader chemical space) respectively affect downstream task performance within UniCorn’s molecular representation learning framework.
References
While we do not have a definitive conclusion, we have observed some phenomena that offer valuable insights for the community.
— UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning
(2405.10343 - Feng et al., 15 May 2024) in Appendix, The Impact of Data Accuracy and Diversity (Section S.5)