Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Molecular Property Prediction via Mixture of Collaborative Experts (2312.03292v1)

Published 6 Dec 2023 in cs.LG, cs.MA, and q-bio.QM

Abstract: Molecular Property Prediction (MPP) task involves predicting biochemical properties based on molecular features, such as molecular graph structures, contributing to the discovery of lead compounds in drug development. To address data scarcity and imbalance in MPP, some studies have adopted Graph Neural Networks (GNN) as an encoder to extract commonalities from molecular graphs. However, these approaches often use a separate predictor for each task, neglecting the shared characteristics among predictors corresponding to different tasks. In response to this limitation, we introduce the GNN-MoCE architecture. It employs the Mixture of Collaborative Experts (MoCE) as predictors, exploiting task commonalities while confronting the homogeneity issue in the expert pool and the decision dominance dilemma within the expert group. To enhance expert diversity for collaboration among all experts, the Expert-Specific Projection method is proposed to assign a unique projection perspective to each expert. To balance decision-making influence for collaboration within the expert group, the Expert-Specific Loss is presented to integrate individual expert loss into the weighted decision loss of the group for more equitable training. Benefiting from the enhancements of MoCE in expert creation, dynamic expert group formation, and experts' collaboration, our model demonstrates superior performance over traditional methods on 24 MPP datasets, especially in tasks with limited data or high imbalance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. W. P. Walters and R. Barzilay, “Applications of deep learning in molecule generation and molecular property prediction,” Accounts of chemical research, vol. 54, no. 2, pp. 263–270, 2020.
  2. K. Huang, T. Fu, W. Gao, Y. Zhao, Y. Roohani, J. Leskovec, C. W. Coley, C. Xiao, J. Sun, and M. Zitnik, “Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development,” arXiv preprint arXiv:2102.09548, 2021.
  3. R. S. Cahn, C. Ingold, and V. Prelog, “Specification of molecular chirality,” Angewandte Chemie International Edition in English, vol. 5, no. 4, pp. 385–415, 1966.
  4. Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” arXiv preprint arXiv:1511.05493, 2015.
  5. W.-H. Lee, S. Millman, N. Desai, M. Srivatsa, and C. Liu, “Neuralfp: Out-of-distribution detection using fingerprints of neural networks,” in 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9561–9568, 2021.
  6. K. Yang, K. Swanson, W. Jin, C. Coley, P. Eiden, H. Gao, A. Guzman-Perez, T. Hopper, B. Kelley, M. Mathea, et al., “Analyzing learned molecular representations for property prediction,” Journal of chemical information and modeling, vol. 59, no. 8, pp. 3370–3388, 2019.
  7. Z. Guo, W. Yu, C. Zhang, M. Jiang, and N. V. Chawla, “Graseq: graph and sequence fusion learning for molecular property prediction,” in Proceedings of the 29th ACM international conference on information & knowledge management, pp. 435–443, 2020.
  8. J. Broberg, M. Bånkestad, and E. Ylipää, “Pre-training transformers for molecular property prediction using reaction prediction,” arXiv preprint arXiv:2207.02724, 2022.
  9. H. Stärk, D. Beaini, G. Corso, P. Tossou, C. Dallago, S. Günnemann, and P. Liò, “3d infomax improves gnns for molecular property prediction,” in International Conference on Machine Learning, pp. 20479–20502, PMLR, 2022.
  10. W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, “Strategies for pre-training graph neural networks,” arXiv preprint arXiv:1905.12265, 2019.
  11. Y. Fang, Q. Zhang, H. Yang, X. Zhuang, S. Deng, W. Zhang, M. Qin, Z. Chen, X. Fan, and H. Chen, “Molecular contrastive learning with chemical element knowledge graph,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3968–3976, 2022.
  12. H. Li, X. Zhao, S. Li, F. Wan, D. Zhao, and J. Zeng, “Improving molecular property prediction through a task similarity enhanced transfer learning strategy,” Iscience, vol. 25, no. 10, 2022.
  13. S. Liu, M. Qu, Z. Zhang, H. Cai, and J. Tang, “Multi-task learning with domain knowledge for molecular property prediction,” in NeurIPS 2021 AI for Science Workshop, 2021.
  14. J. Lee, I. Lee, and J. Kang, “Self-attention graph pooling,” in International conference on machine learning, pp. 3734–3743, PMLR, 2019.
  15. M. Orio, D. A. Pantazis, and F. Neese, “Density functional theory,” Photosynthesis research, vol. 102, pp. 443–453, 2009.
  16. A. Tropsha, “Best practices for qsar model development, validation, and exploitation,” Molecular informatics, vol. 29, no. 6-7, pp. 476–488, 2010.
  17. F. Neese, “Prediction of molecular properties and molecular spectroscopy with density functional theory: From fundamental theory to exchange-coupling,” Coordination Chemistry Reviews, vol. 253, no. 5-6, pp. 526–563, 2009.
  18. M. Tsubaki and T. Mizoguchi, “Fast and accurate molecular property prediction: learning atomic interactions and potentials with neural networks,” The journal of physical chemistry letters, vol. 9, no. 19, pp. 5733–5741, 2018.
  19. D. Weininger, “Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules,” Journal of chemical information and computer sciences, vol. 28, no. 1, pp. 31–36, 1988.
  20. I. Muegge and P. Mukherjee, “An overview of molecular fingerprint similarity search in virtual screening,” Expert opinion on drug discovery, vol. 11, no. 2, pp. 137–148, 2016.
  21. S. Wang, Y. Guo, Y. Wang, H. Sun, and J. Huang, “Smiles-bert: large scale unsupervised pre-training for molecular property prediction,” in Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pp. 429–436, 2019.
  22. Z. Li, M. Jiang, S. Wang, and S. Zhang, “Deep learning methods for molecular representation and property prediction,” Drug Discovery Today, p. 103373, 2022.
  23. G. A. Pinheiro, J. Mucelini, M. D. Soares, R. C. Prati, J. L. Da Silva, and M. G. Quiles, “Machine learning prediction of nine molecular properties based on the smiles representation of the qm9 quantum-chemistry dataset,” The Journal of Physical Chemistry A, vol. 124, no. 47, pp. 9854–9866, 2020.
  24. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  25. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, p. 84–90, may 2017.
  26. J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in International conference on machine learning, pp. 1263–1272, PMLR, 2017.
  27. H. Dai, B. Dai, and L. Song, “Discriminative embeddings of latent variable models for structured data,” in International conference on machine learning, pp. 2702–2711, PMLR, 2016.
  28. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  29. Z. Xiong, D. Wang, X. Liu, F. Zhong, X. Wan, X. Li, Z. Li, X. Luo, K. Chen, H. Jiang, et al., “Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism,” Journal of medicinal chemistry, vol. 63, no. 16, pp. 8749–8760, 2019.
  30. M. Withnall, E. Lindelöf, O. Engkvist, and H. Chen, “Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction,” Journal of cheminformatics, vol. 12, no. 1, pp. 1–18, 2020.
  31. B. Tang, S. T. Kramer, M. Fang, Y. Qiu, Z. Wu, and D. Xu, “A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility,” Journal of cheminformatics, vol. 12, no. 1, pp. 1–9, 2020.
  32. M. Meng, Z. Wei, Z. Li, M. Jiang, and Y. Bian, “Property prediction of molecules in graph convolutional neural network expansion,” in 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), pp. 263–266, IEEE, 2019.
  33. J. Jiang, R. Zhang, J. Ma, Y. Liu, E. Yang, S. Du, Z. Zhao, and Y. Yuan, “Trangru: focusing on both the local and global information of molecules for molecular property prediction,” Applied Intelligence, vol. 53, no. 12, pp. 15246–15260, 2023.
  34. R. Dey and F. M. Salem, “Gate-variants of gated recurrent unit (gru) neural networks,” in 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp. 1597–1600, IEEE, 2017.
  35. Y. Sun, Y. Chen, W. Ma, W. Huang, K. Liu, Z. Ma, W.-Y. Ma, and Y. Lan, “Pemp: Leveraging physics properties to enhance molecular property prediction,” in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp. 3505–3513, 2022.
  36. R. Sun, H. Dai, and A. W. Yu, “Does gnn pretraining help molecular representation?,” Advances in Neural Information Processing Systems, vol. 35, pp. 12096–12109, 2022.
  37. N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” arXiv preprint arXiv:1701.06538, 2017.
  38. J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi, “Modeling task relationships in multi-task learning with multi-gate mixture-of-experts,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, (New York, NY, USA), p. 1930–1939, Association for Computing Machinery, 2018.
  39. K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?,” arXiv preprint arXiv:1810.00826, 2018.
  40. M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
  41. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Computation, vol. 3, no. 1, pp. 79–87, 1991.
  42. N. Du, Y. Huang, A. M. Dai, S. Tong, D. Lepikhin, Y. Xu, M. Krikun, Y. Zhou, A. W. Yu, O. Firat, et al., “Glam: Efficient scaling of language models with mixture-of-experts,” in International Conference on Machine Learning, pp. 5547–5569, PMLR, 2022.
  43. D. Eigen, M. Ranzato, and I. Sutskever, “Learning factored representations in a deep mixture of experts,” CoRR, vol. abs/1312.4314, 2013.
  44. E. Bengio, P.-L. Bacon, J. Pineau, and D. Precup, “Conditional computation in neural networks for faster models,” ArXiv, vol. abs/1511.06297, 2015.
  45. W. Hu, M. Fey, H. Ren, M. Nakata, Y. Dong, and J. Leskovec, “Ogb-lsc: A large-scale challenge for machine learning on graphs,” arXiv preprint arXiv:2103.09430, 2021.
  46. T. S. Ryan Greene and A. N. Lilian Weng, “New and improved embedding model,” https://openai.com/blog/new-and-improved-embedding-model, 2022.
  47. K. Huang, T. Fu, L. M. Glass, M. Zitnik, C. Xiao, and J. Sun, “Deeppurpose: a deep learning library for drug–target interaction prediction,” Bioinformatics, vol. 36, no. 22-23, pp. 5545–5547, 2020.
  48. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  49. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xu Yao (10 papers)
  2. Shuang Liang (84 papers)
  3. Songqiao Han (12 papers)
  4. Hailiang Huang (21 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.