CrysFormer: Protein Structure Prediction via 3d Patterson Maps and Partial Structure Attention (2310.03899v1)
Abstract: Determining the structure of a protein has been a decades-long open question. A protein's three-dimensional structure often poses nontrivial computation costs, when classical simulation algorithms are utilized. Advances in the transformer neural network architecture -- such as AlphaFold2 -- achieve significant improvements for this problem, by learning from a large dataset of sequence information and corresponding protein structures. Yet, such methods only focus on sequence information; other available prior knowledge, such as protein crystallography and partial structure of amino acids, could be potentially utilized. To the best of our knowledge, we propose the first transformer-based model that directly utilizes protein crystallography and partial structure information to predict the electron density maps of proteins. Via two new datasets of peptide fragments (2-residue and 15-residue) , we demonstrate our method, dubbed \texttt{CrysFormer}, can achieve accurate predictions, based on a much smaller dataset size and with reduced computation costs.
- Solid state physics. Cengage Learning, 2022.
- Protein storytelling through physics. Science, 370(6520):eaaz3041, 2020.
- Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013.
- Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
- Vit-v-net: Vision transformer for unsupervised volumetric medical image registration. arXiv, .2104.06468, 2021. URL https://doi.org/10.48550/arXiv.2104.06468.
- Utilizing information bottleneck to evaluate the capability of deep neural networks for image classification. Entropy, 21(5), 2019. ISSN 1099-4300. doi: 10.3390/e21050456. URL https://www.mdpi.com/1099-4300/21/5/456.
- Kevin Cowtan. cphasematch, 2011. URL https://www.ccp4.ac.uk/html/cphasematch.html.
- P. R. David and S. Subbiah. Low-resolution real-space envelopes: the application of the condensing-protocol approach to the ab initio macromolecular phase problem of a variety of examples. Acta Crystallographica Section D, 50(2):132–138, Mar 1994. doi: 10.1107/S090744499301131X. URL https://doi.org/10.1107/S090744499301131X.
- Jan Drenth. Principles of protein X-ray crystallography. Springer Science & Business Media, 2007.
- Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7):e1005659, 2017.
- J. R. Fienup. Phase retrieval algorithms: a comparison. Appl. Opt., 21(15):2758–2769, Aug 1982. doi: 10.1364/AO.21.002758.
- Deep phase retrieval for astronomical Shack–Hartmann wavefront sensors. Monthly Notices of the Royal Astronomical Society, 510(3):4347–4354, 12 2021. ISSN 0035-8711. doi: 10.1093/mnras/stab3690.
- Direct phasing of protein crystals with high solvent content. Acta Crystallographica Section A, 71(1):92–98, Jan 2015. doi: 10.1107/S2053273314024097.
- Improving the efficiency of molecular replacement by utilizing a new iterative transform phasing algorithm. Acta Crystallographica Section A: Foundations and Advances, 72(5):539–547, 2016.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034, New York, NY, USA, 2015. IEEE Press. doi: 10.1109/ICCV.2015.123.
- Anne Marie Helmenstine. Amino acid chirality, 2021. URL https://www.thoughtco.com/amino-acid-chirality-4009939.
- Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, New York, NY, USA, 2018. IEEE Press. doi: 10.1109/CVPR.2018.00745.
- David Hurwitz. From patterson maps to atomic coordinates: Training a deep neural network to solve the phase problem for a simplified case. arXiv, 03 2020.
- Resolution dependence of an ab initio phasing method in protein x-ray crystallography. Crystals, 8(4), 2018. ISSN 2073-4352. doi: 10.3390/cryst8040156. URL https://www.mdpi.com/2073-4352/8/4/156.
- Molecular-replacement phasing using predicted protein structures from awsem-suite. IUCrJ, 7(6):1168–1178, 2020.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Ptychnet: Cnn based fourier ptychography. In 2017 IEEE International Conference on Image Processing (ICIP), pp. 1712–1716, New York, NY, USA, 2017. IEEE Press.
- A general method for directly phasing diffraction data from high-solvent-content protein crystals. IUCrJ, 9(5), 2022.
- Protein Crystallography. Johns Hopkins University Press, 2008.
- Fnet: Mixing tokens with fourier transforms, 2022.
- Macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix. Acta Crystallogr., D75(10):861–877, Oct 2019. doi: 10.1107/S2059798319011471. URL https://doi.org/10.1107/S2059798319011471.
- A deep learning solution for crystallographic structure determination. IUCrJ, 10(4):487–496, 2023.
- A. L. Patterson. A fourier series method for the determination of the components of interatomic distances in crystals. Phys. Rev., 46:372–376, Sep 1934. doi: 10.1103/PhysRev.46.372.
- RJ Read and AJ Schierbeek. A phased translation function. Journal of Applied Crystallography, 21(5):490–495, 1988.
- Phase recovery and holographic image reconstruction using deep learning in neural networks. Light: Science & Applications, 7(2):17141–17141, 2018.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241, 2015.
- I-TASSER: a unified platform for automated protein structure and function prediction. Nature protocols, 5(4):725–738, 2010.
- Comparative protein modelling by satisfaction of spatial restraints. Journal of molecular biology, 234(3):779–815, 1993.
- Manfred J Sippl. Calculation of conformational ensembles from potentials of mena force: an approach to the knowledge-based prediction of local structures in globular proteins. Journal of molecular biology, 213(4):859–883, 1990.
- Nature’s Robots: A History of Proteins. Oxford University Press, 2004.
- Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. Acta Crystallogr., D64(1):61–69, Jan 2008. doi: 10.1107/S090744490705024X. URL https://doi.org/10.1107/S090744490705024X.
- Alphafold predictions are valuable hypotheses, and accelerate but do not replace experimental structure determination. bioRxiv, 2023. doi: 10.1101/2022.11.21.517405. URL https://www.biorxiv.org/content/early/2023/05/19/2022.11.21.517405.
- Highly accurate protein structure prediction for the human proteome. Nature, 596(7873):590–596, 2021.
- An introduction to experimental phasing of macromolecules illustrated by SHELX; new autotracing features. Acta Crystallogr., D74(2):106–116, Feb 2018. doi: 10.1107/S2059798317015121. URL https://doi.org/10.1107/S2059798317015121.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Overview of the CCP4 suite and current developments. Acta Crystallographica Section D, 67(4):235–242, Apr 2011. doi: 10.1107/S0907444910045749.
- Marcin Wojdyr. Gemmi: A library for structural biology. Journal of Open Source Software, 7(73):4200, 2022.
- wwPDB consortium. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Research, 47(D1):D520–D528, 2019. doi: 10.1093/nar/gky949. URL https://doi.org/10.1093/nar/gky949.
- Nyströmformer: A nyström-based algorithm for approximating self-attention. Proceedings of the AAAI Conference on Artificial Intelligence, 35(16):14138–14148, May 2021. doi: 10.1609/aaai.v35i16.17664. URL https://ojs.aaai.org/index.php/AAAI/article/view/17664.
- U-net-based medical image segmentation algorithm. In 13th International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–5, 2021. doi: 10.1109/WCSP52459.2021.9613447.
- Gerchberg–saxton algorithm applied in the fractional fourier or the fresnel domain. Optics Letters, 21(12):842–844, 1996.