Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimal transport for automatic alignment of untargeted metabolomic data (2306.03218v4)

Published 5 Jun 2023 in q-bio.QM and cs.LG

Abstract: Untargeted metabolomic profiling through liquid chromatography-mass spectrometry (LC-MS) measures a vast array of metabolites within biospecimens, advancing drug development, disease diagnosis, and risk prediction. However, the low throughput of LC-MS poses a major challenge for biomarker discovery, annotation, and experimental comparison, necessitating the merging of multiple datasets. Current data pooling methods encounter practical limitations due to their vulnerability to data variations and hyperparameter dependence. Here we introduce GromovMatcher, a flexible and user-friendly algorithm that automatically combines LC-MS datasets using optimal transport. By capitalizing on feature intensity correlation structures, GromovMatcher delivers superior alignment accuracy and robustness compared to existing approaches. This algorithm scales to thousands of features requiring minimal hyperparameter tuning. Manually curated datasets for validating alignment algorithms are limited in the field of untargeted metabolomics, and hence we develop a dataset split procedure to generate pairs of validation datasets to test the alignments produced by GromovMatcher and other methods. Applying our method to experimental patient studies of liver and pancreatic cancer, we discover shared metabolic features related to patient alcohol intake, demonstrating how GromovMatcher facilitates the search for biomarkers associated with lifestyle risk factors linked to several cancer types.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Agresti A, Coull BA. Approximate is better than “exact” for interval estimation of binomial proportions. Am Stat. 1998; 52(2):119–126. doi: https://doi.org/10.2307/2685469.
  2. A multi-omic analysis of birthweight in newborn cord blood reveals new underlying mechanisms related to cholesterol metabolism. Metabolism. 2020; 110:154292. doi: https://doi.org/10.1016/j.metabol.2020.154292.
  3. Alvarez-Melis D, Jaakkola T. Gromov-Wasserstein Alignment of Word Embedding Spaces. In: EMNLP Brussels, Belgium: Association for Computational Linguistics; 2018. p. 1881–1890. https://aclanthology.org/D18-1214, doi: 10.18653/v1/D18-1214.
  4. Towards optimal transport with global invariances. In: AISTATS PMLR; 2019. p. 1870–1879. https://proceedings.mlr.press/v89/alvarez-melis19a.html.
  5. Bedia C. Metabolomics in environmental toxicology: Applications and challenges. Trends Environ Anal Chem. 2022; 34:e00161. doi: https://doi.org/10.1016/j.teac.2022.e00161.
  6. Multi-marginal Gromov-Wasserstein transport and barycenters. arXiv preprint arXiv:220506725. 2022; doi: https://doi.org/10.48550/arXiv.2205.06725.
  7. Interval Estimation for a Binomial Proportion. Stat Sci. 2001; 16(2):101 – 133. doi: https://doi.org/10.1214/ss/1009213286.
  8. Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction. Metabolomics. 2016 Sep; 12(11):173. doi: https://doi.org/10.1007/s11306-016-1124-4.
  9. Metabolite discovery through global annotation of untargeted metabolomics data. Nat Methods. 2021 Nov; 18(11):1377–1385. doi: https://doi.org/10.1038/s41592-021-01303-3.
  10. Unbalanced optimal transport: Dynamic and Kantorovich formulations. J Funct Anal. 2018; 274(11):3090–3123. doi: https://doi.org/10.1016/j.jfa.2018.03.008.
  11. Finding Correspondence between Metabolomic Features in Untargeted Liquid Chromatography–Mass Spectrometry Metabolomics Datasets. Anal Chem. 2022; 94(14):5493–5503. doi: https://doi.org/10.1021/acs.analchem.1c03592.
  12. Joint distribution optimal transportation for domain adaptation. NeurIPS. 2017; 30. https://dl.acm.org/doi/10.5555/3294996.3295130.
  13. SCOT: Single-Cell Multi-Omics Alignment with Optimal Transport. J Comput Biol. 2022; 29(1):3–18. doi: https://doi.org/10.1089/cmb.2021.0446.
  14. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat Microbiol. 2019; 4(2):293–305. doi: https://doi.org/10.1038/s41564-018-0306-4.
  15. Methodological issues in a prospective study on plasma concentrations of persistent organic pollutants and pancreatic cancer risk within the EPIC cohort. Environmental Research. 2019; 169:417–433. doi: https://doi.org/10.1016/j.envres.2018.11.027.
  16. Variational autoencoders learn transferrable representations of metabolomics data. Commun Biol. 2022 Jun; 5(1):645. doi: https://doi.org/10.1038/s42003-022-03579-3.
  17. Gromov M. Metric Structures for Riemannian and Non-Riemannian Spaces. Birkhäuser Boston, Inc., Boston, MA; 2001. doi: https://doi.org/10.1007/978-0-8176-4583-0.
  18. metabCombiner: Paired Untargeted LC-HRMS Metabolomics Feature Matching and Concatenation of Disparately Acquired Data Sets. Anal Chem. 2021; 93(12):5028–5036. doi: https://doi.org/10.1021/acs.analchem.0c03693.
  19. PAIRUP-MS: Pathway analysis and imputation to relate unknowns in profiles from mass spectrometry-based metabolite data. PLoS Comput Biol. 2019 01; 15(1):1–26. doi: https://doi.org/10.1371/journal.pcbi.1006734.
  20. Ivanisevic J, Want EJ. From Samples to Insights into Metabolism: Uncovering Biologically Relevant Information in LC-HRMS Metabolomics Data. Metabolites. 2019; 9(12). doi: https://doi.org/10.3390/metabo9120308.
  21. Kantorovich LV. On the translocation of masses. J Math Sci. 2006; 133(4):1381–1382. doi: https://doi.org/10.1007/s10958-006-0049-2.
  22. Metabolomics-Based Discovery of Molecular Signatures for Triple Negative Breast Cancer in Asian Female Population. Sci Rep. 2020 Jan; 10(1):370. doi: https://doi.org/10.1038/s41598-019-57068-5.
  23. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Sci Rep. 2020 Aug; 10(1):13856. doi: https://doi.org/10.1038/s41598-020-70850-0.
  24. Novel biomarkers of habitual alcohol intake and associations with risk of pancreatic and liver cancers and liver disease mortality. J Natl Cancer Inst. 2021; 113(11):1542–1550. doi: https://doi.org/10.1093/jnci/djab078.
  25. Mémoli F. Gromov–Wasserstein Distances and the Metric Approach to Object Matching. Found Comput Math. 2011 Aug; 11(4):417–487. doi: https://doi.org/10.1007/s10208-011-9093-5.
  26. Monge G. Mémoire sur la théorie des déblais et des remblais. Mem Math Phys Acad Royale Sci. 1781; p. 666–704.
  27. Gene expression cartography. Nature. 2019 Dec; 576(7785):132–137. doi: https://doi.org/10.1038/s41586-019-1773-3.
  28. Patti GJ. Separation strategies for untargeted metabolomics. J Sep Sci. 2011; 34(24):3460–3469. doi: https://doi.org/10.1002/jssc.201100532.
  29. Gromov-wasserstein averaging of kernel and distance matrices. In: ICML PMLR; 2016. p. 2664–2672. doi: https://doi.org/10.5555/3045390.3045671.
  30. Computational optimal transport: With applications to data science. Found Trends Mach Learn. 2019; 11(5-6):355–607. doi: https://doi.org/10.1561/2200000073.
  31. Revealing disease-associated pathways by network integration of untargeted metabolomics. Nat Methods. 2016; 13(9):770–776. doi: https://doi.org/10.1038/nmeth.3940.
  32. The Blood Exposome and Its Role in Discovering Causes of Disease. Environ Health Perspect. 2014; 122(8):769–774. doi: https://doi.org/10.1289/ehp.1308015.
  33. Interactive supercomputing on 40,000 cores for machine learning and data analysis. In: HPEC IEEE; 2018. p. 1–6. doi: https://doi.org/10.1109/HPEC.2018.8547629.
  34. European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002; 5(6b):1113–1124. doi: https://doi.org/10.1079/PHN2002394.
  35. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell. 2019; 176(4):928–943. doi: https://doi.org/10.1016/j.cell.2019.01.006.
  36. Sinkhorn divergences for unbalanced optimal transport. arXiv preprint arXiv:191012958. 2019; doi: https://doi.org/10.48550/arXiv.1910.12958.
  37. The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation. In: NeurIPS, vol. 34 Curran Associates, Inc.; 2021. p. 8766–8779. https://proceedings.neurips.cc/paper/2021/hash/4990974d150d0de5e6e15a1454fe6b0f-Abstract.html.
  38. Alignstein: Optimal transport for improved LC-MS retention time alignment. GigaScience. 2022 11; 11. doi: https://doi.org/https://doi.org/10.1093/gigascience/giac101.
  39. Group level validation of protein intakes estimated by 24-hour diet recall and dietary questionnaires against 24-hour urinary nitrogen in the European Prospective Investigation into Cancer and Nutrition (EPIC) calibration study. Cancer Epidemiol Biomarkers Prev. 2003 Aug; 12(8):784–795. https://pubmed.ncbi.nlm.nih.gov/12917211/.
  40. XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. Anal Chem. 2006; 78(3):779–787. doi: https://doi.org/10.1021/ac051437y.
  41. Entropic Metric Alignment for Correspondence Problems. ACM Trans Graph. 2016 jul; 35(4). doi: https://doi.org/10.1145/2897824.2925903.
  42. Alteration of amino acid and biogenic amine metabolism in hepatobiliary cancers: Findings from a prospective cohort study. Int J Cancer. 2016; 138(2):348–360. doi: https://doi.org/10.1002/ijc.29718.
  43. Metabolic perturbations prior to hepatocellular carcinoma diagnosis: Findings from a prospective observational cohort study. Int J Cancer. 2021; 148(3):609–625. doi: https://doi.org/https://doi.org/10.1002/ijc.33236.
  44. metaXCMS: second-order analysis of untargeted metabolomics data. Anal Chem. 2011; 83(3):696–700. doi: https://doi.org/10.1021/ac102980g.
  45. Liquid Chromatography–Mass Spectrometry Calibration Transfer and Metabolomics Data Fusion. Anal Chem. 2012; 84(22):9848–9857. doi: https://doi.org/10.1021/ac302227c.
  46. Villani C. Topics in optimal transportation, vol. 58. American Mathematical Soc.; 2021.
  47. Metabolite profiles and the risk of developing diabetes. Nat Med. 2011; 17(4):448–453. doi: https://doi.org/10.1038/nm.2307.
  48. Wishart DS. Metabolomics for Investigating Physiological and Pathophysiological Processes. Physiol Rev. 2019; 99(4):1819–1875. doi: https://doi.org/10.1152/physrev.00035.2018.
  49. Predicting cell lineages using autoencoders and optimal transport. PLoS Comput Biol. 2020; 16(4):1–20. doi: https://doi.org/10.1371/journal.pcbi.1007828.
  50. LC-MS-based metabolomics. Mol BioSyst. 2012; 8:470–481. doi: https://doi.org/10.1039/C1MB05350G.
Citations (1)

Summary

We haven't generated a summary for this paper yet.