Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction (2404.14970v1)

Published 23 Apr 2024 in cs.LG

Abstract: Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined. This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in diabetes prediction when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. D. Care, Care in diabetes—2022, Diabetes care 45 (2022) S17.
  2. A review on current advances in machine learning based diabetes prediction, Primary Care Diabetes 15 (2021) 435–443.
  3. P. Sonar, K. JayaMalini, Diabetes prediction using different machine learning approaches, in: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), IEEE, 2019, pp. 367–371.
  4. A. Mujumdar, V. Vaidehi, Diabetes prediction using machine learning algorithms, Procedia Computer Science 165 (2019) 292–299.
  5. Diabetes prediction using ensembling of different machine learning classifiers, IEEE Access 8 (2020) 76516–76531.
  6. Personalized diabetes management using electronic medical records, Diabetes care 40 (2017) 210–217.
  7. Prediction of type ii diabetes onset with computed tomography and electronic medical records, in: Multimodal Learning for Clinical Decision Support and Clinical Image-Based Procedures: 10th International Workshop, ML-CDS 2020, and 9th International Workshop, CLIP 2020, Held in Conjunction with MICCAI 2020, Springer, 2020, pp. 13–23.
  8. Learning temporal state of diabetes patients via combining behavioral and demographic data, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 2081–2089.
  9. Uncovering the gene regulatory network of type 2 diabetes through multi-omic data integration, Journal of Translational Medicine 20 (2022) 604.
  10. Knowledge graphs, ACM Computing Surveys (Csur) 54 (2021) 1–37.
  11. Biomedical ontologies: a functional perspective, Briefings in bioinformatics 9 (2008) 75–90.
  12. Knowledge graph embedding: A survey of approaches and applications, IEEE Transactions on Knowledge and Data Engineering 29 (2017) 2724–2743.
  13. Semantic similarity and machine learning with ontologies, Briefings in Bioinformatics 22 (2021) bbaa199.
  14. Identification of type 2 diabetes based on a ten-gene biomarker prediction model constructed using a support vector machine algorithm, BioMed Research International 2022 (2022).
  15. Downregulation of long non-coding rnas linc00523 and linc00994 in type 2 diabetes in an iranian cohort, Molecular biology reports 45 (2018) 1227–1233.
  16. Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding rnas expression: a comparison of four data mining approaches, BMC bioinformatics 21 (2020) 1–13.
  17. Long non-coding rna ly86-as1 and hcg27_201 expression in type 2 diabetes mellitus, Molecular biology reports 45 (2018) 2601–2608.
  18. Gene expression profiling of type 2 diabetes mellitus by bioinformatics analysis, Computational and Mathematical Methods in Medicine 2020 (2020).
  19. Benchmark and best practices for biomedical knowledge graph embeddings, in: Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2020, NIH Public Access, 2020, p. 167.
  20. Translating embeddings for modeling multi-relational data, in: Proceedings of NIPS 2013, Curran Associates Inc., Red Hook, NY, USA, 2013, p. 2787–2795.
  21. P. Ristoski, H. Paulheim, RDF2Vec: RDF graph embeddings for data mining, in: Proceedings of the 15th International Semantic Web Conference, Springer International Publishing, Cham, Switzerland, 2016, pp. 498–514.
  22. Ncbi geo: archive for gene expression and epigenomics data sets: 23-year update, Nucleic Acids Research 52 (2024) D138–D144.
  23. P. Preisner, H. Paulheim, Universal preprocessing operators for embedding knowledge graphs with literals (2022).
  24. G. Consortium, The Gene Ontology resource: enriching a GOld mine, Nucleic Acids Research 49 (2021) D325–D334.
  25. The GOA database: gene ontology annotation updates for 2015, Nucleic Acids Research 43 (2015) D1057–D1063.
  26. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets, Nucleic Acids Research 49 (2021) D605–D612.
  27. J. R. Quinlan, Induction of decision trees, Machine learning 1 (1986) 81–106.
  28. Series GSE30208, 2014. URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30208.
  29. Innate immune activity is detected prior to seroconversion in children with hla-conferred type 1 diabetes susceptibility, Diabetes 63 (2014) 2402–2414.
  30. Series GSE15932, 2012. URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15932.
  31. Series GSE55098, 2014. URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55098.
  32. Decreased mi r-146 expression in peripheral blood mononuclear cells is correlated with ongoing islet autoimmunity in type 1 diabetes patients 1, Journal of diabetes 7 (2015) 158–165.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets