Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Collaborative Synthesis of Patient Records through Multi-Visit Health State Inference (2312.14646v1)

Published 22 Dec 2023 in cs.AI, cs.CL, and cs.LG

Abstract: Electronic health records (EHRs) have become the foundation of machine learning applications in healthcare, while the utility of real patient records is often limited by privacy and security concerns. Synthetic EHR generation provides an additional perspective to compensate for this limitation. Most existing methods synthesize new records based on real EHR data, without consideration of different types of events in EHR data, which cannot control the event combinations in line with medical common sense. In this paper, we propose MSIC, a Multi-visit health Status Inference model for Collaborative EHR synthesis to address these limitations. First, we formulate the synthetic EHR generation process as a probabilistic graphical model and tightly connect different types of events by modeling the latent health states. Then, we derive a health state inference method tailored for the multi-visit scenario to effectively utilize previous records to synthesize current and future records. Furthermore, we propose to generate medical reports to add textual descriptions for each medical event, providing broader applications for synthesized EHR data. For generating different paragraphs in each visit, we incorporate a multi-generator deliberation framework to collaborate the message passing of multiple generators and employ a two-phase decoding strategy to generate high-quality reports. Our extensive experiments on the widely used benchmarks, MIMIC-III and MIMIC-IV, demonstrate that MSIC advances state-of-the-art results on the quality of synthetic data while maintaining low privacy risks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Banach wasserstein gan. Advances in neural information processing systems, 31.
  2. Publicly Available Clinical BERT Embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78.
  3. Synthesizing electronic health records using improved generative adversarial networks. Journal of the American Medical Informatics Association, 26(3): 228–241.
  4. Evaluating re-identification risks with respect to the HIPAA privacy rule. Journal of the American Medical Informatics Association, 17(2): 169–177.
  5. EVA: Generating longitudinal electronic health records using conditional variational autoencoders. In Machine Learning for Healthcare Conference, 260–282. PMLR.
  6. Generating multi-label discrete patient records using generative adversarial networks. In Machine learning for healthcare conference, 286–305. PMLR.
  7. CONAN: complementary pattern augmentation for rare disease detection. In Proceedings of the AAAI Conference on Artificial Intelligence, 614–621.
  8. Autoregressive Entity Retrieval. In 9th International Conference on Learning Representations.
  9. Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, volume 4, 211–218. SciTePress.
  10. The re-identification risk of Canadians from longitudinal demographics. BMC medical informatics and decision making, 11(1): 1–12.
  11. Anonymising and sharing individual patient data. bmj, 350.
  12. A deep learning pipeline for patient diagnosis prediction using electronic health records. arXiv preprint arXiv:2006.16926.
  13. Making pre-trained language models better few-shot learners. In Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021, 3816–3830. Association for Computational Linguistics (ACL).
  14. Named entity recognition in electronic health records using transfer learning bootstrapped neural networks. Neural Networks, 121: 132–139.
  15. Named entity recognition for electronic health records: a comparison of rule-based and machine learning approaches. arXiv preprint arXiv:1903.03985.
  16. Boundary Seeking GANs. In International Conference on Learning Representations.
  17. Mimic-iv. PhysioNet. Available online at: https://physionet. org/content/mimiciv/1.0/(accessed August 23, 2021).
  18. MIMIC-III, a freely accessible critical care database. Scientific Data, 3.
  19. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4): 1234–1240.
  20. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871–7880.
  21. Generating Synthetic Mixed-type Longitudinal Electronic Health Records for Artificial Intelligent Applications. ArXiv, abs/2112.12047.
  22. Incremental Transformer with Deliberation Decoder for Document Grounded Conversations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 12–21.
  23. Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74–81.
  24. Socinf: Membership inference attacks on social media health data with machine learning. IEEE Transactions on Computational Social Systems, 6(5): 907–921.
  25. Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models. In Findings of the Association for Computational Linguistics: ACL 2022, 2824–2835.
  26. Multi-label clinical time-series generation via conditional gan. IEEE Transactions on Knowledge and Data Engineering.
  27. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6).
  28. Literature-Augmented Clinical Outcome Prediction. In Findings of the Association for Computational Linguistics: NAACL 2022, 438–453.
  29. Automated de-identification of free-text medical records. BMC medical informatics and decision making, 8(1): 1–17.
  30. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 311–318.
  31. An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining. BioNLP 2020, 205.
  32. Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Computer Science, 100: 55–61.
  33. Language models are unsupervised multitask learners. OpenAI blog, 1(8): 9.
  34. SMOOTH-GAN: towards sharp and smooth synthetic EHR data generation. In Artificial Intelligence in Medicine: 18th International Conference on Artificial Intelligence in Medicine, AIME 2020, Minneapolis, MN, USA, August 25–28, 2020, Proceedings 18, 37–48. Springer.
  35. Salakhutdinov, R. 2015. Learning deep generative models. Annual Review of Statistics and Its Application, 2: 361–385.
  36. Gamenet: Graph augmented memory networks for recommending medication combination. In proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 1126–1133.
  37. SympGraph: a framework for mining clinical notes through symptom relation graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 1167–1175.
  38. Debiased, Longitudinal and Coordinated Drug Recommendation through Multi-Visit Clinic Records. Advances in Neural Information Processing Systems, 35: 27837–27849.
  39. Generating Longitudinal Synthetic EHR Data with Recurrent Autoencoders and Generative Adversarial Networks. In Heterogeneous Data Management, Polystores, and Analytics for Healthcare: VLDB Workshops, Poly 2021 and DMAH 2021, Virtual Event, August 20, 2021, Revised Selected Papers 7, 153–165. Springer.
  40. Attention is all you need. Advances in neural information processing systems, 30.
  41. Pre-trained language models in biomedical domain: A systematic survey. arXiv preprint arXiv:2110.05006.
  42. PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning. In Conference on Empirical Methods in Natural Language Processing.
  43. Deliberation networks: Sequence generation beyond one-pass decoding. Advances in neural information processing systems, 30.
  44. An overview of deep generative models. IETE Technical Review, 32(2): 131–139.
  45. Generating electronic health records with multiple data types and constraints. In AMIA annual symposium proceedings, volume 2020, 1335. American Medical Informatics Association.
  46. A Unified Generative Framework for Various NER Subtasks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5808–5822.
  47. SafeDrug: Dual Molecular Graph Encoders for Safe Drug Recommendations. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021.
  48. BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model. In Proceedings of the 21st Workshop on Biomedical Language Processing, 97–109.
  49. On the convergence and mode collapse of GAN. In SIGGRAPH Asia 2018 Technical Briefs, 1–4.
  50. SynTEG: a framework for temporal structured electronic health data simulation. Journal of the American Medical Informatics Association, 28(3): 596–604.
Citations (1)

Summary

We haven't generated a summary for this paper yet.