Papers
Topics
Authors
Recent
2000 character limit reached

SynthIPD: assumption-lean synthetic individual patient data generation (2509.16466v1)

Published 19 Sep 2025 in stat.AP

Abstract: Individual patient data (IPD) are essential for statistical inference in clinical research. However, privacy concerns, high data-sharing costs, and restrictive access often make IPD unavailable. Conventional synthetic data generation usually relies on black box models such as generative adversial networks. These methods, however, requires a large piece of IPD for model training, may be ungeneralizable and lacks interpretability. This paper introduces an assumption-lean, three-step methodology for generating synthetic IPD with survival endpoints only based on published clinical trial articles. The method mainly leverages Kaplan-Meier (KM) curves with at-risk/censoring information and subgroup-level summary statistics. It digitizes the KM curve using Scalable Vector Graphics (SVG) beyond pixel accuracy and then generates synthetic covariates based on the statistics. We illustrate the method's potential through $2$ detailed case studies and simulation studies. The method offers important implications, enabling high-fidelity IPD generation to support evidence-based medical decisions.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.