Papers
Topics
Authors
Recent
2000 character limit reached

How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues (2504.21800v3)

Published 30 Apr 2025 in cs.CL, cs.AI, cs.CY, and cs.HC

Abstract: The growing adoption of synthetic data in healthcare is driven by privacy concerns, limited access to real-world data, and the high cost of annotation. This work explores the use of synthetic Prolonged Exposure (PE) therapeutic conversations for Post-Traumatic Stress Disorder (PTSD) as a scalable alternative for training and evaluating clinical models. We systematically compare real and synthetic dialogues using linguistic, structural, and protocol-specific metrics, including turn-taking patterns and treatment fidelity. We also introduce and evaluate PE-specific metrics derived from linguistic analysis and semantic modeling, offering a novel framework for assessing clinical fidelity beyond surface fluency. Our findings show that although synthetic data holds promise for mitigating data scarcity and protecting patient privacy, it can struggle to capture the subtle dynamics of therapeutic interactions. Synthetic therapy dialogues closely match structural features of real-world conversations (e.g., speaker switch ratio: 0.98 vs. 0.99); however, they may not adequately reflect key fidelity markers (e.g., distress monitoring). We highlight gaps in existing evaluation frameworks and advocate for fidelity-aware metrics that go beyond surface fluency to uncover clinically significant failures. Our findings clarify where synthetic data can effectively complement real-world datasets -- and where critical limitations remain.

Summary

Evaluation of Synthetic Therapeutic Dialogues in PTSD Prolonged Exposure Therapy

This paper presents an empirical investigation into the fidelity of synthetic therapeutic dialogues, specifically within the domain of Prolonged Exposure (PE) therapy for PTSD. With the increasing deployment of synthetic data in healthcare for reasons related to privacy, cost-efficiency, and data scarcity, this paper critically evaluates the ability of LLMs to generate realistic therapy conversations. The comparison of real and synthetic dialogues is conducted through a multi-layered framework involving linguistic, structural, and protocol-specific metrics.

Methodological Approach

The paper employs a dataset comprising 200 real-world and 200 synthetic PE therapy sessions. The real sessions, sourced under strict ethical guidelines and participant consent, are transcribed and processed for analysis. Synthetic dialogues are generated using the Thousand Voices of Trauma dataset, which utilizes Claude Sonnet 3.5 with specialized PE-specific prompting frameworks. Key analyses include assessments of conversational dynamics such as turn-taking patterns, protocol adherence, and linguistic complexity.

Findings and Analysis

  1. System-Level Metrics:
    • Significant alignment between real and synthetic dialogues was observed in structural features like normalized speaker switches (0.98 vs. 0.99) and therapist-client turn ratios (0.01 vs. 0.01). This indicates that structural patterns of real conversations are effectively replicated.
    • However, synthetic dialogues exhibited reduced utterance variability, with shorter and more consistent response lengths (synthetic: 22.9 ± 1.7 vs. real: 68.7 ± 26.6). This points to a certain limitation in expressing linguistic variety and depth.
  2. Correlation and Statistical Significance:
    • Strong correlations in metrics like normalized speaker switches and utterance length reveal consistency in structural mimicry, whereas low correlation in vocabulary richness and flow entropy suggests room for improvement in capturing the natural expressiveness observed in real dialogues.
    • Statistically significant differences in utterance length and lexical richness highlight that synthetic sessions effectively recreate structural aspects but still lack the complete conversational nuance of real interactions.
  3. PE-Specific Metrics:
    • The synthetic data achieved strong alignment with real data in several core metrics, such as trauma narrative coherence and emotional engagement (p < 0.001). Nevertheless, areas like emotional habituation and SUDS (Subjective Units of Distress Scale) progression did not reach statistical significance, suggesting potential areas for model enhancement.

Practical and Theoretical Implications

The findings suggest that while synthetic dialogues replicate structural features well, they fall short in fully capturing the emotional and narrative depth typical of real PE therapy interactions. These results highlight both opportunities and challenges for the application of LLMs in creating synthetic data for mental healthcare. Synthetic dialogues can serve as valuable supplements for training and educational purposes in clinical settings, helping to mitigate issues around data scarcity and privacy while providing a controlled environment for novice therapist training.

Conclusion

This paper offers a comprehensive view into the alignment of LLM-generated synthetic dialogues with real-world therapy sessions in the context of PTSD treatment. While synthetic data can fill gaps in data availability and preserve patient privacy, careful consideration of fidelity markers beyond surface fluency is essential to ensure the generated content maintains clinical realism. The research underscores the necessity for continued refinement in generative models to enhance their authenticity and effectiveness in therapeutic applications, advocating for ongoing advancements in fidelity-aware metrics and clinical expertise integration.

Moving forward, it is critical to improve the dynamic narrative structure and emotional depth in synthetic therapeutic dialogues, potentially by integrating longer context dependencies and emotional modeling enhancements in LLMs.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.