Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

157 tokens/sec

GPT-4o

8 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

GPT4DFCI: AI at Dana-Farber Cancer Institute

Updated 2 July 2025

GPT4DFCI is a secure generative AI platform that integrates state-of-the-art GPT-4 models into academic medical workflows while adhering to strict legal and privacy standards.
It underwent rigorous red teaming with cross-sector experts to expose risks like copyright infringement and data leakage in sensitive content areas.
The platform’s layered mitigation strategies, including explicit anti-infringement prompts, exemplify best practices for regulated AI deployments in academic settings.

GPT4DFCI is an internal generative AI application developed and deployed at the Dana-Farber Cancer Institute (DFCI), in partnership with Microsoft, utilizing state-of-the-art GPT-4 class models (including GPT-4o and o1) to support institutional workflows in an academic medical center setting. GPT4DFCI exemplifies the operationalization of LLMs in environments where legal, ethical, and data privacy requirements are paramount.

1. System Purpose and Deployment Context

GPT4DFCI was conceived as a secure, hospital-compliant generative AI platform aimed at facilitating various knowledge work tasks for clinicians, researchers, and staff within DFCI. Its deployment targets the unique challenges faced by academic medical centers, notably balancing the transformative potential of LLM technology with regulatory obligations including HIPAA and copyright law. Access to GPT4DFCI is controlled and subject to institutional oversight; model versions are sandboxed to comply with privacy and use-case restrictions.

2. Red Teaming Methodology and Test Scope

A structured red teaming event was conducted to systematically probe GPT4DFCI for risks of copyright infringement and data leakage (2506.22523). This exercise engaged 42 experts—spanning academia, medicine, industry, and government—organized into four teams. The teams were provided access to production-grade GPT-4o and o1 endpoints, with model usage guards temporarily relaxed to maximize adversarial probing efficacy.

Four content categories were explicitly targeted:

Famous published books (e.g., Harry Potter and the Sorcerer’s Stone, The Hitchhiker’s Guide to the Galaxy)
Paywalled news articles
High-impact scientific publications
Electronic health records drawn from de-identified notes (MIMIC-IV database)

Participants crafted direct and indirect prompts to test both surface-level and sophisticated extraction pathways, recording all outputs and failures for institutional review.

3. Findings: Copyright Reproduction and Model Behavior

Red teaming identified isolated but concrete instances where GPT4DFCI reproduced short, copyrighted book excerpts verbatim. Notably:

Book dedications (e.g., the dedication page from Harry Potter) and well-known phrases could be elicited through indirect or translated prompts.
Certain famous lines (e.g., the “Beware of the Leopard” passage from The Hitchhiker’s Guide to the Galaxy) were output accurately when sufficiently oblique prompts were provided.

In contrast, the model:

Consistently failed to reproduce content from news articles and scientific papers beyond generic summaries or explanations, even with prompt engineering intended to bypass safeguards.
Did not return verbatim text or protected health information from clinical electronic health records, instead fabricating plausible but incorrect data when content was missing or inaccessible.
Demonstrated a pattern in which short, culture-pervasive content was more easily reproduced than longer or less-circulated textual material.

The model also exhibited occasional hallucination—fabricating details or attributions when unable to retrieve information (such as inventing factual elements for EHRs or misattributing journalists in news content).

4. Content-Type Analysis and Security Outcomes

A structured breakdown of attack results is summarized below:

Task	Test Material	Goal	Outcome
Books	Famous novels	Verbose reproduction	Partial: short well-known excerpts retrievable
News	Paywalled/influential articles	Quotes/full text	Fail: only general summaries, no verbatim text
Scientific	Seminal research papers	Passages/figures	Fail: only general discussion, no copying
EHR	De-identified clinical notes (MIMIC)	PHI/exact text	Fail: no direct matches, fabricated info

This evidence suggests that while LLMs like those backing GPT4DFCI can, under certain prompting, return isolated copyrighted phrases that are commonly disseminated, they are substantially less prone to reproduce less-circulated or more structurally complex proprietary works. Notably, the model maintained the privacy of clinical material and scientific articles within the test scope.

5. Risk Mitigation: Post-Red Teaming Protocols

Following the red teaming event, GPT4DFCI implemented a mitigation strategy as of version 2.8.2 (January 21, 2025), prepending every user prompt with an explicit meta-instruction:

Avoid copyright infringement.

This directive is designed to reinforce built-in model guardrails by adding an additional, explicit safety policy layer at inference time. This measure is intended as a defense-in-depth approach, operating alongside other technical and institutional privacy and copyright protections, and is modifiable as new risks emerge.

No mathematical or LaTeX-based technical analyses were employed in the mitigation process; all countermeasures were grounded in explicit instruction and governance protocol layering.

6. Implications for Academic Medical Centers

The GPT4DFCI experience yields several insights for the academic medical and research community regarding LLM deployment:

Models trained on large-scale, publicly sourced corpora may recall and reproduce short, well-known copyrighted materials, although wholesale extraction of complex or specialized texts appears strongly limited.
Proactive adversarial testing (red teaming) is crucial for surfacing edge cases that may not be anticipated through documentation or routine usage.
Institutional AI deployments must combine model-level prompts, technical controls, and ongoing expert review to limit legal and privacy risks.
GPT4DFCI's failure to reproduce verbatim EHR and scientific article content, and to avoid PHI disclosure, provides some evidence supporting the efficacy of layered institutional safeguards for privacy compliance.
The continued potential for hallucination and misattribution, while not leading to direct legal breach, may impact information integrity and must be monitored in institutional use.

A plausible implication is that, even with high compliance tooling, ongoing monitoring and adaptation of AI systems are required for both legal and ethical assurance in regulated domains.

7. Broader Governance and Future Directions

The GPT4DFCI red teaming protocol exemplifies a wider shift in responsible LLM adoption, emphasizing:

The necessity of institutionally managed, contextually aware AI governance for high-stakes use cases.
The value of transparent documentation and collaborative stress-testing between developers, legal, and clinical experts for continuous risk mitigation.
The need for further research into context-dependent model behavior, attack surface evolution, and policy refinement as LLMs evolve and their capabilities expand.
The significance of explicit, transparent mitigations (such as anti-infringement meta-prompts) and their evaluation as part of AI safety engineering.

The experience at Dana-Farber highlights shared responsibility between model providers and institutional stewards for upholding privacy, copyright, and clinical safety as LLMs are more deeply integrated into academic medicine and allied disciplines.

PDF Markdown Chat (Upgrade)

References (1)

Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center (2025)