DeepSeek Janus-Pro Model
- The paper introduces a domain-specific MLLM using a 1B-parameter lightweight architecture fine-tuned on 10,000 images to enhance radiology report accuracy.
- The clinical trial demonstrated statistically significant improvements in report quality, reduced interpretation time, and higher expert preference over standard care.
- Workflow integration and open-source release enable scalable deployment in resource-constrained settings, outperforming larger generalist models.
The DeepSeek Janus-Pro model is a domain-specific, multimodal LLM (MLLM) designed to generate structured radiology reports from chest X-ray images (CXRs). Janus-Pro forms the basis of Janus-Pro-CXR, an AI-powered chest radiograph interpretation system that has undergone rigorous evaluation in a prospective, multicenter, randomized reader trial for clinical deployment (Bai et al., 23 Dec 2025). Distinct from general-purpose MLLMs, Janus-Pro-CXR leverages lightweight architecture and targeted optimization for clinical use, outperforming state-of-the-art generative and detection models in both subjective and objective reporting tasks, with special emphasis on workflow integration in resource-constrained healthcare settings.
1. Model Architecture and Domain Adaptation
Janus-Pro-CXR employs a 1B-parameter architecture, enabling efficient inference on commodity hardware (e.g., a single RTX 4060 GPU with 8 GB memory). The model processes DICOM-format CXRs in conjunction with minimal clinical history to generate radiology report drafts. Fine-tuning using approximately 10,000 domain-specific images—substantially lower than requirements for generalist MLLMs—enabled effective adaptation to chest radiography without the need for extensive AI-engineering infrastructure. The architecture and implementation framework have been committed to open source for broad clinical translation.
This lightweight, domain-specialized approach contrasts with larger models such as ChatGPT-4o (200B parameters), while yielding superior performance within the target clinical task set. A plausible implication is that task-specific data curation and adaptation play a more critical role than model size for specialized medical deployment (Bai et al., 23 Dec 2025).
2. Clinical Trial Design and Methodology
The clinical utility of Janus-Pro-CXR was evaluated in the NCT07117266 prospective, randomized, multicenter reader trial, which enrolled 296 adult patients across three tertiary hospitals in China: Union Hospital (Wuhan), The First Affiliated Hospital of Zhengzhou University (Zhengzhou), and The First Affiliated Hospital of University of Science and Technology of China (Hefei). Inclusion required adult patients with clinically suspected thoracic disease, written informed consent, full clinical data, and posteroanterior CXRs; key exclusions were poor image quality and pregnancy/lactation.
Each case’s image was interpreted by two junior radiologists—one using Janus-Pro-CXR assistance, one unassisted—yielding paired reports for direct within-subject comparison. The sample size (N=296, 592 reports) resulted from a priori power analysis targeting a mean report quality score difference of 0.25, SD ≈ 0.65, significance α = 0.05, and power β = 0.10. Secondary retrospective experiments used held-out test sets to benchmark Janus-Pro-CXR against both Janus-Pro and ChatGPT-4o baseline models.
3. Evaluation Metrics and Statistical Analysis
The study defined several primary and secondary endpoints:
- Report Quality Score: Five-point Likert scale (1=poor, 5=excellent) encompassing completeness, clarity, and clinical relevance, adjudicated by a panel of five senior radiologists.
- Agreement Score (RADPEER): Adapted RADPEER system, scale inverted so higher numbers correspond to better agreement with ground truth.
- Pairwise Expert Preference: Proportion of cases where ≥3/5 experts preferred the AI-assisted report.
- Interpretation Time: Seconds from case opening to report submission.
Key formulas included paired t-tests for continuous outcomes and the Wilson score interval for proportions. ROC-AUC and F1-scores characterized detection of six predetermined radiographic findings, while Cohen's kappa quantified inter-rater consistency. No multivariate models were used (paired and within-subject analyses predominated).
4. Report Generation and Diagnostic Performance
Janus-Pro-CXR demonstrated statistically significant improvements in prospective clinical deployment:
| Metric | AI-assisted | Standard Care | Mean Difference (95% CI) |
|---|---|---|---|
| Report Quality (Likert, mean ± SD) | 4.36 ± 0.50 | 4.12 ± 0.80 | 0.25 [0.216, 0.283]; P < 0.001 |
| Agreement Score (mean ± SD) | 4.30 ± 0.57 | 4.14 ± 0.84 | 0.16 [0.119, 0.200]; P < 0.001 |
| Interpretation Time (seconds, mean ± SD) | 120.6 ± 45.6 | 147.6 ± 51.1 | -27.0 s (18.3% reduction); P < 0.001 |
AI-assisted reports were preferred by experts in 54.3% of cases (95% CI [48.4%, 60.1%]). In complex cases (≥3 findings), interpretation time was reduced by 16.4% (32.5 s), highlighting the model's value for high-complexity cases.
Retrospective evaluation (n=300) confirmed the superiority of Janus-Pro-CXR over both Janus-Pro and ChatGPT-4o for report quality and expert preference rates. On the CXR-27 test set (n=1,026), Janus-Pro-CXR achieved AUC >0.80 for all six critical findings (support devices, pleural effusion, pneumothorax, atelectasis, consolidation, cardiomegaly), with F1 scores from 0.278 (consolidation) to 0.727 (support devices); Cohen's κ ranged from 0.70–0.85, signifying substantial inter-rater agreement.
5. Workflow Integration and Operational Impact
Janus-Pro-CXR integrates into radiology workflows by automatically receiving DICOM images and brief clinical histories over local networks, generating reports within 1–2 seconds and delivering them to radiologist workstations in under 3 seconds. Junior radiologists incorporate the AI-generated draft into the hospital information system, revising as necessary for final sign-off by senior staff.
A lightweight 1B-parameter configuration allows deployment on standard workstations, facilitating adoption even in primary care environments lacking specialized computational infrastructure. The observed workflow improvement—27 seconds saved per case—translates to approximately 90 minutes saved per 200-case day, with potential reallocations to complex case review or mitigation of clinical fatigue.
Preference analysis cited completeness of clinical description (75%) and clarity of diagnostic impressions (68%) as primary reasons for favoring AI-assisted drafts. Expert blinded assessment found that Janus-Pro-CXR outputs were often indistinguishable from published reference reports.
6. Scalability, Domain-Specificity, and Implementation
Janus-Pro-CXR requires only modest data (10,000 images) for fine-tuning to new clinical domains, supporting rapid deployment and adaptation by centers with limited AI-engineering resources. The open-sourcing of model architecture and inference tools enables independent validation and adoption, fostering further research. The model's robust performance in randomized clinical trials and in comparison to much larger generalist LLMs—combined with real-world deployment parameters—establishes Janus-Pro-CXR as a scalable solution for chest radiography reporting in resource-limited environments.
7. Significance and Future Directions
The demonstration of improved report quality, diagnostic reliability, and workflow efficiency in rigorously validated, real-world clinical settings constitutes a substantive advance in applied artificial intelligence for medical imaging. The Janus-Pro paradigm underscores the importance of lightweight, domain-optimized LLMs for high-stakes clinical tasks and suggests that targeted adaptation can surpass generic, large-scale models for practical deployment (Bai et al., 23 Dec 2025). The open-source release is expected to accelerate clinical translation and promote independent benchmarking. A plausible implication is the emergence of specialized MLLMs as preferred solutions for high-volume, resource-constrained clinical subfields.