Abstract
Objective: To evaluate the diagnostic performance of semi-supervised learning models for aggressive prostate cancer detection on MRI compared to fully supervised models trained with additional expert annotations. Materials and methods: We used 1500 MRI scans from the PI-CAI challenge training subset. Positive scans had 220 human and 205 AI-generated annotations. The mtU-Net (proposed teacher-student semi-supervised approach) was compared to supervised (trained using only 220 human annotations) and semi-supervised (trained on human and AI-generated annotations) nnU-Net. The 205 AI-annotated scans were manually annotated, and a fully supervised model was trained. External validation was performed on a newly annotated dataset from the PROMIS study (n = 574, 403 lesions) and the Prostate158 dataset (n = 158, 126 lesions). Patient-level performance was evaluated using the area under the curve (AUC) and lesion-level detection (overlap > 0.10) using average precision (AP), along with 95% confidence Intervals (in brackets), and the DeLong test to compare AUCs against the supervised and fully supervised models. Results: The fully supervised nnU-Net showed the highest performance on the internal PI-CAI test set (AUC = 0.89 [0.87–0.91], AP = 0.65 [0.60–0.70]) and external validation datasets PROMIS (AUC = 0.68 [0.64–0.72], AP = 0.24 [0.20–0.29]) and Prostate158 (AUC = 0.87 [0.82–0.92], AP = 0.64 [0.56–0.72]), significantly outperforming the supervised baseline (p < 0.0 5). The proposed semi-supervised mtU-Net demonstrated close external validation performance on PROMIS (AUC = 0.66 [0.62–0.71], AP = 0.20 [0.16–0.25]) and Prostate158 (AUC = 0.86 [0.81–0.92], AP = 0.58 [0.49–0.67]), significantly outperforming the supervised baseline on both datasets (p = 0.047 and p = 0.014, respectively), and showing no significant difference to the fully supervised model (p = 0.199 and p = 0.702, respectively). Conclusion: In prostate MRI tumor detection, fully supervised learning performed best. However, in external validation, the semi-supervised methods demonstrated performance that approached that of the fully supervised model, proving a valuable approach when expert annotations are limited. Key Points: Question The need for extensive expert voxel-level annotations delays the development of AI-based prostate cancer diagnostic tools and their implementation in clinical practice. Findings The combination of pseudo-labeling with consistency regularization achieved performance comparable to that of fully supervised methods, demonstrating that data diversity matches the impact of expert annotation volume. Clinical relevance Semi-supervised learning reduces dependence on expert annotations while maintaining detection accuracy, enabling the development of scalable, automated diagnostic tools for prostate cancer amid growing clinical workflow demands.
| Original language | English |
|---|---|
| Pages (from-to) | 5011-5021 |
| Number of pages | 11 |
| Journal | European Radiology |
| Volume | 36 |
| Issue number | 6 |
| Early online date | 28 Jan 2026 |
| DOIs | |
| Publication status | Published - Jun 2026 |
Bibliographical note
Publisher Copyright: © The Author(s) 2026.UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Fingerprint
Dive into the research topics of 'Semi-supervised learning in prostate MRI tumor detection approaches fully supervised performance on external validation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver