Skip to main navigation Skip to search Skip to main content

Semi-supervised learning in prostate MRI tumor detection approaches fully supervised performance on external validation

  • Eduardo H.P. Pooch*
  • , Georgios Agrotis
  • , Lishan Cai
  • , Mark Emberton
  • , Taimur T. Shah
  • , Hashim U. Ahmed
  • , Regina G.H. Beets-Tan
  • , Sean Benson
  • , Tomas Janssen
  • , Ivo G. Schoots
  • *Corresponding author for this work
  • Netherlands Cancer Institute
  • Maastricht University Medical Centre
  • University College London
  • Imperial College London
  • Amsterdam UMC

Research output: Contribution to journalArticleAcademicpeer-review

2 Citations (Scopus)
15 Downloads (Pure)

Abstract

Objective: To evaluate the diagnostic performance of semi-supervised learning models for aggressive prostate cancer detection on MRI compared to fully supervised models trained with additional expert annotations. Materials and methods: We used 1500 MRI scans from the PI-CAI challenge training subset. Positive scans had 220 human and 205 AI-generated annotations. The mtU-Net (proposed teacher-student semi-supervised approach) was compared to supervised (trained using only 220 human annotations) and semi-supervised (trained on human and AI-generated annotations) nnU-Net. The 205 AI-annotated scans were manually annotated, and a fully supervised model was trained. External validation was performed on a newly annotated dataset from the PROMIS study (n = 574, 403 lesions) and the Prostate158 dataset (n = 158, 126 lesions). Patient-level performance was evaluated using the area under the curve (AUC) and lesion-level detection (overlap > 0.10) using average precision (AP), along with 95% confidence Intervals (in brackets), and the DeLong test to compare AUCs against the supervised and fully supervised models. Results: The fully supervised nnU-Net showed the highest performance on the internal PI-CAI test set (AUC = 0.89 [0.87–0.91], AP = 0.65 [0.60–0.70]) and external validation datasets PROMIS (AUC = 0.68 [0.64–0.72], AP = 0.24 [0.20–0.29]) and Prostate158 (AUC = 0.87 [0.82–0.92], AP = 0.64 [0.56–0.72]), significantly outperforming the supervised baseline (p < 0.0 5). The proposed semi-supervised mtU-Net demonstrated close external validation performance on PROMIS (AUC = 0.66 [0.62–0.71], AP = 0.20 [0.16–0.25]) and Prostate158 (AUC = 0.86 [0.81–0.92], AP = 0.58 [0.49–0.67]), significantly outperforming the supervised baseline on both datasets (p = 0.047 and p = 0.014, respectively), and showing no significant difference to the fully supervised model (p = 0.199 and p = 0.702, respectively). Conclusion: In prostate MRI tumor detection, fully supervised learning performed best. However, in external validation, the semi-supervised methods demonstrated performance that approached that of the fully supervised model, proving a valuable approach when expert annotations are limited. Key Points: Question The need for extensive expert voxel-level annotations delays the development of AI-based prostate cancer diagnostic tools and their implementation in clinical practice. Findings The combination of pseudo-labeling with consistency regularization achieved performance comparable to that of fully supervised methods, demonstrating that data diversity matches the impact of expert annotation volume. Clinical relevance Semi-supervised learning reduces dependence on expert annotations while maintaining detection accuracy, enabling the development of scalable, automated diagnostic tools for prostate cancer amid growing clinical workflow demands.

Original languageEnglish
Pages (from-to)5011-5021
Number of pages11
JournalEuropean Radiology
Volume36
Issue number6
Early online date28 Jan 2026
DOIs
Publication statusPublished - Jun 2026

Bibliographical note

Publisher Copyright: © The Author(s) 2026.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Fingerprint

Dive into the research topics of 'Semi-supervised learning in prostate MRI tumor detection approaches fully supervised performance on external validation'. Together they form a unique fingerprint.

Cite this