Skip to main navigation Skip to search Skip to main content

An extraction pipeline for analysis of hematopoietic stem cell transplantation data

  • Erik G.J. von Asmuth*
  • , Constantijn J.M. Halkes
  • , Jurjen Versluis
  • , Dirk Jan A. Eikema
  • , Emanuele Angelucci
  • , Ali Bazarbachi
  • , Fabio Ciceri
  • , Raffaella Greco
  • , Mette Hazenberg
  • , Krzysztof Kałwak
  • , Donal P. McLornan
  • , Bénédicte Neven
  • , Antonio M. Risitano
  • , Mirjam Steinbuch
  • , Anna Sureda
  • , John Snowden
  • , Arjan C. Lankester
  • , Hein Putter
  • , Liesbeth C. de Wreede
  • *Corresponding author for this work
  • Leiden University
  • EBMT - European Society for Blood and Marrow Transplantation
  • San Martino Hospital Genoa
  • American University of Beirut
  • IRCCS Ospedale San Raffaele
  • University of Amsterdam
  • Wrocław Medical University
  • University College London Hospital
  • Institut Imagine
  • Azienda Ospedaliera di Rilievo Nazionale “San Giuseppe Moscati” (A.O.R.N. Giuseppe Moscati)
  • EBMT Executive Office
  • Institute Catala Oncologia
  • University of Sheffield

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Many clinical studies are based on registry analyses, but exact approaches of data extraction and pre-processing are rarely included, while this is critical for reliability and reproducibility of results. We aimed to develop an open-source data extraction pipeline which generates a ready-to-analyze dataset focused on relevant determinants of outcomes after hematopoietic stem cell transplantation (HSCT). This pipeline was developed using EBMT registry data, including 54,457 allogeneic and 63,651 autologous HSCT procedures. The pipeline determines HLA matching from molecular data, assesses cytogenetic risk for acute myeloid leukemia and myelodysplastic syndrome, processes molecular markers, assigns the hematopoietic cell transplantation comorbidity index (HCT-CI) based on comorbidities, and maps disease states to simplified categories. We prospectively assessed the recently developed disease risk stratification system (DRSS), showing that the pipeline produces consistent results with previous studies. The hazard ratio correlation between our cohort and the original DRSS derivation cohort was 0.92 with a 2-year AUC of 0.616, indicating similar effects and predictive performance. We aim to establish a new standard by promoting transparent, standardized and uniform extraction of registry data, enhancing reproducibility in registry studies.

Original languageEnglish
Pages (from-to)400-407
Number of pages8
JournalBone Marrow Transplantation
Volume61
Issue number4
DOIs
Publication statusPublished - Apr 2026

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature Limited 2026.

Fingerprint

Dive into the research topics of 'An extraction pipeline for analysis of hematopoietic stem cell transplantation data'. Together they form a unique fingerprint.

Cite this