Abstract
Many clinical studies are based on registry analyses, but exact approaches of data extraction and pre-processing are rarely included, while this is critical for reliability and reproducibility of results. We aimed to develop an open-source data extraction pipeline which generates a ready-to-analyze dataset focused on relevant determinants of outcomes after hematopoietic stem cell transplantation (HSCT). This pipeline was developed using EBMT registry data, including 54,457 allogeneic and 63,651 autologous HSCT procedures. The pipeline determines HLA matching from molecular data, assesses cytogenetic risk for acute myeloid leukemia and myelodysplastic syndrome, processes molecular markers, assigns the hematopoietic cell transplantation comorbidity index (HCT-CI) based on comorbidities, and maps disease states to simplified categories. We prospectively assessed the recently developed disease risk stratification system (DRSS), showing that the pipeline produces consistent results with previous studies. The hazard ratio correlation between our cohort and the original DRSS derivation cohort was 0.92 with a 2-year AUC of 0.616, indicating similar effects and predictive performance. We aim to establish a new standard by promoting transparent, standardized and uniform extraction of registry data, enhancing reproducibility in registry studies.
| Original language | English |
|---|---|
| Pages (from-to) | 400-407 |
| Number of pages | 8 |
| Journal | Bone Marrow Transplantation |
| Volume | 61 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - Apr 2026 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive licence to Springer Nature Limited 2026.
Fingerprint
Dive into the research topics of 'An extraction pipeline for analysis of hematopoietic stem cell transplantation data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver