TY - JOUR
T1 - Standardised and Reproducible Phenotyping Using Distributed Analytics and Tools in the Data Analysis and Real World Interrogation Network (DARWIN EU)
AU - Dernie, Francesco
AU - Corby, George
AU - Robinson, Abigail
AU - Bezer, James
AU - Mercade-Besora, Nuria
AU - Griffier, Romain
AU - Verdy, Guillaume
AU - Leis, Angela
AU - Ramirez-Anguita, Juan Manuel
AU - Mayer, Miguel A.
AU - Brash, James T.
AU - Seager, Sarah
AU - Parry, Rowan
AU - Jodicke, Annika
AU - Duarte-Salles, Talita
AU - Rijnbeek, Peter R.
AU - Verhamme, Katia
AU - Pacurariu, Alexandra
AU - Morales, Daniel
AU - Pinheiro, Luis
AU - Prieto-Alhambra, Daniel
AU - Prats-Uribe, Albert
N1 - Publisher Copyright:
© 2024 The Author(s). Pharmacoepidemiology and Drug Safety published by John Wiley & Sons Ltd.
PY - 2024/11
Y1 - 2024/11
N2 - Purpose: The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU). We illustrate the use of this framework by generating phenotypes for two diseases: pancreatic cancer and systemic lupus erythematosus (SLE). Methods: The phenotyping process involves a 14-steps process based on a standard operating procedure co-created by the DARWIN EU Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model. Results: Codelists were generated for both pancreatic cancer and SLE, and cohorts were generated in six OMOP-mapped databases. Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature, despite significant inter-database variability. Co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge. Conclusions: Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the resulting cohorts. Wider use of structured and reproducible phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.
AB - Purpose: The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU). We illustrate the use of this framework by generating phenotypes for two diseases: pancreatic cancer and systemic lupus erythematosus (SLE). Methods: The phenotyping process involves a 14-steps process based on a standard operating procedure co-created by the DARWIN EU Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model. Results: Codelists were generated for both pancreatic cancer and SLE, and cohorts were generated in six OMOP-mapped databases. Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature, despite significant inter-database variability. Co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge. Conclusions: Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the resulting cohorts. Wider use of structured and reproducible phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.
UR - http://www.scopus.com/inward/record.url?scp=85208797359&partnerID=8YFLogxK
U2 - 10.1002/pds.70042
DO - 10.1002/pds.70042
M3 - Article
C2 - 39532529
AN - SCOPUS:85208797359
SN - 1053-8569
VL - 33
JO - Pharmacoepidemiology and Drug Safety
JF - Pharmacoepidemiology and Drug Safety
IS - 11
M1 - e70042
ER -