TY - JOUR
T1 - Transforming and evaluating the UK Biobank to the OMOP Common Data Model for COVID-19 research and beyond
AU - Papez, Vaclav
AU - Moinat, Maxim
AU - Voss, Erica A.
AU - Bazakou, Sofia
AU - Van Winzum, Anne
AU - Peviani, Alessia
AU - Payralbe, Stefan
AU - Lara, Elena Garcia
AU - Kallfelz, Michael
AU - Asselbergs, Folkert W.
AU - Prieto-Alhambra, Daniel
AU - Dobson, Richard J.B.
AU - Denaxas, Spiros
N1 - Publisher Copyright:
© 2022 The Author(s). Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2023/1
Y1 - 2023/1
N2 - Objective: The coronavirus disease 2019 (COVID-19) pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDMs) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500â Š000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM. Materials and Methods: We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHRs) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data. Results: We identified 502â Š505 participants (3086 with COVID-19) and transformed 690 fields (1â Š373â Š239â Š555 rows) to the OMOP CDM using 8 different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported noncancer illnesses 946â Š053 (83.91% of all source entries), cancers 37â Š802 (70.81%), medications 1â Š218â Š935 (88.25%), and prescriptions 864â Š788 (86.96%). In EHR, we transformed 13â Š028â Š182 (99.95%) hospital diagnoses, 6â Š465â Š399 (89.2%) procedures, 337â Š896â Š333 primary care diagnoses (CTV3, SNOMED-CT), 139â Š966â Š587 (98.74%) prescriptions (dm+d) and 77â Š127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data. Discussion and Conclusion: Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.
AB - Objective: The coronavirus disease 2019 (COVID-19) pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDMs) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500â Š000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM. Materials and Methods: We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHRs) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data. Results: We identified 502â Š505 participants (3086 with COVID-19) and transformed 690 fields (1â Š373â Š239â Š555 rows) to the OMOP CDM using 8 different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported noncancer illnesses 946â Š053 (83.91% of all source entries), cancers 37â Š802 (70.81%), medications 1â Š218â Š935 (88.25%), and prescriptions 864â Š788 (86.96%). In EHR, we transformed 13â Š028â Š182 (99.95%) hospital diagnoses, 6â Š465â Š399 (89.2%) procedures, 337â Š896â Š333 primary care diagnoses (CTV3, SNOMED-CT), 139â Š966â Š587 (98.74%) prescriptions (dm+d) and 77â Š127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data. Discussion and Conclusion: Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.
UR - http://www.scopus.com/inward/record.url?scp=85159497595&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocac203
DO - 10.1093/jamia/ocac203
M3 - Article
C2 - 36227072
AN - SCOPUS:85159497595
SN - 1067-5027
VL - 30
SP - 103
EP - 111
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 1
ER -