Predicting population-level vulnerability among pregnant women using routinely collected data and the added relevance of self-reported data

Joyce M. Molenaar, Ka Yin Leung, Lindsey van der Meer, Peter Paul F. Klein, Jeroen N. Struijs, Jessica C. Kiefte-de Jong

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)
4 Downloads (Pure)

Abstract

Recognizing and addressing vulnerability during the first thousand days of life can prevent health inequities. It is necessary to determine the best data for predicting multidimensional vulnerability (i.e. risk factors to vulnerability across different domains and a lack of protective factors) at population level to understand national prevalence and trends. This study aimed to (1) assess the feasibility of predicting multidimensional vulnerability during pregnancy using routinely collected data, (2) explore potential improvement of these predictions by adding self-reported data on health, well-being, and lifestyle, and (3) identify the most relevant predictors. The study was conducted using Dutch nationwide routinely collected data and self-reported Public Health Monitor data. First, to predict multidimensional vulnerability using routinely collected data, we used random forest (RF) and considered the area under the curve (AUC) and F1 measure to assess RF model performance. To validate results, sensitivity analyses (XGBoost and Lasso) were done. Second, we gradually added self-reported data to predictions. Third, we explored the RF model's variable importance. The initial RF model could distinguish between those with and without multidimensional vulnerability (AUC = 0.98). The model was able to correctly predict multidimensional vulnerability in most cases, but there was also misclassification (F1 measure = 0.70). Adding self-reported data improved RF model performance (e.g. F1 measure = 0.80 after adding perceived health). The strongest predictors concerned self-reported health, socioeconomic characteristics, and healthcare expenditures and utilization. It seems possible to predict multidimensional vulnerability using routinely collected data that is readily available. However, adding self-reported data can improve predictions.

Original languageEnglish
Pages (from-to)1210-1217
Number of pages8
JournalEuropean Journal of Public Health
Volume34
Issue number6
DOIs
Publication statusPublished - 1 Dec 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Fingerprint

Dive into the research topics of 'Predicting population-level vulnerability among pregnant women using routinely collected data and the added relevance of self-reported data'. Together they form a unique fingerprint.

Cite this