Background: Collation of aphasia research data across settings, countries and study designs using big data principles will support analyses across different language modalities, levels of impairment, and therapy interventions in this heterogeneous population. Big data approaches in aphasia research may support vital analyses, which are unachievable within individual trial datasets. However, we lack insight into the requirements for a systematically created database, the feasibility and challenges and potential utility of the type of data collated. Aim: To report the development, preparation and establishment of an internationally agreed aphasia after stroke research database of individual participant data (IPD) to facilitate planned aphasia research analyses. Methods: Data were collated by systematically identifying existing, eligible studies in any language (≥10 IPD, data on time since stroke, and language performance) and included sourcing from relevant aphasia research networks. We invited electronic contributions and also extracted IPD from the public domain. Data were assessed for completeness, validity of value-ranges within variables, and described according to pre-defined categories of demographic data, therapy descriptions, and language domain measurements. We cleaned, clarified, imputed and standardised relevant data in collaboration with the original study investigators. We presented participant, language, stroke, and therapy data characteristics of the final database using summary statistics. Results: From 5256 screened records, 698 datasets were potentially eligible for inclusion; 174 datasets (5928 IPD) from 28 countries were included, 47/174 RCT datasets (1778 IPD) and 91/174 (2834 IPD) included a speech and language therapy (SLT) intervention. Participants’ median age was 63 years (interquartile range [53, 72]), 3407 (61.4%) were male and median recruitment time was 321 days (IQR 30, 1156) after stroke. IPD were available for aphasia severity or ability overall (n = 2699; 80 datasets), naming (n = 2886; 75 datasets), auditory comprehension (n = 2750; 71 datasets), functional communication (n = 1591; 29 datasets), reading (n = 770; 12 datasets) and writing (n = 724; 13 datasets). Information on SLT interventions were described by theoretical approach, therapy target, mode of delivery, setting and provider. Therapy regimen was described according to intensity (1882 IPD; 60 datasets), frequency (2057 IPD; 66 datasets), duration (1960 IPD; 64 datasets) and dosage (1978 IPD; 62 datasets). Discussion: Our international IPD archive demonstrates the application of big data principles in the context of aphasia research; our rigorous methodology for data acquisition and cleaning can serve as a template for the establishment of similar databases in other research areas.
This project was funded by the National Institute for Health Research (NIHR) Health Services and
Delivery Research [14/04/22], and The Tavistock Trust for Aphasia and will be published in full in the
Health Services and Delivery Research Journal. Further information is available at www.journal
slibrary.nihr.ac.u/programmes/hsdr/140422#/NMAHPRU. NMAHPRU and MCB is funded by the APHASIOLOGY 529
Chief Scientist Office (CSO), Scottish Government Health and Social Care Directorates. The views expressed are those of the authors and not necessarily those of the funders.
Publisher Copyright: © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.