Background The application of disability weights by nature of injury is central to the calculation of disability-adjusted life years (DALYs). Such weights should represent injury diagnosis groups that demonstrate homogeneity in disability outcomes. Existing classifications have not used empirical data in their development to inform groups that are homogeneous for disability outcomes, limiting the capacity to make informed recommendations for best practice in measuring injury burden. Methods The Validating and Improving injury Burden Estimates (Injury-VIBES) Study includes pooled data from over 30 000 injured participants recruited to six cohort studies. The International Classification of Disease 10th Revision (ICD-10) diagnosis codes were mapped to existing injury burden study groupings and prediction models were developed to measure the capacity of the injury groupings and ICD-10 diagnoses to predict disability outcomes at 12 months. Models were adjusted for age, gender and data source and investigated for discrimination using area under the receiver operating characteristic curve (AUC) and calibration using Hosmer-Lemeshow statistics and calibration curves. Results Discrimination and calibration of models varied depending on the outcome measured. Models using full four-character ICD-10 diagnosis codes, rather than groupings of codes, demonstrated the highest discrimination ranging from an AUC (95% CI) of 0.627 (0.618 to 0.635) for the pain or discomfort item of the EQ-5D to 0.847 (0.841 to 0.853) for the extended Glasgow Outcome Scale independent living outcome. However, gain over other groupings was marginal. Conclusions Prediction performance was best for measures of function such as independent living, mobility and self-care. The classifications were poorer predictors of anxiety/depression and pain/discomfort. There was no clearly superior classification.