Background. Calibration (estimation of model parameters) compares model outcomes with observed outcomes and explores possible model parameter values to identify the set of values that provides the best fit to the data. The goodness-of-fit (GOF) criterion quantifies the difference between model and observed outcomes. There is no consensus on the most appropriate GOF criterion, because a direct performance comparison of GOF criteria in model calibration is lacking. Methods. We systematically compared the performance of commonly used GOF criteria (sum of squared errors [SSE], Pearson chi-square, and a likelihood-based approach [Poisson and/or binomial deviance functions]) in the calibration of selected parameters of the MISCAN-Colon microsimulation model for colorectal cancer. The performance of each GOF criterion was assessed by comparing the 1) root mean squared prediction error (RMSPE) of the selected parameters, 2) computation time of the calibration procedure of various calibration scenarios, and 3) impact on estimated cost-effectiveness ratios. Results. The likelihood-based deviance resulted in the lowest RMSPE in 4 of 6 calibration scenarios and was close to best in the other 2. The SSE had a 25 times higher RMSPE in a scenario with considerable differences in the values of observed outcomes, whereas the Pearson chi-square had a 60 times higher RMSPE in a scenario with multiple studies measuring the same outcome. In all scenarios, the SSE required the most computation time. The likelihood-based approach estimated the cost-effectiveness ratio most accurately (up to 20.15% relative difference versus 0.44% [SSE] and 13% [Pearson chi-square]). Conclusions. The likelihood-based deviance criteria lead to accurate estimation of parameters under various circumstances. These criteria are recommended for calibration in microsimulation disease models in contrast with other commonly used criteria.