When comparing prediction models, it is essential to estimate the magnitude of change in performance rather than rely solely on statistical significance. In this paper we investigate measures that estimate change in classification performance, assuming 2-group classification based on a single risk threshold. We study the value of a new biomarker when added to a baseline risk prediction model. First, simulated data are used to investigate the change in sensitivity and specificity (Se and Sp). Second, the influence of Se and Sp on the net reclassification improvement (NRI; sum of Se and Sp) and on decision-analytic measures (net benefit or relative utility) is studied. We assume normal distributions for the predictors and assume correctly specified models such that the extended model has a dominating receiver operating characteristic curve relative to the baseline model. Remarkably, we observe that even when a strong marker is added it is possible that either sensitivity (for thresholds below the event rate) or specificity (for thresholds above the event rate) decreases. In these cases, decision-analytic measures provide more modest support for improved classification than NRI, even though all measures confirm that adding the marker improved classification accuracy. Our results underscore the necessity of reporting Se and Sp separately. When a single summary is desired, decision-analytic measures allow for a simple incorporation of the misclassification costs.