TY - JOUR
T1 - Penalized regression techniques for modeling relationships between metabolites and tomato taste attributes
AU - Menendez, P
AU - Eilers, Paul
AU - Tikunov, Y
AU - Bovy, A
AU - van Eeuwijk, F
PY - 2012
Y1 - 2012
N2 - The search for models which link tomato taste attributes to their metabolic profiling, is a main challenge within the breeding programs that aim to enhance tomato flavor. In this paper, we compared such models calculated by the traditional statistical approach, stepwise regression, with models obtained by the new generation of regression techniques, known as penalized regression or regularization methods. In addition, for penalized regression, different scenarios and various model selection criteria were discussed to conclude that classical crossvalidation, selects models with many superfluous variables whereas model selection criteria such as Bayesian information criterion, seem to be more suitable, when the goal is to find parsimonious models, to explain tomato taste attributes based on metabolic information. An exhaustive comparison of the discussed methodology was done for six sensory traits, showing that the most important covariates were identified by the stepwise regression as well as by some of the penalized regression methods, despite the general disagreement on the size of the regression coefficients between them. In particular, for stepwise regression the coefficients are inflated due to their high variance which is not the case with penalized regression, showing that this new methodology, can be an alternative to obtain more accurate models.
AB - The search for models which link tomato taste attributes to their metabolic profiling, is a main challenge within the breeding programs that aim to enhance tomato flavor. In this paper, we compared such models calculated by the traditional statistical approach, stepwise regression, with models obtained by the new generation of regression techniques, known as penalized regression or regularization methods. In addition, for penalized regression, different scenarios and various model selection criteria were discussed to conclude that classical crossvalidation, selects models with many superfluous variables whereas model selection criteria such as Bayesian information criterion, seem to be more suitable, when the goal is to find parsimonious models, to explain tomato taste attributes based on metabolic information. An exhaustive comparison of the discussed methodology was done for six sensory traits, showing that the most important covariates were identified by the stepwise regression as well as by some of the penalized regression methods, despite the general disagreement on the size of the regression coefficients between them. In particular, for stepwise regression the coefficients are inflated due to their high variance which is not the case with penalized regression, showing that this new methodology, can be an alternative to obtain more accurate models.
U2 - 10.1007/s10681-011-0374-5
DO - 10.1007/s10681-011-0374-5
M3 - Article
VL - 183
SP - 379
EP - 387
JO - Euphytica
JF - Euphytica
SN - 0014-2336
IS - 3
ER -