Robust and confident predictor selection in metabolomics

J.A. Hageman, B. Engel, C.H. de Vos, R. Mumm, R.D. Hall, H. Jwanro, D. Crouzillat, C. Spadone, F.A. van Eeuwijk

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

Abstract

Metabolomics is a proven tool to obtain information about differences in food stuffs and to select biochemical markers for sensory quality of food products. A valuable application of untargeted metabolomics is the selection of metabolites that are (highly) predictive for sensory or phenotypical traits for use as (bio) markers. This chapter demonstrates how to robustly select key metabolites and evaluate their predictive properties. The proposed approach constrains the number of selected metabolites, searching for an optimal number of predictive metabolites by cross-validation. This mitigates the problem of selection of spurious metabolites. It also enables straightforward use of linear regression. In the present implementation simple forward selection is used. In concert with a second cross-validation to assess the predictive power of the selected set of metabolites, the proposed method involves two leave-one-out cross-validations and will be referred to as LOO2CV. In the second leave-one-out cross-validation a multitude of regression models is generated. This offers additional information that is potentially useful for selection of key metabolites in the spirit of stability selection. The proposed LOO2CV approach is illustrated with sensory and large-scale metabolomics data from a set of 76 different cocoa liquors. The proposed approach is compared with conventional stepwise regression and stepwise regression in concert with cross-validation for evaluation of predictive power of the model.
Original languageEnglish
Title of host publicationStatistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences
EditorsS. Datta, B.J.A. Mertens
PublisherSpringer
Pages239-257
ISBN (Electronic)9783319458090
ISBN (Print)9783319458076
DOIs
Publication statusPublished - 2017

Publication series

NameFrontiers in Probability and the Statistical Sciences

Fingerprint

metabolomics
metabolites
chocolate liquor
biomarkers
foods

Cite this

Hageman, J. A., Engel, B., de Vos, C. H., Mumm, R., Hall, R. D., Jwanro, H., ... van Eeuwijk, F. A. (2017). Robust and confident predictor selection in metabolomics. In S. Datta, & B. J. A. Mertens (Eds.), Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences (pp. 239-257). (Frontiers in Probability and the Statistical Sciences). Springer. https://doi.org/10.1007/978-3-319-45809-0_13
Hageman, J.A. ; Engel, B. ; de Vos, C.H. ; Mumm, R. ; Hall, R.D. ; Jwanro, H. ; Crouzillat, D. ; Spadone, C. ; van Eeuwijk, F.A. / Robust and confident predictor selection in metabolomics. Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences. editor / S. Datta ; B.J.A. Mertens. Springer, 2017. pp. 239-257 (Frontiers in Probability and the Statistical Sciences).
@inbook{8d93ee8bf0994ebb9416e6022653d039,
title = "Robust and confident predictor selection in metabolomics",
abstract = "Metabolomics is a proven tool to obtain information about differences in food stuffs and to select biochemical markers for sensory quality of food products. A valuable application of untargeted metabolomics is the selection of metabolites that are (highly) predictive for sensory or phenotypical traits for use as (bio) markers. This chapter demonstrates how to robustly select key metabolites and evaluate their predictive properties. The proposed approach constrains the number of selected metabolites, searching for an optimal number of predictive metabolites by cross-validation. This mitigates the problem of selection of spurious metabolites. It also enables straightforward use of linear regression. In the present implementation simple forward selection is used. In concert with a second cross-validation to assess the predictive power of the selected set of metabolites, the proposed method involves two leave-one-out cross-validations and will be referred to as LOO2CV. In the second leave-one-out cross-validation a multitude of regression models is generated. This offers additional information that is potentially useful for selection of key metabolites in the spirit of stability selection. The proposed LOO2CV approach is illustrated with sensory and large-scale metabolomics data from a set of 76 different cocoa liquors. The proposed approach is compared with conventional stepwise regression and stepwise regression in concert with cross-validation for evaluation of predictive power of the model.",
author = "J.A. Hageman and B. Engel and {de Vos}, C.H. and R. Mumm and R.D. Hall and H. Jwanro and D. Crouzillat and C. Spadone and {van Eeuwijk}, F.A.",
year = "2017",
doi = "10.1007/978-3-319-45809-0_13",
language = "English",
isbn = "9783319458076",
series = "Frontiers in Probability and the Statistical Sciences",
publisher = "Springer",
pages = "239--257",
editor = "S. Datta and B.J.A. Mertens",
booktitle = "Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences",

}

Hageman, JA, Engel, B, de Vos, CH, Mumm, R, Hall, RD, Jwanro, H, Crouzillat, D, Spadone, C & van Eeuwijk, FA 2017, Robust and confident predictor selection in metabolomics. in S Datta & BJA Mertens (eds), Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences. Frontiers in Probability and the Statistical Sciences, Springer, pp. 239-257. https://doi.org/10.1007/978-3-319-45809-0_13

Robust and confident predictor selection in metabolomics. / Hageman, J.A.; Engel, B.; de Vos, C.H.; Mumm, R.; Hall, R.D.; Jwanro, H.; Crouzillat, D.; Spadone, C.; van Eeuwijk, F.A.

Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences. ed. / S. Datta; B.J.A. Mertens. Springer, 2017. p. 239-257 (Frontiers in Probability and the Statistical Sciences).

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

TY - CHAP

T1 - Robust and confident predictor selection in metabolomics

AU - Hageman, J.A.

AU - Engel, B.

AU - de Vos, C.H.

AU - Mumm, R.

AU - Hall, R.D.

AU - Jwanro, H.

AU - Crouzillat, D.

AU - Spadone, C.

AU - van Eeuwijk, F.A.

PY - 2017

Y1 - 2017

N2 - Metabolomics is a proven tool to obtain information about differences in food stuffs and to select biochemical markers for sensory quality of food products. A valuable application of untargeted metabolomics is the selection of metabolites that are (highly) predictive for sensory or phenotypical traits for use as (bio) markers. This chapter demonstrates how to robustly select key metabolites and evaluate their predictive properties. The proposed approach constrains the number of selected metabolites, searching for an optimal number of predictive metabolites by cross-validation. This mitigates the problem of selection of spurious metabolites. It also enables straightforward use of linear regression. In the present implementation simple forward selection is used. In concert with a second cross-validation to assess the predictive power of the selected set of metabolites, the proposed method involves two leave-one-out cross-validations and will be referred to as LOO2CV. In the second leave-one-out cross-validation a multitude of regression models is generated. This offers additional information that is potentially useful for selection of key metabolites in the spirit of stability selection. The proposed LOO2CV approach is illustrated with sensory and large-scale metabolomics data from a set of 76 different cocoa liquors. The proposed approach is compared with conventional stepwise regression and stepwise regression in concert with cross-validation for evaluation of predictive power of the model.

AB - Metabolomics is a proven tool to obtain information about differences in food stuffs and to select biochemical markers for sensory quality of food products. A valuable application of untargeted metabolomics is the selection of metabolites that are (highly) predictive for sensory or phenotypical traits for use as (bio) markers. This chapter demonstrates how to robustly select key metabolites and evaluate their predictive properties. The proposed approach constrains the number of selected metabolites, searching for an optimal number of predictive metabolites by cross-validation. This mitigates the problem of selection of spurious metabolites. It also enables straightforward use of linear regression. In the present implementation simple forward selection is used. In concert with a second cross-validation to assess the predictive power of the selected set of metabolites, the proposed method involves two leave-one-out cross-validations and will be referred to as LOO2CV. In the second leave-one-out cross-validation a multitude of regression models is generated. This offers additional information that is potentially useful for selection of key metabolites in the spirit of stability selection. The proposed LOO2CV approach is illustrated with sensory and large-scale metabolomics data from a set of 76 different cocoa liquors. The proposed approach is compared with conventional stepwise regression and stepwise regression in concert with cross-validation for evaluation of predictive power of the model.

U2 - 10.1007/978-3-319-45809-0_13

DO - 10.1007/978-3-319-45809-0_13

M3 - Chapter

SN - 9783319458076

T3 - Frontiers in Probability and the Statistical Sciences

SP - 239

EP - 257

BT - Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences

A2 - Datta, S.

A2 - Mertens, B.J.A.

PB - Springer

ER -

Hageman JA, Engel B, de Vos CH, Mumm R, Hall RD, Jwanro H et al. Robust and confident predictor selection in metabolomics. In Datta S, Mertens BJA, editors, Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry; Frontiers in Probability and the Statistical Sciences. Springer. 2017. p. 239-257. (Frontiers in Probability and the Statistical Sciences). https://doi.org/10.1007/978-3-319-45809-0_13