Simplivariate Models: Ideas and First Examples

J.A. Hageman, M.M.W.B. Hendriks, J.A. Westerhuis, M.J. van der Werf, R. Berger, A.K. Smilde

Research output: Contribution to journalArticleAcademicpeer-review

16 Citations (Scopus)

Abstract

One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data. A more useful distinction is in terms of informative and non-informative variation where informative relates to the problem being studied. In most common methods for analyzing metabolomics (or any other high-dimensional x-omics) data this distinction is ignored thereby severely hampering the results of the analysis. This leads to poorly interpretable models and may even obscure the relevant biological information. We developed a framework from first data analysis principles by explicitly formulating the problem of analyzing metabolomics data in terms of informative and non-informative parts. This framework allows for flexible interactions with the biologists involved in formulating prior knowledge of underlying structures. The basic idea is that the informative parts of the complex metabolomics data are approximated by simple components with a biological meaning, e.g. in terms of metabolic pathways or their regulation. Hence, we termed the framework ‘simplivariate models’ which constitutes a new way of looking at metabolomics data. The framework is given in its full generality and exemplified with two methods, IDR analysis and plaid modeling, that fit into the framework. Using this strategy of ‘divide and conquer’, we show that meaningful simplivariate models can be obtained using a real-life microbial metabolomics data set. For instance, one of the simple components contained all the measured intermediates of the Krebs cycle of E. coli. Moreover, these simplivariate models were able to uncover regulatory mechanisms present in the phenylalanine biosynthesis route of E. coli
Original languageEnglish
Article numbere3259
Number of pages11
JournalPLoS ONE
Volume3
Issue number9
DOIs
Publication statusPublished - 2008

Fingerprint

Metabolomics
metabolomics
Escherichia coli
metabolome
Citric Acid Cycle
Metabolome
tricarboxylic acid cycle
Biosynthesis
Genomics
Metabolic Networks and Pathways
Phenylalanine
phenylalanine
biologists
biochemical pathways
Noise
data analysis
experimental design
biosynthesis
genomics
organisms

Cite this

Hageman, J. A., Hendriks, M. M. W. B., Westerhuis, J. A., van der Werf, M. J., Berger, R., & Smilde, A. K. (2008). Simplivariate Models: Ideas and First Examples. PLoS ONE, 3(9), [e3259]. https://doi.org/10.1371/journal.pone.0003259
Hageman, J.A. ; Hendriks, M.M.W.B. ; Westerhuis, J.A. ; van der Werf, M.J. ; Berger, R. ; Smilde, A.K. / Simplivariate Models: Ideas and First Examples. In: PLoS ONE. 2008 ; Vol. 3, No. 9.
@article{c11dc8389bbd45f6963fee3fb614d6b5,
title = "Simplivariate Models: Ideas and First Examples",
abstract = "One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data. A more useful distinction is in terms of informative and non-informative variation where informative relates to the problem being studied. In most common methods for analyzing metabolomics (or any other high-dimensional x-omics) data this distinction is ignored thereby severely hampering the results of the analysis. This leads to poorly interpretable models and may even obscure the relevant biological information. We developed a framework from first data analysis principles by explicitly formulating the problem of analyzing metabolomics data in terms of informative and non-informative parts. This framework allows for flexible interactions with the biologists involved in formulating prior knowledge of underlying structures. The basic idea is that the informative parts of the complex metabolomics data are approximated by simple components with a biological meaning, e.g. in terms of metabolic pathways or their regulation. Hence, we termed the framework ‘simplivariate models’ which constitutes a new way of looking at metabolomics data. The framework is given in its full generality and exemplified with two methods, IDR analysis and plaid modeling, that fit into the framework. Using this strategy of ‘divide and conquer’, we show that meaningful simplivariate models can be obtained using a real-life microbial metabolomics data set. For instance, one of the simple components contained all the measured intermediates of the Krebs cycle of E. coli. Moreover, these simplivariate models were able to uncover regulatory mechanisms present in the phenylalanine biosynthesis route of E. coli",
author = "J.A. Hageman and M.M.W.B. Hendriks and J.A. Westerhuis and {van der Werf}, M.J. and R. Berger and A.K. Smilde",
year = "2008",
doi = "10.1371/journal.pone.0003259",
language = "English",
volume = "3",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "9",

}

Hageman, JA, Hendriks, MMWB, Westerhuis, JA, van der Werf, MJ, Berger, R & Smilde, AK 2008, 'Simplivariate Models: Ideas and First Examples', PLoS ONE, vol. 3, no. 9, e3259. https://doi.org/10.1371/journal.pone.0003259

Simplivariate Models: Ideas and First Examples. / Hageman, J.A.; Hendriks, M.M.W.B.; Westerhuis, J.A.; van der Werf, M.J.; Berger, R.; Smilde, A.K.

In: PLoS ONE, Vol. 3, No. 9, e3259, 2008.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Simplivariate Models: Ideas and First Examples

AU - Hageman, J.A.

AU - Hendriks, M.M.W.B.

AU - Westerhuis, J.A.

AU - van der Werf, M.J.

AU - Berger, R.

AU - Smilde, A.K.

PY - 2008

Y1 - 2008

N2 - One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data. A more useful distinction is in terms of informative and non-informative variation where informative relates to the problem being studied. In most common methods for analyzing metabolomics (or any other high-dimensional x-omics) data this distinction is ignored thereby severely hampering the results of the analysis. This leads to poorly interpretable models and may even obscure the relevant biological information. We developed a framework from first data analysis principles by explicitly formulating the problem of analyzing metabolomics data in terms of informative and non-informative parts. This framework allows for flexible interactions with the biologists involved in formulating prior knowledge of underlying structures. The basic idea is that the informative parts of the complex metabolomics data are approximated by simple components with a biological meaning, e.g. in terms of metabolic pathways or their regulation. Hence, we termed the framework ‘simplivariate models’ which constitutes a new way of looking at metabolomics data. The framework is given in its full generality and exemplified with two methods, IDR analysis and plaid modeling, that fit into the framework. Using this strategy of ‘divide and conquer’, we show that meaningful simplivariate models can be obtained using a real-life microbial metabolomics data set. For instance, one of the simple components contained all the measured intermediates of the Krebs cycle of E. coli. Moreover, these simplivariate models were able to uncover regulatory mechanisms present in the phenylalanine biosynthesis route of E. coli

AB - One of the new expanding areas in functional genomics is metabolomics: measuring the metabolome of an organism. Data being generated in metabolomics studies are very diverse in nature depending on the design underlying the experiment. Traditionally, variation in measurements is conceptually broken down in systematic variation and noise where the latter contains, e.g. technical variation. There is increasing evidence that this distinction does not hold (or is too simple) for metabolomics data. A more useful distinction is in terms of informative and non-informative variation where informative relates to the problem being studied. In most common methods for analyzing metabolomics (or any other high-dimensional x-omics) data this distinction is ignored thereby severely hampering the results of the analysis. This leads to poorly interpretable models and may even obscure the relevant biological information. We developed a framework from first data analysis principles by explicitly formulating the problem of analyzing metabolomics data in terms of informative and non-informative parts. This framework allows for flexible interactions with the biologists involved in formulating prior knowledge of underlying structures. The basic idea is that the informative parts of the complex metabolomics data are approximated by simple components with a biological meaning, e.g. in terms of metabolic pathways or their regulation. Hence, we termed the framework ‘simplivariate models’ which constitutes a new way of looking at metabolomics data. The framework is given in its full generality and exemplified with two methods, IDR analysis and plaid modeling, that fit into the framework. Using this strategy of ‘divide and conquer’, we show that meaningful simplivariate models can be obtained using a real-life microbial metabolomics data set. For instance, one of the simple components contained all the measured intermediates of the Krebs cycle of E. coli. Moreover, these simplivariate models were able to uncover regulatory mechanisms present in the phenylalanine biosynthesis route of E. coli

U2 - 10.1371/journal.pone.0003259

DO - 10.1371/journal.pone.0003259

M3 - Article

VL - 3

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 9

M1 - e3259

ER -

Hageman JA, Hendriks MMWB, Westerhuis JA, van der Werf MJ, Berger R, Smilde AK. Simplivariate Models: Ideas and First Examples. PLoS ONE. 2008;3(9). e3259. https://doi.org/10.1371/journal.pone.0003259