Abstract
One of the first steps in analyzing high-dimensional functional genomics data is an exploratory analysis of such data. Cluster Analysis and Principal Component Analysis are then usually the method of choice. Despite their versatility they also have a severe drawback: they do not always generate simple and interpretable solutions. On the basis of the observation that functional genomics data often contain both informative and non-informative variation, we propose a method that finds sets of variables containing informative variation. This informative variation is subsequently expressed in easily interpretable simplivariate components.
We present a new implementation of the recently introduced simplivariate models. In this implementation, the informative variation is described by multiplicative models that can adequately represent the relations between functional genomics data. Both a simulated and two real-life metabolomics data sets show good performance of the method.
Original language | English |
---|---|
Article number | e20747 |
Journal | PLoS ONE |
Volume | 6 |
Issue number | 6 |
DOIs | |
Publication status | Published - 2011 |
Keywords
- metabolomics data
- multiple-regression
- genetic algorithms
- escherichia-coli
- microarray data
- decomposition
- number
- indole