Better interpretable models after correcting for natural variation

Residual approaches examined

Mike Koeman, Jasper Engel, Jeroen Jansen*, Lutgarde Buydens

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The interpretation of estimates of model parameters in terms of biological information is often just as important as the predictions of the model itself. In this study we consider the identification of metabolites in a possibly biologically heterogeneous case group that show abnormal patterns with respect to a set of (healthy) control observations. For this purpose, we filter normal (baseline) natural variation from the data by projection of the data on a control sample model: the residual approach. This step should more easily highlight the abnormal metabolites. Interpretation is, however, hindered by a problem we named the ‘residual bias’ effect, which may lead to the identification of the wrong metabolites as ‘abnormal’. This effect is related to the smearing effect. We propose to alleviate residual bias by considering a weighted average of the filtered and raw data. This way, a compromise is found between excluding irrelevant natural variation from the data and the amount of residual bias that occurs. We show for simulated and real-world examples that this compromise may outperform inspection of the raw or filtered data. The method holds promise in numerous applications such as disease diagnoses, personalized healthcare, and industrial process control.
Original languageEnglish
Pages (from-to)142-148
JournalChemometrics and Intelligent Laboratory Systems
Volume174
DOIs
Publication statusPublished - 15 Mar 2018

Fingerprint

Metabolites
Process control
Identification (control systems)
Inspection

Keywords

  • Disease diagnosis
  • Interpretation
  • Metabolomics
  • PCA
  • Residuals
  • Smearing

Cite this

@article{87d3773fec5c4709be5b809d7cb9f69d,
title = "Better interpretable models after correcting for natural variation: Residual approaches examined",
abstract = "The interpretation of estimates of model parameters in terms of biological information is often just as important as the predictions of the model itself. In this study we consider the identification of metabolites in a possibly biologically heterogeneous case group that show abnormal patterns with respect to a set of (healthy) control observations. For this purpose, we filter normal (baseline) natural variation from the data by projection of the data on a control sample model: the residual approach. This step should more easily highlight the abnormal metabolites. Interpretation is, however, hindered by a problem we named the ‘residual bias’ effect, which may lead to the identification of the wrong metabolites as ‘abnormal’. This effect is related to the smearing effect. We propose to alleviate residual bias by considering a weighted average of the filtered and raw data. This way, a compromise is found between excluding irrelevant natural variation from the data and the amount of residual bias that occurs. We show for simulated and real-world examples that this compromise may outperform inspection of the raw or filtered data. The method holds promise in numerous applications such as disease diagnoses, personalized healthcare, and industrial process control.",
keywords = "Disease diagnosis, Interpretation, Metabolomics, PCA, Residuals, Smearing",
author = "Mike Koeman and Jasper Engel and Jeroen Jansen and Lutgarde Buydens",
year = "2018",
month = "3",
day = "15",
doi = "10.1016/j.chemolab.2018.01.007",
language = "English",
volume = "174",
pages = "142--148",
journal = "Chemometrics and Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier",

}

Better interpretable models after correcting for natural variation : Residual approaches examined. / Koeman, Mike; Engel, Jasper; Jansen, Jeroen; Buydens, Lutgarde.

In: Chemometrics and Intelligent Laboratory Systems, Vol. 174, 15.03.2018, p. 142-148.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Better interpretable models after correcting for natural variation

T2 - Residual approaches examined

AU - Koeman, Mike

AU - Engel, Jasper

AU - Jansen, Jeroen

AU - Buydens, Lutgarde

PY - 2018/3/15

Y1 - 2018/3/15

N2 - The interpretation of estimates of model parameters in terms of biological information is often just as important as the predictions of the model itself. In this study we consider the identification of metabolites in a possibly biologically heterogeneous case group that show abnormal patterns with respect to a set of (healthy) control observations. For this purpose, we filter normal (baseline) natural variation from the data by projection of the data on a control sample model: the residual approach. This step should more easily highlight the abnormal metabolites. Interpretation is, however, hindered by a problem we named the ‘residual bias’ effect, which may lead to the identification of the wrong metabolites as ‘abnormal’. This effect is related to the smearing effect. We propose to alleviate residual bias by considering a weighted average of the filtered and raw data. This way, a compromise is found between excluding irrelevant natural variation from the data and the amount of residual bias that occurs. We show for simulated and real-world examples that this compromise may outperform inspection of the raw or filtered data. The method holds promise in numerous applications such as disease diagnoses, personalized healthcare, and industrial process control.

AB - The interpretation of estimates of model parameters in terms of biological information is often just as important as the predictions of the model itself. In this study we consider the identification of metabolites in a possibly biologically heterogeneous case group that show abnormal patterns with respect to a set of (healthy) control observations. For this purpose, we filter normal (baseline) natural variation from the data by projection of the data on a control sample model: the residual approach. This step should more easily highlight the abnormal metabolites. Interpretation is, however, hindered by a problem we named the ‘residual bias’ effect, which may lead to the identification of the wrong metabolites as ‘abnormal’. This effect is related to the smearing effect. We propose to alleviate residual bias by considering a weighted average of the filtered and raw data. This way, a compromise is found between excluding irrelevant natural variation from the data and the amount of residual bias that occurs. We show for simulated and real-world examples that this compromise may outperform inspection of the raw or filtered data. The method holds promise in numerous applications such as disease diagnoses, personalized healthcare, and industrial process control.

KW - Disease diagnosis

KW - Interpretation

KW - Metabolomics

KW - PCA

KW - Residuals

KW - Smearing

U2 - 10.1016/j.chemolab.2018.01.007

DO - 10.1016/j.chemolab.2018.01.007

M3 - Article

VL - 174

SP - 142

EP - 148

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

ER -