On the increase of predictive performance with high-level data fusion Highlighted and/or underlined version

T.G. Doeswijk, A.K. Smilde, J.A. Hageman, J.A. Westerhuis, F.A. van Eeuwijk

Research output: Contribution to journalArticleAcademicpeer-review

60 Citations (Scopus)

Abstract

The combination of the different data sources for classification purposes, also called data fusion, can be done at different levels: low-level, i.e. concatenating data matrices, medium-level, i.e. concatenating data matrices after feature selection and high-level, i.e. combining model outputs. In this paper the predictive performance of high-level data fusion is investigated. Partial least squares is used on each of the data sets and dummy variables representing the classes are used as response variables. Based on the estimated responses View the MathML source for data set j and class k, a Gaussian distribution View the MathML source is fitted. A simulation study is performed that shows the theoretical performance of high-level data fusion for two classes and two data sets. Within group correlations of the predicted responses of the two models and differences between the predictive ability of each of the separate models and the fused models are studied. Results show that the error rate is always less than or equal to the best performing subset and can theoretically approach zero. Negative within group correlations always improve the predictive performance. However, if the data sets have a joint basis, as with metabolomics data, this is not likely to happen. For equally performing individual classifiers the best results are expected for small within group correlations. Fusion of a non-predictive classifier with a classifier that exhibits discriminative ability lead to increased predictive performance if the within group correlations are strong. An example with real life data shows the applicability of the simulation results
Original languageEnglish
Pages (from-to)41-47
JournalAnalytica Chimica Acta
Volume705
Issue number1-2
DOIs
Publication statusPublished - 2011

Keywords

  • Classification
  • Data fusion
  • Error rate
  • Metabolomics

Fingerprint

Dive into the research topics of 'On the increase of predictive performance with high-level data fusion Highlighted and/or underlined version'. Together they form a unique fingerprint.

Cite this