Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods

E. Saccenti*, J. Camacho*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

40 Citations (Scopus)

Abstract

Principal component analysis is one of the most commonly used multivariate tools to describe and summarize data. Determining the optimal number of components in a principal component model is a fundamental problem in many fields of application. In this paper we compare the performance of several methods developed for this task in different areas of research. We consider statistical methods based on results from random matrix theory (Tracy-Widom and Kritchman-Nadler testing procedures), cross-validation methods (namely the well characterized element wise k-fold algorithm, ekf, and its corrected version cekf) and methods based on numerical approximation (SACV and GCV). The performance of these methods is assessed on both simulated and real life data sets. In both cases, differential behavior of the considered methods is observed, for which we propose theoretical explanations
Original languageEnglish
Pages (from-to)99-116
JournalChemometrics and Intelligent Laboratory Systems
Volume149
Issue numberpart A
DOIs
Publication statusPublished - 2015

Keywords

  • Covariance matrix
  • Dimensionality assessment
  • Eigenanalysis
  • Random matrix theory
  • Tracy-Widom distribution

Fingerprint

Dive into the research topics of 'Determining the number of components in principal components analysis: A comparison of statistical, crossvalidation and approximated methods'. Together they form a unique fingerprint.

Cite this