TY - JOUR
T1 - All sparse PCA models are wrong, but some are useful. Part II
T2 - Limitations and problems of deflation
AU - Camacho, J.
AU - Smilde, A.K.
AU - Saccenti, E.
AU - Westerhuis, J.A.
AU - Bro, Rasmus
PY - 2021/1/15
Y1 - 2021/1/15
N2 - Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA). It combines variance maximization and sparsity with the ultimate goal of improving data interpretation. A main application of sPCA is to handle high-dimensional data, for example biological omics data. In Part I of this series, we illustrated limitations of several state-of-the-art sPCA algorithms when modeling noise-free data, simulated following an exact sPCA model. In this Part II we provide a thorough analysis of the limitations of sPCA methods that use deflation for calculating subsequent, higher order, components. We show, both theoretically and numerically, that deflation can lead to problems in the model interpretation, even for noise free data. In addition, we contribute diagnostics to identify modeling problems in real-data analysis.
AB - Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA). It combines variance maximization and sparsity with the ultimate goal of improving data interpretation. A main application of sPCA is to handle high-dimensional data, for example biological omics data. In Part I of this series, we illustrated limitations of several state-of-the-art sPCA algorithms when modeling noise-free data, simulated following an exact sPCA model. In this Part II we provide a thorough analysis of the limitations of sPCA methods that use deflation for calculating subsequent, higher order, components. We show, both theoretically and numerically, that deflation can lead to problems in the model interpretation, even for noise free data. In addition, we contribute diagnostics to identify modeling problems in real-data analysis.
KW - Artifacts
KW - Data interpretation
KW - Exploratory data analysis
KW - Model interpretation
KW - Sparse principal component analysis
KW - Sparsity
U2 - 10.1016/j.chemolab.2020.104212
DO - 10.1016/j.chemolab.2020.104212
M3 - Article
AN - SCOPUS:85098168203
VL - 208
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
SN - 0169-7439
M1 - 104212
ER -