Genomic selection entails the estimation of phenotypic traits of interest for plants without phenotype based on the association between single-nucleotide polymorphisms (SNPs) and phenotypic traits for plants with phenotype. Typically, the number of SNPs far exceeds the number of samples (high-dimensionality) and, therefore, usage of regularization methods is common. The most common approach to estimate marker-trait associations uses the genomic best linear unbiased predictor (GBLUP) method, where a mixed model is fitted to the data. GBLUP has also been alternatively parameterized as a ridge regression model (RRBLUP). GBLUP/RRBLUP is based on the assumption of independence between predictor variables. However, it is to be expected that variables will be associated due to their genetic proximity. Here, we propose a regularized linear model (namely psBLUP: proximity smoothed BLUP) that explicitly models the dependence between predictor effects. We show that psBLUP can improve accuracy compared to the standard methods on both Arabidopsis thaliana data and Barley data.
|Publication status||Published - May 2022|
- Genomic selection
- High-dimensional data
- Proximity smoothing