Imputation of missing single nucleaotide polymorphism genotypes using multivariate mixed model framework

Research output: Contribution to journalArticleAcademicpeer-review

20 Citations (Scopus)


The objective of this paper was to investigate, for various scenarios at low and high marker density, the accuracy of imputing genotypes when using a multivariate mixed model framework using information from 2, 4, or 10 surrounding markers. This model predicts genotypes at a locus, using genotypes at nearby loci as correlated traits, and the additive genetic relationship matrix to use information from genotyped relatives. For 2 scenarios this method was compared with the population-based imputation algorithms FastPHASE and Beagle. Accuracies of imputation were obtained with Monte Carlo simulation and predicted with selection index theory, using input from the simulated data. Five different scenarios of missing genotypes were considered: 1) genotypes of some loci are missing due to genotyping errors, 2) juvenile selection candidates are genotyped using a smaller SNP panel, 3) some animals in the pedigree of a breeding population are not genotyped, 4) juvenile selection candidates are not genotyped, and 5) 1 generation of animals in the top of the pedigree are not genotyped. Surrounding marker information did not improve accuracy of imputation when animals whose genotypes were imputed were not genotyped for those surrounding markers. When those animals were genotyped for surrounding markers, results indicated a limited gain when linkage disequilibrium (LD) between SNP was low, but a substantial increase in accuracy when LD between SNP was high. For scenario 1, using 1 vs. 11 SNP, accuracy was respectively 0.75 and 0.81 at low, and 0.75 and 0.93 at high density. For scenario 2, using 1 vs. 11 SNP, accuracy was, respectively, 0.70 and 0.73 at low, and 0.71 and 0.84 at high density. Beagle outperformed the other methods at high SNP density, whereas the multivariate mixed model was clearly superior when SNP density was low and animals where genotyped with a reduced SNP panel. The results showed that extending the univariate gene content method to a multivariate BLUP model with inclusion of surrounding marker information only yields greater imputation accuracy when the animals with imputed loci are at least genotyped for some SNP that are in LD with the SNP to be imputed. The equation derived from selection index theory accurately predicted the accuracy of imputation using the multivariate mixed model framework.
Original languageEnglish
Pages (from-to)2042-2049
JournalJournal of Animal Science
Issue number7
Publication statusPublished - 2011


  • breeding value estimation
  • genomic selection
  • gene content
  • prediction
  • accuracy
  • populations
  • animals
  • cattle
  • phase


Dive into the research topics of 'Imputation of missing single nucleaotide polymorphism genotypes using multivariate mixed model framework'. Together they form a unique fingerprint.

Cite this