Accurate genotype imputation in multiparental populations from low-coverage sequence

Chaozhi Zheng*, Martin P. Boer, Fred A. van Eeuwijk

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)

Abstract

Many different types of multiparental populations have recently been produced to increase genetic diversity and resolution in QTL mapping. Low-coverage, genotyping-by-sequencing (GBS) technology has become a cost-effective tool in these populations, despite large amounts of missing data in offspring and founders. In this work, we present a general statistical framework for genotype imputation in such experimental crosses from low-coverage GBS data. Generalizing a previously developed hidden Markov model for calculating ancestral origins of offspring DNA, we present an imputation algorithm that does not require parental data and that is applicable to bi-and multiparental populations. Our imputation algorithm allows heterozygosity of parents and offspring as well as error correction in observed genotypes. Further, our approach can combine imputation and genotype calling from sequencing reads, and it also applies to called genotypes from SNP array data. We evaluate our imputation algorithm by simulated and real data sets in four different types of populations: the F2, the advanced intercross recombinant inbred lines, the multiparent advanced generation intercross, and the cross-pollinated population. Because our approach uses marker data and population design information efficiently, the comparisons with previous approaches show that our imputation is accurate at even very low (< 1 ×) sequencing depth, in addition to having accurate genotype phasing and error detection.

Original languageEnglish
Pages (from-to)71-82
Number of pages12
JournalGenetics
Volume210
Issue number1
DOIs
Publication statusPublished - 1 Sep 2018

Fingerprint

Genotype
Population
Single Nucleotide Polymorphism
Technology
Costs and Cost Analysis
DNA

Keywords

  • Cross-pollinated (CP) population
  • Genotype imputation
  • Genotyping by sequencing
  • Hidden Markov model
  • MPP
  • Multiparent advanced generation inter-cross (MAGIC)
  • Multiparental populations

Cite this

@article{3e39d6f23c5b4721baf833ee00d53b14,
title = "Accurate genotype imputation in multiparental populations from low-coverage sequence",
abstract = "Many different types of multiparental populations have recently been produced to increase genetic diversity and resolution in QTL mapping. Low-coverage, genotyping-by-sequencing (GBS) technology has become a cost-effective tool in these populations, despite large amounts of missing data in offspring and founders. In this work, we present a general statistical framework for genotype imputation in such experimental crosses from low-coverage GBS data. Generalizing a previously developed hidden Markov model for calculating ancestral origins of offspring DNA, we present an imputation algorithm that does not require parental data and that is applicable to bi-and multiparental populations. Our imputation algorithm allows heterozygosity of parents and offspring as well as error correction in observed genotypes. Further, our approach can combine imputation and genotype calling from sequencing reads, and it also applies to called genotypes from SNP array data. We evaluate our imputation algorithm by simulated and real data sets in four different types of populations: the F2, the advanced intercross recombinant inbred lines, the multiparent advanced generation intercross, and the cross-pollinated population. Because our approach uses marker data and population design information efficiently, the comparisons with previous approaches show that our imputation is accurate at even very low (< 1 ×) sequencing depth, in addition to having accurate genotype phasing and error detection.",
keywords = "Cross-pollinated (CP) population, Genotype imputation, Genotyping by sequencing, Hidden Markov model, MPP, Multiparent advanced generation inter-cross (MAGIC), Multiparental populations",
author = "Chaozhi Zheng and Boer, {Martin P.} and {van Eeuwijk}, {Fred A.}",
year = "2018",
month = "9",
day = "1",
doi = "10.1534/genetics.118.300885",
language = "English",
volume = "210",
pages = "71--82",
journal = "Genetics",
issn = "0016-6731",
publisher = "Genetics Society of America",
number = "1",

}

Accurate genotype imputation in multiparental populations from low-coverage sequence. / Zheng, Chaozhi; Boer, Martin P.; van Eeuwijk, Fred A.

In: Genetics, Vol. 210, No. 1, 01.09.2018, p. 71-82.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Accurate genotype imputation in multiparental populations from low-coverage sequence

AU - Zheng, Chaozhi

AU - Boer, Martin P.

AU - van Eeuwijk, Fred A.

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Many different types of multiparental populations have recently been produced to increase genetic diversity and resolution in QTL mapping. Low-coverage, genotyping-by-sequencing (GBS) technology has become a cost-effective tool in these populations, despite large amounts of missing data in offspring and founders. In this work, we present a general statistical framework for genotype imputation in such experimental crosses from low-coverage GBS data. Generalizing a previously developed hidden Markov model for calculating ancestral origins of offspring DNA, we present an imputation algorithm that does not require parental data and that is applicable to bi-and multiparental populations. Our imputation algorithm allows heterozygosity of parents and offspring as well as error correction in observed genotypes. Further, our approach can combine imputation and genotype calling from sequencing reads, and it also applies to called genotypes from SNP array data. We evaluate our imputation algorithm by simulated and real data sets in four different types of populations: the F2, the advanced intercross recombinant inbred lines, the multiparent advanced generation intercross, and the cross-pollinated population. Because our approach uses marker data and population design information efficiently, the comparisons with previous approaches show that our imputation is accurate at even very low (< 1 ×) sequencing depth, in addition to having accurate genotype phasing and error detection.

AB - Many different types of multiparental populations have recently been produced to increase genetic diversity and resolution in QTL mapping. Low-coverage, genotyping-by-sequencing (GBS) technology has become a cost-effective tool in these populations, despite large amounts of missing data in offspring and founders. In this work, we present a general statistical framework for genotype imputation in such experimental crosses from low-coverage GBS data. Generalizing a previously developed hidden Markov model for calculating ancestral origins of offspring DNA, we present an imputation algorithm that does not require parental data and that is applicable to bi-and multiparental populations. Our imputation algorithm allows heterozygosity of parents and offspring as well as error correction in observed genotypes. Further, our approach can combine imputation and genotype calling from sequencing reads, and it also applies to called genotypes from SNP array data. We evaluate our imputation algorithm by simulated and real data sets in four different types of populations: the F2, the advanced intercross recombinant inbred lines, the multiparent advanced generation intercross, and the cross-pollinated population. Because our approach uses marker data and population design information efficiently, the comparisons with previous approaches show that our imputation is accurate at even very low (< 1 ×) sequencing depth, in addition to having accurate genotype phasing and error detection.

KW - Cross-pollinated (CP) population

KW - Genotype imputation

KW - Genotyping by sequencing

KW - Hidden Markov model

KW - MPP

KW - Multiparent advanced generation inter-cross (MAGIC)

KW - Multiparental populations

UR - https://doi.org/10.25386/genetics.6854933

U2 - 10.1534/genetics.118.300885

DO - 10.1534/genetics.118.300885

M3 - Article

VL - 210

SP - 71

EP - 82

JO - Genetics

JF - Genetics

SN - 0016-6731

IS - 1

ER -