FitTetra 2.0 - Improved genotype calling for tetraploids with multiple population and parental data support

Konrad Zych, Gerrit Gort, Chris A. Maliepaard, Ritsert C. Jansen, Roeland E. Voorrips*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Background: Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data. Results: Our upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60 k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20% more SNPs could be called with our updated. Conclusion: Our updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.

Original languageEnglish
Article number148
JournalBMC Bioinformatics
Volume20
Issue number1
DOIs
Publication statusPublished - 20 Mar 2019

Fingerprint

Tetraploidy
Genotype
Software
Population
Proportion
Licensure
Diploidy
Software packages
Artifacts
Single Nucleotide Polymorphism
Segregation
Axiom
Alleles
Mixture Model
Modeling
Throughput
Software Package
High Throughput
Predict
Methodology

Keywords

  • Autotetraploids
  • fitPoly
  • Genomics
  • Genotype calling
  • Genotyping
  • Polyploids

Cite this

@article{ea58311ed5fc4365a28ba8d70692efaa,
title = "FitTetra 2.0 - Improved genotype calling for tetraploids with multiple population and parental data support",
abstract = "Background: Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data. Results: Our upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60 k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20{\%} more SNPs could be called with our updated. Conclusion: Our updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.",
keywords = "Autotetraploids, fitPoly, Genomics, Genotype calling, Genotyping, Polyploids",
author = "Konrad Zych and Gerrit Gort and Maliepaard, {Chris A.} and Jansen, {Ritsert C.} and Voorrips, {Roeland E.}",
year = "2019",
month = "3",
day = "20",
doi = "10.1186/s12859-019-2703-y",
language = "English",
volume = "20",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "Springer Verlag",
number = "1",

}

FitTetra 2.0 - Improved genotype calling for tetraploids with multiple population and parental data support. / Zych, Konrad; Gort, Gerrit; Maliepaard, Chris A.; Jansen, Ritsert C.; Voorrips, Roeland E.

In: BMC Bioinformatics, Vol. 20, No. 1, 148, 20.03.2019.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - FitTetra 2.0 - Improved genotype calling for tetraploids with multiple population and parental data support

AU - Zych, Konrad

AU - Gort, Gerrit

AU - Maliepaard, Chris A.

AU - Jansen, Ritsert C.

AU - Voorrips, Roeland E.

PY - 2019/3/20

Y1 - 2019/3/20

N2 - Background: Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data. Results: Our upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60 k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20% more SNPs could be called with our updated. Conclusion: Our updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.

AB - Background: Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data. Results: Our upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60 k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20% more SNPs could be called with our updated. Conclusion: Our updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.

KW - Autotetraploids

KW - fitPoly

KW - Genomics

KW - Genotype calling

KW - Genotyping

KW - Polyploids

U2 - 10.1186/s12859-019-2703-y

DO - 10.1186/s12859-019-2703-y

M3 - Article

VL - 20

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 148

ER -