TriPoly: haplotype estimation for polyploids using sequencing data of related individuals

Ehsan Motazedi, Dick de Ridder, Richard Finkers, Samantha Baldwin, Susan Thomson, Katrina Monaghan, Chris Maliepaard

Research output: Contribution to journalArticleAcademicpeer-review

2 Citations (Scopus)

Abstract

Motivation: Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci. Results: We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring. Availability and implementation: TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly. Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Pages (from-to)3864-3872
JournalBioinformatics
Volume34
Issue number22
DOIs
Publication statusPublished - 15 Nov 2018

Fingerprint

Polyploidy
Haplotype
Haplotypes
Sequencing
Chromosomes
Solanum tuberosum
Boidae
DNA sequences
Musa
Bioinformatics
Python
Nucleotides
Polymorphism
Potato
RNA
Chromosome
Throughput
Availability
Coverage
Solanum Tuberosum

Cite this

@article{a6846080df8c4abd9d5add80054b798e,
title = "TriPoly: haplotype estimation for polyploids using sequencing data of related individuals",
abstract = "Motivation: Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci. Results: We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring. Availability and implementation: TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly. Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Ehsan Motazedi and {de Ridder}, Dick and Richard Finkers and Samantha Baldwin and Susan Thomson and Katrina Monaghan and Chris Maliepaard",
year = "2018",
month = "11",
day = "15",
doi = "10.1093/bioinformatics/bty442",
language = "English",
volume = "34",
pages = "3864--3872",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "22",

}

TriPoly : haplotype estimation for polyploids using sequencing data of related individuals. / Motazedi, Ehsan; de Ridder, Dick; Finkers, Richard; Baldwin, Samantha; Thomson, Susan; Monaghan, Katrina; Maliepaard, Chris.

In: Bioinformatics, Vol. 34, No. 22, 15.11.2018, p. 3864-3872.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - TriPoly

T2 - haplotype estimation for polyploids using sequencing data of related individuals

AU - Motazedi, Ehsan

AU - de Ridder, Dick

AU - Finkers, Richard

AU - Baldwin, Samantha

AU - Thomson, Susan

AU - Monaghan, Katrina

AU - Maliepaard, Chris

PY - 2018/11/15

Y1 - 2018/11/15

N2 - Motivation: Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci. Results: We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring. Availability and implementation: TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly. Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: Knowledge of haplotypes, i.e. phased and ordered marker alleles on a chromosome, is essential to answer many questions in genetics and genomics. By generating short pieces of DNA sequence, high-throughput modern sequencing technologies make estimation of haplotypes possible for single individuals. In polyploids, however, haplotype estimation methods usually require deep coverage to achieve sufficient accuracy. This often renders sequencing-based approaches too costly to be applied to large populations needed in studies of Quantitative Trait Loci. Results: We propose a novel haplotype estimation method for polyploids, TriPoly, that combines sequencing data with Mendelian inheritance rules to infer haplotypes in parent-offspring trios. Using realistic simulations of both short and long-read sequencing data for banana (Musa acuminata) and potato (Solanum tuberosum) trios, we show that TriPoly yields more accurate progeny haplotypes at low coverages compared to existing methods that work on single individuals. We also apply TriPoly to phase Single Nucleotide Polymorphisms on chromosome 5 for a family of tetraploid potato with 2 parents and 37 offspring sequenced with an RNA capture approach. We show that TriPoly haplotype estimates differ from those of the other methods mainly in regions with imperfect sequencing or mapping difficulties, as it does not rely solely on sequence reads and aims to avoid phasings that are not likely to have been passed from the parents to the offspring. Availability and implementation: TriPoly has been implemented in Python 3.5.2 (also compatible with Python 2.7.3 and higher) and can be freely downloaded at https://github.com/EhsanMotazedi/TriPoly. Supplementary information: Supplementary data are available at Bioinformatics online.

U2 - 10.1093/bioinformatics/bty442

DO - 10.1093/bioinformatics/bty442

M3 - Article

VL - 34

SP - 3864

EP - 3872

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 22

ER -