High-quality, genome-wide SNP genotypic data for pedigreed germplasm of the diploid outbreeding species apple, peach, and sweet cherry through a common workflow

Stijn Vanderzande, Nicholas P. Howard, Lichun Cai, Cassia Da Silva Linge, Laima Antanaviciute, Marco C.A.M. Bink, Johannes W. Kruisselbrink, Nahla Bassil, Ksenija Gasic, Amy Iezzoni, Eric Van de Weg, Cameron Peace

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

High-quality genotypic data is a requirement for many genetic analyses. For any crop, errors in genotype calls, phasing of markers, linkage maps, pedigree records, and unnoticed variation in ploidy levels can lead to spurious marker-locus-trait associations and incorrect origin assignment of alleles to individuals. High-throughput genotyping requires automated scoring, as manual inspection of thousands of scored loci is too time-consuming. However, automated SNP scoring can result in errors that should be corrected to ensure recorded genotypic data are accurate and thereby ensure confidence in downstream genetic analyses. To enable quick identification of errors in a large genotypic data set, we have developed a comprehensive workflow. This multiple-step workflow is based on inheritance principles and on removal of markers and individuals that do not follow these principles, as demonstrated here for apple, peach, and sweet cherry. Genotypic data was obtained on pedigreed germplasm using 6-9K SNP arrays for each crop and a subset of well-performing SNPs was created using ASSIsT. Use of correct (and corrected) pedigree records readily identified violations of simple inheritance principles in the genotypic data, streamlined with FlexQTL software. Retained SNPs were grouped into haploblocks to increase the information content of single alleles and reduce computational power needed in downstream genetic analyses. Haploblock borders were defined by recombination locations detected in ancestral generations of cultivars and selections. Another round of inheritance-checking was conducted, for haploblock alleles (i.e., haplotypes). High-quality genotypic data sets were created using this workflow for pedigreed collections representing the U.S. breeding germplasm of apple, peach, and sweet cherry evaluated within the RosBREED project. These data sets contain 3855, 4005, and 1617 SNPs spread over 932, 103, and 196 haploblocks in apple, peach, and sweet cherry, respectively. The highly curated phased SNP and haplotype data sets, as well as the raw iScan data, of germplasm in the apple, peach, and sweet cherry Crop Reference Sets is available through the Genome Database for Rosaceae.

Original languageEnglish
Article numbere0210928
JournalPLoS ONE
Volume14
Issue number6
DOIs
Publication statusPublished - 27 Jul 2019

Fingerprint

outbreeding
Workflow
Malus
Prunus avium
Diploidy
peaches
Single Nucleotide Polymorphism
germplasm
diploidy
apples
Genes
Genome
Crops
inheritance (genetics)
genome
alleles
pedigree
haplotypes
crops
Alleles

Cite this

Vanderzande, S., Howard, N. P., Cai, L., Da Silva Linge, C., Antanaviciute, L., Bink, M. C. A. M., ... Peace, C. (2019). High-quality, genome-wide SNP genotypic data for pedigreed germplasm of the diploid outbreeding species apple, peach, and sweet cherry through a common workflow. PLoS ONE, 14(6), [e0210928]. https://doi.org/10.1371/journal.pone.0210928
Vanderzande, Stijn ; Howard, Nicholas P. ; Cai, Lichun ; Da Silva Linge, Cassia ; Antanaviciute, Laima ; Bink, Marco C.A.M. ; Kruisselbrink, Johannes W. ; Bassil, Nahla ; Gasic, Ksenija ; Iezzoni, Amy ; Van de Weg, Eric ; Peace, Cameron. / High-quality, genome-wide SNP genotypic data for pedigreed germplasm of the diploid outbreeding species apple, peach, and sweet cherry through a common workflow. In: PLoS ONE. 2019 ; Vol. 14, No. 6.
@article{f1e2b9aedf1d4118aece1d63de2fce33,
title = "High-quality, genome-wide SNP genotypic data for pedigreed germplasm of the diploid outbreeding species apple, peach, and sweet cherry through a common workflow",
abstract = "High-quality genotypic data is a requirement for many genetic analyses. For any crop, errors in genotype calls, phasing of markers, linkage maps, pedigree records, and unnoticed variation in ploidy levels can lead to spurious marker-locus-trait associations and incorrect origin assignment of alleles to individuals. High-throughput genotyping requires automated scoring, as manual inspection of thousands of scored loci is too time-consuming. However, automated SNP scoring can result in errors that should be corrected to ensure recorded genotypic data are accurate and thereby ensure confidence in downstream genetic analyses. To enable quick identification of errors in a large genotypic data set, we have developed a comprehensive workflow. This multiple-step workflow is based on inheritance principles and on removal of markers and individuals that do not follow these principles, as demonstrated here for apple, peach, and sweet cherry. Genotypic data was obtained on pedigreed germplasm using 6-9K SNP arrays for each crop and a subset of well-performing SNPs was created using ASSIsT. Use of correct (and corrected) pedigree records readily identified violations of simple inheritance principles in the genotypic data, streamlined with FlexQTL software. Retained SNPs were grouped into haploblocks to increase the information content of single alleles and reduce computational power needed in downstream genetic analyses. Haploblock borders were defined by recombination locations detected in ancestral generations of cultivars and selections. Another round of inheritance-checking was conducted, for haploblock alleles (i.e., haplotypes). High-quality genotypic data sets were created using this workflow for pedigreed collections representing the U.S. breeding germplasm of apple, peach, and sweet cherry evaluated within the RosBREED project. These data sets contain 3855, 4005, and 1617 SNPs spread over 932, 103, and 196 haploblocks in apple, peach, and sweet cherry, respectively. The highly curated phased SNP and haplotype data sets, as well as the raw iScan data, of germplasm in the apple, peach, and sweet cherry Crop Reference Sets is available through the Genome Database for Rosaceae.",
author = "Stijn Vanderzande and Howard, {Nicholas P.} and Lichun Cai and {Da Silva Linge}, Cassia and Laima Antanaviciute and Bink, {Marco C.A.M.} and Kruisselbrink, {Johannes W.} and Nahla Bassil and Ksenija Gasic and Amy Iezzoni and {Van de Weg}, Eric and Cameron Peace",
year = "2019",
month = "7",
day = "27",
doi = "10.1371/journal.pone.0210928",
language = "English",
volume = "14",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "6",

}

Vanderzande, S, Howard, NP, Cai, L, Da Silva Linge, C, Antanaviciute, L, Bink, MCAM, Kruisselbrink, JW, Bassil, N, Gasic, K, Iezzoni, A, Van de Weg, E & Peace, C 2019, 'High-quality, genome-wide SNP genotypic data for pedigreed germplasm of the diploid outbreeding species apple, peach, and sweet cherry through a common workflow' PLoS ONE, vol. 14, no. 6, e0210928. https://doi.org/10.1371/journal.pone.0210928

High-quality, genome-wide SNP genotypic data for pedigreed germplasm of the diploid outbreeding species apple, peach, and sweet cherry through a common workflow. / Vanderzande, Stijn; Howard, Nicholas P.; Cai, Lichun; Da Silva Linge, Cassia; Antanaviciute, Laima; Bink, Marco C.A.M.; Kruisselbrink, Johannes W.; Bassil, Nahla; Gasic, Ksenija; Iezzoni, Amy; Van de Weg, Eric; Peace, Cameron.

In: PLoS ONE, Vol. 14, No. 6, e0210928, 27.07.2019.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - High-quality, genome-wide SNP genotypic data for pedigreed germplasm of the diploid outbreeding species apple, peach, and sweet cherry through a common workflow

AU - Vanderzande, Stijn

AU - Howard, Nicholas P.

AU - Cai, Lichun

AU - Da Silva Linge, Cassia

AU - Antanaviciute, Laima

AU - Bink, Marco C.A.M.

AU - Kruisselbrink, Johannes W.

AU - Bassil, Nahla

AU - Gasic, Ksenija

AU - Iezzoni, Amy

AU - Van de Weg, Eric

AU - Peace, Cameron

PY - 2019/7/27

Y1 - 2019/7/27

N2 - High-quality genotypic data is a requirement for many genetic analyses. For any crop, errors in genotype calls, phasing of markers, linkage maps, pedigree records, and unnoticed variation in ploidy levels can lead to spurious marker-locus-trait associations and incorrect origin assignment of alleles to individuals. High-throughput genotyping requires automated scoring, as manual inspection of thousands of scored loci is too time-consuming. However, automated SNP scoring can result in errors that should be corrected to ensure recorded genotypic data are accurate and thereby ensure confidence in downstream genetic analyses. To enable quick identification of errors in a large genotypic data set, we have developed a comprehensive workflow. This multiple-step workflow is based on inheritance principles and on removal of markers and individuals that do not follow these principles, as demonstrated here for apple, peach, and sweet cherry. Genotypic data was obtained on pedigreed germplasm using 6-9K SNP arrays for each crop and a subset of well-performing SNPs was created using ASSIsT. Use of correct (and corrected) pedigree records readily identified violations of simple inheritance principles in the genotypic data, streamlined with FlexQTL software. Retained SNPs were grouped into haploblocks to increase the information content of single alleles and reduce computational power needed in downstream genetic analyses. Haploblock borders were defined by recombination locations detected in ancestral generations of cultivars and selections. Another round of inheritance-checking was conducted, for haploblock alleles (i.e., haplotypes). High-quality genotypic data sets were created using this workflow for pedigreed collections representing the U.S. breeding germplasm of apple, peach, and sweet cherry evaluated within the RosBREED project. These data sets contain 3855, 4005, and 1617 SNPs spread over 932, 103, and 196 haploblocks in apple, peach, and sweet cherry, respectively. The highly curated phased SNP and haplotype data sets, as well as the raw iScan data, of germplasm in the apple, peach, and sweet cherry Crop Reference Sets is available through the Genome Database for Rosaceae.

AB - High-quality genotypic data is a requirement for many genetic analyses. For any crop, errors in genotype calls, phasing of markers, linkage maps, pedigree records, and unnoticed variation in ploidy levels can lead to spurious marker-locus-trait associations and incorrect origin assignment of alleles to individuals. High-throughput genotyping requires automated scoring, as manual inspection of thousands of scored loci is too time-consuming. However, automated SNP scoring can result in errors that should be corrected to ensure recorded genotypic data are accurate and thereby ensure confidence in downstream genetic analyses. To enable quick identification of errors in a large genotypic data set, we have developed a comprehensive workflow. This multiple-step workflow is based on inheritance principles and on removal of markers and individuals that do not follow these principles, as demonstrated here for apple, peach, and sweet cherry. Genotypic data was obtained on pedigreed germplasm using 6-9K SNP arrays for each crop and a subset of well-performing SNPs was created using ASSIsT. Use of correct (and corrected) pedigree records readily identified violations of simple inheritance principles in the genotypic data, streamlined with FlexQTL software. Retained SNPs were grouped into haploblocks to increase the information content of single alleles and reduce computational power needed in downstream genetic analyses. Haploblock borders were defined by recombination locations detected in ancestral generations of cultivars and selections. Another round of inheritance-checking was conducted, for haploblock alleles (i.e., haplotypes). High-quality genotypic data sets were created using this workflow for pedigreed collections representing the U.S. breeding germplasm of apple, peach, and sweet cherry evaluated within the RosBREED project. These data sets contain 3855, 4005, and 1617 SNPs spread over 932, 103, and 196 haploblocks in apple, peach, and sweet cherry, respectively. The highly curated phased SNP and haplotype data sets, as well as the raw iScan data, of germplasm in the apple, peach, and sweet cherry Crop Reference Sets is available through the Genome Database for Rosaceae.

U2 - 10.1371/journal.pone.0210928

DO - 10.1371/journal.pone.0210928

M3 - Article

VL - 14

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 6

M1 - e0210928

ER -