Large-scale in silico genome-wide association studies to unravel allelic variation and combinations for selected traits of agronomic importance in potato

Project: PhD

Project Details


a. Introduction (literature references in an appendix) Potato (Solanum tuberosum L.) plays a critical role in global food and nutrition security and is the third important staple crop after rice and wheat in terms of human consumption (Statistics of Food and Agriculture Organization, FAOSTAT, Throughout the over 150-year breeding history of potato since it was first introduced to Europe, diverse market segments such as fresh consumption, starch industry and chip processing were gradually developed to meet the preference of different consumers or industries. Despite its importance, potato breeding has long been hampered by the nature of tetrasomic inheritance and its highly heterozygous genome, thereby resulting in a slow breeding progress and a limited genetic gain in yield and other complex traits. Several potato research groups are now proposing to re-domesticate potato from a tuber-propagated, tetraploid crop into an inbred-line based diploid crop based on true seeds (Lindhout et al., 2011; Stockem et al., 2020), and progress towards addressing self-incompatibility and inbreeding depression in potato has already been achieved (Ye et al., 2018; Zhang et al., 2019). It is possible, albeit challenging, to see diploid potato fully substitute its tetraploid counterparts through intensive selection and breeding efforts. Construction of reference genomes can significantly accelerate the process in breeding and biology studies. The first reference genome of potato, DM1-3 516R44 (hereafter DM) was released in 2011 (Consortium, 2011; Sharma et al., 2013), followed by the publication of de novo short-read assemblies of two potato wild relatives Solanum commersonii and Solanum chacoense (Aversano et al., 2015; Leisner et al., 2018), Recently, the widely used potato reference genome DM has been updated to version 6.1 with chromosome-level scaffolds and revised gene annotation (Pham et al., 2020). A haplotype-resolved assembly of a diploid potato, RH89-039-16 (Zhou et al., 2020) and the reference genome assembly of Solyntus, a highly homozygous diploid potato (van Lieshout et al., 2020) were also reported, all of which were constructed using long-read sequencing technologies thus presenting significantly increased genome contiguity. The availability of potato reference genomes permits large-scale population-level characterization of genetic variations, which could provide markers to facilitate molecular breeding and functional studies as well as reinforce our understanding of potato evolution and domestication. To help reach this, several SNP genotyping platforms such as the Infinium 8303 Potato Array, a potato 20k SNP array and the SolCAP 12K SNP array were successfully developed (Ellis et al., 2018; Felcher et al., 2012; Hamilton et al., 2011; Vos et al., 2015). Not only genomic variants within functionally important genes were identified (Manrique-Carpintero et al., 2013; Manrique-Carpintero et al., 2014) but also evolutionary patterns and population structure among different diversity panels were characterized (Deperi et al., 2018; Ellis et al., 2018; Hardigan et al., 2015; Hirsch et al., 2013; Kolech et al., 2016; Stich et al., 2013) by deploying these SNP array platforms. The rapid advance and the declining cost of next-generation sequencing (NGS) technologies provided unprecedented opportunities to access the nearly full-spectrum of genetic diversity within a species. By leveraging genome-wide resequencing of worldwide collected potato accessions, evolution history, introgression patterns and selected loci during domestication were presented and tremendous numbers of markers for genetic mapping and association studies were also generated (Hardigan et al., 2017; Li et al., 2018). Genomic polymorphisms generated from SNP array platforms and whole-genome resequencing, when transforming into format of allele frequencies, can be directly utilized for association mapping, typically genome-wide association (GWA) studies in potato. GWAS exhibits several particular advantages relative to the conventional Quantitative Trait Locus (QTL) mapping: mapping resolution increases substantially and numerous traits can be analyzed simultaneously. However, the statistic models extensively applied in current GWA studies hardly take into account rare genetic variants that can also have a strong impact on phenotypic variance and population stratification could confound these models greatly. Despite this, many GWA studies have been performed in diploid and tetraploid potato and numerous signals associated with agronomically important traits such as tuber quality, tuber starch and protein content and disease resistance have been identified, which offered guides to marker-assistant breeding and genomic selection (Berdugo-Cely et al., 2017; Juyo Rojas et al., 2019; Khlestkin et al., 2019; Klaassen et al., 2019; Kloosterman et al., 2013; Lindqvist-Kreuze et al., 2014; Prodhomme et al., 2020; Sharma et al., 2018; Uitdewilligen et al., 2013; van Eck et al., 2017; Vos et al., 2015). b. Problem definition Prior to the introduction of diploid potato varieties with phenotypic performance (production, tuber quality and disease resistance, etc.) comparable with their tetraploid counterparts, existing tetraploid elite cultivars will still dominate the market, but the genetic diversity among these tetraploid varieties has rarely been accessed. Despite the prevalence in genome-wide resequencing in plant species, most studies targeting on identification and characterization of genetic variants in potato still utilize SNP array platforms containing only variant information from a small proportion of genomic segments, thereby overlooking other allelic variations, for example, insertions and deletions (InDels) and copy number variations (CNVs) that might also contribute to phenotypic variance. Reported GWA studies leveraging those low-density array-based SNPs usually failed to achieve sufficient resolution to narrow down mapping intervals to identify causative variants for given agronomic traits. Moreover, existing work of association mapping in potato usually focused on specific sets of phenotypes derived from a single commercial performance, e.g., tuber quality, starch content and field production, but rarely investigated multi-way correlation between one marker and two or more preferred traits belonging to distinct categories. Despite the identification of numerous significantly associated signals/alleles to important phenotypes, allelic configuration to reach a given agronomic trait remained largely unexplored. c. Methodology Genome-wide deep resequencing data (~40-fold coverage in average) of approximately 200–250 tetraploid potato commercial varieties representing diverse market classes and spanning the over 150-year breeding process (~1850–present) have been generated, which will allow comprehensive characterization of genetic variations (SNPs, InDels and CNVs) by applying allele dosage-aware variant calling pipelines. Haplotype reconstruction for each (selected) gene will be performed using the resequencing-based variants by a novel algorithm that is under development. In conjunction with a range of phenotypic data for the over 500-variety population, GWAS will be applied for each trait by passing whole-genome variants identified from the tetraploid panel to polyploid-compatible association algorithms such as GWASpoly (Rosyara et al., 2016). Mapping resolution is expected to be increased due to the use of this large-size association panel and tens of millions of markers generated from whole-genome deep resequencing. GWAS signal sites (possibly leading SNPs) and corresponding haplotype information for each phenotype will further be jointly examined to test the possibility of three-way correlation between the allelic variation and two distinct phenotypes (e.g., tuber starch content and underwater weights, or tuber weights and flavor-related metabolite content). Significantly associated signals identified from GWAS can be further analyzed in two 1000-offspring populations (one diploid and one tetraploid), possibly for candidate gene isolation. a. How is adequate supervision guaranteed? Dr. Richard Finkers is an expert in genome sequencing and genomics of polyploids. Dr. Christian Bachem is an expert in plant biotechnology and plant genomics. They will act as daily supervisors and co-promoters. Prof. Richard Visser has many years’ experience in potato physiology, genetics and genomics and he heads Plant Breeding, WUR. Prof. Sanwen Huang has many years of experience in genome sequencing notably potato. He heads the sequencing center of AGIS, CAAS in Shenzhen. The project will be dovetailed with another project carried out by PhD candidate Natascha van Lieshout who studies genomics of polyploid crops notably potato and chrysanthemum. Technical assistance at the bioinformatic and genomics level can and will be provided if needed by Danny Esselink and Martijn van Kauwen who are accomplished bioinformatician technicians in Plant Breeding. Prof. Richard Visser will conduct overall supervision (when the candidate is in NL) and coordinate the research and will be liaising with Chinese counterparts. b. How is execution of the project guaranteed? Adequate supervision both in NL and China is guaranteed both at the biological and bioinformatic sides. This project will benefit from interactions between various other projects both on potato biological research as well as bioinformatic/genomics research on which different PhD students and postdocs are active. All needed sequence info and phenotypic data has been obtained and is available for this project including but not limited to (Sequence info of three de novo assembled potato genomes, six de novo assembled tetraploid potato genomes and the pan-genome of those, genomic and phenotypic data of around 500 resequenced potato varieties and phenotypic and genomic data of at least two populations (one diploid and one tetraploid) and their offspring (around 1000 in both cases). When needed and necessary specific sequence data of potato genotypes will be generated.
Effective start/end date1/09/20 → …


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.