Projects per year
Abstract
In this era of high-throughput genomics, the number of available genome sequences and genomic data sets for many organisms is growing rapidly. Many species are now represented by multiple genomes, rather than a single reference genome. Traditional comparative genomics approaches are bound by the physical constraints of a single genome that are unable to represent all genomic diversity of a species or population. As a result, there is an ongoing shift towards pangenomic approaches, which promise to precisely address the sequence and variation described in large collections of related genomes. However, the lack of practical applications of pangenomes prevents us from understanding the full genetic diversity variation within a species or population. PanTools is a recently developed toolkit for pangenome construction, graph annotation, and homology detection. In this thesis, I develop and employ novel applications utilizing PanTools' annotated pangenome graph representation, mainly focusing on the analysis of plant and bacterial pathogen genomes.
In Chapter 1 I introduce the field of comparative genomics and explain the essential principles for comparing genetic blueprints of different organisms. I describe the limitations of traditional methods, highlighting how reference genome bias and computational constraints are driving a transition to pangenomic approaches. A brief historical overview of pangenome studies elucidates the significance of this transition and introduces novel concepts used in pangenomics. The most prevalent pangenome representations are further detailed, discussing their advantages and shortcomings. Finally, I delineate several critical challenges that must be overcome to advance the recently emerged field of pangenomics.
Chapter 2 lays the groundwork for comparative genomics in PanTools by introducing a set of functionalities designed to handle large collections of bacterial genomes. Quality assurance to validate the genome quality and annotations before analyzing the data. Characterizing a pangenome's gene content (core, accessory, unique) and pangenome openness estimation. Furthermore, implementation of phylogenomic methods to reconstruct evolutionary relationships across bacterial species. The hierarchical graph representation was extended to enable incorporation of phenotype/metadata and functional annotations. The functionalities are demonstrated in a comparative study of nearly 200 Pectobacterium genomes. We describe the pangenome's large accessory genome, whose genetic repertoire far surpasses that of a single bacterial genome. The gene-based pangenome representation is further utilized as a framework for identifying genetic differences between blackleg causing and non-causing strains.
In Chapter 3, the methodology introduced in the previous chapter is updated to enable large-scale comparative genomics in pangenomes throughout all kingdoms of life. To obtain an accurate representation of gene families across different genome collections with varying sizes, complexities, and evolutionary distances, we propose a method that adjusts strictness of the homology grouping based on validated single-copy orthologs. We demonstrate generic applicability and scalability of the software by emphasizing functional annotation, gene-level and phylogenomic analyses on seven use cases from different taxonomic kingdoms: ranging from viruses to human and plant genomes.
Recent breakthroughs in sequencing and assembly technologies have delivered the first high quality haplotype-resolved (phased) genome assemblies. Chapter 4 focuses on harnessing the variation in these novel assemblies, which was previously hidden within haploid references. PanTools' methodology is updated, and novel functionalities are introduced specifically to enable the analysis of intragenomic variation, complementing existing intergenomic analyses. A relational synteny layer is incorporated into the hierarchical pangenome graph, linking collinear gene loci across all sequences. The functionalities are employed to pangenomes of diploid apples and tetraploid potatoes. We demonstrate high heterozygosity among homeologous chromosomes, characterized by gene absence/presence variation. Additionally, the analyses shed light on the evolutionary history of apple and potato species by revealing structural chromosomal rearrangements that occurred after whole-genome duplication events.
Chapter 5 presents various enhancements to the PanTools toolkit required for integrating pangenomics into plant breeding applications. We address a major bottleneck in the pangenome construction algorithm to increase scalability for a higher number of genomes. Improved construction runtime is demonstrated on various datasets, with the largest being a Solanum (genus-level) pangenome, including a combined total of 29 (haplotype-resolved) tomato and potato genomes. We further introduce functionalities for detecting gene presence/absence variation in WGS samples and integrating them into the pangenome graph, showcasing gene-bases pangenomic analysis of 474 Lactuca resequencing lines. Finally, we introduce a visual analysis browser for exploring sequence variants in the context of 20 tomato references with metadata and 444 resequenced accessions.
Chapter 6 serves as the conclusion of this thesis. I explain how the field of pangenomics has evolved during the timeframe of this thesis, detailing its current representations and applications, and how my work contributes to the field. Recognizing pangenomics as an emerging field that requires continuous development, I discuss the key challenges it faces. Finally, I conclude by exploring promising future directions for pangenomics research, with a particular focus on applications in bacteria and plants.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 17 Sept 2024 |
Place of Publication | Wageningen |
Publisher | |
Print ISBNs | 9789465101262 |
DOIs | |
Publication status | Published - 17 Sept 2024 |
Fingerprint
Dive into the research topics of 'Beyond the reference: pangenomic applications for plants and pathogens'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Pangenomic applications for plants and pathogens
Jonkheer, E. (PhD candidate), de Ridder, D. (Promotor), Smit, S. (Co-promotor) & van der Lee, T. (Co-promotor)
15/09/18 → 17/09/24
Project: PhD