Computational analysis of copy number variation in plant genomes

Research output: Thesisinternal PhD, WU

Abstract

Adaptation is the process through which organisms acquire traits that increase fitness relative to organisms that did not develop these traits. There is a great need to study adaptation in plants, particularly because it helps to develop novel strategies for improving crop cultivars through breeding. Such strategies are necessary to feed a growing world population in the face of increasingly challenging environmental conditions due to climate change.

Given an environment, an organism’s fitness is determined for a large part by its genome: the full DNA sequence of an individual through which traits are passed on to offspring. Variation in genomic sequence (“genetic variation”) between organisms can result in variation in heritable traits and subsequently fitness, facilitating adaptation. There is increasing evidence that copy number variation (CNV) –- a particular type of genetic variation comprising deletions, insertions, and duplications of DNA sequences of at least 50 base pairs –- plays a major role in this process in plants. The rise of whole-genome sequencing (WGS) platforms that allow the reconstruction of genomic DNA sequences has enabled us to detect and study CNV at large scale. This thesis presents the development and application of computational methods that leverage WGS data to assess the abundance of CNV in plant species and its contribution to environmental adaptation.

Computational algorithms that detect CNV from WGS data have trouble dealing with some of the typical complexities of plant data, such as high gehomic divergence between natural or out-bred individuals, as they have been mainly designed with human data in mind. This necessitates the development of novel detection methods specifically tailored to plants. To this end, I developed Hecaton, a computational workflow that integrates four CNV detection tools using a machine-learning approach, trained on plant data. It has increased sensitivity and precision compared to a range of state-of-art CNV detection methods when applied to short-read sequencing data of the plants Arabidopsis thaliana, rice, maize, and tomato. Hecaton thus provides a robust method to detect CNV in plant genomes. 

I leveraged Hecaton to jointly study the contribution of CNV and other types of genetic variation to plant adaptation at different evolutionary timescales and in different environments. Previous work has indicated that CNV plays a large role in short-term adaptation of plants to environmental stress, with one study providing evidence that such stress promotes the formation of CNV. It remains unclear whether the latter is a general phenomenon however, as I found very few CNVs in A. thaliana plants grown for five generations under high salinity and zinc stress. Yet, adaptation through CNV may still have occurred, as one tray of plants grown under zinc stress showed strong evidence of adaptation through an unknown genetic mutation, which is currently being pinpointed through follow-up experiments.

While the exact contribution of CNV to rapid stress adaptation of plants thus remains uncertain, I did find evidence that CNV contributes to environmental adaptation across larger evolutionary timescales. A diversity panel of natural A. thaliana accessions sampled in the Netherlands contains an abundant amount of CNV and genes that overlap with CNV found at moderate frequency in the panel are enriched for those involved in disease resistance. This suggests that CNV of such genes was mediated by differences in abundance of pathogens of A. thaliana across the range in which the accessions were sampled.

In addition to mediating variation in adaptive traits within the same plant species, CNV may also mediate variation in such traits between different plant species. An example of this is presented in this thesis by Hirschfeldia incana, a plant with an exceptional rate of photosynthesis at high levels of irradiance. Several genes involved in photosynthesis and/or response to high light intensities are duplicated relative to its close relative A. thaliana and had increased levels of expression. These duplications may therefore contribute to the high photosynthetic efficiency of H. incana.

In conclusion, this thesis aids the detection of CNV in plant genomes and helps elucidate its role in plant adaptation. While there are still a number of challenges regarding the computational detection and interpretation of CNV, I foresee that ongoing technological and computational developments will be able to address these. This will help to further illuminate the link between CNV and plant adaptation. 

Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Wageningen University
Supervisors/Advisors
  • de Ridder, Dick, Promotor
  • Smit, Sandra, Co-promotor
Award date13 Dec 2021
Place of PublicationWageningen
Publisher
Print ISBNs9789463959834
DOIs
Publication statusPublished - 13 Dec 2021

Fingerprint

Dive into the research topics of 'Computational analysis of copy number variation in plant genomes'. Together they form a unique fingerprint.

Cite this