Biology is becoming a data-rich science driven by the development of high-throughput technologies like next-generation DNA sequencing. This is fundamentally changing biological research. The genome sequences of many species are becoming available, as well as the genetic variation within a species, and the activity of the genes in a genome under various conditions. With the opportunities that these new technologies offer, comes the challenge to effectively deal with the large volumes of data that they produce. Bioinformaticians have an important role to play in organising and analysing this data to extract biological information and gain knowledge. Also for experimental biologists computers have become essential tools. This has created a strong need for software applications aimed at biological research. The chapters in this thesis detail my contributions to this area. Together with molecular biologists, plant breeders, immunologists, and microbiologists, I have developed several software tools and performed computational analyses to study biological questions.
Chapter 2 is about Primer3Plus, a web tool that helps biologists to design DNA primers for their experiments. These primers are typically short stretches of DNA (~20 nucleotides) that direct the DNA replication machinery to copy a selected region of a DNA molecule. The specificity of a primer is determined by several chemical and physical properties and therefore designing good primers is best done with the help of a computer program. Primer3Plus offers a user-friendly task-oriented web interface to the popular primer3 primer design program. Primer3Plus clearly fulfils a need in the biological research community as already over 400 scientific articles have cited the Primer3Plus publication.
Single nucleotide differences or polymorphisms (SNPs) that are present within a species can be used as markers to link phenotypic observations to locations on the genome. Chapter 3 discusses QualitySNPng, which is a stand-alone software tool for finding SNPs in high-throughput sequencing data. QualitySNPng was inspired by the QualitySNP pipeline for SNP detection that was published in 2006 and it uses similar filtering criteria to distinguish SNPs from technical artefacts like sequence read errors. In addition, the SNPs are used to predict haplotypes. QualitySNPng has a graphical user interface that allows the user to run the SNP detection and evaluate the results. It has already been successfully used in several projects on marker detection for plant breeding.
Single nucleotide polymorphisms can lead to single amino acid changes in protein sequences. These single amino acid polymorphisms (SAPs) play a key role in graft-versus-host (GVH) effects that often accompany tissue transplantations. A beneficial variant of GVH is the graft-versus-leukaemia (GVL) effect that is sometimes witnessed after bone marrow transplantation in leukaemia patients. When the GVL effect occurs, the donor’s immune cells actively destroy residual tumour cells in the patient. The GVL effect can already be elicited by a single amino acid difference between the patient and the donor. Currently, a small number of SAPs that can elicit a GVL effect are known and these are used to select the right bone marrow donor for a leukaemia patient. Together with researchers at the Leiden University Medical Center I developed a database to aid in the discovery of more such SAPs. We called this database the “Human Short Peptide Variation database” or HSPVdb. It is described in chapter 4.
The work described in chapter 5 is focused on the regions in bacterial genomes that are involved in gene regulation, the promoters. Intrigued by anecdotal evidence that duplication of bacterial promoters can activate or silence genes, we investigated how often promoter duplication occurs in bacterial genomes. Using the large number of bacterial genomes that are currently available, we looked for clusters of highly similar promoter regions. Since duplication assumes some sort of mobility, we termed the duplicated promoters: putative mobile promoters or PMPs. We found over 4,000 clusters of PMPs in 1,043 genomes. Most of the clusters consist of two members, indicating a single duplication event, but we also found much larger clusters of PMPs within some genomes. A number of PMPs are present in multiple species, even in very distantly related bacterial species, suggesting perhaps that these were subjected to horizontal gene transfer. The mobile promoters could play an important role in the rapid rewiring of gene regulatory networks.
Chapter 6 discusses how current biological research can adapt to make full use of the opportunities offered by the high-throughput technologies by following three different approaches. The first approach empowers the biologists with user-friendly software that allows him to analyse the large volumes of genome scale data without requiring expert computer skills. In the second approach the biologist teams up with a bioinformatician to combine in-depth biological knowledge with expert computational skills. The third approach combines the biologist and the bioinformatician in one person by teaching the biologist computational skills. Each of these three approaches has it merits and shortcomings, so I do not expect any of them to become dominant in the near future. Looking further ahead, it seems inevitable that any biologist will have to learn at least the basics of computational methods and that this should be an integral part of biology education. Bioinformatics might in time cease to exist as a separate field and instead become an intrinsic aspect of most biological research disciplines.
|Qualification||Doctor of Philosophy|
|Award date||5 Dec 2013|
|Place of Publication||Wageningen|
|Publication status||Published - 2013|
- molecular biology
- computer analysis
- information technology