Scalable Computing for Evolutionary Genomics

J.C.P. Prins, D. Belhachemi, S. Möller, G. Smant

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

4 Citations (Scopus)

Abstract

Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Original languageEnglish
Title of host publicationEvolutionary Genomics. Statistical and Computational Methods, Volume 2
EditorsM. Anisimova
Pages529-545
Number of pages556
DOIs
Publication statusPublished - 2012

Publication series

NameMethods in Molecular Biology
PublisherHumana Press
Number856

Keywords

  • Amazon EC2
  • Big data
  • Bioinformatics
  • BioNode
  • Cloud computing
  • Cluster computing
  • Debian Linux
  • Evolutionary biology
  • MPI
  • MrBayes
  • OpenStack
  • PAML
  • Parallelization
  • Virtual machine
  • VirtualBox

Fingerprint

Dive into the research topics of 'Scalable Computing for Evolutionary Genomics'. Together they form a unique fingerprint.

Cite this