PanTools: representation, storage and exploration of pan-genomic data

S. Sheikhizadeh Anari, M.E. Schranz, Mehmet Akdel, D. de Ridder, S. Smit*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

39 Citations (Scopus)

Abstract

Motivation: Next-generation sequencing technology is generating a wealth of highly similar genome sequences for many species, paving the way for a transition from single-genome to pangenome analyses. Accordingly, genomics research is going to switch from reference-centric to pan-genomic approaches. We define the pan-genome as a comprehensive representation of multiple
annotated genomes, facilitating analyses on the similarity and divergence of the constituent genomes at the nucleotide, gene and genome structure level. Current pan-genomic approaches do not thoroughly address scalability, functionality and usability.
Results: We introduce a generalized De Bruijn graph as a pan-genome representation, as well as an online algorithm to construct it. This representation is stored in a Neo4j graph database, which makes our approach scalable to large eukaryotic genomes. Besides the construction algorithm, our software package, called PanTools, currently provides functionality for annotating pan-genomes,
adding sequences, grouping genes, retrieving gene sequences or genomic regions, reconstructing genomes and comparing and querying pan-genomes. We demonstrate the performance of the tool using datasets of 62 E. coli genomes, 93 yeast genomes and 19 Arabidopsis thaliana genomes.
Availability and Implementation: The Java implementation of PanTools is publicly available at http://www.bif.wur.nl.
Contact: [email protected]
Original languageEnglish
Pages (from-to)i487-i493
JournalBioinformatics
Volume32
Issue number17
DOIs
Publication statusPublished - 2016
EventECCB 2016: The 15th European Conference on Computational Biology - World Forum Convention Center , The Hague, Netherlands
Duration: 3 Sept 20167 Sept 2016
http://www.eccb2016.org

Fingerprint

Dive into the research topics of 'PanTools: representation, storage and exploration of pan-genomic data'. Together they form a unique fingerprint.

Cite this