Long-read annotation

Automated eukaryotic genome annotation based on long-read cDNA sequencing

David E. Cook, Jose Espejo Valle-Inclan, Alice Pajoro, Hanna Rovenich, Bart P.H.J. Thomma*, Luigi Faino

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

8 Citations (Scopus)

Abstract

Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short-and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.

Original languageEnglish
Pages (from-to)38-54
Number of pages17
JournalPlant Physiology
Volume179
Issue number1
DOIs
Publication statusPublished - 1 Jan 2019

Fingerprint

complementary DNA
DNA Sequence Analysis
sequence analysis
Complementary DNA
Genome
Arabidopsis
genome
Fungal Genome
Verticillium
Plant Genome
Genes
Verticillium dahliae
Software
Oryza sativa
genes
Arabidopsis thaliana
prediction
Proteins
proteins

Cite this

Cook, David E. ; Valle-Inclan, Jose Espejo ; Pajoro, Alice ; Rovenich, Hanna ; Thomma, Bart P.H.J. ; Faino, Luigi. / Long-read annotation : Automated eukaryotic genome annotation based on long-read cDNA sequencing. In: Plant Physiology. 2019 ; Vol. 179, No. 1. pp. 38-54.
@article{129fdcb285ae4490b904935d23307df1,
title = "Long-read annotation: Automated eukaryotic genome annotation based on long-read cDNA sequencing",
abstract = "Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short-and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.",
author = "Cook, {David E.} and Valle-Inclan, {Jose Espejo} and Alice Pajoro and Hanna Rovenich and Thomma, {Bart P.H.J.} and Luigi Faino",
year = "2019",
month = "1",
day = "1",
doi = "10.1104/pp.18.00848",
language = "English",
volume = "179",
pages = "38--54",
journal = "Plant Physiology",
issn = "0032-0889",
publisher = "American Society of Plant Biologists",
number = "1",

}

Long-read annotation : Automated eukaryotic genome annotation based on long-read cDNA sequencing. / Cook, David E.; Valle-Inclan, Jose Espejo; Pajoro, Alice; Rovenich, Hanna; Thomma, Bart P.H.J.; Faino, Luigi.

In: Plant Physiology, Vol. 179, No. 1, 01.01.2019, p. 38-54.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Long-read annotation

T2 - Automated eukaryotic genome annotation based on long-read cDNA sequencing

AU - Cook, David E.

AU - Valle-Inclan, Jose Espejo

AU - Pajoro, Alice

AU - Rovenich, Hanna

AU - Thomma, Bart P.H.J.

AU - Faino, Luigi

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short-and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.

AB - Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short-and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.

U2 - 10.1104/pp.18.00848

DO - 10.1104/pp.18.00848

M3 - Article

VL - 179

SP - 38

EP - 54

JO - Plant Physiology

JF - Plant Physiology

SN - 0032-0889

IS - 1

ER -