De novo assembly, functional annotation, and analysis of the giant reed (Arundo donax L.) leaf transcriptome provide tools for the development of a biofuel feedstock

Chiara Evangelistella, Alessio Valentini, Riccardo Ludovisi, Andrea Firrincieli, Francesco Fabbrini, Simone Scalabrin, Federica Cattonaro, Michele Morgante, Giuseppe Scarascia Mugnozza, Joost J.B. Keurentjes, Antoine Harfouche*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

23 Citations (Scopus)


Background: Arundo donax has attracted renewed interest as a potential candidate energy crop for use in biomass-to-liquid fuel conversion processes and biorefineries. This is due to its high productivity, adaptability to marginal land conditions, and suitability for biofuel and biomaterial production. Despite its importance, the genomic resources currently available for supporting the improvement of this species are still limited. Results: We used RNA sequencing (RNA-Seq) to de novo assemble and characterize the A. donax leaf transcriptome. The sequencing generated 1249 million clean reads that were assembled using single-k-mer and multi-k-mer approaches into 62,596 unique sequences (unitranscripts) with an N50 of 1134 bp. TransDecoder and Trinotate software suites were used to obtain putative coding sequences and annotate them by mapping to UniProtKB/Swiss-Prot and UniRef90 databases, searching for known transcripts, proteins, protein domains, and signal peptides. Furthermore, the unitranscripts were annotated by mapping them to the NCBI non-redundant, GO and KEGG pathway databases using Blast2GO. The transcriptome was also characterized by BLAST searches to investigate homologous transcripts of key genes involved in important metabolic pathways, such as lignin, cellulose, purine, and thiamine biosynthesis and carbon fixation. Moreover, a set of homologous transcripts of key genes involved in stomatal development and of genes coding for stress-associated proteins (SAPs) were identified. Additionally, 8364 simple sequence repeat (SSR) markers were identified and surveyed. SSRs appeared more abundant in non-coding regions (63.18%) than in coding regions (36.82%). This SSR dataset represents the first marker catalogue of A. donax. 53 SSRs (PolySSRs) were then predicted to be polymorphic between ecotype-specific assemblies, suggesting genetic variability in the studied ecotypes. Conclusions: This study provides the first publicly available leaf transcriptome for the A. donax bioenergy crop. The functional annotation and characterization of the transcriptome will be highly useful for providing insight into the molecular mechanisms underlying its extreme adaptability. The identification of homologous transcripts involved in key metabolic pathways offers a platform for directing future efforts in genetic improvement of this species. Finally, the identified SSRs will facilitate the harnessing of untapped genetic diversity. This transcriptome should be of value to ongoing functional genomics and genetic studies in this crop of paramount economic importance.

Original languageEnglish
Article number138
Number of pages24
JournalBiotechnology for Biofuels
Publication statusPublished - 2017


  • Arundo donax
  • Biofuel
  • Carbon fixation
  • De novo leaf transcriptome
  • Genic-SSRs
  • Phenylpropanoid
  • Purine and thiamine metabolism
  • RNA-Seq
  • SAPs
  • Stomata


Dive into the research topics of 'De novo assembly, functional annotation, and analysis of the giant reed (Arundo donax L.) leaf transcriptome provide tools for the development of a biofuel feedstock'. Together they form a unique fingerprint.

Cite this