Genome-wide BAC-end sequencing of Musa acuminata DH Pahang reveals further insights into the genome organization of banana

R.E. Arnago, R.C. Togawa, S.C. Carpentier, B. te Lintel Hekkert, G.H.J. Kema, M.T. Souza

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)

Abstract

Banana and plantain (Musa spp.) are grown in more than 120 countries in tropical and subtropical regions and constitute an important staple food for millions of people. A Musa acuminata ssp. malaccencis DH Pahang bacterial artificial chromosome (BAC) library (MAMB) was submitted for BAC-end sequencing. MAMB consists of 23,040 clones, with a 140-kbp average insert size, accounting for a five times coverage of the banana genome. A total of 46,080 reads were generated, and 42,750 (92.8%) high-quality sequences were obtained after trimming for vector and quality. Analysis of these data shows a GC content of 41.39%, whereas interspersed repeats comprise 32.3%. The most common repeated sequences found show homology to ribosomal RNA genes, particularly 18S rRNA, while the Ty3/gypsy type monkey retrotransposon is the most common retro element. The sequence data were used to generate a banana-specific repeat library containing 54 new repetitive elements which accounted for 11.86% of the total nucleotides. Simple sequence repeats represent 0.7% of the sequence data and allowed the identification of 2,455 potentially useful marker sites. Functional annotation identified 2,705 sequences that could code for proteins of known function. Microsynteny analysis shows a higher number of co-linear matches to Oryza sativa, in contrast to Arabidopsis thaliana. This database of BAC-end sequences is useful for the assembly of the complete banana genome sequence and is important for identification in functional genomics experiments
Original languageEnglish
Pages (from-to)933-940
JournalTree Genetics and Genomes
Volume7
Issue number5
DOIs
Publication statusPublished - 2011

    Fingerprint

Keywords

  • rice genome
  • arabidopsis-thaliana
  • draft sequence
  • nuclear genome
  • dna
  • database
  • model
  • identification
  • localization
  • proteome

Cite this