TY - JOUR
T1 - Genome-wide BAC-end sequencing of Musa acuminata DH Pahang reveals further insights into the genome organization of banana
AU - Arnago, R.E.
AU - Togawa, R.C.
AU - Carpentier, S.C.
AU - te Lintel Hekkert, B.
AU - Kema, G.H.J.
AU - Souza, M.T.
PY - 2011
Y1 - 2011
N2 - Banana and plantain (Musa spp.) are grown in more than 120 countries in tropical and subtropical regions and constitute an important staple food for millions of people. A Musa acuminata ssp. malaccencis DH Pahang bacterial artificial chromosome (BAC) library (MAMB) was submitted for BAC-end sequencing. MAMB consists of 23,040 clones, with a 140-kbp average insert size, accounting for a five times coverage of the banana genome. A total of 46,080 reads were generated, and 42,750 (92.8%) high-quality sequences were obtained after trimming for vector and quality. Analysis of these data shows a GC content of 41.39%, whereas interspersed repeats comprise 32.3%. The most common repeated sequences found show homology to ribosomal RNA genes, particularly 18S rRNA, while the Ty3/gypsy type monkey retrotransposon is the most common retro element. The sequence data were used to generate a banana-specific repeat library containing 54 new repetitive elements which accounted for 11.86% of the total nucleotides. Simple sequence repeats represent 0.7% of the sequence data and allowed the identification of 2,455 potentially useful marker sites. Functional annotation identified 2,705 sequences that could code for proteins of known function. Microsynteny analysis shows a higher number of co-linear matches to Oryza sativa, in contrast to Arabidopsis thaliana. This database of BAC-end sequences is useful for the assembly of the complete banana genome sequence and is important for identification in functional genomics experiments
AB - Banana and plantain (Musa spp.) are grown in more than 120 countries in tropical and subtropical regions and constitute an important staple food for millions of people. A Musa acuminata ssp. malaccencis DH Pahang bacterial artificial chromosome (BAC) library (MAMB) was submitted for BAC-end sequencing. MAMB consists of 23,040 clones, with a 140-kbp average insert size, accounting for a five times coverage of the banana genome. A total of 46,080 reads were generated, and 42,750 (92.8%) high-quality sequences were obtained after trimming for vector and quality. Analysis of these data shows a GC content of 41.39%, whereas interspersed repeats comprise 32.3%. The most common repeated sequences found show homology to ribosomal RNA genes, particularly 18S rRNA, while the Ty3/gypsy type monkey retrotransposon is the most common retro element. The sequence data were used to generate a banana-specific repeat library containing 54 new repetitive elements which accounted for 11.86% of the total nucleotides. Simple sequence repeats represent 0.7% of the sequence data and allowed the identification of 2,455 potentially useful marker sites. Functional annotation identified 2,705 sequences that could code for proteins of known function. Microsynteny analysis shows a higher number of co-linear matches to Oryza sativa, in contrast to Arabidopsis thaliana. This database of BAC-end sequences is useful for the assembly of the complete banana genome sequence and is important for identification in functional genomics experiments
KW - rice genome
KW - arabidopsis-thaliana
KW - draft sequence
KW - nuclear genome
KW - dna
KW - database
KW - model
KW - identification
KW - localization
KW - proteome
U2 - 10.1007/s11295-011-0385-3
DO - 10.1007/s11295-011-0385-3
M3 - Article
SN - 1614-2942
VL - 7
SP - 933
EP - 940
JO - Tree Genetics and Genomes
JF - Tree Genetics and Genomes
IS - 5
ER -