Background - Automatic annotation of sequenced eukaryotic genomes integrates a combination of methodologies such as ab-initio methods and alignment of homologous genes and/or proteins. For example, annotation of the zebrafish genome within Ensembl relies heavily on available cDNA and protein sequences from two distantly related fish species and other vertebrates that have diverged several hundred million years ago. The scarcity of genomic information from other cyprinids provides the impetus to leverage EST collections to understand gene structures in this diverse teleost group. Results - We have generated 6,050 ESTs from the differentiating testis of common carp (Cyprinus carpio) and clustered them with 9,303 non-gonadal ESTs from CarpBase as well as 1,317 ESTs and 652 common carp mRNAs from GenBank. Over 28% of the resulting 8,663 unique transcripts are exclusively testis-derived ESTs. Moreover, 974 of these transcripts did not match any sequence in the zebrafish or fathead minnow EST collection. A total of 1,843 unique common carp sequences could be stringently mapped to the zebrafish genome (version 5), of which 1,752 matched coding sequences of zebrafish genes with or without potential splice variants. We show that 91 common carp transcripts map to intergenic and intronic regions on the zebrafish genome assembly and regions annotated with non-teleost sequences. Interestingly, an additional 42 common carp transcripts indicate the potential presence of new splicing variants not found in zebrafish databases so far. The fact that common carp transcripts help the identification or confirmation of these coding regions in zebrafish exemplifies the usefulness of sequences from closely related species for the annotation of model genomes. We also demonstrate that 5' UTR sequences of common carp and zebrafish orthologs share a significant level of similarity based on preservation of motif arrangements for as many as 10 ab-initio motifs. Conclusion - Our data show that there is sufficient homology between the transcribed sequences of common carp and zebrafish to warrant an even deeper cyprinid transcriptome comparison. On the other hand, the comparative analysis illustrates the value in utilizing partially sequenced transcriptomes to understand gene structure in this diverse teleost group. We highlight the need for integrated resources to leverage the wealth of fragmented genomic data.
- sequence tag alignment
- regulatory elements