Understanding and identifying amino acid repeats

H. Luo, H. Nijveen

Research output: Contribution to journalArticleAcademicpeer-review

29 Citations (Scopus)

Abstract

Amino acid repeats (AARs) are abundant in protein sequences. They have particular roles in protein function and evolution. Simple repeat patterns generated by DNA slippage tend to introduce length variations and point mutations in repeat regions. Loss of normal and gain of abnormal function owing to their variable length are potential risks leading to diseases. Repeats with complex patterns mostly refer to the functional domain repeats, such as the well-known leucine-rich repeat and WD repeat, which are frequently involved in protein–protein interaction. They are mainly derived from internal gene duplication events and stabilized by ‘gate-keeper’ residues, which play crucial roles in preventing inter-domain aggregation. AARs are widely distributed in different proteomes across a variety of taxonomic ranges, and especially abundant in eukaryotic proteins. However, their specific evolutionary and functional scenarios are still poorly understood. Identifying AARs in protein sequences is the first step for the further investigation of their biological function and evolutionary mechanism. In principle, this is an NP-hard problem, as most of the repeat fragments are shaped by a series of sophisticated evolutionary events and become latent periodical patterns. It is not possible to define a uniform criterion for detecting and verifying various repeat patterns. Instead, different algorithms based on different strategies have been developed to cope with different repeat patterns. In this review, we attempt to describe the amino acid repeat-detection algorithms currently available and compare their strategies based on an in-depth analysis of the biological significance of protein repeats.
Original languageEnglish
Pages (from-to)582-591
JournalBriefings in Bioinformatics
Volume15
Issue number4
DOIs
Publication statusPublished - 2014

Keywords

  • low-complexity regions
  • protein homology detection
  • intragenic tandem repeats
  • hidden markov-models
  • statistical significance
  • biological sequences
  • transcription factors
  • scoring schemes
  • evolution
  • identification

Fingerprint Dive into the research topics of 'Understanding and identifying amino acid repeats'. Together they form a unique fingerprint.

  • Cite this