Abstract
Motivation: Transcription factor interactions are the cornerstone of
combinatorial control, which is a crucial aspect of the gene
regulatory system. Understanding and predicting transcription
factor interactions based on their sequence alone is difficult since
they are often part of families of factors sharing high sequence
identity. Given the scarcity of experimental data on interactions
compared to available sequence data, however, it would be most
useful to have accurate methods for the prediction of such
interactions.
Results: We present a method consisting of a Random Forestbased
feature-selection procedure that selects relevant motifs out of
a set found using a correlated motif search algorithm. Prediction
accuracy for several transcription factor families (bZIP, MADS,
homeobox and forkhead) reaches 60¿90%. In addition, we identified
those parts of the sequence that are important for the interaction
specificity, and show that these are in agreement with available data.
We also used the predictors to perform genome-wide scans for
interaction partners and recovered both known and putative new
interaction partners
Original language | English |
---|---|
Pages (from-to) | 26-33 |
Journal | Bioinformatics |
Volume | 24 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2008 |
Keywords
- protein-protein interactions
- regulatory networks
- interaction datasets
- motif pairs
- complexes
- evolution
- database
- dna
- classification
- signatures