Caretta – A multiple protein structure alignment and feature extraction suite

Mehmet Akdel, Janani Durairaj, Dick de Ridder, Aalt D.J. van Dijk*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta's performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.

Original languageEnglish
Pages (from-to)981-992
Number of pages12
JournalComputational and Structural Biotechnology Journal
Volume18
DOIs
Publication statusPublished - 2020

Keywords

  • Dynamic programming
  • Machine learning
  • Protein structure
  • Structure alignment

Fingerprint Dive into the research topics of 'Caretta – A multiple protein structure alignment and feature extraction suite'. Together they form a unique fingerprint.

  • Cite this