Automated procedure for candidate compound selection in GCMS metabolomics based on prediction of Kovats retention index

V.V. Mihaleva, H.A. Verhoeven, C.H. de Vos, R.D. Hall, R.C.H.J. van Ham

Research output: Contribution to journalArticleAcademicpeer-review

25 Citations (Scopus)


Motivation: Matching both the retention index (RI) and the mass spectrum of an unknown compound against a mass spectral reference library provides strong evidence for a correct identification of that compound. Data on retention indices are, however, available for only a small fraction of the compounds in such libraries. We propose a quantitative structure - retention index model that enables the ranking and filtering of putative identifications of compounds for which the predicted RI falls outside a predefined window. Results: We constructed multiple linear regression and support vector regression (SVR) models using a set of descriptors obtained with a genetic algorithm as variable selection method. The SVR model is a significant improvement over previous models built for structurally diverse compounds as it covers a large range (360 to 4100) of RI values and gives better prediction of isomer compounds. The hit list reduction varied from 41% to 60% and depended on the size of the original hit list. Large hit lists were reduced to a greater extend compared to small hit lists.
Original languageEnglish
Pages (from-to)787-794
Issue number6
Publication statusPublished - 2009



  • volatile organic-compounds
  • variable selection
  • genetic algorithms
  • mass-spectrometry
  • descriptors
  • strategy
  • isomers
  • variety
  • cancer
  • time

Cite this