Abstract
Motivation: Matching both the retention index (RI) and the mass spectrum of an unknown compound against a mass spectral reference library provides strong evidence for a correct identification of that compound. Data on retention indices are, however, available for only a small fraction of the compounds in such libraries. We propose a quantitative structure - retention index model that enables the ranking and filtering of putative identifications of compounds for which the predicted RI falls outside a predefined window.
Results: We constructed multiple linear regression and support vector regression (SVR) models using a set of descriptors obtained with a genetic algorithm as variable selection method. The SVR model is a significant improvement over previous models built for structurally diverse compounds as it covers a large range (360 to 4100) of RI values and gives better prediction of isomer compounds. The hit list reduction varied from 41% to 60% and depended on the size of the original hit list. Large hit lists were reduced to a greater extend compared to small hit lists.
| Original language | English |
|---|---|
| Pages (from-to) | 787-794 |
| Journal | Bioinformatics |
| Volume | 25 |
| Issue number | 6 |
| DOIs | |
| Publication status | Published - 2009 |
Keywords
- volatile organic-compounds
- variable selection
- genetic algorithms
- mass-spectrometry
- descriptors
- strategy
- isomers
- variety
- cancer
- time