Developing a discrimination rule between breast cancer patients and controls using proteomics mass spectrometric data: A three-step approach

A.G. Heidema, N. Nagelkerke

Research output: Contribution to journalArticleAcademicpeer-review

10 Citations (Scopus)

Abstract

To discriminate between breast cancer patients and controls, we used a three-step approach to obtain our decision rule. First, we ranked the mass/charge values using random forests, because it generates importance indices that take possible interactions into account. We observed that the top ranked variables consisted of highly correlated contiguous mass/charge values, which were grouped in the second step into new variables. Finally, these newly created variables were used as predictors to find a suitable discrimination rule. In this last step, we compared three different methods, namely Classification and Regression Tree ( CART), logistic regression and penalized logistic regression. Logistic regression and penalized logistic regression performed equally well and both had a higher classification accuracy than CART. The model obtained with penalized logistic regression was chosen as we hypothesized that this model would provide a better classification accuracy in the validation set. The solution had a good performance on the training set with a classification accuracy of 86.3%, and a sensitivity and specificity of 86.8% and 85.7%, respectively.
Original languageEnglish
Article number5
Number of pages9
JournalStatistical Applications in Genetics and Molecular Biology
Volume7
Issue number2
DOIs
Publication statusPublished - 2008

Keywords

  • random forest
  • classification
  • selection
  • bias

Fingerprint Dive into the research topics of 'Developing a discrimination rule between breast cancer patients and controls using proteomics mass spectrometric data: A three-step approach'. Together they form a unique fingerprint.

Cite this