Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs

A.G. Heidema, E.J.M. Feskens, P.A.F.M. Doevendans, H.J.T. Ruven, H.C. Houwelingen, E.C.M. Mariman, J.M.A. Boer

Research output: Contribution to journalArticleAcademicpeer-review

31 Citations (Scopus)

Abstract

Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p
Original languageEnglish
Pages (from-to)910-921
JournalGenetic Epidemiology
Volume31
Issue number8
DOIs
Publication statusPublished - 2007

Fingerprint

Genetic Association Studies
Single Nucleotide Polymorphism
HDL Cholesterol
Multifactor Dimensionality Reduction
Sample Size
LDL Cholesterol
Cardiovascular Diseases
Population
Forests

Keywords

  • multifactor-dimensionality reduction
  • cholesterol determination
  • hdl-cholesterol
  • random forests
  • susceptibility
  • polymorphisms
  • heterogeneity
  • epistasis
  • disease
  • cancer

Cite this

Heidema, A. G., Feskens, E. J. M., Doevendans, P. A. F. M., Ruven, H. J. T., Houwelingen, H. C., Mariman, E. C. M., & Boer, J. M. A. (2007). Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs. Genetic Epidemiology, 31(8), 910-921. https://doi.org/10.1002/gepi.20251
Heidema, A.G. ; Feskens, E.J.M. ; Doevendans, P.A.F.M. ; Ruven, H.J.T. ; Houwelingen, H.C. ; Mariman, E.C.M. ; Boer, J.M.A. / Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs. In: Genetic Epidemiology. 2007 ; Vol. 31, No. 8. pp. 910-921.
@article{42c2f760a08046bd97d0f296936a72e6,
title = "Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs",
abstract = "Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p",
keywords = "multifactor-dimensionality reduction, cholesterol determination, hdl-cholesterol, random forests, susceptibility, polymorphisms, heterogeneity, epistasis, disease, cancer",
author = "A.G. Heidema and E.J.M. Feskens and P.A.F.M. Doevendans and H.J.T. Ruven and H.C. Houwelingen and E.C.M. Mariman and J.M.A. Boer",
year = "2007",
doi = "10.1002/gepi.20251",
language = "English",
volume = "31",
pages = "910--921",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley",
number = "8",

}

Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs. / Heidema, A.G.; Feskens, E.J.M.; Doevendans, P.A.F.M.; Ruven, H.J.T.; Houwelingen, H.C.; Mariman, E.C.M.; Boer, J.M.A.

In: Genetic Epidemiology, Vol. 31, No. 8, 2007, p. 910-921.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Analysis of multiple SNPs in genetic association studies: comparison of three multi-locus methods to prioritize and select SNPs

AU - Heidema, A.G.

AU - Feskens, E.J.M.

AU - Doevendans, P.A.F.M.

AU - Ruven, H.J.T.

AU - Houwelingen, H.C.

AU - Mariman, E.C.M.

AU - Boer, J.M.A.

PY - 2007

Y1 - 2007

N2 - Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p

AB - Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high-density lipoprotein (HDL)-cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL-cholesterol, n = 533) and non-cases (high HDL-cholesterol, n = 545) based on gender-specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP-629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p-values for the selected models were significant for the set association approach (p =.0019), random forests (p

KW - multifactor-dimensionality reduction

KW - cholesterol determination

KW - hdl-cholesterol

KW - random forests

KW - susceptibility

KW - polymorphisms

KW - heterogeneity

KW - epistasis

KW - disease

KW - cancer

U2 - 10.1002/gepi.20251

DO - 10.1002/gepi.20251

M3 - Article

VL - 31

SP - 910

EP - 921

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 8

ER -