reGenotyper: Detecting mislabeled samples in genetic data

Konrad Zych, Basten L. Snoek, Mark Elvin, Miriam Rodriguez, K.J. Van Der Velde, Danny Arends, Harm-Jan Westra, Morris A. Swertz, Gino Poulin, Jan E. Kammenga, Rainer Breitling, Ritsert C. Jansen, Yang Li

Research output: Contribution to journalArticleAcademicpeer-review

2 Citations (Scopus)

Abstract

In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the “ideal” genotype and identify “best-matched” labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a “data cleaning” step before standard data analysis.
Original languageEnglish
Article numbere0171324
JournalPLoS ONE
Volume12
Issue number2
DOIs
Publication statusPublished - 2017

Fingerprint

Labels
Genotype
Cleaning
genotype
Throughput
phenotypic variation
sampling
cleaning
data analysis
Datasets

Cite this

Zych, K., Snoek, B. L., Elvin, M., Rodriguez, M., Van Der Velde, K. J., Arends, D., ... Li, Y. (2017). reGenotyper: Detecting mislabeled samples in genetic data. PLoS ONE, 12(2), [e0171324]. https://doi.org/10.1371/journal.pone.0171324
Zych, Konrad ; Snoek, Basten L. ; Elvin, Mark ; Rodriguez, Miriam ; Van Der Velde, K.J. ; Arends, Danny ; Westra, Harm-Jan ; Swertz, Morris A. ; Poulin, Gino ; Kammenga, Jan E. ; Breitling, Rainer ; Jansen, Ritsert C. ; Li, Yang. / reGenotyper: Detecting mislabeled samples in genetic data. In: PLoS ONE. 2017 ; Vol. 12, No. 2.
@article{d63935b1cc8b46b3899ad0ee9418d2a4,
title = "reGenotyper: Detecting mislabeled samples in genetic data",
abstract = "In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the “ideal” genotype and identify “best-matched” labels for mislabeled samples. On average, we identified 4{\%} of samples as mislabeled in eight published datasets, highlighting the necessity of applying a “data cleaning” step before standard data analysis.",
author = "Konrad Zych and Snoek, {Basten L.} and Mark Elvin and Miriam Rodriguez and {Van Der Velde}, K.J. and Danny Arends and Harm-Jan Westra and Swertz, {Morris A.} and Gino Poulin and Kammenga, {Jan E.} and Rainer Breitling and Jansen, {Ritsert C.} and Yang Li",
year = "2017",
doi = "10.1371/journal.pone.0171324",
language = "English",
volume = "12",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "2",

}

Zych, K, Snoek, BL, Elvin, M, Rodriguez, M, Van Der Velde, KJ, Arends, D, Westra, H-J, Swertz, MA, Poulin, G, Kammenga, JE, Breitling, R, Jansen, RC & Li, Y 2017, 'reGenotyper: Detecting mislabeled samples in genetic data', PLoS ONE, vol. 12, no. 2, e0171324. https://doi.org/10.1371/journal.pone.0171324

reGenotyper: Detecting mislabeled samples in genetic data. / Zych, Konrad; Snoek, Basten L.; Elvin, Mark; Rodriguez, Miriam; Van Der Velde, K.J.; Arends, Danny; Westra, Harm-Jan; Swertz, Morris A.; Poulin, Gino; Kammenga, Jan E.; Breitling, Rainer; Jansen, Ritsert C.; Li, Yang.

In: PLoS ONE, Vol. 12, No. 2, e0171324, 2017.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - reGenotyper: Detecting mislabeled samples in genetic data

AU - Zych, Konrad

AU - Snoek, Basten L.

AU - Elvin, Mark

AU - Rodriguez, Miriam

AU - Van Der Velde, K.J.

AU - Arends, Danny

AU - Westra, Harm-Jan

AU - Swertz, Morris A.

AU - Poulin, Gino

AU - Kammenga, Jan E.

AU - Breitling, Rainer

AU - Jansen, Ritsert C.

AU - Li, Yang

PY - 2017

Y1 - 2017

N2 - In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the “ideal” genotype and identify “best-matched” labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a “data cleaning” step before standard data analysis.

AB - In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the “ideal” genotype and identify “best-matched” labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a “data cleaning” step before standard data analysis.

U2 - 10.1371/journal.pone.0171324

DO - 10.1371/journal.pone.0171324

M3 - Article

VL - 12

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 2

M1 - e0171324

ER -

Zych K, Snoek BL, Elvin M, Rodriguez M, Van Der Velde KJ, Arends D et al. reGenotyper: Detecting mislabeled samples in genetic data. PLoS ONE. 2017;12(2). e0171324. https://doi.org/10.1371/journal.pone.0171324