TY - JOUR
T1 - Concordance of SNP-and allele-based typing workflows in the context of a large-scale international Salmonella enteritidis outbreak investigation
AU - Coipan, Claudia E.
AU - Dallman, Timothy J.
AU - Brown, Derek
AU - Hartman, Hassan
AU - van der Voort, Menno
AU - van den Berg, Redmar R.
AU - Palm, Daniel
AU - Kotila, Saara
AU - van Wijk, Tom
AU - Franz, Eelco
PY - 2020/2/26
Y1 - 2020/2/26
N2 - A large European multi-country Salmonella enterica serovar Enteritidis outbreak associated with Polish eggs was character-ized by whole-genome sequencing (WGS)-based analysis, with various European institutes using different analysis workflows to identify isolates potentially related to the outbreak. The objective of our study was to compare the output of six of these different typing workflows (distance matrices of either SNP-based or allele-based workflows) in terms of cluster detection and concordance. To this end, we analysed a set of 180 isolates coming from confirmed and probable outbreak cases, which were representative of the genetic variation within the outbreak, supplemented with 22 unrelated contemporaneous S. enterica serovar Enteritidis isolates. Since the definition of a cluster cut-off based on genetic distance requires prior knowledge on the evolutionary processes that govern the bacterial populations in question, we used a variety of hierarchical clustering methods (single, average and complete) and selected the optimal number of clusters based on the consensus of the silhouette, Dunn2, and McClain–Rao internal validation indices. External validation was done by calculating the concordance with the WGS-based case definition (SNP-address) for this outbreak using the Fowlkes–Mallows index. Our analysis indicates that with complete-linkage hierarchical clustering combined with the optimal number of clusters, as defined by three internal validity indices, the six different allele-and SNP-based typing workflows generate clusters with similar compositions. Furthermore, we show that even in the absence of coordinated typing procedures, but by using an unsupervised machine learning methodology for cluster delineation, the various workflows that are currently in use by six European public-health authorities can identify concordant clusters of genetically related S. enterica serovar Enteritidis isolates; thus, providing public-health researchers with compara-ble tools for detection of infectious-disease outbreaks.
AB - A large European multi-country Salmonella enterica serovar Enteritidis outbreak associated with Polish eggs was character-ized by whole-genome sequencing (WGS)-based analysis, with various European institutes using different analysis workflows to identify isolates potentially related to the outbreak. The objective of our study was to compare the output of six of these different typing workflows (distance matrices of either SNP-based or allele-based workflows) in terms of cluster detection and concordance. To this end, we analysed a set of 180 isolates coming from confirmed and probable outbreak cases, which were representative of the genetic variation within the outbreak, supplemented with 22 unrelated contemporaneous S. enterica serovar Enteritidis isolates. Since the definition of a cluster cut-off based on genetic distance requires prior knowledge on the evolutionary processes that govern the bacterial populations in question, we used a variety of hierarchical clustering methods (single, average and complete) and selected the optimal number of clusters based on the consensus of the silhouette, Dunn2, and McClain–Rao internal validation indices. External validation was done by calculating the concordance with the WGS-based case definition (SNP-address) for this outbreak using the Fowlkes–Mallows index. Our analysis indicates that with complete-linkage hierarchical clustering combined with the optimal number of clusters, as defined by three internal validity indices, the six different allele-and SNP-based typing workflows generate clusters with similar compositions. Furthermore, we show that even in the absence of coordinated typing procedures, but by using an unsupervised machine learning methodology for cluster delineation, the various workflows that are currently in use by six European public-health authorities can identify concordant clusters of genetically related S. enterica serovar Enteritidis isolates; thus, providing public-health researchers with compara-ble tools for detection of infectious-disease outbreaks.
KW - Epidemiology
KW - Hierarchical clustering
KW - Infectious disease
KW - Surveillance
KW - Unsupervised machine learning
KW - Whole-genome sequencing
U2 - 10.1099/mgen.0.000318
DO - 10.1099/mgen.0.000318
M3 - Article
C2 - 32101514
AN - SCOPUS:85082779755
SN - 2057-5858
VL - 6
JO - Microbial Genomics
JF - Microbial Genomics
IS - 3
M1 - 000318
ER -