Concordance of SNP-and allele-based typing workflows in the context of a large-scale international Salmonella enteritidis outbreak investigation

Claudia E. Coipan*, Timothy J. Dallman, Derek Brown, Hassan Hartman, Menno van der Voort, Redmar R. van den Berg, Daniel Palm, Saara Kotila, Tom van Wijk, Eelco Franz

*Corresponding author for this work

    Research output: Contribution to journalArticleAcademicpeer-review

    18 Citations (Scopus)

    Abstract

    A large European multi-country Salmonella enterica serovar Enteritidis outbreak associated with Polish eggs was character-ized by whole-genome sequencing (WGS)-based analysis, with various European institutes using different analysis workflows to identify isolates potentially related to the outbreak. The objective of our study was to compare the output of six of these different typing workflows (distance matrices of either SNP-based or allele-based workflows) in terms of cluster detection and concordance. To this end, we analysed a set of 180 isolates coming from confirmed and probable outbreak cases, which were representative of the genetic variation within the outbreak, supplemented with 22 unrelated contemporaneous S. enterica serovar Enteritidis isolates. Since the definition of a cluster cut-off based on genetic distance requires prior knowledge on the evolutionary processes that govern the bacterial populations in question, we used a variety of hierarchical clustering methods (single, average and complete) and selected the optimal number of clusters based on the consensus of the silhouette, Dunn2, and McClain–Rao internal validation indices. External validation was done by calculating the concordance with the WGS-based case definition (SNP-address) for this outbreak using the Fowlkes–Mallows index. Our analysis indicates that with complete-linkage hierarchical clustering combined with the optimal number of clusters, as defined by three internal validity indices, the six different allele-and SNP-based typing workflows generate clusters with similar compositions. Furthermore, we show that even in the absence of coordinated typing procedures, but by using an unsupervised machine learning methodology for cluster delineation, the various workflows that are currently in use by six European public-health authorities can identify concordant clusters of genetically related S. enterica serovar Enteritidis isolates; thus, providing public-health researchers with compara-ble tools for detection of infectious-disease outbreaks.

    Original languageEnglish
    Article number000318
    JournalMicrobial Genomics
    Volume6
    Issue number3
    DOIs
    Publication statusPublished - 26 Feb 2020

    Keywords

    • Epidemiology
    • Hierarchical clustering
    • Infectious disease
    • Surveillance
    • Unsupervised machine learning
    • Whole-genome sequencing

    Fingerprint

    Dive into the research topics of 'Concordance of SNP-and allele-based typing workflows in the context of a large-scale international Salmonella enteritidis outbreak investigation'. Together they form a unique fingerprint.

    Cite this