Pitfalls in the statistical analysis of microbiome amplicon sequencing data

Hendriek C. Boshuizen*, Dennis E. te Beest

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

6 Citations (Scopus)

Abstract

Microbiome data are characterized by several aspects that make them challenging to analyse statistically: they are compositional, high dimensional and rich in zeros. A large array of statistical methods exist to analyse these data. Some are borrowed from other fields, such as ecology or RNA-sequencing, while others are custom-made for microbiome data. The large range of available methods, and which is continuously expanding, means that researchers have to invest considerable effort in choosing what method(s) to apply. In this paper we list 14 statistical methods or approaches that we think should be generally avoided. In several cases this is because we believe the assumptions behind the method are unlikely to be met for microbiome data. In other cases we see methods that are used in ways they are not intended to be used. We believe researchers would be helped by more critical evaluations of existing methods, as not all methods in use are suitable or have been sufficiently reviewed. We hope this paper contributes to a critical discussion on what methods are appropriate to use in the analysis of microbiome data.

Original languageEnglish
Pages (from-to)539-548
JournalMolecular Ecology Resources
Volume23
Issue number3
Early online date4 Nov 2022
DOIs
Publication statusPublished - Apr 2023

Keywords

  • compositional data
  • microbiome
  • negative binomial regression
  • normalization
  • statistical methods

Fingerprint

Dive into the research topics of 'Pitfalls in the statistical analysis of microbiome amplicon sequencing data'. Together they form a unique fingerprint.

Cite this