Semi-automated sequence curation for reliable reference datasets in ITS2 vascular plant DNA (meta-) barcoding

Andreia Quaresma, Markus J. Ankenbrand, Carlos Ariel Yadró Garcia, José Rufino, Mónica Honrado, Joana Amaral, Robert Brodschneider, Valters Brusbardis, Kristina Gratzer, Fani Hatjina, Ole Kilpinen, Marco Pietropaoli, I. Roessink, Jozef van der Steen, Flemming Vejsnaes, Maria Alice Pinto, Alexander Keller*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

One of the most critical steps for accurate taxonomic identification in DNA (meta)-barcoding is to have an accurate DNA reference sequence dataset for the marker of choice. Therefore, developing such a dataset has been a long-term ambition, especially in the Viridiplantae kingdom. Typically, reference datasets are constructed with sequences downloaded from general public databases, which can carry taxonomic and other relevant errors. Herein, we constructed a curated (i) global dataset, (ii) European crop dataset, and (iii) 27 datasets for the EU countries for the ITS2 barcoding marker of vascular plants. To that end, we first developed a pipeline script that entails (i) an automated curation stage comprising five filters, (ii) manual taxonomic correction for misclassified taxa, and (iii) manual addition of newly sequenced species. The pipeline allows easy updating of the curated datasets. With this approach, 13% of the sequences, corresponding to 7% of species originally imported from GenBank, were discarded. Further, 259 sequences were manually added to the curated global dataset, which now comprises 307,977 sequences of 111,382 plant species.
Original languageEnglish
Article number129
JournalScientific Data
Volume11
DOIs
Publication statusPublished - 25 Jan 2024

Keywords

  • Sequence curation
  • ITS2 vascular plant DNA
  • meta barcoding

Fingerprint

Dive into the research topics of 'Semi-automated sequence curation for reliable reference datasets in ITS2 vascular plant DNA (meta-) barcoding'. Together they form a unique fingerprint.
  • ITS2 Global database

    Quaresma, A. (Creator), Amaral, J. (Creator), Brodschneider, R. (Creator), Brusbardis, V. (Creator), Gratzer, K. (Creator), Hatjina, F. (Creator), Honrado, M. (Creator), Kilpinen, O. (Creator), Pietropaoli, M. (Creator), Roessink, I. (Creator), Rufino, J. (Creator), van der Steen, J. (Creator), Vejsnaes, F. (Creator), Yadró Garcia, C. A. (Creator), Pinto, M. A. (Creator) & Keller, A. (Creator), Instituto Politécnico de Bragança, 24 May 2023

    Dataset

  • ITS2 Crop database

    Quaresma, A. (Creator), Amaral, J. (Creator), Brodschneider, R. (Creator), Brusbardis, V. (Creator), Gratzer, K. (Creator), Hatjina, F. (Creator), Honrado, M. (Creator), Kilpinen, O. (Creator), Pietropaoli, M. (Creator), Roessink, I. (Creator), Rufino, J. (Creator), van der Steen, J. (Creator), Vejsnaes, F. (Creator), Yadró Garcia, C. A. (Creator), Pinto, M. A. (Creator) & Keller, A. (Creator), Instituto Politécnico de Bragança, 25 May 2023

    Dataset

  • ITS2 European countries

    Quaresma, A. (Creator), Amaral, J. (Creator), Brodschneider, R. (Creator), Brusbardis, V. (Creator), Gratzer, K. (Creator), Hatjina, F. (Creator), Honrado, M. (Creator), Kilpinen, O. (Creator), Pietropaoli, M. (Creator), Roessink, I. (Creator), Rufino, J. (Creator), van der Steen, J. (Creator), Vejsnaes, F. (Creator), Yadró Garcia, C. A. (Creator), Pinto, M. A. (Creator) & Keller, A. (Creator), Instituto Politécnico de Bragança, 25 May 2023

    Dataset

Cite this