Abstract
The analysis of historical biodiversity data is crucial to understand the changes the biosphere has undergone over the past decades and even centuries. Much of these historical data are locked up in field notes, lab books and museum collections. To make the best use of such data, the error rate during the digitisation process should be kept as low as possible, lest an additional source of errors (and thus uncertainty) enters the data before analysis.
The Wijster dataset, consisting of ground beetles (Carabidae) collected by Biological station Wijster (Drenthe, NL) is the longest running time series of terrestrial invertebrates in the world, and is ongoing since 1959. This huge dataset of close to 1 million specimens was never fully digitised and is not openly accessible.
With financial support from NLBIF, we have set out to digitise the first part of the dataset (1959-1967), working out best practices, and laying down benchmarks for the digitisation of the entire dataset. The goal of our project is to bring together all available (meta)data, create a reference dataset with an extremely low error rate (<0.01%) and explore the possibilities of automating the further digitisation process with the use of self-learning algorithms. Our poster presents our workflow for the close-to perfect digitization of historical data.
The Wijster dataset, consisting of ground beetles (Carabidae) collected by Biological station Wijster (Drenthe, NL) is the longest running time series of terrestrial invertebrates in the world, and is ongoing since 1959. This huge dataset of close to 1 million specimens was never fully digitised and is not openly accessible.
With financial support from NLBIF, we have set out to digitise the first part of the dataset (1959-1967), working out best practices, and laying down benchmarks for the digitisation of the entire dataset. The goal of our project is to bring together all available (meta)data, create a reference dataset with an extremely low error rate (<0.01%) and explore the possibilities of automating the further digitisation process with the use of self-learning algorithms. Our poster presents our workflow for the close-to perfect digitization of historical data.
Original language | English |
---|---|
DOIs | |
Publication status | Published - 3 May 2024 |
Event | EOSC Empowering Biodiversity Research III Conference (EBR III) - Naturalis Biodiversity Center, Leiden, Netherlands Duration: 25 Mar 2024 → 26 Mar 2024 |
Conference/symposium
Conference/symposium | EOSC Empowering Biodiversity Research III Conference (EBR III) |
---|---|
Country/Territory | Netherlands |
City | Leiden |
Period | 25/03/24 → 26/03/24 |