Advancing Data Quality Assurance with Machine Learning: A Case Study on Wind Vane Stalling Detection

Vincent S. de Feiter*, Jessica M.I. Strickland, Irene Garcia-Marti

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

High-quality observational datasets are essential for climate research and models, but validating and filtering decades of meteorological measurements is an enormous task. Advances in machine learning provide opportunities to expedite and improve quality control while offering insight into non-linear interactions between the meteorological variables. The Cabauw Experimental Site for Atmospheric Research in the Netherlands, known for its 213 m observation mast, has provided in situ observations for over 50 years. Despite high-quality instrumentation, measurement errors or non-representative data are inevitable. We explore machine-learning-assisted quality control, focusing on wind vane stalling at 10 m height. Wind vane stalling is treated as a binary classification problem as we evaluate five supervised methods (Logistic Regression, K-Nearest Neighbour, Random Forest, Gaussian Naive Bayes, Support Vector Machine) and one semi-supervised method (One-Class Support Vector Machine). Our analysis determines that wind vane stalling occurred 4.54% of the time annually over 20 years, often during stably stratified nocturnal conditions. The K-Nearest Neighbour and Random Forest methods performed the best, identifying stalling with approximately 75% accuracy, while others were more affected by data imbalance (more non-stalling than stalling data points). The semi-supervised method, avoiding the effects of the inherent data imbalance, also yielded promising results towards advancing data quality assurance.
Original languageEnglish
Article number129
JournalAtmosphere
Volume16
Issue number2
DOIs
Publication statusPublished - 25 Jan 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 13 - Climate Action
    SDG 13 Climate Action

Fingerprint

Dive into the research topics of 'Advancing Data Quality Assurance with Machine Learning: A Case Study on Wind Vane Stalling Detection'. Together they form a unique fingerprint.

Cite this