TY - JOUR
T1 - Better, Not Just More
T2 - Data-centric machine learning for Earth observation
AU - Roscher, Ribana
AU - Russwurm, Marc
AU - Gevaert, Caroline
AU - Kampffmeyer, Michael
AU - Dos Santos, Jefersson A.
AU - Vakalopoulou, Maria
AU - Hansch, Ronny
AU - Hansen, Stine
AU - Nogueira, Keiller
AU - Prexl, Jonathan
AU - Tuia, Devis
PY - 2024
Y1 - 2024
N2 - Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on enduser applications. Furthermore, considering the entire machine learning cycle - from problem definition to model deployment with feedback - is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.
AB - Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on enduser applications. Furthermore, considering the entire machine learning cycle - from problem definition to model deployment with feedback - is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.
U2 - 10.1109/MGRS.2024.3470986
DO - 10.1109/MGRS.2024.3470986
M3 - Article
AN - SCOPUS:85208550164
SN - 2473-2397
VL - 12
SP - 335
EP - 355
JO - IEEE Geoscience and Remote Sensing Magazine
JF - IEEE Geoscience and Remote Sensing Magazine
IS - 4
ER -