TY - JOUR
T1 - Relationships between geo-spatial features and COVID-19 hospitalisations revealed by machine learning models and SHAP values
AU - Chu, Lixia
AU - Nelen, Jeroen
AU - Crivellari, Alessandro
AU - Masiliūnas, Dainius
AU - Hein, Carola
AU - Lofi, Christoph
PY - 2024/5/27
Y1 - 2024/5/27
N2 - Uncovering relationships between geospatial features and COVID-19 features is a comprehensive, confounding, cross-disciplinary and challenging topic, as the spread and effects of COVID-19 are related to many aspects of our lives, including socio-economic, cultural, and environmental features. Our research aims to provide an innovative data-driven method to uncover the relationships between the heterogeneous and cross-disciplinary geospatial features with COVID-19 features at the municipality scale in Germany. We exploit these relationships using supervised machine learning, explainable AI and spatial analysis in Germany from March 2020 to October 2021. First, we integrated multi-source data including social data, economic data, cultural data, air pollution data and COVID-19 features data into one spatiotemporally harmonised dataset. Second, we trained three machine learning models (a Support Vector Regressor, a Random Forest, and a Light Gradient Boosting Machine) on the integrated dataset to learn the relationships between the spatial features and the COVID-19 features. Third, we used Shapley Additive exPlanations (SHAP) to rank the relevance of each feature. After that, we illustrated the results by the visualised spatial differences within municipalities. The output delivers key information regarding the Covid hospitalisation rate with the control of NO2 concentration and education level in Germany with transferable methods.
AB - Uncovering relationships between geospatial features and COVID-19 features is a comprehensive, confounding, cross-disciplinary and challenging topic, as the spread and effects of COVID-19 are related to many aspects of our lives, including socio-economic, cultural, and environmental features. Our research aims to provide an innovative data-driven method to uncover the relationships between the heterogeneous and cross-disciplinary geospatial features with COVID-19 features at the municipality scale in Germany. We exploit these relationships using supervised machine learning, explainable AI and spatial analysis in Germany from March 2020 to October 2021. First, we integrated multi-source data including social data, economic data, cultural data, air pollution data and COVID-19 features data into one spatiotemporally harmonised dataset. Second, we trained three machine learning models (a Support Vector Regressor, a Random Forest, and a Light Gradient Boosting Machine) on the integrated dataset to learn the relationships between the spatial features and the COVID-19 features. Third, we used Shapley Additive exPlanations (SHAP) to rank the relevance of each feature. After that, we illustrated the results by the visualised spatial differences within municipalities. The output delivers key information regarding the Covid hospitalisation rate with the control of NO2 concentration and education level in Germany with transferable methods.
U2 - 10.1080/17538947.2024.2358851
DO - 10.1080/17538947.2024.2358851
M3 - Article
SN - 1753-8947
VL - 17
JO - International Journal of Digital Earth
JF - International Journal of Digital Earth
IS - 1
M1 - 2358851
ER -