TY - JOUR
T1 - Improving performance of spatio-temporal machine learning models using forward feature selection and target-oriented validation
AU - Meyer, Hanna
AU - Reudenbach, Christoph
AU - Hengl, Tomislav
AU - Katurji, Marwan
AU - Nauss, Thomas
PY - 2018/3
Y1 - 2018/3
N2 - Importance of target-oriented validation strategies for spatio-temporal prediction models is illustrated using two case studies: (1) modelling of air temperature (Tair) in Antarctica, and (2) modelling of volumetric water content (VW) for the R.J. Cook Agronomy Farm, USA. Performance of a random k-fold cross-validation (CV) was compared to three target-oriented strategies: Leave-Location-Out (LLO), Leave-Time-Out (LTO), and Leave-Location-and-Time-Out (LLTO) CV. Results indicate that considerable differences between random k-fold (R2 = 0.9 for Tair and 0.92 for VW) and target-oriented CV (LLO R2 = 0.24 for Tair and 0.49 for VW) exist, highlighting the need for target-oriented validation to avoid an overoptimistic view on models. Differences between random k-fold and target-oriented CV indicate spatial over-fitting caused by misleading variables. To decrease over-fitting, a forward feature selection in conjunction with target-oriented CV is proposed. It decreased over-fitting and simultaneously improved target-oriented performances (LLO CV R2 = 0.47 for Tair and 0.55 for VW).
AB - Importance of target-oriented validation strategies for spatio-temporal prediction models is illustrated using two case studies: (1) modelling of air temperature (Tair) in Antarctica, and (2) modelling of volumetric water content (VW) for the R.J. Cook Agronomy Farm, USA. Performance of a random k-fold cross-validation (CV) was compared to three target-oriented strategies: Leave-Location-Out (LLO), Leave-Time-Out (LTO), and Leave-Location-and-Time-Out (LLTO) CV. Results indicate that considerable differences between random k-fold (R2 = 0.9 for Tair and 0.92 for VW) and target-oriented CV (LLO R2 = 0.24 for Tair and 0.49 for VW) exist, highlighting the need for target-oriented validation to avoid an overoptimistic view on models. Differences between random k-fold and target-oriented CV indicate spatial over-fitting caused by misleading variables. To decrease over-fitting, a forward feature selection in conjunction with target-oriented CV is proposed. It decreased over-fitting and simultaneously improved target-oriented performances (LLO CV R2 = 0.47 for Tair and 0.55 for VW).
KW - Cross-validation
KW - Feature selection
KW - Over-fitting
KW - Random forest
KW - Spatio-temporal
KW - Target-oriented validation
U2 - 10.1016/j.envsoft.2017.12.001
DO - 10.1016/j.envsoft.2017.12.001
M3 - Article
AN - SCOPUS:85038035153
SN - 1364-8152
VL - 101
SP - 1
EP - 9
JO - Environmental Modelling and Software
JF - Environmental Modelling and Software
ER -