Currently most global land cover maps are produced with discrete classes, which express the dominant land cover class in each pixel, or a combination of several classes at a predetermined ratio. In contrast, land cover fraction mapping enables expressing the proportion of each pure class in each pixel, which increases precision and reduces legend complexity. To map land cover fractions, regression rather than classification algorithms are needed, and multiple approaches are available for this task.
A major challenge for land cover fraction mapping models is data sparsity. Land cover fraction data is by its nature zero-inflated due to how common the 0% fraction is. As regression favours the mean, 0% and 100% fractions are difficult for regression models to predict accurately. We proposed a new solution by combining three models: a binary model determines whether a pixel is pure; if so, it is processed using a classification model; otherwise with a regression model.
We compared multiple regression algorithms and implemented our proposed three-step model on the algorithm with the lowest RMSE. We further evaluated the spatial and per-class accuracy of the model and demonstrated a wall-to-wall prediction of seven land cover fractions over the globe. The models were trained on over 138,000 points and validated on a separate dataset of over 20,000 points, provided by the CGLS-LC100 project. Both datasets are global and aligned with the PROBA-V 100 m UTM grid.
Results showed that the random forest regression model reached the lowest RMSE of 17.3%. Lowest MAE (7.9%) and highest overall accuracy (72% ± 2%) was achieved using random forest with our proposed three-model approach and median vote.
This research proves that machine learning algorithms can be applied globally to map a wide variety of land cover fractions. Fraction mapping expresses land cover more precisely, and empowers users to create their own discrete maps using user-defined thresholds and rules, which enables customising the result for a diverse range of uses. The three-step approach is useful for addressing the zero-inflation issue and mapping 0% and 100% fractions more accurately, and thus has already been taken up in the operational production of global land cover fraction layers within the CGLS-LC100 project. Furthermore, this study contributes to the accuracy assessment of land cover fraction maps both thematically and spatially, and these methods could be taken up by future land cover fraction mapping efforts.
- Global land cover mapping
- Land cover fraction mapping
- Machine learning
- Neural network
- Random forest
- Spatial accuracy
- Support vector regression
- Time series analysis
- Zero inflation