Abstract
Estimates of the area of land cover classes or land change are frequently calculated from land cover classification maps by counting the pixels labeled as each class in the map. This procedure is known to produce biased estimates of area for many widely used classification algorithms, including random forests. Poststratification estimation using the mapped classes as strata has been proposed to obtain unbiased estimates of the class areas. Still, the method requires additional sampling units, which may not be available or be the most efficient method depending on the application. Alternatively, consistent estimates of class areas can be obtained using class membership probabilities estimates from a random forest classification. This article demonstrates that, for a large sample and proper set of explanatory variables, the error of the predicted class membership probabilities obtained from a random forest classification converges to zero. Therefore, the expected class areas calculated from these probabilities converge to the population class areas. On average, the relative error of the expected class proportions computed by class membership probabilities from a random forests model was 40% points lower than the proportions estimated by pixel counting. Our proposed approach is also comparable to the area-adjusted method, which is currently considered the best practice by the remote sensing community. We recommend that class probability estimates area always retained and used for calculating expected class areas or area proportions based on our results. Our method reduces bias compared to statistics calculated by pixel counting and circumvents the need for poststratification area estimates under certain conditions.
Original language | English |
---|---|
Article number | 4402711 |
Number of pages | 11 |
Journal | IEEE Transactions on Geoscience and Remote Sensing |
Volume | 60 |
Early online date | 9 Jun 2021 |
DOIs | |
Publication status | Published - 2022 |
Keywords
- Area estimation bias
- computational infrastructure
- optical data
- random forest and geographic information system (GIS)
- vegetation and land surface