Accounting for access costs in validation of soil maps: A comparison of design-based sampling strategies

Lin Yang, Dick J. Brus, A.X. Zhu, Xinming Li, Jingjing Shi

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)

Abstract

The quality of soil maps can best be estimated by collecting additional data at locations selected by probability sampling. These data can be used in design-based estimation of map quality measures such as the population mean of the squared prediction errors (MSE) for continuous soil maps and overall accuracy for categorical soil maps. In areas with large differences in access costs it can be attractive to account for these differences in selecting validation locations. In this paper two types of sampling design are compared that take access costs into account: sampling with probabilities proportional to size (pps) and stratified simple random sampling (STSI). In pps the inverse of the square root of the access costs is used as a size variable. Two estimators of MSE are applied, the Hansen-Hurwitz and Hajek estimator. In STSI optimal strata are constructed based on access costs. Simple random sampling (SI) is taken as a reference design. The sampling strategies were compared on the basis of: 1) the variance of the estimated MSE; 2) the variance of the total pointwise access costs; 3) the 95-percentile of the sampling distribution of the total access costs. The comparison was done at equal expected total pointwise access costs. The sampling strategies were compared in a simulation study and a real-world case study in Anhui, China. In the case study car travel and hiking costs were considered in computing access costs per point. The results showed that the variance of estimated MSE with pps(Hansen-Hurwitz) was larger than with pps(Hajek) and STSI. The variances of estimated MSE of pps(Hajek) and STSI were about equal and smaller than that of SI. The gain in precision compared to SI depends on the cost distribution. The larger the coefficient of variation of the costs, the larger the gain. The 95 percentile of the sampling distribution of the total pointwise access costs with STSI was smaller than with pps and SI. The gain in precision of pps(Hajek) and STSI was about 30% accounting for hiking costs only, and about 10% accounting for the sum of car travel and hiking costs in the case study. The proposed sampling strategies are of interest for surveying any soil property in areas with marked differences in access costs, not just for validation of soil maps.
LanguageEnglish
Pages160-169
JournalGeoderma
Volume315
DOIs
Publication statusPublished - 1 Apr 2018

Fingerprint

sampling
cost
hiking
prediction
comparison
soil map
case studies
travel
automobile
China
surveying
soil properties
soil property

Keywords

  • Digital soil mapping
  • Mean squared error
  • Optimal stratification
  • Probability sampling
  • Sampling with probabilities-proportional-to-size
  • Stratified random sampling

Cite this

Yang, Lin ; Brus, Dick J. ; Zhu, A.X. ; Li, Xinming ; Shi, Jingjing. / Accounting for access costs in validation of soil maps : A comparison of design-based sampling strategies. In: Geoderma. 2018 ; Vol. 315. pp. 160-169.
@article{9b42bbb0e55c46fb8231dc086404c45f,
title = "Accounting for access costs in validation of soil maps: A comparison of design-based sampling strategies",
abstract = "The quality of soil maps can best be estimated by collecting additional data at locations selected by probability sampling. These data can be used in design-based estimation of map quality measures such as the population mean of the squared prediction errors (MSE) for continuous soil maps and overall accuracy for categorical soil maps. In areas with large differences in access costs it can be attractive to account for these differences in selecting validation locations. In this paper two types of sampling design are compared that take access costs into account: sampling with probabilities proportional to size (pps) and stratified simple random sampling (STSI). In pps the inverse of the square root of the access costs is used as a size variable. Two estimators of MSE are applied, the Hansen-Hurwitz and Hajek estimator. In STSI optimal strata are constructed based on access costs. Simple random sampling (SI) is taken as a reference design. The sampling strategies were compared on the basis of: 1) the variance of the estimated MSE; 2) the variance of the total pointwise access costs; 3) the 95-percentile of the sampling distribution of the total access costs. The comparison was done at equal expected total pointwise access costs. The sampling strategies were compared in a simulation study and a real-world case study in Anhui, China. In the case study car travel and hiking costs were considered in computing access costs per point. The results showed that the variance of estimated MSE with pps(Hansen-Hurwitz) was larger than with pps(Hajek) and STSI. The variances of estimated MSE of pps(Hajek) and STSI were about equal and smaller than that of SI. The gain in precision compared to SI depends on the cost distribution. The larger the coefficient of variation of the costs, the larger the gain. The 95 percentile of the sampling distribution of the total pointwise access costs with STSI was smaller than with pps and SI. The gain in precision of pps(Hajek) and STSI was about 30{\%} accounting for hiking costs only, and about 10{\%} accounting for the sum of car travel and hiking costs in the case study. The proposed sampling strategies are of interest for surveying any soil property in areas with marked differences in access costs, not just for validation of soil maps.",
keywords = "Digital soil mapping, Mean squared error, Optimal stratification, Probability sampling, Sampling with probabilities-proportional-to-size, Stratified random sampling",
author = "Lin Yang and Brus, {Dick J.} and A.X. Zhu and Xinming Li and Jingjing Shi",
year = "2018",
month = "4",
day = "1",
doi = "10.1016/j.geoderma.2017.11.028",
language = "English",
volume = "315",
pages = "160--169",
journal = "Geoderma",
issn = "0016-7061",
publisher = "Elsevier",

}

Accounting for access costs in validation of soil maps : A comparison of design-based sampling strategies. / Yang, Lin; Brus, Dick J.; Zhu, A.X.; Li, Xinming; Shi, Jingjing.

In: Geoderma, Vol. 315, 01.04.2018, p. 160-169.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Accounting for access costs in validation of soil maps

T2 - Geoderma

AU - Yang, Lin

AU - Brus, Dick J.

AU - Zhu, A.X.

AU - Li, Xinming

AU - Shi, Jingjing

PY - 2018/4/1

Y1 - 2018/4/1

N2 - The quality of soil maps can best be estimated by collecting additional data at locations selected by probability sampling. These data can be used in design-based estimation of map quality measures such as the population mean of the squared prediction errors (MSE) for continuous soil maps and overall accuracy for categorical soil maps. In areas with large differences in access costs it can be attractive to account for these differences in selecting validation locations. In this paper two types of sampling design are compared that take access costs into account: sampling with probabilities proportional to size (pps) and stratified simple random sampling (STSI). In pps the inverse of the square root of the access costs is used as a size variable. Two estimators of MSE are applied, the Hansen-Hurwitz and Hajek estimator. In STSI optimal strata are constructed based on access costs. Simple random sampling (SI) is taken as a reference design. The sampling strategies were compared on the basis of: 1) the variance of the estimated MSE; 2) the variance of the total pointwise access costs; 3) the 95-percentile of the sampling distribution of the total access costs. The comparison was done at equal expected total pointwise access costs. The sampling strategies were compared in a simulation study and a real-world case study in Anhui, China. In the case study car travel and hiking costs were considered in computing access costs per point. The results showed that the variance of estimated MSE with pps(Hansen-Hurwitz) was larger than with pps(Hajek) and STSI. The variances of estimated MSE of pps(Hajek) and STSI were about equal and smaller than that of SI. The gain in precision compared to SI depends on the cost distribution. The larger the coefficient of variation of the costs, the larger the gain. The 95 percentile of the sampling distribution of the total pointwise access costs with STSI was smaller than with pps and SI. The gain in precision of pps(Hajek) and STSI was about 30% accounting for hiking costs only, and about 10% accounting for the sum of car travel and hiking costs in the case study. The proposed sampling strategies are of interest for surveying any soil property in areas with marked differences in access costs, not just for validation of soil maps.

AB - The quality of soil maps can best be estimated by collecting additional data at locations selected by probability sampling. These data can be used in design-based estimation of map quality measures such as the population mean of the squared prediction errors (MSE) for continuous soil maps and overall accuracy for categorical soil maps. In areas with large differences in access costs it can be attractive to account for these differences in selecting validation locations. In this paper two types of sampling design are compared that take access costs into account: sampling with probabilities proportional to size (pps) and stratified simple random sampling (STSI). In pps the inverse of the square root of the access costs is used as a size variable. Two estimators of MSE are applied, the Hansen-Hurwitz and Hajek estimator. In STSI optimal strata are constructed based on access costs. Simple random sampling (SI) is taken as a reference design. The sampling strategies were compared on the basis of: 1) the variance of the estimated MSE; 2) the variance of the total pointwise access costs; 3) the 95-percentile of the sampling distribution of the total access costs. The comparison was done at equal expected total pointwise access costs. The sampling strategies were compared in a simulation study and a real-world case study in Anhui, China. In the case study car travel and hiking costs were considered in computing access costs per point. The results showed that the variance of estimated MSE with pps(Hansen-Hurwitz) was larger than with pps(Hajek) and STSI. The variances of estimated MSE of pps(Hajek) and STSI were about equal and smaller than that of SI. The gain in precision compared to SI depends on the cost distribution. The larger the coefficient of variation of the costs, the larger the gain. The 95 percentile of the sampling distribution of the total pointwise access costs with STSI was smaller than with pps and SI. The gain in precision of pps(Hajek) and STSI was about 30% accounting for hiking costs only, and about 10% accounting for the sum of car travel and hiking costs in the case study. The proposed sampling strategies are of interest for surveying any soil property in areas with marked differences in access costs, not just for validation of soil maps.

KW - Digital soil mapping

KW - Mean squared error

KW - Optimal stratification

KW - Probability sampling

KW - Sampling with probabilities-proportional-to-size

KW - Stratified random sampling

U2 - 10.1016/j.geoderma.2017.11.028

DO - 10.1016/j.geoderma.2017.11.028

M3 - Article

VL - 315

SP - 160

EP - 169

JO - Geoderma

JF - Geoderma

SN - 0016-7061

ER -