### Abstract

In balanced sampling a linear relation between the soil property of interest and one or more covariates with known means is exploited in selecting the sampling locations. Recent developments make this sampling design attractive for statistical soil surveys. This paper introduces balanced sampling and demonstrates its potential utility and versatility. Latin hypercube sampling appears to be a special case of balanced sampling. When implemented as a balanced sampling design, the inclusion probabilities of the population units are known. Population parameters can then be estimated by design-based, model-assisted or model-based inference. In a simulation study balanced (b) random sampling, balanced coverage (bc) random sampling, and latin hypercube (lh) random sampling were compared in terms of the sampling distributions of number of unsampled marginal strata (U) measuring coverage of feature space, Mean Squared Shortest Distance (MSSD) measuring spatial coverage, and error in the estimated mean e. In designs b and bc four covariates were used as balancing variables. In bc the four covariates and the spatial coordinates were used as spreading variables. With lh the total sample size was random, but the size fluctuations were acceptable. Design lh clearly scored best with regard to U, but had by far the largest variance of e. Based on U and MSSD design bc outperformed design b, which can be explained by the use of spreading variables in bc. The variance of e for these designs was about 0.8 times the variance for simple random sampling. Using the ratio estimator for lh this variance ratio was about 1.8, showing the poor estimation performance of lh.

Language | English |
---|---|

Pages | 111-121 |

Journal | Geoderma |

Volume | 253-254 |

DOIs | |

Publication status | Published - 2015 |

### Fingerprint

### Keywords

- Design-based estimation
- Digital soil mapping
- Latin hypercube sampling
- Probability sampling
- Random forests

### Cite this

}

*Geoderma*, vol. 253-254, pp. 111-121. https://doi.org/10.1016/j.geoderma.2015.04.009

**Balanced sampling : A versatile sampling approach for statistical soil surveys.** / Brus, D.J.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Balanced sampling

T2 - Geoderma

AU - Brus, D.J.

PY - 2015

Y1 - 2015

N2 - In balanced sampling a linear relation between the soil property of interest and one or more covariates with known means is exploited in selecting the sampling locations. Recent developments make this sampling design attractive for statistical soil surveys. This paper introduces balanced sampling and demonstrates its potential utility and versatility. Latin hypercube sampling appears to be a special case of balanced sampling. When implemented as a balanced sampling design, the inclusion probabilities of the population units are known. Population parameters can then be estimated by design-based, model-assisted or model-based inference. In a simulation study balanced (b) random sampling, balanced coverage (bc) random sampling, and latin hypercube (lh) random sampling were compared in terms of the sampling distributions of number of unsampled marginal strata (U) measuring coverage of feature space, Mean Squared Shortest Distance (MSSD) measuring spatial coverage, and error in the estimated mean e. In designs b and bc four covariates were used as balancing variables. In bc the four covariates and the spatial coordinates were used as spreading variables. With lh the total sample size was random, but the size fluctuations were acceptable. Design lh clearly scored best with regard to U, but had by far the largest variance of e. Based on U and MSSD design bc outperformed design b, which can be explained by the use of spreading variables in bc. The variance of e for these designs was about 0.8 times the variance for simple random sampling. Using the ratio estimator for lh this variance ratio was about 1.8, showing the poor estimation performance of lh.

AB - In balanced sampling a linear relation between the soil property of interest and one or more covariates with known means is exploited in selecting the sampling locations. Recent developments make this sampling design attractive for statistical soil surveys. This paper introduces balanced sampling and demonstrates its potential utility and versatility. Latin hypercube sampling appears to be a special case of balanced sampling. When implemented as a balanced sampling design, the inclusion probabilities of the population units are known. Population parameters can then be estimated by design-based, model-assisted or model-based inference. In a simulation study balanced (b) random sampling, balanced coverage (bc) random sampling, and latin hypercube (lh) random sampling were compared in terms of the sampling distributions of number of unsampled marginal strata (U) measuring coverage of feature space, Mean Squared Shortest Distance (MSSD) measuring spatial coverage, and error in the estimated mean e. In designs b and bc four covariates were used as balancing variables. In bc the four covariates and the spatial coordinates were used as spreading variables. With lh the total sample size was random, but the size fluctuations were acceptable. Design lh clearly scored best with regard to U, but had by far the largest variance of e. Based on U and MSSD design bc outperformed design b, which can be explained by the use of spreading variables in bc. The variance of e for these designs was about 0.8 times the variance for simple random sampling. Using the ratio estimator for lh this variance ratio was about 1.8, showing the poor estimation performance of lh.

KW - Design-based estimation

KW - Digital soil mapping

KW - Latin hypercube sampling

KW - Probability sampling

KW - Random forests

U2 - 10.1016/j.geoderma.2015.04.009

DO - 10.1016/j.geoderma.2015.04.009

M3 - Article

VL - 253-254

SP - 111

EP - 121

JO - Geoderma

JF - Geoderma

SN - 0016-7061

ER -