### Abstract

In balanced sampling a linear relation between the soil property of interest and one or more covariates with known means is exploited in selecting the sampling locations. Recent developments make this sampling design attractive for statistical soil surveys. This paper introduces balanced sampling and demonstrates its potential utility and versatility. Latin hypercube sampling appears to be a special case of balanced sampling. When implemented as a balanced sampling design, the inclusion probabilities of the population units are known. Population parameters can then be estimated by design-based, model-assisted or model-based inference. In a simulation study balanced (b) random sampling, balanced coverage (bc) random sampling, and latin hypercube (lh) random sampling were compared in terms of the sampling distributions of number of unsampled marginal strata (U) measuring coverage of feature space, Mean Squared Shortest Distance (MSSD) measuring spatial coverage, and error in the estimated mean e. In designs b and bc four covariates were used as balancing variables. In bc the four covariates and the spatial coordinates were used as spreading variables. With lh the total sample size was random, but the size fluctuations were acceptable. Design lh clearly scored best with regard to U, but had by far the largest variance of e. Based on U and MSSD design bc outperformed design b, which can be explained by the use of spreading variables in bc. The variance of e for these designs was about 0.8 times the variance for simple random sampling. Using the ratio estimator for lh this variance ratio was about 1.8, showing the poor estimation performance of lh.

Original language | English |
---|---|

Pages (from-to) | 111-121 |

Journal | Geoderma |

Volume | 253-254 |

DOIs | |

Publication status | Published - 2015 |

### Keywords

- Design-based estimation
- Digital soil mapping
- Latin hypercube sampling
- Probability sampling
- Random forests