Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with discussion)

D.J. Brus, J.J. de Gruijter

    Research output: Contribution to journalArticleAcademicpeer-review

    248 Citations (Scopus)

    Abstract

    Classical sampling theory has been repeatedly identified with classical statistics which assumes that data are identically and independently distributed. This explains the switch of many soil scientists from design-based sampling strategies, based on classical sampling theory, to the model-based approach, which is based on geostatistics. However, in design-based sampling, independence has a different meaning and is determined by the sampling design, whereas in the model-based approach it is determined by the postulated model for the process studied. Design-based strategies are therefore also valid in areas with autocorrelation. Design-based and model-based estimates of spatial means are compared in a simulation study on the basis of the design-based quality criteria. The simulated field consists of four homogeneous units that are realizations of models with different means, variances and variograms. Performance is compared for two sample sizes (140 and 1520) and two block sizes (8 x 6.4 km2, 1.6 x 1.6 km2). The two strategies are Stratified Simple Random Sampling combined with the Horvitz-Thompson estimator (STSI, tHT), and Systematic Sampling combined with the block kriging predictor (SY, tOK). Point estimates of spatial means by (SY, tOK) were more accurate in all cases except the global mean (8 x 6.4 km2 block) estimated from the small sample. In interval estimates on the other hand, p-coverages were in general better with the design-based strategy, except when the number of sample points in the block was small. Factors that determine the effectiveness and efficiency of the two approaches are the type of request, the interest in objective estimates, the need for separate unique estimates of the estimation variance for all points or subregions, the interest in valid and accurate estimates of the estimation or prediction variance, the quality of the model, the autocorrelation between observation and prediction points, and the sample size. These factors will be assembled in a decision-tree that can be helpful in choosing between the two approaches. Models can also be used in the design-based approach. They describe the population itself, whereas in the model-based approach they describe the data generating processes. Errors in such models result in less accurate estimates, but the estimated accuracy is still valid
    Original languageEnglish
    Pages (from-to)1-59
    JournalGeoderma
    Volume80
    Issue number1/2
    DOIs
    Publication statusPublished - 1997

    Keywords

    • soil science
    • geostatistics

    Fingerprint Dive into the research topics of 'Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with discussion)'. Together they form a unique fingerprint.

    Cite this