### Abstract

In areas with marked differences in accessibility, the cost efficiency of design-based sampling strategies for estimating the population mean or total can be increased by accounting for these differences in the selection of the sampling locations. This can be achieved by stratified random sampling. The question then is how to construct the strata. Existing optimal stratification methods such as cum (Formula presented.) stratification assume a constant cost among the sampling units, and therefore can be suboptimal when this assumption is violated. A simulated annealing algorithm is proposed for simultaneous optimization of the stratum breaks and the sample size under optimal allocation of the sample size, given a chosen maximum for the expected total costs. The proposed stratification method is tested in a study area of 5900 km^{2} in Anhui province, China. Optimal stratum breaks were computed for estimating the population mean of the soil organic matter content (SOM). Predictions of SOM from a multiple linear regression model were used as a stratification variable. The optimal stratum breaks differed markedly from the cum (Formula presented.) breaks. The variance of the estimated mean of SOM using the optimal stratification was about 8 to 29% smaller than with the cum (Formula presented.) stratification, depending on the number of strata. This large gain in precision can be explained by the moderately strong correlation of the point-wise costs and the stratification variable. Smaller gains are expected when this correlation is weaker or the variation in costs among the units are smaller. The proposed algorithm can also be used when no ancillary variable related to the variable of interest is available, accounting for differences in costs among the sampling units only. An R script with functions is provided as supporting information. Highlights: A method is proposed to compute optimal strata that accounts for differences in costs among sampling locations Simulated annealing is used to optimize stratum breaks and total sample size under a total costs constraint The variance of estimated mean of SOM with proposed method was 8 to 29% smaller than with cum (Formula presented.) method Proposed algorithm can also be used when no stratification variable is available (optimal costs stratification).

Language | English |
---|---|

Article number | 12731 |

Pages | 200-212 |

Journal | European Journal of Soil Science |

Volume | 70 |

Issue number | 1 |

Early online date | 9 Sep 2018 |

DOIs | |

Publication status | Published - Jan 2019 |

### Fingerprint

### Cite this

*European Journal of Soil Science*,

*70*(1), 200-212. [12731]. https://doi.org/10.1111/ejss.12731

}

*European Journal of Soil Science*, vol. 70, no. 1, 12731, pp. 200-212. https://doi.org/10.1111/ejss.12731

**Accounting for differences in costs among sampling locations in optimal stratification.** / Brus, D.J.; Yang, L.; Zhu, A.X.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Accounting for differences in costs among sampling locations in optimal stratification

AU - Brus, D.J.

AU - Yang, L.

AU - Zhu, A.X.

PY - 2019/1

Y1 - 2019/1

N2 - In areas with marked differences in accessibility, the cost efficiency of design-based sampling strategies for estimating the population mean or total can be increased by accounting for these differences in the selection of the sampling locations. This can be achieved by stratified random sampling. The question then is how to construct the strata. Existing optimal stratification methods such as cum (Formula presented.) stratification assume a constant cost among the sampling units, and therefore can be suboptimal when this assumption is violated. A simulated annealing algorithm is proposed for simultaneous optimization of the stratum breaks and the sample size under optimal allocation of the sample size, given a chosen maximum for the expected total costs. The proposed stratification method is tested in a study area of 5900 km2 in Anhui province, China. Optimal stratum breaks were computed for estimating the population mean of the soil organic matter content (SOM). Predictions of SOM from a multiple linear regression model were used as a stratification variable. The optimal stratum breaks differed markedly from the cum (Formula presented.) breaks. The variance of the estimated mean of SOM using the optimal stratification was about 8 to 29% smaller than with the cum (Formula presented.) stratification, depending on the number of strata. This large gain in precision can be explained by the moderately strong correlation of the point-wise costs and the stratification variable. Smaller gains are expected when this correlation is weaker or the variation in costs among the units are smaller. The proposed algorithm can also be used when no ancillary variable related to the variable of interest is available, accounting for differences in costs among the sampling units only. An R script with functions is provided as supporting information. Highlights: A method is proposed to compute optimal strata that accounts for differences in costs among sampling locations Simulated annealing is used to optimize stratum breaks and total sample size under a total costs constraint The variance of estimated mean of SOM with proposed method was 8 to 29% smaller than with cum (Formula presented.) method Proposed algorithm can also be used when no stratification variable is available (optimal costs stratification).

AB - In areas with marked differences in accessibility, the cost efficiency of design-based sampling strategies for estimating the population mean or total can be increased by accounting for these differences in the selection of the sampling locations. This can be achieved by stratified random sampling. The question then is how to construct the strata. Existing optimal stratification methods such as cum (Formula presented.) stratification assume a constant cost among the sampling units, and therefore can be suboptimal when this assumption is violated. A simulated annealing algorithm is proposed for simultaneous optimization of the stratum breaks and the sample size under optimal allocation of the sample size, given a chosen maximum for the expected total costs. The proposed stratification method is tested in a study area of 5900 km2 in Anhui province, China. Optimal stratum breaks were computed for estimating the population mean of the soil organic matter content (SOM). Predictions of SOM from a multiple linear regression model were used as a stratification variable. The optimal stratum breaks differed markedly from the cum (Formula presented.) breaks. The variance of the estimated mean of SOM using the optimal stratification was about 8 to 29% smaller than with the cum (Formula presented.) stratification, depending on the number of strata. This large gain in precision can be explained by the moderately strong correlation of the point-wise costs and the stratification variable. Smaller gains are expected when this correlation is weaker or the variation in costs among the units are smaller. The proposed algorithm can also be used when no ancillary variable related to the variable of interest is available, accounting for differences in costs among the sampling units only. An R script with functions is provided as supporting information. Highlights: A method is proposed to compute optimal strata that accounts for differences in costs among sampling locations Simulated annealing is used to optimize stratum breaks and total sample size under a total costs constraint The variance of estimated mean of SOM with proposed method was 8 to 29% smaller than with cum (Formula presented.) method Proposed algorithm can also be used when no stratification variable is available (optimal costs stratification).

U2 - 10.1111/ejss.12731

DO - 10.1111/ejss.12731

M3 - Article

VL - 70

SP - 200

EP - 212

JO - European Journal of Soil Science

T2 - European Journal of Soil Science

JF - European Journal of Soil Science

SN - 1351-0754

IS - 1

M1 - 12731

ER -