Mapping soil organic carbon content by geographically weighted regression: A case study in the Heihe River Basin, China

Xiao Dong Song, Dick J. Brus, Feng Liu, De Cheng Li, Yu Guo Zhao, Jin Ling Yang, Gan Lin Zhang

Research output: Contribution to journalArticleAcademicpeer-review

46 Citations (Scopus)

Abstract

In large heterogeneous areas the relationship between soil organic carbon (SOC) and environmental covariates may vary throughout the area, bringing about difficulty for accurate modeling of the regional SOC variation. The benefit of local, geographically weighted regression (GWR) coefficients was tested in a case study on soil organic carbon mapping across a 50,810km2 area in northwestern China. This area is composed of an alpine ecosystem in the upper reaches and oases in the middle reaches. The benefit was quantified by comparing the quality of the maps obtained by GWR and geographically weighted ridge regression (GWRR) on the one side and multiple linear regression (MLR) on the other side. In these methods spatial dependence of model residuals is ignored. The root mean squared error (RMSE) of predictions of natural log-transformed SOC obtained with GWR was smaller than with MLR: 0.565 versus 0.618g/kg. The use of a local ridge parameter in GWRR did not lead to an increase in accuracy. Besides we compared the quality of maps obtained by geographically weighted regression followed by simple kriging of model residuals (GWRSK) and kriging with an external drift (KED) with global regression coefficients. In these methods the spatial dependence of model residuals is incorporated in the model. The RMSE with KED was smaller than with GWRSK: 0.515 versus 0.546g/kg. We conclude that fitting regression coefficients locally as in GWR only paid when no spatial random effect was included in the model. When a spatial random effect was included, the flexibility of local, geographically weighted regression coefficients was not needed and even undesirable as it led to less accurate predictions than KED with global regression coefficients. In comparing the accuracy of prediction methods by leave-one-out cross-validation (LOOCV) of a non-probability sample it is important to account for possible autocorrelation of pairwise differences in the prediction errors. The effective sample sizes were substantially smaller than the total number of sampling points, so that most pairwise differences in MSE were not significant at a significance level of 10% in a two-sided paired t-test.

LanguageEnglish
Pages11-22
JournalGeoderma
Volume261
DOIs
Publication statusPublished - 1 Jan 2016

Fingerprint

soil organic carbon
river basin
organic carbon
case studies
China
prediction
kriging
soil
oasis
oases
autocorrelation
sampling
methodology
ecosystem
ecosystems
modeling
method
effect

Keywords

  • Cross-validation
  • Digital soil mapping
  • Effective sample size
  • Kriging with an external drift (KED)
  • Map accuracy
  • Restricted maximum likelihood (REML)

Cite this

Song, Xiao Dong ; Brus, Dick J. ; Liu, Feng ; Li, De Cheng ; Zhao, Yu Guo ; Yang, Jin Ling ; Zhang, Gan Lin. / Mapping soil organic carbon content by geographically weighted regression : A case study in the Heihe River Basin, China. In: Geoderma. 2016 ; Vol. 261. pp. 11-22.
@article{f88c2e560edf4bb2b7b3439a8dfc4353,
title = "Mapping soil organic carbon content by geographically weighted regression: A case study in the Heihe River Basin, China",
abstract = "In large heterogeneous areas the relationship between soil organic carbon (SOC) and environmental covariates may vary throughout the area, bringing about difficulty for accurate modeling of the regional SOC variation. The benefit of local, geographically weighted regression (GWR) coefficients was tested in a case study on soil organic carbon mapping across a 50,810km2 area in northwestern China. This area is composed of an alpine ecosystem in the upper reaches and oases in the middle reaches. The benefit was quantified by comparing the quality of the maps obtained by GWR and geographically weighted ridge regression (GWRR) on the one side and multiple linear regression (MLR) on the other side. In these methods spatial dependence of model residuals is ignored. The root mean squared error (RMSE) of predictions of natural log-transformed SOC obtained with GWR was smaller than with MLR: 0.565 versus 0.618g/kg. The use of a local ridge parameter in GWRR did not lead to an increase in accuracy. Besides we compared the quality of maps obtained by geographically weighted regression followed by simple kriging of model residuals (GWRSK) and kriging with an external drift (KED) with global regression coefficients. In these methods the spatial dependence of model residuals is incorporated in the model. The RMSE with KED was smaller than with GWRSK: 0.515 versus 0.546g/kg. We conclude that fitting regression coefficients locally as in GWR only paid when no spatial random effect was included in the model. When a spatial random effect was included, the flexibility of local, geographically weighted regression coefficients was not needed and even undesirable as it led to less accurate predictions than KED with global regression coefficients. In comparing the accuracy of prediction methods by leave-one-out cross-validation (LOOCV) of a non-probability sample it is important to account for possible autocorrelation of pairwise differences in the prediction errors. The effective sample sizes were substantially smaller than the total number of sampling points, so that most pairwise differences in MSE were not significant at a significance level of 10{\%} in a two-sided paired t-test.",
keywords = "Cross-validation, Digital soil mapping, Effective sample size, Kriging with an external drift (KED), Map accuracy, Restricted maximum likelihood (REML)",
author = "Song, {Xiao Dong} and Brus, {Dick J.} and Feng Liu and Li, {De Cheng} and Zhao, {Yu Guo} and Yang, {Jin Ling} and Zhang, {Gan Lin}",
year = "2016",
month = "1",
day = "1",
doi = "10.1016/j.geoderma.2015.06.024",
language = "English",
volume = "261",
pages = "11--22",
journal = "Geoderma",
issn = "0016-7061",
publisher = "Elsevier",

}

Mapping soil organic carbon content by geographically weighted regression : A case study in the Heihe River Basin, China. / Song, Xiao Dong; Brus, Dick J.; Liu, Feng; Li, De Cheng; Zhao, Yu Guo; Yang, Jin Ling; Zhang, Gan Lin.

In: Geoderma, Vol. 261, 01.01.2016, p. 11-22.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Mapping soil organic carbon content by geographically weighted regression

T2 - Geoderma

AU - Song, Xiao Dong

AU - Brus, Dick J.

AU - Liu, Feng

AU - Li, De Cheng

AU - Zhao, Yu Guo

AU - Yang, Jin Ling

AU - Zhang, Gan Lin

PY - 2016/1/1

Y1 - 2016/1/1

N2 - In large heterogeneous areas the relationship between soil organic carbon (SOC) and environmental covariates may vary throughout the area, bringing about difficulty for accurate modeling of the regional SOC variation. The benefit of local, geographically weighted regression (GWR) coefficients was tested in a case study on soil organic carbon mapping across a 50,810km2 area in northwestern China. This area is composed of an alpine ecosystem in the upper reaches and oases in the middle reaches. The benefit was quantified by comparing the quality of the maps obtained by GWR and geographically weighted ridge regression (GWRR) on the one side and multiple linear regression (MLR) on the other side. In these methods spatial dependence of model residuals is ignored. The root mean squared error (RMSE) of predictions of natural log-transformed SOC obtained with GWR was smaller than with MLR: 0.565 versus 0.618g/kg. The use of a local ridge parameter in GWRR did not lead to an increase in accuracy. Besides we compared the quality of maps obtained by geographically weighted regression followed by simple kriging of model residuals (GWRSK) and kriging with an external drift (KED) with global regression coefficients. In these methods the spatial dependence of model residuals is incorporated in the model. The RMSE with KED was smaller than with GWRSK: 0.515 versus 0.546g/kg. We conclude that fitting regression coefficients locally as in GWR only paid when no spatial random effect was included in the model. When a spatial random effect was included, the flexibility of local, geographically weighted regression coefficients was not needed and even undesirable as it led to less accurate predictions than KED with global regression coefficients. In comparing the accuracy of prediction methods by leave-one-out cross-validation (LOOCV) of a non-probability sample it is important to account for possible autocorrelation of pairwise differences in the prediction errors. The effective sample sizes were substantially smaller than the total number of sampling points, so that most pairwise differences in MSE were not significant at a significance level of 10% in a two-sided paired t-test.

AB - In large heterogeneous areas the relationship between soil organic carbon (SOC) and environmental covariates may vary throughout the area, bringing about difficulty for accurate modeling of the regional SOC variation. The benefit of local, geographically weighted regression (GWR) coefficients was tested in a case study on soil organic carbon mapping across a 50,810km2 area in northwestern China. This area is composed of an alpine ecosystem in the upper reaches and oases in the middle reaches. The benefit was quantified by comparing the quality of the maps obtained by GWR and geographically weighted ridge regression (GWRR) on the one side and multiple linear regression (MLR) on the other side. In these methods spatial dependence of model residuals is ignored. The root mean squared error (RMSE) of predictions of natural log-transformed SOC obtained with GWR was smaller than with MLR: 0.565 versus 0.618g/kg. The use of a local ridge parameter in GWRR did not lead to an increase in accuracy. Besides we compared the quality of maps obtained by geographically weighted regression followed by simple kriging of model residuals (GWRSK) and kriging with an external drift (KED) with global regression coefficients. In these methods the spatial dependence of model residuals is incorporated in the model. The RMSE with KED was smaller than with GWRSK: 0.515 versus 0.546g/kg. We conclude that fitting regression coefficients locally as in GWR only paid when no spatial random effect was included in the model. When a spatial random effect was included, the flexibility of local, geographically weighted regression coefficients was not needed and even undesirable as it led to less accurate predictions than KED with global regression coefficients. In comparing the accuracy of prediction methods by leave-one-out cross-validation (LOOCV) of a non-probability sample it is important to account for possible autocorrelation of pairwise differences in the prediction errors. The effective sample sizes were substantially smaller than the total number of sampling points, so that most pairwise differences in MSE were not significant at a significance level of 10% in a two-sided paired t-test.

KW - Cross-validation

KW - Digital soil mapping

KW - Effective sample size

KW - Kriging with an external drift (KED)

KW - Map accuracy

KW - Restricted maximum likelihood (REML)

U2 - 10.1016/j.geoderma.2015.06.024

DO - 10.1016/j.geoderma.2015.06.024

M3 - Article

VL - 261

SP - 11

EP - 22

JO - Geoderma

JF - Geoderma

SN - 0016-7061

ER -