Biplots for understanding machine learning predictions in digital soil mapping

Stephan van der Westhuizen*, Gerard B.M. Heuvelink, Sugnet Gardner-Lubbe, Catherine E. Clarke

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

In digital soil mapping, machine learning is gradually replacing traditional statistical models because of their greater flexibility and better prediction performance. However, unlike traditional models, a notable drawback of machine learning models is that they are “black-box” in nature due to their limited ability to provide comprehensive interpretations for their predictions. Explainable machine learning (XML) methods provide visualisations that can be used to aid in understanding predictions made by machine learning models. Popular model-agnostic visualisation methods include partial dependence plots, independent conditional expectation curves, and partial dependence plots produced with Shapley values. These methods require that covariates are uncorrelated which could be restrictive. For cases where covariates are correlated, an alternative approach is the Accumulated Local Effect plot, which however is limited to depicting one or two covariates at a time. Another disadvantage of the above mentioned methods is that no readily available goodness-of-fit metric is available. In this paper we propose the use of a principal component analysis biplot as a model-agnostic method to gain insight into machine learning predictions in digital soil mapping. A biplot is a powerful visualisation tool that is used to seek patterns in multivariate data. A biplot does not require covariates included in the visualisation to be uncorrelated, and furthermore, an analytically derived goodness-of-fit metric is provided which allows the user to evaluate the accuracy of the approximation. We present examples from a case study in South Africa in which soil organic carbon is mapped with a random forest model. Our findings show that biplots can provide meaningful interpretations for predictions, making it a worthy addition to the XML toolkit.

Original languageEnglish
Article number102892
Number of pages15
JournalEcological Informatics
Volume84
DOIs
Publication statusPublished - Dec 2024

Keywords

  • Accumulated local effect
  • Interpretable machine learning
  • Partial dependence
  • Principal component analysis
  • Shapley
  • XAI

Fingerprint

Dive into the research topics of 'Biplots for understanding machine learning predictions in digital soil mapping'. Together they form a unique fingerprint.

Cite this