Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation

R. Barth*, J. IJsselmuiden, J. Hemming, E.J. Van Henten

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

5 Citations (Scopus)

Abstract

A current bottleneck of state-of-the-art machine learning methods for image segmentation in agriculture, e.g. convolutional neural networks (CNNs), is the requirement of large manually annotated datasets on a per-pixel level. In this paper, we investigated how related synthetic images can be used to bootstrap CNNs for successful learning as compared to other learning strategies. We hypothesise that a small manually annotated empirical dataset is sufficient for fine-tuning a synthetically bootstrapped CNN. Furthermore we investigated (i) multiple deep learning architectures, (ii) the correlation between synthetic and empirical dataset size on part segmentation performance, (iii) the effect of post-processing using conditional random fields (CRF) and (iv) the generalisation performance on other related datasets. For this we have performed 7 experiments using the Capsicum annuum (bell or sweet pepper) dataset containing 50 empirical and 10,500 synthetic images with 7 pixel-level annotated part classes. Results confirmed our hypothesis that only 30 empirical images were required to obtain the highest performance on all 7 classes (mean IOU = 0.40) when a CNN was bootstrapped on related synthetic data. Furthermore we found optimal empirical performance when a VGG-16 network was modified to include à trous spatial pyramid pooling. Adding CRF only improved performance on the synthetic data. Training binary classifiers did not improve results. We have found a positive correlation between dataset size and performance. For the synthetic dataset, learning stabilises around 3000 images. Generalisation to other related datasets proved possible.
Original languageEnglish
Pages (from-to)291-304
JournalComputers and Electronics in Agriculture
Volume161
Early online date19 Dec 2017
DOIs
Publication statusPublished - Jun 2019

Fingerprint

bootstrapping
segmentation
neural networks
plant anatomy
learning
Semantics
Neural networks
sweet peppers
Pixels
artificial intelligence
Capsicum annuum
Image segmentation
Agriculture
Learning systems
pixel
Classifiers
Tuning
agriculture
Processing
Experiments

Keywords

  • Big data
  • Bootstrapping
  • Computer vision
  • Semantic segmentation
  • Synthetic dataset

Cite this

@article{ace18530a6b54cc48b5baa236432bcdd,
title = "Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation",
abstract = "A current bottleneck of state-of-the-art machine learning methods for image segmentation in agriculture, e.g. convolutional neural networks (CNNs), is the requirement of large manually annotated datasets on a per-pixel level. In this paper, we investigated how related synthetic images can be used to bootstrap CNNs for successful learning as compared to other learning strategies. We hypothesise that a small manually annotated empirical dataset is sufficient for fine-tuning a synthetically bootstrapped CNN. Furthermore we investigated (i) multiple deep learning architectures, (ii) the correlation between synthetic and empirical dataset size on part segmentation performance, (iii) the effect of post-processing using conditional random fields (CRF) and (iv) the generalisation performance on other related datasets. For this we have performed 7 experiments using the Capsicum annuum (bell or sweet pepper) dataset containing 50 empirical and 10,500 synthetic images with 7 pixel-level annotated part classes. Results confirmed our hypothesis that only 30 empirical images were required to obtain the highest performance on all 7 classes (mean IOU = 0.40) when a CNN was bootstrapped on related synthetic data. Furthermore we found optimal empirical performance when a VGG-16 network was modified to include {\`a} trous spatial pyramid pooling. Adding CRF only improved performance on the synthetic data. Training binary classifiers did not improve results. We have found a positive correlation between dataset size and performance. For the synthetic dataset, learning stabilises around 3000 images. Generalisation to other related datasets proved possible.",
keywords = "Big data, Bootstrapping, Computer vision, Semantic segmentation, Synthetic dataset",
author = "R. Barth and J. IJsselmuiden and J. Hemming and {Van Henten}, E.J.",
year = "2019",
month = "6",
doi = "10.1016/j.compag.2017.11.040",
language = "English",
volume = "161",
pages = "291--304",
journal = "Computers and Electronics in Agriculture",
issn = "0168-1699",
publisher = "Elsevier",

}

Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation. / Barth, R.; IJsselmuiden, J.; Hemming, J.; Van Henten, E.J.

In: Computers and Electronics in Agriculture, Vol. 161, 06.2019, p. 291-304.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Synthetic bootstrapping of convolutional neural networks for semantic plant part segmentation

AU - Barth, R.

AU - IJsselmuiden, J.

AU - Hemming, J.

AU - Van Henten, E.J.

PY - 2019/6

Y1 - 2019/6

N2 - A current bottleneck of state-of-the-art machine learning methods for image segmentation in agriculture, e.g. convolutional neural networks (CNNs), is the requirement of large manually annotated datasets on a per-pixel level. In this paper, we investigated how related synthetic images can be used to bootstrap CNNs for successful learning as compared to other learning strategies. We hypothesise that a small manually annotated empirical dataset is sufficient for fine-tuning a synthetically bootstrapped CNN. Furthermore we investigated (i) multiple deep learning architectures, (ii) the correlation between synthetic and empirical dataset size on part segmentation performance, (iii) the effect of post-processing using conditional random fields (CRF) and (iv) the generalisation performance on other related datasets. For this we have performed 7 experiments using the Capsicum annuum (bell or sweet pepper) dataset containing 50 empirical and 10,500 synthetic images with 7 pixel-level annotated part classes. Results confirmed our hypothesis that only 30 empirical images were required to obtain the highest performance on all 7 classes (mean IOU = 0.40) when a CNN was bootstrapped on related synthetic data. Furthermore we found optimal empirical performance when a VGG-16 network was modified to include à trous spatial pyramid pooling. Adding CRF only improved performance on the synthetic data. Training binary classifiers did not improve results. We have found a positive correlation between dataset size and performance. For the synthetic dataset, learning stabilises around 3000 images. Generalisation to other related datasets proved possible.

AB - A current bottleneck of state-of-the-art machine learning methods for image segmentation in agriculture, e.g. convolutional neural networks (CNNs), is the requirement of large manually annotated datasets on a per-pixel level. In this paper, we investigated how related synthetic images can be used to bootstrap CNNs for successful learning as compared to other learning strategies. We hypothesise that a small manually annotated empirical dataset is sufficient for fine-tuning a synthetically bootstrapped CNN. Furthermore we investigated (i) multiple deep learning architectures, (ii) the correlation between synthetic and empirical dataset size on part segmentation performance, (iii) the effect of post-processing using conditional random fields (CRF) and (iv) the generalisation performance on other related datasets. For this we have performed 7 experiments using the Capsicum annuum (bell or sweet pepper) dataset containing 50 empirical and 10,500 synthetic images with 7 pixel-level annotated part classes. Results confirmed our hypothesis that only 30 empirical images were required to obtain the highest performance on all 7 classes (mean IOU = 0.40) when a CNN was bootstrapped on related synthetic data. Furthermore we found optimal empirical performance when a VGG-16 network was modified to include à trous spatial pyramid pooling. Adding CRF only improved performance on the synthetic data. Training binary classifiers did not improve results. We have found a positive correlation between dataset size and performance. For the synthetic dataset, learning stabilises around 3000 images. Generalisation to other related datasets proved possible.

KW - Big data

KW - Bootstrapping

KW - Computer vision

KW - Semantic segmentation

KW - Synthetic dataset

U2 - 10.1016/j.compag.2017.11.040

DO - 10.1016/j.compag.2017.11.040

M3 - Article

VL - 161

SP - 291

EP - 304

JO - Computers and Electronics in Agriculture

JF - Computers and Electronics in Agriculture

SN - 0168-1699

ER -