Semantic Segmentation of Remote Sensing Images With Sparse Annotations

Yuansheng Hua, Diego Marcos, Lichao Mou, Xiao Xiang Zhu, Devis Tuia

Research output: Contribution to journalArticleAcademicpeer-review


Training convolutional neural networks (CNNs) for very high-resolution images requires a large quantity of high-quality pixel-level annotations, which is extremely labor-intensive and time-consuming to produce. Moreover, professional photograph interpreters might have to be involved in guaranteeing the correctness of annotations. To alleviate such a burden, we propose a framework for semantic segmentation of aerial images based on incomplete annotations, where annotators are asked to label a few pixels with easy-to-draw scribbles. To exploit these sparse scribbled annotations, we propose the FEature and Spatial relaTional regulArization (FESTA) method to complement the supervised task with an unsupervised learning signal that accounts for neighborhood structures both in spatial and feature terms. For the evaluation of our framework, we perform experiments on two remote sensing image segmentation data sets involving aerial and satellite imagery, respectively. Experimental results demonstrate that the exploitation of sparse annotations can significantly reduce labeling costs, while the proposed method can help improve the performance of semantic segmentation when training on such annotations. The sparse labels and codes are publicly available for reproducibility purposes.1

Original languageEnglish
Article number3051053
Pages (from-to)1-5
Number of pages5
JournalIEEE Geoscience and Remote Sensing Letters
Publication statusE-pub ahead of print - 25 Jan 2021


  • Aerial image
  • Annotations
  • convolutional neural networks (CNNs)
  • Image color analysis
  • Image segmentation
  • Kernel
  • Remote sensing
  • semantic segmentation
  • Semantics
  • semisupervised learning
  • sparse scribbled annotation.
  • Training

Fingerprint Dive into the research topics of 'Semantic Segmentation of Remote Sensing Images With Sparse Annotations'. Together they form a unique fingerprint.

Cite this