Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution

Shivangi Srivastava, John E. Vargas-Muñoz, Devis Tuia

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)

Abstract

Landuse characterization is important for urban planning. It is traditionally performed with field surveys or manual photo interpretation, two practices that are time-consuming and labor-intensive. Therefore, we aim to automate landuse mapping at the urban-object level with a deep learning approach based on data from multiple sources (or modalities). We consider two image modalities: overhead imagery from Google Maps and ensembles of ground-based pictures (side-views) per urban-object from Google Street View (GSV). These modalities bring complementary visual information pertaining to the urban-objects. We propose an end-to-end trainable model, which uses OpenStreetMap annotations as labels. The model can accommodate a variable number of GSV pictures for the ground-based branch and can also function in the absence of ground pictures at prediction time. We test the effectiveness of our model over the area of Île-de-France, France, and test its generalization abilities on a set of urban-objects from the city of Nantes, France. Our proposed multimodal Convolutional Neural Network achieves considerably higher accuracies than methods that use a single image modality, making it suitable for automatic landuse map updates. Additionally, our approach could be easily scaled to multiple cities, because it is based on data sources available for many cities worldwide.
LanguageEnglish
Pages129-143
JournalRemote Sensing of Environment
Volume228
DOIs
Publication statusPublished - Jul 2019

Fingerprint

learning
land use
France
urban planning
Urban planning
neural networks
field survey
Labels
labor
imagery
testing
Personnel
Neural networks
prediction
city
Deep learning
test
methodology
method

Cite this

@article{c1f34cb7f55145f58ac48cf946f9430e,
title = "Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution",
abstract = "Landuse characterization is important for urban planning. It is traditionally performed with field surveys or manual photo interpretation, two practices that are time-consuming and labor-intensive. Therefore, we aim to automate landuse mapping at the urban-object level with a deep learning approach based on data from multiple sources (or modalities). We consider two image modalities: overhead imagery from Google Maps and ensembles of ground-based pictures (side-views) per urban-object from Google Street View (GSV). These modalities bring complementary visual information pertaining to the urban-objects. We propose an end-to-end trainable model, which uses OpenStreetMap annotations as labels. The model can accommodate a variable number of GSV pictures for the ground-based branch and can also function in the absence of ground pictures at prediction time. We test the effectiveness of our model over the area of {\^I}le-de-France, France, and test its generalization abilities on a set of urban-objects from the city of Nantes, France. Our proposed multimodal Convolutional Neural Network achieves considerably higher accuracies than methods that use a single image modality, making it suitable for automatic landuse map updates. Additionally, our approach could be easily scaled to multiple cities, because it is based on data sources available for many cities worldwide.",
author = "Shivangi Srivastava and Vargas-Mu{\~n}oz, {John E.} and Devis Tuia",
year = "2019",
month = "7",
doi = "10.1016/j.rse.2019.04.014",
language = "English",
volume = "228",
pages = "129--143",
journal = "Remote Sensing of Environment",
issn = "0034-4257",
publisher = "Elsevier",

}

Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution. / Srivastava, Shivangi; Vargas-Muñoz, John E.; Tuia, Devis.

In: Remote Sensing of Environment, Vol. 228, 07.2019, p. 129-143.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Understanding urban landuse from the above and ground perspectives: A deep learning, multimodal solution

AU - Srivastava, Shivangi

AU - Vargas-Muñoz, John E.

AU - Tuia, Devis

PY - 2019/7

Y1 - 2019/7

N2 - Landuse characterization is important for urban planning. It is traditionally performed with field surveys or manual photo interpretation, two practices that are time-consuming and labor-intensive. Therefore, we aim to automate landuse mapping at the urban-object level with a deep learning approach based on data from multiple sources (or modalities). We consider two image modalities: overhead imagery from Google Maps and ensembles of ground-based pictures (side-views) per urban-object from Google Street View (GSV). These modalities bring complementary visual information pertaining to the urban-objects. We propose an end-to-end trainable model, which uses OpenStreetMap annotations as labels. The model can accommodate a variable number of GSV pictures for the ground-based branch and can also function in the absence of ground pictures at prediction time. We test the effectiveness of our model over the area of Île-de-France, France, and test its generalization abilities on a set of urban-objects from the city of Nantes, France. Our proposed multimodal Convolutional Neural Network achieves considerably higher accuracies than methods that use a single image modality, making it suitable for automatic landuse map updates. Additionally, our approach could be easily scaled to multiple cities, because it is based on data sources available for many cities worldwide.

AB - Landuse characterization is important for urban planning. It is traditionally performed with field surveys or manual photo interpretation, two practices that are time-consuming and labor-intensive. Therefore, we aim to automate landuse mapping at the urban-object level with a deep learning approach based on data from multiple sources (or modalities). We consider two image modalities: overhead imagery from Google Maps and ensembles of ground-based pictures (side-views) per urban-object from Google Street View (GSV). These modalities bring complementary visual information pertaining to the urban-objects. We propose an end-to-end trainable model, which uses OpenStreetMap annotations as labels. The model can accommodate a variable number of GSV pictures for the ground-based branch and can also function in the absence of ground pictures at prediction time. We test the effectiveness of our model over the area of Île-de-France, France, and test its generalization abilities on a set of urban-objects from the city of Nantes, France. Our proposed multimodal Convolutional Neural Network achieves considerably higher accuracies than methods that use a single image modality, making it suitable for automatic landuse map updates. Additionally, our approach could be easily scaled to multiple cities, because it is based on data sources available for many cities worldwide.

U2 - 10.1016/j.rse.2019.04.014

DO - 10.1016/j.rse.2019.04.014

M3 - Article

VL - 228

SP - 129

EP - 143

JO - Remote Sensing of Environment

T2 - Remote Sensing of Environment

JF - Remote Sensing of Environment

SN - 0034-4257

ER -