Better Generic Objects Counting When Asking Questions to Images: A Multitask Approach for Remote Sensing Visual Question Answering

Sylvain Lobry*, Diego Marcos, Benjamin Kellenberger, Devis Tuia

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paper

Abstract

Visual Question Answering for Remote Sensing (RSVQA) aims at extracting information from remote sensing images through queries formulated in natural language. Since the answer to the query is also provided in natural language, the system is accessible to non-experts, and therefore dramatically increases the value of remote sensing images as a source of information, for example for journalism purposes or interactive land planning. Ideally, an RSVQA system should be able to provide an answer to questions that vary both in terms of topic (presence, localization, counting) and image content. However, aiming at such flexibility generates problems related to the variability of the possible answers. A striking example is counting, where the number of objects present in a remote sensing image can vary by multiple orders of magnitude, depending on both the scene and type of objects. This represents a challenge for traditional Visual Question Answering (VQA) methods, which either become intractable or result in an accuracy loss, as the number of possible answers has to be limited. To this end, we introduce a new model that jointly solves a classification problem (which is the most common approach in VQA) and a regression problem (to answer numerical questions more precisely). An evaluation of this method on the RSVQA dataset shows that this finer numerical output comes at the cost of a small loss of performance on non-numerical questions.

Original languageEnglish
Title of host publicationISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Pages1021-1027
Number of pages7
Volume5
Edition2
DOIs
Publication statusPublished - 3 Aug 2020
Event2020 24th ISPRS Congress on Technical Commission II - Nice, Virtual, France
Duration: 31 Aug 20202 Sep 2020

Publication series

NameISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
ISSN (Print)2194-9042

Conference

Conference2020 24th ISPRS Congress on Technical Commission II
CountryFrance
CityNice, Virtual
Period31/08/202/09/20

Keywords

  • Convolution Neural Networks
  • Deep learning
  • Natural language
  • Recurrent Neural Networks
  • Regression
  • Remote sensing
  • Visual Question Answering

Fingerprint Dive into the research topics of 'Better Generic Objects Counting When Asking Questions to Images: A Multitask Approach for Remote Sensing Visual Question Answering'. Together they form a unique fingerprint.

Cite this