A main issue preventing the use of Convolutional Neural Networks (CNN) in end user applications is the low level of transparency in the decision process. Previous work on CNN interpretability has mostly focused either on localizing the regions of the image that contribute to the result or on building an external model that generates plausible explanations. However, the former does not provide any semantic information and the latter does not guarantee the faithfulness of the explanation. We propose an intermediate representation composed of multiple Semantically Interpretable Activation Maps (SIAM) indicating the presence of predefined attributes at different locations of the image. These attribute maps are then linearly combined to produce the final output. This gives the user insight into what the model has seen, where, and a final output directly linked to this information in a comprehensive and interpretable way. We test the method on the task of landscape scenicness (aesthetic value) estimation, using an intermediate representation of 33 attributes from the SUN Attributes database. The results confirm that SIAM makes it possible to understand what attributes in the image are contributing to the final score and where they are located. Since it is based on learning from multiple tasks and datasets, SIAM improve the explanability of the prediction without additional annotation efforts or computational overhead at inference time, while keeping good performances on both the final and intermediate tasks.
|Number of pages||9|
|Publication status||Published - 18 Sep 2019|