Making sense of CNNs: interpreting deep representations & their invariances with INNs

To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into black box models that lack interpretability. To open such a bla...

Full description

Saved in:
Bibliographic Details
Main Authors: Rombach, Robin (Author) , Esser, Patrick (Author) , Ommer, Björn (Author)
Format: Article (Journal) Chapter/Article
Language:English
Published: 4 Aug 2020
In: Arxiv
Year: 2020, Pages: 1-41
DOI:10.48550/arXiv.2008.01777
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.48550/arXiv.2008.01777
Verlag, lizenzpflichtig, Volltext: http://arxiv.org/abs/2008.01777
Get full text
Author Notes:Robin Rombach, Patrick Esser, and Björn Ommer
Description
Summary:To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into black box models that lack interpretability. To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to. We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts. As a consequence, neural network representations become understandable by providing the means to (i) expose their semantic meaning, (ii) semantically modify a representation, and (iii) visualize individual learned semantic concepts and invariances. Our invertible approach significantly extends the abilities to understand black box models by enabling post-hoc interpretations of state-of-the-art networks without compromising their performance. Our implementation is available at https://compvis.github.io/invariances/ .
Item Description:Gesehen am 06.10.2022
Physical Description:Online Resource
DOI:10.48550/arXiv.2008.01777