1. Goal
In Interpretable And Fine-grained Visual Explanations For Convolutional Neural Networks, authors propose an optimization-based visual explanation method, which highlights the evidence in the input images for a specific prediction.
1.1 Sub-goal
[A]: Defending against adversarial evidence (i.e. faulty evidence due to artifacts).
[B]: Providing explanations which are both fine-grained and preserve the characteristics of images, such as edges and colors.

2. Method
The authors provide local explanations, which focus on individual input. Given one data point, our method highlights the evidence on which a model bases its decision.
2.1 Perturbation based visual explanations
Perturbation based explanations can be defined as:
- Explanation by preservation: The smallest region of the image must be retained to preserve the original model output.
- Explanation by deletion: The smallest region of the image must be deleted to change the model output.
2.1.1 Problem definition
- CNN:
- Input image:
- Output:
- Softmax scores:
of the different classes - Explanations:
for a target class
2.1.2 The objective function
An explanation is computed by removing either relevant or irrelevant information from the image
To do this, they use a mask based operator

Authors introduce a similarity metric
- Measuring the consistency of the model output generated by the explanation
and the output of the image with respect to a target class - Typical choices for the metric (i) cross-entropy and (ii) negative softmax score of the target class
|If the explanation preserves the output of
|Elif the explanation manages to significantly drop the probability of
From the similarity metric, Let's define two versions of the objective functions.
(1) Preserving game
Using the mask based definition of an explanation with a reference (
where
(2) Deletion game
We can compute a deleting explanation using:

To solve the optimization in Eq. (2) and (3), we utilize Stochastic Gradient Descent (SGD and start with an explanation
2.2 Defending against Adversarial Evidence
CNNs have been proven susceptible to adversarial images. Due to the computational similarity of adversarial methods and optimization-based visual explanation approaches, adversarial noise is also a concern for our method.
To tackle this problem, the authors propose a novel adversarial defense that filters gradients during backpropagation in a targeted way. The basic idea is: A neuron within a CNN is only allowed to be activated by the explanation
If we regard neurons as indicators for the existence of features (e.g. edges, object parts, ...), the proposed constraint enforces that the explanation
where
To solve the optimization with subject to Eq. (4), one could incorporate the constraints via a penalty function by adding an additional layer
where
However, since this method changes the architecture of the model which we explain, do clip gradients in the backward pass of the optimization, which leads to a violation of Eq. (4). This is equivalent to adding an additional clipping-layer after each nonlinearity which acts as the identity in the forward pass and uses the gradient update of Eq. (5) in the backward pass. When backpropagating an error-signal
where

Reference
Wagner, Jorg, et al. "Interpretable and fine-grained visual explanations for convolutional neural networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
'AI paper review > Explainable AI' 카테고리의 다른 글
Counterfactual Explanation Based on Gradual Construction for Deep Networks (0) | 2022.03.10 |
---|---|
A Disentangling Invertible Interpretation Network for Explaining Latent Representations (0) | 2022.03.10 |
Interpretable Explanations of Black Boxes by Meaningful Perturbation (0) | 2022.03.07 |
GradCAM (0) | 2022.03.07 |