Interpretable Explanations of Black Boxes by Meaningful Perturbation

1. How to explain the decision of black-box model??

Given the left figure, we wonder that why the deep network predicts the image as "dog".

To gratify this curiosity, we aim to find the important regions of an input image to classify it as "dog" class.

\[ \downarrow \text{The idea} \]

If we find and remove THE REGIONS, the probability of the prediction significantly gets lower.

Note: removing the region represents that the regions are replaced with reference data (e.g. blurred, noise).

Fig 1. Input and result for explanations

2. How to find the important regions to be classified as a target class?

2.1 Problem definition

Input image: \(x_0 \)
Black-box model: \(f \)
Target class: \(c\)
Prediction score for the target class: \( f_c(x_0)\)

2.2 Methods

In Interpretable Explanations of Black Boxes by Meaningful Perturbation, the goal is to find deletion (perturbation) regions that are maximally informative to the decision.

Let \( m:\lambda \rightarrow [0,1] \) be a mask, associating each pixel \( u \in \lambda \) with a scalar value \(m(u)\).
Then, the perturbation operator is defined as follows:

\[
\Phi(x_0: m)(u)= \left\{ \begin{array}{ll} m(u)x_0(u)+(1-m(u))u_0, & \text{constant}, \cr
m(u)x_0(u)+(1-m(u))\eta(u), & \text{noise}, \cr
\int g_{\sigma_0 m(u)} (v-u)x_0(v)dv, & \text{blur}
\end{array} \right.
\]

,where \(u_0\) is an average color, \(\eta(u)\) are i.i.d Gaussian noise samples for each pixel and \(\sigma_0\) is the maximum isotropic standard deviation of the Gaussian blur kernel \(g_\sigma\).

| If \(m(u)=1 \quad \) \(\rightarrow\) Preserve the original pixel |
| Elif \(m(u)=0 \quad \) \(\rightarrow\) Replace the original pixel with a pixel of reference data|

Note: I will define constant, noise, and blur as reference data \(R\).

2.2.1 The objective function

Find the smallest deletion mask \(m\) that causes the score \(f_c(\Phi (x_0:m)) \ll f_c(x_0)\) to drop significantly, where \(c\) is the target class. This represents that masked regions (pixels) were important to classify \(x_0\) as \(c\).

\[ m^*= argmin_{m \in [0,1]^\Lambda} \lambda \Vert 1-m\Vert_1+ f_c(\Phi(x_0:m))\]

,where \(\lambda\) encourages most of the mask to be turned off.

2.2.2 Dealing with artifacts

By committing to finding a single representative perturbation, our approach incurs the risk of triggering artifacts of the black-box model, like below figures.

To solve this problem, we suggest two approaches in generating explanations.

First, generalization for the mask
This means not relying on the details of a singly-learned mask \(m\). Hence, we reformulate the problem to apply the mask \(m\) stochastically, up to small random jitter.

Second, Total Variation (TV) regularization and upsampling
By using TV regularization and upsampling mask, we can encourage the result to have a simple, regular structure that can not be co-adapted to artifacts.

With these two modifications, the final objective function is follows:

\[min_{m \in [0,1]^\Lambda} \lambda_1 \Vert 1-m \Vert_1 + \lambda_2 \sum_u \Vert \bigtriangledown m(u) \Vert^\beta_\beta + \mathbb{E}_\tau [ f_c(\Phi (x( \cdot -\tau), m)) ] \]

,where second term represents TV regularization and third term indicates Generalization for the mask.

3. Implement details

3.1 Procedure of the upsampling method in the method

Reference

Fong, Ruth C., and Andrea Vedaldi. "Interpretable explanations of black boxes by meaningful perturbation." Proceedings of the IEEE International Conference on Computer Vision. 2017.

Github Code: Interpretable Explanations of Black Boxes by Meaningful Perturbation

저작자표시 (새창열림)

'AI paper review > Explainable AI' 카테고리의 다른 글

Counterfactual Explanation Based on Gradual Construction for Deep Networks (0)	2022.03.10
A Disentangling Invertible Interpretation Network for Explaining Latent Representations (0)	2022.03.10
Interpretable And Fine-grained Visual Explanations For Convolutional Neural Networks (0)	2022.03.09
GradCAM (0)	2022.03.07

1. How to explain the decision of black-box model??

2. How to find the important regions to be classified as a target class?

2.1 Problem definition

2.2 Methods

2.2.1 The objective function

2.2.2 Dealing with artifacts

3. Implement details

3.1 Procedure of the upsampling method in the method

Reference

'AI paper review > Explainable AI' 카테고리의 다른 글

티스토리툴바