GradCAM

1. What is the goal of GradCAM??

The goal of GradCAM is to produce a coarse localization map highlighting the important regions in the image for predicting the concept (class).

GradCAM uses the gradients of any target concept (such as "cat") flowing into the final convolutional layer.

Note: I (da2so) will only deal with the problem of image classification in the following contents.

The property of feature map $A^{k}$ from the last convolution layer: We expect the last convolution layer to have the best comprise between high-level semantics and detailed spatial information.

Obtaining the neuron importance weights $w_{k}^{c} = \frac{1}{z} \sum_{i} \sum_{j} \frac{\partial y^{c}}{\partial A_{V}^{k}}$ , where $V$ is $i, j$ .

This weight represents a partial linearization of the deep network downstream from $A$ , and captures the 'importance' of feature map $k$ for a target class $c$ .

Then, we perform a weighted combination of forward activation maps and follow it with a ReLU.
$Grad-CAM L^{c} = R e L u (\sum_{k} w_{k}^{c} A^{k}) \dots E q . (1)$

The reason for applying ReLU is that we are only interested in the features that have a positive influence.

In summary, the procedure of GradCAM is followed.

Input
- Image: $x$
- Pre-trained model: $f$
  - Feature extractor (CNN): $f_{e}$
  - Classification layer (fc layer): $f_{l}$
- Category (target class): $c$

$A \leftarrow f_{e} (x)$
$y^{c} \leftarrow f_{l} (A)$
$w_{k}^{c} \leftarrow \frac{1}{z} \sum_{i} \sum_{j} \frac{\partial y^{c}}{\partial A_{V}^{k}}$
$L^{c} \leftarrow R e L u (\sum_{k} w_{k}^{c} A^{k})$

Output
- Grad-CAM: $L^{c}$

Evaluating Trust

Given two prediction explanations, they evaluate which seems more trustworthy between Guided Backpropagation and Guided Grad-CAM visualizations. For experiments, they use AlexNet and VGG-16oting that VGG-16 has more accurate than AlexNet with an accuracy of 79.09 mAP (vs. 69.20 mAP) on PASCAL classification. Trust scores are obtained from 54 humans. With Guided Backpropagation, humans assign VGG-16 an average score of 1.00 which means that it is more trustworthy than AlexNet, while Guided Grad-CAM achieves a higher score of 1.27 which means that VGG-16 is clearly more reliable.

Reference

Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." Proceedings of the IEEE international conference on computer vision. 2017.

Github Code: Grad-CAM

'AI paper review > Explainable AI' 카테고리의 다른 글

Counterfactual Explanation Based on Gradual Construction for Deep Networks (0)	2022.03.10
A Disentangling Invertible Interpretation Network for Explaining Latent Representations (0)	2022.03.10
Interpretable And Fine-grained Visual Explanations For Convolutional Neural Networks (0)	2022.03.09
Interpretable Explanations of Black Boxes by Meaningful Perturbation (0)	2022.03.07

1. What is the goal of GradCAM??
Evaluating Trust
Reference

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

GradCAM

1. What is the goal of GradCAM??

Evaluating Trust

Reference

'AI paper review > Explainable AI' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역