1. What is the goal of GradCAM??
The goal of GradCAM is to produce a coarse localization map highlighting the important regions in the image for predicting the concept (class).

GradCAM uses the gradients of any target concept (such as "cat") flowing into the final convolutional layer.
Note: I (da2so) will only deal with the problem of image classification in the following contents.


The property of feature map
Obtaining the neuron importance weights
This weight represents a partial linearization of the deep network downstream from
Then, we perform a weighted combination of forward activation maps and follow it with a ReLU.
The reason for applying ReLU is that we are only interested in the features that have a positive influence.
In summary, the procedure of GradCAM is followed.
- Input
- Image:
- Pre-trained model:
- Feature extractor (CNN):
- Classification layer (fc layer):
- Feature extractor (CNN):
- Category (target class):
- Image:
- Output
- Grad-CAM:
- Grad-CAM:
Evaluating Trust
Given two prediction explanations, they evaluate which seems more trustworthy between Guided Backpropagation and Guided Grad-CAM visualizations. For experiments, they use AlexNet and VGG-16oting that VGG-16 has more accurate than AlexNet with an accuracy of 79.09 mAP (vs. 69.20 mAP) on PASCAL classification. Trust scores are obtained from 54 humans. With Guided Backpropagation, humans assign VGG-16 an average score of 1.00 which means that it is more trustworthy than AlexNet, while Guided Grad-CAM achieves a higher score of 1.27 which means that VGG-16 is clearly more reliable.
Reference
Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." Proceedings of the IEEE international conference on computer vision. 2017.
Github Code: Grad-CAM