Abstract
Explaining predictions of black-box neural networks is crucial when
applied to decision-critical tasks. Thus, attribution maps are
commonly used to identify important image regions, despite prior work
showing that humans prefer explanations based on similar examples. To
this end, ProtoPNet learns a set of class-representative feature
vectors (prototypes) for case-based reasoning. During inference,
similarities of latent features to prototypes are linearly classified
to form predictions and attribution maps are provided to explain the
similarity. In this work, we evaluate whether architectures for case-
based reasoning fulfill established axioms required for faithful
explanations using the example of ProtoPNet. We show that such
architectures allow the extraction of faithful explanations. However,
we prove that the attribution maps used to explain the similarities
violate the axioms. We propose a new procedure to extract explanations
for trained ProtoPNets, named ProtoPFaith. Conceptually, these
explanations are Shapley values, calculated on the similarity scores
of each prototype. They allow to faithfully answer which prototypes
are present in an unseen image and quantify each pixel’s contribution
to that presence, thereby complying with all axioms. The theoretical
violations of ProtoPNet manifest in our experiments on three datasets
(CUB-200-2011, Stanford Dogs, RSNA) and five architectures (ConvNet,
ResNet, ResNet50, WideResNet50, ResNeXt50). Our experiments show a
qualitative difference between the explanations given by ProtoPNet and
ProtoPFaith. Additionally, we quantify the explanations with the Area
Over the Perturbation Curve, on which ProtoPFaith outperforms
ProtoPNet on all experiments by a factor $>10^{3}$.
Publication
AAAI Conference on Artificial Intelligence