flylogical: Deep Learning Analysis of COVID-19 lung X-Rays using MATLAB: Part 4

Update: see Part 5 where the grad-CAM results presented below are used to train another suite of networks to help choose between all the lung X-ray classifiers presented in Part 3.

* DISCLAIMER *

I have no medical training. Nothing presented here should be considered in any way as informative from a medical point-of-view. This is simply an exercise in image analysis via Deep Learning using MATLAB, with lung X-rays as a topical example in these times of COVID-19.

INTRODUCTION

In this Part 4 in my series of blog articles on exploring Deep Learning of lung X-rays using MATLAB, the analysis of Part 3 is revisited to further compare the performance of all the pre-trained networks available via MATLAB as the basis for the Transfer Learning procedure. Specifically, the grad-CAM technique is applied to (i) gain an insight into how the various networks respond to the underlying images and, moreover, (ii) to investigate the differences between the responses of each network from one another. The goal is to provide some guidance as to how to choose the "best" network for the task at hand. Again, all analysis is performed in MATLAB.

grad-CAM

The grad-CAM technique is introduced here, with a MATLAB implementation provided here which is used as the basis for the present analysis. Note that grad-CAM is a more powerful and more general extension of the Class Activation Map (CAM) technique used in Part 2.

The code for generating the results presented in the following sections uses the gradcam function (in MATLAB) provided in the reference example here. The gradcam function presented there is used in precisely the same manner here, so is not repeated here.

That said, the cited reference example is directly applicable only to googlenet. In order to extend to each of the other networks requires the appropriate softmax and feature map layers to be identified through use of the analyzeNetwork function to examine the given network and select the correct layers. The softmax layer is easily identified as the last softmax layer before the output. The feature map layer is identified as follows (from here):

"Specify either the last ReLU layer with non-singleton spatial dimensions, or the last layer that gathers the outputs of ReLU layers (such as a depth concatenation or an addition layer). If your network does not contain any ReLU layers, specify the name of the final convolutional layer that has non-singleton spatial dimensions in the output".

For convenience, I have performed this identification for all the network types, and bundled them into a function named gradCamLayerNames (available via my github repository.)

Note: my gradCamLayerNames function returns the relevant layer names for the unmodified pre-trained networks distributed with MATLAB. For pre-trained networks which have been modified for Transfer Learning (by replacing the final few layers as described in Part 1), the relevant layer names for use with gradcam may be different (unless the original names happen to have been replicated). For example, all the networks used in the present analysis have been modified in the manner described in Part 1 for Transfer Learning, and so the relevant softmax layer name for use with gradcam is 'softmax' rather than that returned by gradCamLayerNames.

Image Datasets and Transfer Learning Networks

The lung X-ray image datasets (arranged into Examples 1--4) and the corresponding Transfer Learning trained networks from Part 3 are used here "as is" without further introduction (refer to Part 3 for the details).

Analysis via grad-CAM

EXAMPLE 1: "YES / NO" Classification of Pneumonia

The grad-CAM analysis has been performed on all of the Example 1 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

vgg16 applied to all 224 validation images
darknet53 applied to all 224 validation images
all 19 networks applied to a single representative validation image

EXAMPLE 2: Classification Bacterial or Viral Pneumonia

The grad-CAM analysis has been performed on all of the Example 2 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

darknet53 applied to all 640 validation images
all 19 networks applied to a single representative validation image

EXAMPLE 3: Classification of COVID-19 or Other-Viral

The grad-CAM analysis has been performed on all of the Example 3 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

vgg19 applied to all 260 validation images
all 19 networks applied to a single representative validation image

EXAMPLE 4: Determine if COVID-19 pneumonia versus Healthy, Bacterial, or non-COVID viral pneumonia

The grad-CAM analysis has been performed on all of the Example 4 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

inceptionresnetv2 applied to all 44 validation images
all 19 networks applied to a single representative validation image

RESULTS & NEXT STEPS

Looking over all these grad-CAM images for all four Examples (via the links above) confirms that the networks are generally responding to regions within the lungs when making their classifications. This is a positive finding in terms of qualifying the overall Deep Learning approach to the analysis of the lung X-rays, and confirms the results of the (simpler) CAM approach from Part 2. However, the findings are not completely definitive in that it can be seen that some networks on some images are responding to inappropriate regions in the images (e.g., outside the lungs or even outside the body!), thereby reducing the validity of the approach for classifying the lung X-rays.

It is also interesting to observe how the various networks respond differently to the same image. For example, the grad-CAM images below (taken from the results for Experiment 4) illustrate how six different networks (base names darknet19, darknet53, densenet201, googlenet [original], googlenet [places], and inceptionresnetv2) respond to the same validation image. It can be seen that the given networks are activated by quite different regions within the image. This is perhaps not too surprising given that the networks generally have quite different layer structures. That said, the googlenet variants ([original] and [places]) have identical layer structures but have been pre-trained on different image sets, then Transfer Trained on identical lung X-ray training images. The activations observed from grad-CAM analysis are nevertheless quite different.

All this goes to show that the optimal choice of networks for the task of lung X-ray classification is somewhat subtle since the various networks respond in different ways to the underlying images. It is not sufficient to only consider the classification accuracy scores (from the classification-accuracy results tables presented in Part 3). It is important to also consider the relevance and validity of the activated regions as exposed via this grad-CAM analysis.

Interesting next steps to consider therefore would be to (i) combine the results of the various networks on the classification task rather than simply trying to choose a single 'optimal' network (per Experiment task); (ii) whilst doing so, eliminate any network whose grad-CAM activations are in inappropriate regions (i.e., outside the lungs) on a given sample-image-under-test. This could result in a more accurate and robust COVID-19 classifier.