flylogical: May 2020

Saturday, 30 May 2020

Deep Learning Analysis of COVID-19 lung X-Rays using MATLAB: Part 5

* DISCLAIMER *

I have no medical training. Nothing presented here should be considered in any way as informative from a medical point-of-view. This is simply an exercise in image analysis via Deep Learning using MATLAB, with lung X-rays as a topical example in these times of COVID-19.

INTRODUCTION

In this Part 5 in my series of blog articles on exploring Deep Learning of lung X-rays using MATLAB, the observations for Part 4 -- whereby the grad-CAM technique was used to identify which regions of the X-ray images were being activated for all 19 network architectures under consideration -- serve as the basis for a new network to discriminate between those models which are utilising the lung regions (as desired) rather than outside the lung regions, for a given image-under-test. The resulting network can then be used as a discriminating filter applied to the outputs of the main X-ray classifiers in order to choose between those classifiers which focus (correctly) on the lung regions rather than elsewhere.

DATASET

The image dataset for the grad-CAM Discriminating Filter comprised a set of grad-CAM images as presented in Part 4. Specifically, the dataset was composed by generating a total of 14,333 grad-CAM image files across all 19 network types and X-ray sample images. For training of the Deep Neural Network, these were split into three classes: INSIDE_LUNGS (whereby the grad-CAM images contain activation regions which are focused on the interior of one or both lungs -- the desirable scenario); OUTSIDE_LUNGS (whereby the grad-CAM images contain activation regions which are focused outside of the lungs or even outside of the body -- undesirable scenario); and RIBCAGE_CENTRAL (whereby the grad-CAM images contain activation regions which are focused in the central part of the ribcage rather then explicitly within either lung -- an intermediate scenario which happened to occur commonly so was considered necessary to be included). Sample images of each of these are shown below.

A sample image generated from the grad-CAM technique presented in Part 4 applied to a lung X-ray analysed via a (Transfer Learning) Deep Neural Network from Part 3. This example has been assigned the label INSIDE_LUNGS for the sake of creating a test dataset for the training of the Deep Neural Network Discriminating Filter, the central focus of this current article. Ideally, all the grad-CAM images generated from all the classifiers applied to all the lung X-rays would fall within this INSIDE_LUNGS class. But the results of Part 4 show this not to be the case (and hence the motivation for devising the Discriminating Filter to sort the relevant classifications from the less relevant) .

A sample grad-CAM image which falls within the OUTSIDE_LUNGS category. The purpose of the Discriminating Filter described in this current article is to identify such cases where the X-ray analysis classifier has wrongly focused on regions outside of the lungs (or indeed the body).

A sample grad-CAM image which falls within the RIBCAGE_CENTRAL category. This occurs quite often with the networks from Part 3. The idea is that such cases can be considered less positively definitive than INSIDE_LUNGS, but better than OUTSIDE_LUNGS, when it comes to determining the validity for lung X-ray classification.

GROUND TRUTH DATA LABELLING VIA AMAZON SAGEMAKER

In order to assign each of the 14,333 grad-CAM images into the appropriate class (INSIDE_LUNGS, OUTSIDE_LUNGS, or RIBCAGE_CENTRAL) in preparation for training the Deep Neural Network to be used as the Discriminating Filter, the Amazon Mechanical Turk service (part of the Amazon SageMaker Ground Truth for Data Labelling product suite) was utilised. This unique service leverages an on-demand, scalable, human workforce to perform the image labelling. The service employs thousands of human workers willing to do piecemeal work at their convenience, and is a far more attractive solution than attempting to manually label all the images oneself (!)

TRAINING THE DEEP NEURAL NETWORKS VIA TRANSFER LEARNING

Once the grad-CAM images had been sorted (via AWS Mechanical Turk) into the three classes (INSIDE_LUNGS, OUTSIDE_LUNGS, and RIBCAGE_CENTRAL), all 19 pre-trained networks available in MATLAB were used for Transfer Learning on these grad-CAM images, directly analogous to the approach presented in Part 3 for the underlying X-ray image classifier training.

RESULTS

The results from the (Transfer Learning) training of all the networks is summarised as follows. From consideration of the classification accuracies on the validation dataset, the "best" performing networks were found to be (where the name refers to the base pre-trained network used in the Transfer Learning): googlenet for determining if INSIDE_LUNGS (75% accuracy); darknet19 for determining if OUTSIDE_LUNGS (85% accuracy); and mobilenetv2 for determining if RIBCAGE_CENTRAL (86% accuracy). The validation Confusion Matrix for each of these is included below.

Confusion Matrix (on the validation dataset) for a network trained on grad-CAM images via Transfer Learning starting with the pre-trained googlenet. Of all the networks that were tried, this one had the highest accuracy (75%) for the INSIDE_LUNGS class.

Confusion Matrix (on the validation dataset) for a network trained on grad-CAM images via Transfer Learning starting with the pre-trained darknet19. Of all the networks that were tried, this one had the highest accuracy (85%) for the OUTSIDE_LUNGS class.

Confusion Matrix (on the validation dataset) for a network trained on grad-CAM images via Transfer Learning starting with the pre-trained mobilenetv2. Of all the networks that were tried, this one had the highest accuracy (86%) for the RIBCAGE_CENTRAL class.

DISCUSSION & NEXT STEPS

The results demonstrate that the technique of Transfer Learning can be used to devise Deep Neural Networks which can successfully distinguish (with reasonable accuracy) the validity of a given lung X-ray classifier network applied to a given X-ray image by determining whether the corresponding grad-CAM image focuses on regions INSIDE the lungs (suggesting that the X-ray lung classification is valid), OUSTIDE the lungs (suggesting that the the X-ray lung classification is not valid), or in the RIBCAGE CENTRAL region (suggesting that the lung X-ray classification may be of some validity: i.e., more relevant than OUTSIDE the lungs though not as relevant as INSIDE the lungs). The Deep Neural Networks presented here can therefore serve as a Discrimination Filter to assist in choosing between all the various networks (presented in Part 3) for X-ray lung image classification.

The next step will be to combine the results of this article with the results from Part 3 to determine the "best" network (or combination of networks) for lung X-ray image classification.

Thursday, 14 May 2020

Deep Learning Analysis of COVID-19 lung X-Rays using MATLAB: Part 4

Update: see Part 5 where the grad-CAM results presented below are used to train another suite of networks to help choose between all the lung X-ray classifiers presented in Part 3.

* DISCLAIMER *

INTRODUCTION

In this Part 4 in my series of blog articles on exploring Deep Learning of lung X-rays using MATLAB, the analysis of Part 3 is revisited to further compare the performance of all the pre-trained networks available via MATLAB as the basis for the Transfer Learning procedure. Specifically, the grad-CAM technique is applied to (i) gain an insight into how the various networks respond to the underlying images and, moreover, (ii) to investigate the differences between the responses of each network from one another. The goal is to provide some guidance as to how to choose the "best" network for the task at hand. Again, all analysis is performed in MATLAB.

grad-CAM

The grad-CAM technique is introduced here, with a MATLAB implementation provided here which is used as the basis for the present analysis. Note that grad-CAM is a more powerful and more general extension of the Class Activation Map (CAM) technique used in Part 2.

The code for generating the results presented in the following sections uses the gradcam function (in MATLAB) provided in the reference example here. The gradcam function presented there is used in precisely the same manner here, so is not repeated here.

That said, the cited reference example is directly applicable only to googlenet. In order to extend to each of the other networks requires the appropriate softmax and feature map layers to be identified through use of the analyzeNetwork function to examine the given network and select the correct layers. The softmax layer is easily identified as the last softmax layer before the output. The feature map layer is identified as follows (from here):

"Specify either the last ReLU layer with non-singleton spatial dimensions, or the last layer that gathers the outputs of ReLU layers (such as a depth concatenation or an addition layer). If your network does not contain any ReLU layers, specify the name of the final convolutional layer that has non-singleton spatial dimensions in the output".

For convenience, I have performed this identification for all the network types, and bundled them into a function named gradCamLayerNames (available via my github repository.)

Note: my gradCamLayerNames function returns the relevant layer names for the unmodified pre-trained networks distributed with MATLAB. For pre-trained networks which have been modified for Transfer Learning (by replacing the final few layers as described in Part 1), the relevant layer names for use with gradcam may be different (unless the original names happen to have been replicated). For example, all the networks used in the present analysis have been modified in the manner described in Part 1 for Transfer Learning, and so the relevant softmax layer name for use with gradcam is 'softmax' rather than that returned by gradCamLayerNames.

Image Datasets and Transfer Learning Networks

The lung X-ray image datasets (arranged into Examples 1--4) and the corresponding Transfer Learning trained networks from Part 3 are used here "as is" without further introduction (refer to Part 3 for the details).

Analysis via grad-CAM

EXAMPLE 1: "YES / NO" Classification of Pneumonia

The grad-CAM analysis has been performed on all of the Example 1 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

vgg16 applied to all 224 validation images
darknet53 applied to all 224 validation images
all 19 networks applied to a single representative validation image

EXAMPLE 2: Classification Bacterial or Viral Pneumonia

The grad-CAM analysis has been performed on all of the Example 2 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

darknet53 applied to all 640 validation images
all 19 networks applied to a single representative validation image

EXAMPLE 3: Classification of COVID-19 or Other-Viral

The grad-CAM analysis has been performed on all of the Example 3 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

vgg19 applied to all 260 validation images
all 19 networks applied to a single representative validation image

EXAMPLE 4: Determine if COVID-19 pneumonia versus Healthy, Bacterial, or non-COVID viral pneumonia

The grad-CAM analysis has been performed on all of the Example 4 Transfer Learning networks with all of the corresponding validation images. A representative sample of results are displayed on the following links (where the network names pertain to the base networks used in the Transfer Learning):

inceptionresnetv2 applied to all 44 validation images
all 19 networks applied to a single representative validation image

RESULTS & NEXT STEPS

Looking over all these grad-CAM images for all four Examples (via the links above) confirms that the networks are generally responding to regions within the lungs when making their classifications. This is a positive finding in terms of qualifying the overall Deep Learning approach to the analysis of the lung X-rays, and confirms the results of the (simpler) CAM approach from Part 2. However, the findings are not completely definitive in that it can be seen that some networks on some images are responding to inappropriate regions in the images (e.g., outside the lungs or even outside the body!), thereby reducing the validity of the approach for classifying the lung X-rays.

It is also interesting to observe how the various networks respond differently to the same image. For example, the grad-CAM images below (taken from the results for Experiment 4) illustrate how six different networks (base names darknet19, darknet53, densenet201, googlenet [original], googlenet [places], and inceptionresnetv2) respond to the same validation image. It can be seen that the given networks are activated by quite different regions within the image. This is perhaps not too surprising given that the networks generally have quite different layer structures. That said, the googlenet variants ([original] and [places]) have identical layer structures but have been pre-trained on different image sets, then Transfer Trained on identical lung X-ray training images. The activations observed from grad-CAM analysis are nevertheless quite different.

All this goes to show that the optimal choice of networks for the task of lung X-ray classification is somewhat subtle since the various networks respond in different ways to the underlying images. It is not sufficient to only consider the classification accuracy scores (from the classification-accuracy results tables presented in Part 3). It is important to also consider the relevance and validity of the activated regions as exposed via this grad-CAM analysis.

Interesting next steps to consider therefore would be to (i) combine the results of the various networks on the classification task rather than simply trying to choose a single 'optimal' network (per Experiment task); (ii) whilst doing so, eliminate any network whose grad-CAM activations are in inappropriate regions (i.e., outside the lungs) on a given sample-image-under-test. This could result in a more accurate and robust COVID-19 classifier.