During the Covid-19 pandemic, we were wondering how our skills could help to face the emergency. AI might help in a situation like this in several ways. A lot of initiatives have already been promoted globally, aimed at effectively diagnosing or avoiding the spread of the virus (click here for an interesting article https://www.technologyreview.com/s/615342/how-baidu-is-bringing-ai-to-the-fight-against-coronavirus/).
Last night, the White House has released a large set of scientific papers (about 29,000 articles) on Kaggle, a well-known data science platform, launching the challenge to those who can analyze the text and give new ideas for research.
The European Union has also got in the game, in a call dedicated to the development of innovative solutions to fight the spread of the virus (https://ec.europa.eu/info/news/startups-and-smes-innovative-solutions-welcome-2020-mar-13_en).
Inspired by the work done by Alibaba [1] in China and by Adrian of PyImageSearch [2], we created a Machine Vision algorithm which can analyze images from chest X-rays and estimates the presence of pneumonia due to Covid-19.
The analysis of this article is purely demonstrative and does not intend to have a scientific value. It aims to show how we can use Artificial Intelligence to serve a higher purpose (#AI4GOOD).
To address this problem, we had to find a consistent dataset and we relied on GitHub. We ran into Joseph Cohen, a postdoc from the University of Montreal ((https://josephpcohen.com/w/), who launched a project in order to create an “open database of COVID-19 cases with chest X-ray or CT images.”
The dataset provides several X-ray or CT images of patients with various pathologies, including SARS and MERS. Only the images relating to Covid-19 were kept. We also decided to use the X-rays in an anterior-posterior position, in order to expand the data available through another public dataset (<a href="https://www.kaggle.com/paultimothymooney/chest-xray- pneumonia "target="_blank">https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia</a>) from which we could take X-ray images without pathology. In this database, the X-rays are made in the anterior-posterior position.
Therefore, we had the opportunity to create a dataset consisting of:
56 images of lungs of people with Covid-19
56 images of lungs of healthy people (without Covid-19)
At that point, we created a simple computer vision algorithm using Tensorflow and Keras.
Since the article does not aim to show the implementation of this type of solutions (which is purely demonstrative), we will not go into details. We are now cleaning up the code and, then, we are going to share it soon. If someone is interested in it, feel free to contact us.
This problem is called "classification": we must classify the class (healthy/affected by Covid-19) to which each image belongs
To solve this classification problem, we used MobileNetV1 (https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md), a network that has been designed for mobile devices, with a relatively low computational load. We added a fully connected layer to the structure of this convolutional network, to carry out transfer learning and not to have to train the whole network from scratch.
The convolutional network has learned very well how to do its job by correctly classifying 22 of the 23 images used as tests (it trains on a partial dataset and then it tests its performance on data never seen before). The resulting confusion matrix is shown below:
This result has 100% accuracy [3] on the detection of Covid-19 patients but also leads to a false positive: a considerable problem.
The network, despite the limited data available, does not seem to overfit, as we can see in the following graph which displays the progress of the training metrics.
Has the network learned to classify between Covid and healthy patients? It would seem so, but it would be necessary to deepen the knowledge of the dataset to understand what leads the network to classify an image as belonging to an healthy person and another one as belonging to a person affected by Covid-19. In fact, it could happen that the network identifies patterns that are not related to the presence of Covid-19.
This simple analysis is aimed solely at showing how an approach based on Computer Vision could improve the efficiency of physicians, helping them in diagnosis.
Expanding the dataset to further images such as tomographies could allow to develop more versatile tools.
Meanwhile, the only really useful thing we can do is to stay at home!
Michele Ermidoro