How CATs help machine learning diagnose disease

07/11/2020 - By Veronika Cheplygina

CT scans and machine learning!

Machine learning (ML) is everywhere, from your Netflix recommendations to a self-driving car. Also in medicine, ML can be used for many applications, including diagnosis of different diseases – here’s an earlier example that was featured about the brain tumors. Recently medical imaging has seen several algorithms outperforming, or performing on par with, human experts, see [1] for an overview.

To teach a ML algorithm to diagnose disease in a medical scan – for example, a magnetic resonance (MR) or a computed tomography (CT or CAT) scan, you need examples of scans with, and without, the disease. The more examples you have, and the more diverse they are – for examples scans from different hospitals – the better you can expect your algorithm to do in the future. Unfortunately it is more difficult to collect as many medical scans than say, pictures of cars, and diagnostic performance can suffer as a result.

A way to overcome the problem of limited data in medical imaging, is to reuse information extracted from other types of data, or “transfer learning” [2, 3]. For example, if learning to diagnose lung cancer from CT scans, we could try using brain CT scans to teach the algorithm some “basics” about images, so that less lung cancer CT scans are needed to learn more specific features relating to lung cancer. In this case, the brain CT scans are the source data, and the lung CT are the target data.

[Figure from reference 5] Overview of transfer learning. First an algorithm is trained on a source dataset, to learn the “basics” of images. These images can be medical or non-medical. Then the algorithm can be further trained on the target dataset – in the example of this figure, a skin cancer dataset. By doing this, the number of skin cancer images needed to train the algorithm, is reduced. (https://www.tue.nl/en/research/researchers/veronika-cheplygina/)

Around 2014, a surprising discovery was made [4] – the source images could be quite different from the target images. In this paper the researchers used a subset of images from the ImageNet dataset, which contains images from 10 categories like “bird”, “car” and “cat”, to teach an algorithm the “basics”, or in other words, to pretrain it on this source dataset. They compared this approach to pretraining on brain CT scans, and it turned out that the natural images were more effective! The explanation they give is that brain CT scans do not have some variations that we expect to find in lung scans, whereas natural images do provide more variations.

Transfer learning is currently quite popular in medical imaging, especially with ImageNet [3], possibly because you can just download an already pretrained algorithm from the internet, saving yourself time and resources. But there is no definite answer yet, and researchers sometimes compare natural and medical source datasets. In my overview, “Cats or CAT scans: transfer learning from natural or medical image source data sets?” I looked at several such comparisons. Half of the papers achieved better results with natural images, and half with medical images. We do not yet know for sure what’s best – but we can be certain that cats are helping algorithms, just a little bit.

References:

[1] Liu, X., Faes, L., Kale, A. U., Wagner, S. K., Fu, D. J., Bruynseels, A., … & Ledsam, J. R. (2019). A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The lancet digital health, 1(6), e271-e297. (https://www.thelancet.com/journals/landig/article/PIIS2589-7500(19)30123-2/fulltext)

[2] Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359.

[3] Cheplygina, V., de Bruijne, M., & Pluim, J. P. W. (2019). Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical image analysis, 54, 280-296. (https://www.sciencedirect.com/science/article/pii/S1361841518307588)

[4] Schlegl, T., Ofner, J., & Langs, G. (2014). Unsupervised pre-training across image domains improves lung tissue classification. In International MICCAI Workshop on Medical Computer Vision (pp. 82-93). Springer, Cham. (https://link.springer.com/chapter/10.1007/978-3-319-13972-2_8)

[5] Cheplygina, V. (2019). Cats or CAT scans: transfer learning from natural or medical image source data sets?. Current Opinion in Biomedical Engineering, 9, 21-27. (https://www.sciencedirect.com/science/article/pii/S2468451118300527)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Related Posts

Brain tumors and AI

How big data can help us understand mental illness