0% found this document useful (0 votes)
20 views

How To Get Started With Deep Learning Using MRI Data. - by Divya Gaur - MICCAI Educational Initiative - Medium

The document discusses preprocessing of magnetic resonance imaging (MRI) data for use in deep learning models. It describes how MRI scans are acquired and stored in DICOM format. Key steps explained are loading MRI volumes using SimpleITK, resampling and resizing volumes, and extracting 2D slices from volumes for use in deep learning models.

Uploaded by

amany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

How To Get Started With Deep Learning Using MRI Data. - by Divya Gaur - MICCAI Educational Initiative - Medium

The document discusses preprocessing of magnetic resonance imaging (MRI) data for use in deep learning models. It describes how MRI scans are acquired and stored in DICOM format. Key steps explained are loading MRI volumes using SimpleITK, resampling and resizing volumes, and extracting 2D slices from volumes for use in deep learning models.

Uploaded by

amany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

!"#$%&'()*%+ ,-..

/-*0)"123%4+2$*-+%3%23%5(

"*'25(*6*F9((*B(B#(9C4+$7*&349%(&*$(F3*3'%&*B4+3'G
+*"J*F49*,()%"B*2+)*I(3*2+*(K392*4+(

6%572*82"9
:45*;<=*;>;> ? @A*B%+*9(2) ? ,(B#(9C4+$7 ? D%&3(+

#$%"$&'%$(%)*%'+$#,%-$+''.$/')*0,0&$1(,0&$234
%)5
n though all the algorithms and information are open-source these days,
etimes using even the most well-established computer vision or deep
ning methods do not produce expected results, especially for medical
ging problems. The problem lies in the insufficient understanding of
ical data and its efficient use to leverage the power of the new
putational methods. This post here addresses a basic structure that can
in understanding the problem at hand and implement deep learning
els to use MRI data. Although this work primarily deals with
sification problems, the data exploration and preparation steps equally
y to other types of problem statements. Before we start, it is beneficial to
w that most of the popular machine learning libraries and deep learning
meworks used here based on python. The requirements list below
ains the language and library requirements necessary for following the
e of this tutorial.

uirements: Main libraries and python version used for the code
tioned in this tutorial are as follows:
Python: 3.6

impleITK : 1.2.4

orch (PyTorch library): 1.4.0

orchvision (datasets and transforms): 0.5.0

klearn: 0.0

Understanding Magnetic Resonance Imaging (MRI) and DICOM

netic Resonance Imaging (MRI) is a widely used imaging modality in


ology. Even though MRI has long data acquisition time, and complex
ning protocols, it is preferred more than other imaging modalities such
omputed Tomography (CT) because it lacks the use of external radiation
has a better visualization potential.

re getting into the details, we will first walk through MR Image


uisition pipeline:

,L*%B2I(*21M"%&%3%4+*J%J($%+(

king principle: MRI works by aligning hydrogen atoms using a magnetic


as they are in abundance in the human body due to the presence of
er and fat.

agnetic: putting the subject in strong magnetic field. The purpose of


using the magnetic field is to enable the protons to spin at a frequency
proportional to magnetic field & this frequency is called resonance
requency.

esonance: transmitting Radio Frequency (RF) energy to the subject using


RF coil, and then receiving RF signals emitted by the subject.

maging: spatially modulating the magnetic field strength to distinguish


ignals from different locations using gradient coil.

Pulse sequences: are the pulses of current fed to the RF & gradient coils
hrough RF electronics & gradient amplifier. TR, TE mentioned in the
igure above are repetition time and echo time. These values control
ontrast and ‘weighting’ of the MR image. Although there can be an entire
utorial dedicated to ‘weighting’ in MR imaging, it means which tissues
re prominently visible in the captured image.

xample: some portions like cerebrospinal fluid (CSF) found in the spinal
appear dark in a ‘T1-weighted’ image, whereas it appears white in a ‘T2-
hted’ image.

K-space: It is a collection of spatial frequencies captured after applying


he pulse sequences. The (x,y) coordinates in this space represent the
requency and phase for every pixel of the image generated after
pplying inverse Fourier transform to the captured frequency data.

ourier transform: is a mathematical technique that allows a signal to be


ecomposed into a sum of sine or cosine waves of different frequencies,
phases and amplitudes. This link can be followed for further reading
bout Fourier transform.

, that we have an overview of the image acquisition let us discuss a bit


ut the format in which these images are stored and transmitted. The
t popular format for storing radiology data is Digital Imaging and
munications in Medicine (DICOM), files generated using this standard
meta-data such as patient name, patient ID in addition to the image
. Hence, data protection is a huge part of working with medical records
ing anonymization an essential step before using the data for any
hine learning or data processing model. There are specific applications
wn as DICOM viewers which can be used to load and inspect the ‘.dcm’
OM) files.
e of the popular viewers are:

Horos: Mac OS

MicroDicom: Windows

DimViewer: Windows, Mac OS X, Linux

re starting work with DICOM files, it is always a good idea to have a look Y9%3(
H(291'*,()%"B H%I+*"J H%I+*-+
e data. DICOM readers have important properties such as ‘window level’
‘window width’ that can be adjusted while viewing the data. ‘Window !"#$%&'%()
Google ‫تسجيل الدخول باستخدام‬
’ means the range of pixel values available for displaying an image <N*O4$$4P(9&
reas ‘window level’ refers to the mid-value of that range. ,21'%+(*$(29+%+I*2+)*)((J*$(29+%+I*(+3'"&
Google ‫استخدام حساب‬ O%+)%+I*P27&*34*2JJ$7*,D*%+*'(2$3'129(G
Medium ‫لتسجيل الدخول إلى‬
ata exploration and preparation
‫ ﻓﻘد أﺻﺑﺣت‬،‫ﻻ داﻋﻲ ﻟﺗذ ّﻛر أيّ ﻛﻠﻣﺎت ﻣرور‬ O4$$4P
.‫ﻋﻣﻠﯾﺔ ﺗﺳﺟﯾل اﻟدﺧول ﺳرﯾﻌﺔ وﺑﺳﯾطﺔ وآﻣﻧﺔ‬
that we know how the data is acquired and have viewed the files for an
‫متابعة‬ *+),&-)+.&*,/"(.
al understanding, it is time to get down to the actual computation part.
section covers data pre-processing steps that are necessary before Q21'*R %+ !%J($%+(T*/*6232*0+I%R
S"%++ (9%+I*L(&4"91(
ying any deep learning or machine learning model on MR data. 7$8)%)$9:,'0:'$;*"<':%($=-)%
>"%$2'$?6$40%'*@,'#(5$A0+$?
ough there exist purely python-based libraries like Pydicom for working =-)%$>"%$2'$,0$=*"1B/'5
most of the DICOM metadata, they are not well suited for working with
,29U*H1'2(F(9
ge data stored in DICOM files. Hence, we would be having a brief
6C$D0%'*%),0,0&$E('($"F
view of more established computer vision package i.e. SimpleITK [1] G-)%>;=$H"1$I'@'*$J0'#
K'*'$;"((,B/'
ch works with python and would come handy when dealing with medical
ging problems. V'(*!7.4R %+ V4P29)&*6232*H1%R
1' +1(
;*'+,:%,0&$=-'$L4LA$K"*/+
G1.$6C66$K,%-$)$9,M./'
pleITK: It acts as a simplified interface between Insight Toolkit (ITK) [4] 2"+'/$1(,0&$;N%-"0
languages such as python. ITK provides an extensive array of
/+B4$*V4B29 %+ .4)(W
tionality, as it supports more than 15 file formats and has
9)N$>""+BN'$%"$O"".($,0
ementations of more than 200 image analysis methods. Now coming ;N%-"0P$)0+$K'/:"M'
Q':%"*,R)%,"0S
k to our use-case of working with MR scans and how this library can
e pre-processing easier and efficient. First and foremost, re-sampling is
X($J H323"& Y9%3(9& Z$4I .29((9& !9%5217 V(9B&
ortant for MR scans, as the voxel size (spatial resolution) might vary for V(K3*34*&J((1'

rent scans. Following code snippets demonstrate how to read, re-sample


resize an MR volume:
Loading, re-sampling and re-sizing the data in python:

Read MR volume

import SimpleITK as sitk


readMRIVolume(mri_volume_path):
reader = sitk.ImageSeriesReader()
dicom_files = reader.GetGDCMSeriesFileNames(mri_volume_path)
#'dicom_files' are the individual slices of an MRI volume
reader.SetFileNames(dicom_files)
retrieved_mri_volume = reader.Execute()
return retrieved_mri_volume

processing.py hosted with ❤ by GitHub view raw

.4)(*F49*9(2)%+I*,L-*54$"B(*"&%+I*H%BJ$(-V[

e-sample previously read MR volume


esize the re-sampled MR volume

r resizing and re-sampling the volume, the next step is to decide whether
(dimension) or a 3D model will be useful for given data, as this tutorial
begin exploring this field we would only be considering 2D slices
acted using SimpleITK. But for the sake of discussion, both methods
their advantages and disadvantages. Using 2D slices might be easier to
ement and understand but might lack contextual information
gether. Whereas a 3D model might be a bit challenging as it works on
hes of volume, thereby increasing the computational cost, but can
erve the contextual information. Coming back to the immediate matter
and, i.e. using 2D MR slices, following code can be used to extract the
s from MR volume read using previous code snippets:

code returns numpy arrays, which then can be stored in any lossless
ge format for further processing. Code snippet below can be extended to
multiple slices:
that we have MR image slices stored in regular image format, it is the
to move on to the next important part which is working with the data
using it to train your first deep learning model.

we know that the major source of medical data are the online data
lenges and they distribute the data either as training and testing sets or
ning, validation and testing datasets. But this might not be the case when
ing with our own datasets. Hence, it is good to know how to roughly split
dataset. Generally if training and testing sets are given separately, it is
erable to divide the training set in 70% and 30% train and validation set
ectively. But when no test data is provided its good to divide data in 10%,
, 60%, test, validation and training set respectively. One thing to keep in
d here is to sample the test set first and then validation set and at last
ning set as it is essential that test set has all the labels which are likely to
resent in training and validation set.

plitting validation and training data using PyTorch and sklearn: As


mentioned above it is beneficial to split the training data into a train and
alidation set. But before we discuss that code we would also look at
econd important aspect i.e data augmentation. Although there are
arious non-PyTorch and possibly more effective ways of doing it, for the
ake of simplicity we would only consider the ways offered by PyTorch.
Next point briefly discusses meaning and the need for augmentation and
s then followed by a code snippet for applying data splitting and
ugmentation using PyTorch and sklearn.

Data augmentation: It is evident that due to all the privacy constraints,


nd long acquisition times there can be scarcity of data. In order to
vercome this problem, there are a few techniques available which can
e used to increase the size of dataset, and they are as simple as
ranslating, flipping and cropping the existing data to generate new image
iles. While data augmentation i.e the image transforms are useful to
apture expressiveness of the data, it should be done within the bounds
f medical imaging, because sometimes these transforms can result in
hange of label. For example, if a chest x-ray is flipped around 180 degrees
uch that the heart lies on the right side it demonstrates a rare condition
ome people suffer from and hence, cannot use the same label as the
riginal image. There are a lot of ways in which these transformations
re applied, but the quickest way here would be to use what PyTorch
ffers i.e. options like CenterCrop, RandomAffine, RandomHorizontalFlip,
RandomRotation.
ve mentioned code snippet shows data augmentation techniques only
ied to the training set and validation set is simply resized & normalized.

One thing to notice in the code, is that transforms functionality not only
different image transforms as mentioned in ‘data augmentation’
on, but also has a normalize attribute. This attribute takes ‘mean’ and
dard deviation’ of the dataset as an input and the resulting
malization is useful because of reasons discussed below.

thing to notice is that it is not always necessary to calculate these values


ndividual datasets and the standard values (mean: [0.485, 0.456, 0.406],
[0.229, 0.224, 0.225]) calculated for ImageNet [12] work fine for most
s. This makes using the code from the following section completely
onal.

Normalizing the dataset: Normalization is useful & necessary because


he mean and standard deviation modify the data, such that it lies in a
particular range making the computation easier and faster. In most cases
he aim is to keep the input’s mean zero and standard deviation one.
ollowing code snippet shows how to load the dataset using PyTorch and
hen find the mean and standard deviation of the data. As this is an
ptional step, it should be done in a different code file and the values
hould be used while applying transforms mentioned above.
ansfer learning and popular image classification models:

nsfer learning in simple words means using the knowledge which neural
works have acquired for a domain, to a slightly more specialized yet
ed one. For example taking a pre-trained network which is an expert in
gnizing natural images, and training it further to make it recognize
rent images such as X-ray or MR scans.

before we dig deeper it would be beneficial to know about the major


ors which impact the training process and the final outcome. Apart from
choice of model it is essential to find a suitable loss function, optimizer,
ning rate (schedule) and number of epochs. For initial experiments the
es, functions and schedules mentioned in the code here can be used, as
are pretty standard and are applicable for most problems. But it is
eficial to have a brief overview of how these values influence network
ning and outcome.

poch: it indicates the number of passes, entire dataset would make in


rder to complete the training process. For example: if num_epochs=25, it
means the training process would finish once there have been 25 passes
hrough the entire dataset.

batch size: the total number of images or entries from a dataset which
would be used for a network at once. dataset_size = iterations*batch_size,
nd one epoch is completed once the given amount of
ataset_size/batch_size number of iterations are completed for a given
atch size.

oss function: it is also known as an objective function or cost function and


he goal of the optimizer is to find a global minima for this function. As
his post deals with classification problem, loss function chosen here is
cross-entropy’.

earning rate (schedule): learning rate is the parameter which determines


he step size which an optimizer takes in each iteration in order to move
owards its goal of minimizing the loss function. Learning rate schedule
helps in setting and updating the value of learning rate for every epoch.

ptimizer: the optimizer function finds out the loss gradients w.r.t the
network parameters and then uses these gradients to update the
parameters and the scale by which the parameters are updated is
etermined using the learning rate.

!"#$%&'"$($)*+,+"$($)*+-+.&$(/0/1'($%&+2
"$($)*'1($#

hough we have provided a brief overview of the elements which play


le in the network learning process, it is assumed that the reader has
me knowledge about the working of convolutional neural network
N). The knowledge is transferable across different deep learning
meworks, but if you are new to deep learning or PyTorch it is
ommended to have a look at this link.

we can proceed further to discuss the networks that are the main
ment of transfer learning. Although there are a variety of network choices
able, below we have discussed the two most popular types of
itectures used for computer vision problems.
esNet: The concept of the residual architecture [8] is to use skip
ections also known as identities in addition to normal sequential
nections. This is done in order to make the subsequent layers to have a
of what the inputs were for the previous layers in addition to the
ious layer outputs. The combination of convolution, ReLU and batch
malization layers make up what is also known as a block. The skip
ections connect the input of one block and it’s output to make the input
he next following block. Residual learning is effective because layers not
see what their previous layers made out of the input but also the input
ose layers, making up for any information which might have been lost
ng the learning process. Typical architecture of ResNet-18 with basic
k can be seen below:

L(&:(3C@N*291'%3(13"9(*\N]

have employed here, the fine-tuning type of transfer learning which


ns that last classifier layer is adapted to use the number of classes the
dataset offers and the rest of the weights are copied from the trained
el after which network is further trained to adapt changed final output
r. Also, this transfer learning code can be extended to involve logging for
racy and loss values in both the training and validation phase. Following
e code for using fine-tuning a pre-trained ResNet network:
enseNet: DenseNets [9] are an extension of the concept of residual
ning, but the advantage is that they reduce the redundancy in the
work. Unlike ResNets, DenseNet doesn’t add the input, and the output of
k it concatenates them both. Another difference is that the information
is not only limited to the immediate next layer but also propagated to all
ollowing dense blocks giving all the layers a chance to learn from initial
ts used by other previous layers. As the information is not summed but
catenated, it is also bound to grow manifolds. The growth is a function of
rameter known as the growth rate, which controls the amount of previous
rmation that is concatenated, hence controlling the amount of data that
s through the network, making it a better version of ResNets. The
rmation flow architecture is as in the following figure:
6(+&(:(3*291'%3(13"9(*\^]

stead of using ResNet, DenseNet can be a more suitable model for


sfer learning. One major difference is that instead of a fully connected (fc)
esNet, DenseNet has it renamed as classifier, hence to address this
nge the code can be modified as follows :
e transfer learning examples with PyTorch can be found here.

oes accuracy as a metric tell the real story?

that we have trained a model and have some information about its
racy and loss values, it would be beneficial to discuss how exactly can
performance of a network be measured. As it is evident most widely used
ric for determining whether a model performs well or not is accuracy,
t is not well suited for every type of problem statement. The following
e discusses that and also mentions the metrics which can provide a
e insightful view about the performance of the trained model.

fusion matrix: Also known as contingency table, is a metric widely used in


sification problems and is represented as follows for binary
sification but can be extended to multi-class classification problem.
meaning of TP, FN, FP and TN is as follows:

rue Positive (TP): Predicted label is same as the actual label otherwise
nown as ground truth.

alse Negative (FN): Predicted label was marked as negative, despite it


ctually being positive.

alse Positive (FP): Predicted label was marked as positive despite it begin
negative in reality.

rue Negative (TN): Prediction marked the label negative and it is actually
negative.

meanings can be better understood using the following example:


l number of subjects: 82

ects who have Alzheimer: 16

ects who do not have Alzheimer: 66

People who have Alzheimer and were predicted they have it: 2

People who do not have Alzheimer and prediction says they do not have

People who do not have Alzheimer, but the prediction says they have it: 6

People who have Alzheimer, but were predicted that they do not have it:
, the formula for accuracy can be derived from the confusion matrix:

can be seen that accuracy is influenced by the number of true positive


true negatives, but this might cause a problem when the datasets used
mbalanced.

if we calculate the accuracy of the sample classifier used for


onstrating confusion matrix, it will come out to be 75% even when the
sifier only detected 2 cases out of the total 16 cases. To solve this class
ion problem two separate metrics known as sensitivity and specificity
e introduced and are defined as follows:

cificity: proportion of cases which are predicted negative and are


ally negative w.r.t. total negative cases from ground truth. It can also be
ved from confusion matrix and are represented as follows:

ficity of the classifier above is : 90%, this demonstrates that the cases
ch did not have the disease were predicted correctly 90% of the time by
classifier.

sitivity: proportion of cases which are predicted positive and are actually
tive w.r.t total positive cases from ground truth. It can also be derived
confusion matrix and are represented as follows:
tivity of the classifier above is : 12%, this demonstrates that the cases
ch actually had disease, were predicted only 12% times correctly by the
sifier.

above calculations show a reason for why calculating accuracy alone


ially in the medical imaging domain can be misleading due to the high
s imbalance and how calculating confusion matrix and specificity,
itivity can offer more insight on the functioning of the model. The
mple above uses binary classification problem but both confusion matrix
calculation of specificity and sensitivity can be extended for multi-class
sification. The transfer learning code specified above can be modified as
ws to accommodate calculation of confusion matrix, specificity, sensitivity
display of confusion matrix:
topic which this tutorial discusses is visualizing what the neural
works actually focus on while they are training. As this is a beginner’s
rial discussing the code for these technique is beyond the scope, but
wing is a brief introduction of one such technique:

d-CAM : Being able to visually represent what networks see, can help in
erstanding neural networks better. This becomes a useful tool specially
n working with medical data, as it is important for the model to focus on
areas which are crucial for making a distinction between healthy and
healthy patient data. One such activation visualization technique is
Grad-CAM (Gradient weighted Class Activation Mapping)[14], it uses
gradients from the last convolution layer or a pooling layer to find a
lization map which roughly estimates the image regions on which more
work focus. The original code from the authors can be found at Grad-
M code. The results from using this method on a normal brain MRI image
as follows:

B2$*#92%+*,L-*&12+*_$(F3`=*+49B2$*#92%+*,L-*&12+*45(9$2JJ()*P%3'*'(23CB2J*I(+(923()*#7*892)C./,
_9%I'3`
onclusion, this tutorial only scratches the surface of the potential deep
ning & medical imaging hold as individual domains and also in
bination with each other. We briefly discussed the series of steps that
ht be useful to establish an initial functional pipeline when getting
ed with any deep learning with medical imaging problems.

rences:

. Beare, B. C. Lowekamp, Z. Yaniv, “Image Segmentation, Registration


Characterization in R with SimpleITK”, J Stat Softw, 86(8), doi:
8637/jss.v086.i08, 2018.

. Yaniv, B. C. Lowekamp, H. J. Johnson, R. Beare, “SimpleITK Image-


ysis Notebooks: a Collaborative Environment for Education and
roducible Research”, J Digit Imaging., doi: 10.1007/s10278–017–0037–8,
): 290–303, 2018.

. C. Lowekamp, D. T. Chen, L. Ibáñez, D. Blezek, “The Design of


pleITK”, Front. Neuroinform., 7:45. doi: 10.3389/fninf.2013.00045, 2013.

oo TS, Ackerman MJ, Lorensen WE, Schroeder W, Chalana V, Aylward S,


axas D, Whitaker R. Engineering and Algorithm Design for an Image
essing API: A Technical Report on ITK — The Insight Toolkit. In Proc. of
icine Meets Virtual Reality, J. Westwood, ed., IOS Press Amsterdam pp
592 (2002).

McCormick M, Liu X, Jomier J, Marion C, Ibanez L. ITK: enabling


oducible research and open science. Front Neuroinform. 2014;8:13.
ished 2014 Feb 20. doi:10.3389/fninf.2014.00013

ohnson, McCormick, Ibanez. “The ITK Software Guide: Design and


ctionality.” Fourth Edition. Published by Kitware, Inc. 2015 ISBN: 9781–
34–28–3.

ohnson, McCormick, Ibanez. “The ITK Software Guide: Introduction and


elopment Guidelines.” Fourth Edition. Published by Kitware, Inc. 2015
N: 9781–930934–27–6.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for
ge recognition. In Proceedings of the IEEE conference on computer vision and
ern recognition (pp. 770–778).

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017).
sely connected convolutional networks. In Proceedings of the IEEE
erence on computer vision and pattern recognition (pp. 4700–4708).

Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net:


volutional networks for biomedical image segmentation. In International
erence on Medical image computing and computer-assisted intervention (pp.
241). Springer, Cham.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … &
maison, A. (2019). Pytorch: An imperative style, high-performance deep
ning library. In Advances in neural information processing systems (pp.
–8037).

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … &
, A. C. (2015). Imagenet large scale visual recognition challenge.
national journal of computer vision, 115(3), 211–252.

http://dx.doi.org/10.17632/rscbjbr9sj.2#file-41d542e7-7f91-47f6-9ff2-
e5a5a7861

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra,
017). Grad-cam: Visual explanations from deep networks via gradient-
d localization. In Proceedings of the IEEE international conference on
puter vision (pp. 618–626).Selvaraju, R. R., Cogswell, M., Das, A.,
antam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations
m deep networks via gradient-based localization. In Proceedings of the
E international conference on computer vision (pp. 618–626).

Case courtesy of Assoc Prof Frank Gaillard, <a href=”


s://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href=”
s://radiopaedia.org/cases/22196">rID: 22196</a>

)%12$*-B2I%+I 6((J*D(29+%+I V92+&F(9*D(29+%+I ,21'%+(*D(29+%+I

a;

You might also like