How To Get Started With Deep Learning Using MRI Data. - by Divya Gaur - MICCAI Educational Initiative - Medium
How To Get Started With Deep Learning Using MRI Data. - by Divya Gaur - MICCAI Educational Initiative - Medium
/-*0)"123%4+2$*-+%3%23%5(
"*'25(*6*F9((*B(B#(9C4+$7*&349%(&*$(F3*3'%&*B4+3'G
+*"J*F49*,()%"B*2+)*I(3*2+*(K392*4+(
6%572*82"9
:45*;<=*;>;> ? @A*B%+*9(2) ? ,(B#(9C4+$7 ? D%&3(+
#$%"$&'%$(%)*%'+$#,%-$+''.$/')*0,0&$1(,0&$234
%)5
n though all the algorithms and information are open-source these days,
etimes using even the most well-established computer vision or deep
ning methods do not produce expected results, especially for medical
ging problems. The problem lies in the insufficient understanding of
ical data and its efficient use to leverage the power of the new
putational methods. This post here addresses a basic structure that can
in understanding the problem at hand and implement deep learning
els to use MRI data. Although this work primarily deals with
sification problems, the data exploration and preparation steps equally
y to other types of problem statements. Before we start, it is beneficial to
w that most of the popular machine learning libraries and deep learning
meworks used here based on python. The requirements list below
ains the language and library requirements necessary for following the
e of this tutorial.
uirements: Main libraries and python version used for the code
tioned in this tutorial are as follows:
Python: 3.6
impleITK : 1.2.4
klearn: 0.0
,L*%B2I(*21M"%&%3%4+*J%J($%+(
Pulse sequences: are the pulses of current fed to the RF & gradient coils
hrough RF electronics & gradient amplifier. TR, TE mentioned in the
igure above are repetition time and echo time. These values control
ontrast and ‘weighting’ of the MR image. Although there can be an entire
utorial dedicated to ‘weighting’ in MR imaging, it means which tissues
re prominently visible in the captured image.
xample: some portions like cerebrospinal fluid (CSF) found in the spinal
appear dark in a ‘T1-weighted’ image, whereas it appears white in a ‘T2-
hted’ image.
Horos: Mac OS
MicroDicom: Windows
re starting work with DICOM files, it is always a good idea to have a look Y9%3(
H(291'*,()%"B H%I+*"J H%I+*-+
e data. DICOM readers have important properties such as ‘window level’
‘window width’ that can be adjusted while viewing the data. ‘Window !"#$%&'%()
Google تسجيل الدخول باستخدام
’ means the range of pixel values available for displaying an image <N*O4$$4P(9&
reas ‘window level’ refers to the mid-value of that range. ,21'%+(*$(29+%+I*2+)*)((J*$(29+%+I*(+3'"&
Google استخدام حساب O%+)%+I*P27&*34*2JJ$7*,D*%+*'(2$3'129(G
Medium لتسجيل الدخول إلى
ata exploration and preparation
ﻓﻘد أﺻﺑﺣت،ﻻ داﻋﻲ ﻟﺗذ ّﻛر أيّ ﻛﻠﻣﺎت ﻣرور O4$$4P
.ﻋﻣﻠﯾﺔ ﺗﺳﺟﯾل اﻟدﺧول ﺳرﯾﻌﺔ وﺑﺳﯾطﺔ وآﻣﻧﺔ
that we know how the data is acquired and have viewed the files for an
متابعة *+),&-)+.&*,/"(.
al understanding, it is time to get down to the actual computation part.
section covers data pre-processing steps that are necessary before Q21'*R %+ !%J($%+(T*/*6232*0+I%R
S"%++ (9%+I*L(&4"91(
ying any deep learning or machine learning model on MR data. 7$8)%)$9:,'0:'$;*"<':%($=-)%
>"%$2'$?6$40%'*@,'#(5$A0+$?
ough there exist purely python-based libraries like Pydicom for working =-)%$>"%$2'$,0$=*"1B/'5
most of the DICOM metadata, they are not well suited for working with
,29U*H1'2(F(9
ge data stored in DICOM files. Hence, we would be having a brief
6C$D0%'*%),0,0&$E('($"F
view of more established computer vision package i.e. SimpleITK [1] G-)%>;=$H"1$I'@'*$J0'#
K'*'$;"((,B/'
ch works with python and would come handy when dealing with medical
ging problems. V'(*!7.4R %+ V4P29)&*6232*H1%R
1' +1(
;*'+,:%,0&$=-'$L4LA$K"*/+
G1.$6C66$K,%-$)$9,M./'
pleITK: It acts as a simplified interface between Insight Toolkit (ITK) [4] 2"+'/$1(,0&$;N%-"0
languages such as python. ITK provides an extensive array of
/+B4$*V4B29 %+ .4)(W
tionality, as it supports more than 15 file formats and has
9)N$>""+BN'$%"$O"".($,0
ementations of more than 200 image analysis methods. Now coming ;N%-"0P$)0+$K'/:"M'
Q':%"*,R)%,"0S
k to our use-case of working with MR scans and how this library can
e pre-processing easier and efficient. First and foremost, re-sampling is
X($J H323"& Y9%3(9& Z$4I .29((9& !9%5217 V(9B&
ortant for MR scans, as the voxel size (spatial resolution) might vary for V(K3*34*&J((1'
Read MR volume
.4)(*F49*9(2)%+I*,L-*54$"B(*"&%+I*H%BJ$(-V[
r resizing and re-sampling the volume, the next step is to decide whether
(dimension) or a 3D model will be useful for given data, as this tutorial
begin exploring this field we would only be considering 2D slices
acted using SimpleITK. But for the sake of discussion, both methods
their advantages and disadvantages. Using 2D slices might be easier to
ement and understand but might lack contextual information
gether. Whereas a 3D model might be a bit challenging as it works on
hes of volume, thereby increasing the computational cost, but can
erve the contextual information. Coming back to the immediate matter
and, i.e. using 2D MR slices, following code can be used to extract the
s from MR volume read using previous code snippets:
code returns numpy arrays, which then can be stored in any lossless
ge format for further processing. Code snippet below can be extended to
multiple slices:
that we have MR image slices stored in regular image format, it is the
to move on to the next important part which is working with the data
using it to train your first deep learning model.
we know that the major source of medical data are the online data
lenges and they distribute the data either as training and testing sets or
ning, validation and testing datasets. But this might not be the case when
ing with our own datasets. Hence, it is good to know how to roughly split
dataset. Generally if training and testing sets are given separately, it is
erable to divide the training set in 70% and 30% train and validation set
ectively. But when no test data is provided its good to divide data in 10%,
, 60%, test, validation and training set respectively. One thing to keep in
d here is to sample the test set first and then validation set and at last
ning set as it is essential that test set has all the labels which are likely to
resent in training and validation set.
One thing to notice in the code, is that transforms functionality not only
different image transforms as mentioned in ‘data augmentation’
on, but also has a normalize attribute. This attribute takes ‘mean’ and
dard deviation’ of the dataset as an input and the resulting
malization is useful because of reasons discussed below.
nsfer learning in simple words means using the knowledge which neural
works have acquired for a domain, to a slightly more specialized yet
ed one. For example taking a pre-trained network which is an expert in
gnizing natural images, and training it further to make it recognize
rent images such as X-ray or MR scans.
batch size: the total number of images or entries from a dataset which
would be used for a network at once. dataset_size = iterations*batch_size,
nd one epoch is completed once the given amount of
ataset_size/batch_size number of iterations are completed for a given
atch size.
ptimizer: the optimizer function finds out the loss gradients w.r.t the
network parameters and then uses these gradients to update the
parameters and the scale by which the parameters are updated is
etermined using the learning rate.
!"#$%&'"$($)*+,+"$($)*+-+.&$(/0/1'($%&+2
"$($)*'1($#
we can proceed further to discuss the networks that are the main
ment of transfer learning. Although there are a variety of network choices
able, below we have discussed the two most popular types of
itectures used for computer vision problems.
esNet: The concept of the residual architecture [8] is to use skip
ections also known as identities in addition to normal sequential
nections. This is done in order to make the subsequent layers to have a
of what the inputs were for the previous layers in addition to the
ious layer outputs. The combination of convolution, ReLU and batch
malization layers make up what is also known as a block. The skip
ections connect the input of one block and it’s output to make the input
he next following block. Residual learning is effective because layers not
see what their previous layers made out of the input but also the input
ose layers, making up for any information which might have been lost
ng the learning process. Typical architecture of ResNet-18 with basic
k can be seen below:
L(&:(3C@N*291'%3(13"9(*\N]
that we have trained a model and have some information about its
racy and loss values, it would be beneficial to discuss how exactly can
performance of a network be measured. As it is evident most widely used
ric for determining whether a model performs well or not is accuracy,
t is not well suited for every type of problem statement. The following
e discusses that and also mentions the metrics which can provide a
e insightful view about the performance of the trained model.
rue Positive (TP): Predicted label is same as the actual label otherwise
nown as ground truth.
alse Positive (FP): Predicted label was marked as positive despite it begin
negative in reality.
rue Negative (TN): Prediction marked the label negative and it is actually
negative.
People who have Alzheimer and were predicted they have it: 2
People who do not have Alzheimer and prediction says they do not have
People who do not have Alzheimer, but the prediction says they have it: 6
People who have Alzheimer, but were predicted that they do not have it:
, the formula for accuracy can be derived from the confusion matrix:
ficity of the classifier above is : 90%, this demonstrates that the cases
ch did not have the disease were predicted correctly 90% of the time by
classifier.
sitivity: proportion of cases which are predicted positive and are actually
tive w.r.t total positive cases from ground truth. It can also be derived
confusion matrix and are represented as follows:
tivity of the classifier above is : 12%, this demonstrates that the cases
ch actually had disease, were predicted only 12% times correctly by the
sifier.
d-CAM : Being able to visually represent what networks see, can help in
erstanding neural networks better. This becomes a useful tool specially
n working with medical data, as it is important for the model to focus on
areas which are crucial for making a distinction between healthy and
healthy patient data. One such activation visualization technique is
Grad-CAM (Gradient weighted Class Activation Mapping)[14], it uses
gradients from the last convolution layer or a pooling layer to find a
lization map which roughly estimates the image regions on which more
work focus. The original code from the authors can be found at Grad-
M code. The results from using this method on a normal brain MRI image
as follows:
B2$*#92%+*,L-*&12+*_$(F3`=*+49B2$*#92%+*,L-*&12+*45(9$2JJ()*P%3'*'(23CB2J*I(+(923()*#7*892)C./,
_9%I'3`
onclusion, this tutorial only scratches the surface of the potential deep
ning & medical imaging hold as individual domains and also in
bination with each other. We briefly discussed the series of steps that
ht be useful to establish an initial functional pipeline when getting
ed with any deep learning with medical imaging problems.
rences:
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for
ge recognition. In Proceedings of the IEEE conference on computer vision and
ern recognition (pp. 770–778).
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017).
sely connected convolutional networks. In Proceedings of the IEEE
erence on computer vision and pattern recognition (pp. 4700–4708).
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … &
maison, A. (2019). Pytorch: An imperative style, high-performance deep
ning library. In Advances in neural information processing systems (pp.
–8037).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … &
, A. C. (2015). Imagenet large scale visual recognition challenge.
national journal of computer vision, 115(3), 211–252.
http://dx.doi.org/10.17632/rscbjbr9sj.2#file-41d542e7-7f91-47f6-9ff2-
e5a5a7861
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra,
017). Grad-cam: Visual explanations from deep networks via gradient-
d localization. In Proceedings of the IEEE international conference on
puter vision (pp. 618–626).Selvaraju, R. R., Cogswell, M., Das, A.,
antam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations
m deep networks via gradient-based localization. In Proceedings of the
E international conference on computer vision (pp. 618–626).
a;