0% found this document useful (0 votes)
37 views

Cardiac MRI Segmentation With A Dilated CNN Incorporating Domain Specific Constraints

Cardiac

Uploaded by

Aman Agarwal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
37 views

Cardiac MRI Segmentation With A Dilated CNN Incorporating Domain Specific Constraints

Cardiac

Uploaded by

Aman Agarwal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 9
Cardiac MRI Segmentation With a Dilated CNN Incorporating Domain-Specific Constraints Georgios Simantiris © and Georgios Tviritas ©, Senior Member, IEEE Abstract—Semantic segmentation of cardiac MR images is a challenging (ask due to its importance in medical assessment fof heart diseases. Having a detailed localization of specific re- gions of interest such as Right and Left Ventricular Cavities and ‘Myocardium, doctors ean infer important information about the presence of cardiovascular diseases, which are today a major cause fof death globally. This paper addresses the problem of semantic segmentation in cardiac MR images using a dilated Convolutional Neural Network. Opting for dilated convolutions allowed us to ‘work in full resolution throughout the network's layers, preserving. localization accuracy, while maintaining a relatively small number of trainable parameters. To assist thenetwork's training process we designed a custom loss function, Furthermore, we developed new Augmentation techniques and also adapted existing ones, to cope forthe lack of sufficient training images. Consequently, the training, set increases not only by amount, but by substance as well, and the network trains quickly and efficiently without overfitting. Our pre-and post-processing steps are also crucial tothe whole process. ‘We apply our methodology for the Right and Left Ventricles (RV, LV) and also the Myocardium (MYO) according to the Automated. (Caridiac Diagnosis Challenge (ACDC) with promising results. Sub- titting our algorithm's predictions to the Post-2017-MICCAT- challenge testing phase, we achieved similar scores (average Dice coefficient 0.916) on the test data set compared to the state of the art featured in the ACDC leaderboard, but parameters than the leading method. Our approach outperforms other methods featuring dilated conyolutions in this challenge up ‘until now. Index Terms—Cardiac MRI segmentation, CNN, data augmentation, loss function, dilated convolutions, MRF. I, IstRopuction UTOMATED semantic segmentation in medical images is ‘very challenging task. When performed correctly, it ean Assist doctors who would be spending a significant amount of time to manually segment regions of interest (ROIs). In addition, to saving a lot of time, it eliminates ambiguities from human intervention, which can occur when the regions are annotated by ‘more than one experts. Furthermore, i ean serve as a frst step towards automatic disease diagnosis, In patticular the human “Manuscript secived December 15, 2019; revised Jane 8, 2020, acoptd July 15, 2020. Date of pabieation Joly 31,2020: date of erent version Seplembet 2020, The guest eto coordinating the review of tis manusetpt and ‘pproving x for publication mas Dr Vishal Monga (Corresponding author Georgios Smarr) The authors ate with the Department of Computer Science, University of Crate, 700 13 Heraktion, Gress (e-mail simgetesd ioe gr, inane fe. voe go. This arucle has supplementary downloadable material avaable at tps! ecexplore ee ng, provided by the autor. Drgial Object deter 10.1 1095S 20203013351 heart can suffer from various cardiovascular diseases, still a ‘major, if not the most significant, cause of death globally Inzecent years, Deep Convolutional Neural Networks (CNNs) hhave achieved state-of-the-att results in biomedical image seg- ‘mentation tasks. Especially the U-net architecture [15] proved to be task independent and has been widely applied with mi- nor of major modifications. Approaches with CNNs for the segmentation of the Right and Left Ventricles, as well as the myocardium outperformed other more conventional computer vision and machine learning approaches (1). The majority of the best performing algorithms for ventricular segmentation are inspired by the U-net architecture. Dilated, or a trous, convolutions have also been proven quite clfective in capturing context relations in images by drastically cclarging the CNN's receptive ficld [4]. They have been used as front-end modules [19], as well as in conjunction with residu- als {20}. In[18] adilated CNN in relatively shallow architecture is used standalone in the framework of the Automatic Cardiac Diagnosis Challenge. In[9] the U-netis given acontext extractor built using a dense a trous convolution block and the new network outperforms the U-net in many medical applications The dilated CNN with 3 x 3 convolutions is known to suffer from gridding artifacts [20]. Inthe residual dilated networks the ‘gridding artifacts have been alleviated by adding more layers after the block of dilated convolutions. In [17] two methods of smoothed convolutions ate proposed for addressing the gridding artifacts, In this work, we address the task of semantic segmentation ‘of cardiac MR images having as main objective and motivation, to obtain a network that is simple as possible, driven by the application domain, We propose a simple architecture with cascaded dilated convolution. In semantic segmentation, and ‘moreover in cardiac segmentation, localization performance is, primordial. We have opted foradilated CNN in order to preserve image resolution, In this framework we propose to use large emel size filters at the early layers for obtaining learnable smooth convolutions. As in our design the kemels are larger ‘at the frst layers and become narrow at the last layers, dense ‘connections between the feature maps are established through the layers. The deep neural networks rely heavily on big data to have ‘good generalisation performance. Unfortunately, our targeted application domain does not have access to big data, as the data annotation is very costly and often requires significant amounts of expertise, A dala-space solution to the problem of limited annotated data is data augmentation, Data augmentation 1982-4555 © 2020 IEEE. Personal use ie permited, but republicain/edisuibution requies TERE pemision, See bps: see org publeatonseghahndex hin foe more snralion, ‘Autrizadteonsed use nied to: Chandigarh Univeriy. Downloaded on May 15,2022 at 135454 UTC tom IEE Xplore. Reston apply. ‘encompasses a suite of techniques that enhance the size and quality of training data sets, such that better deep learning ‘models can be built using them [16] We further establish, as also shown in [18], that dilated convo- lutions do not only serve as front end modules or intermediate blocks to other networks, but are also capable to perform an advanced task such as semantic segmentation in a standalone architecture. The main contributions of our work are summa- zed as follows: 1) Wepropose a simple and eflicient filter-bank size and filter kernel size design for smoothing the a rrous convolutions and minimizing the gridding artifacts, while strengthening the expressiveness and the interpretability of the early network layers 2) We propose a new data augmentation method adapted to the cardise MRI data, 3) We enrich the loss function with criteria taking into con- sideration anatomical constraints for heart structure. 4) We also propose domain specific post-processing mod- ules dealing with unavoidable anatomical errors in deep Tearning segmentation Having determined a region of interest where both ventricles appear in a specially designed pre-processing step presented in subsection IIA, we train the network end-to-end using specific data augmentation procedures presented in I-B, to cope with the lack of training data and boost training efficiency. The detailed presentation of the CNN architecture with the rationale of the filter-bank design is presented in subsection I-C, while the new terms of the loss function ate given in subsection ILD. We use data adapted augmentation, without neglecting the already existing and successfully widely applied image transformation techniques. Even so, wesstill keep the number of training images, as well as the training time, in a small scale compared to ‘ther approaches. The post-processing modules are presented in Section I. We chose to work on the Automated Cardiac Diagnosis Challenge (ACDC) data set! which provides a set ‘of annotated images covering vatious pathologies. Our experi- ‘ments and results on the taining, as well as the test, data sets are discussed in Section IV. TL PRoposED LEARNING MeTHoD A. Data Pre-Processing First the heart is localized using a method presented in [8 ‘The initial localization is based on the intensity change between, corresponding images in the end diastole (ED) and end systole (ES) phases. Then, a slice in mid-cavity and in the ED phase is segmented using the Chan Vese active contours algorithm [3] to localize the two ventricular cavities on the selected slice ‘and obtaining for the whole data set images of size 120 x 120, reducing the computational cost and improving the accuracy of the results “Then, the image intensity is transformed by a function re- sulting from the histogram equalization of the whole Training data set. The transformation (solid line) is given in Fig. 1 in "YOnine). Available: pe sae ereatisinsalyon Sf Fig.l Tage intensity transformation, comparison with the function operating a histogram matching with a Gaussian (dashed line) for the specific image illustrated atthe left inthis figure, while atthe right the transformed image is given, B. Data Augmentation Data augmentation isa technique primarily used when too few training samples are avaiable to train the network, I becomes even moe crucial when dealing with medical images, especially when semantic segmentation is involved. Annotating correctly the regions of interest within an image requires specialized expert knowledge and is a heavy time-consuming task. And sil, ambiguities arise even when the ROIs are annotated twice by the same exper. Ths results in having even fewer images to work with in opposite to other research ares, Having litle training dat, the network cannot Team the desired invariance and robustness properties or it overits quickly. To increase the num- ber of training samples, basic image transformation techniques are quite commonly used, ike random shifts, rotations, scales, clastic deformations and flips. When applied on the original training data they essentially provide multiple views of the same image We design a novel augmentation technique specifically for cardiac MR images based on intensity transformation In order forthe network to lear how to accurately distinguish the left ventricle (LV) eavity from the myocardium, the itensitis of these two areas are (rnsformed producing augmented images. ‘These two areas of interest intertwine heavily with each oer and, in sone cases, ae hard to distinguish even withthe hum exe. A rt, the probability density funetions of the two ares, myocardium and LV cavity, are estimated. Then, the intensity contrast is reduced adjusted by the Bhattacharyya distance (2) between the two distibutions. By also adding noise, more data variation is obtained, More precisely, if dp (paryo,Pa) is the distance between the two distributions, the intensity forthe pixels inthe LV cavity is randomly transformed as Trv,aug = (1 ~ ds (PatyorPav)V + W)Inv, While the intensity for the myocardium is transformed as Tasyosaug = (1+ do (Pasyos Pav V + W)atyo V is a random variable distributed around 0.1 with small variance, and W is a zero-mean random variable uniformly distributed with maximum value 0.05. The additive noise W is pixel-wise determined. Globally the contrast between the ‘AutarizadIeorsed use ited to: Chandar Univery. Downloaded on May 15,2022 at 1:54:54 UTC tm IEE Xplore. Reston apply. Fig.2. Original (eft) and angmeted images (ight). IntensiGes augmentation (Gop) sotation and noon in (baton), intensity in the LV cavity and the myocardiumis reduced, forcing the training algorithm to learn difficult classifications. “To assist our augmentation technique, while also feeding the network with new images never seen before, rotation and scaling in real time is also employed. The rotations are random, but limited to (0, |, as we found that the given training set does not feature cases where the LV cavity area and the Right Ventuicle are aligned differently. For the scaling process we choose to only zoom in on images that are in the end systole (ES) phase, by a random factor no greater than 1.3. This generates more samples for the end diastole (ED) case. The opposite process of, ‘zooming out on images in the ED phase to simulate images in the ES phase, did not help the training process and was therefore finally omitted. At last, whenever ROIs required to perform the previous rotation and scale processes were missing, the realtime ‘augmentation consisted of just @ random rotation in the range Original and augmented images are presented in Fig. 2. An ‘example of intensity augmentation is shown on the top and a rolated and scaled sample on the bottom, ©. DENN Architecture ‘We propose a simple network of cascaded modules of di- lated convolutions, without max-pooling nor any other methods that downseale the feature maps. The convolutions are always performed at full resolution with the network’s receptive field covering finally alittle bit more than half the size of the input mage due to the selected increased dilation rates, Five similar modules are designed, which perform 2-D con- volutions, followed by batch normalization and conclude with, a ReLU activation function. The filters start with large Kernel sizes in the first layer (9 x 9), but shrink gradually thereafter resulting in 3 > 3 Kernel sizes at the last modules. In the first layer the filter is not up-sampled, the dilation rate equals to 1 ‘Then dilated convolutions are performed with a progressively increased dilation rate for the rest ofthe modules, thus increasing the network's receptive field, In each of the aforementioned modules, 12 regularization, ‘even if in a small degree, is invoked for the weights as another ‘measure against overfitting. The last layer of the network con- sists of a2-D convolution with four 1 x 1 kernels and a softmax activation function, so that each pixel has the postetior prob- abilities for each class assigned (Background, Right Venticle, ‘Myocardium and Left Venteicle. We also found that no concatenations, or any kind of layer aggregation, are needed to boost performance. Only few convo- Jutional kernels and thus a small amount of trainable network parameters are able to achieve the desired outcome. The detailed architecture of our DCNN featuring 229220 parameters in total is presented in Fig. 3, while in Table I more details about the receptive field and the parameters of each layer are given. As ‘we can notice, the kemels start densely, but expand and contract sparsely with the analogous consequences in terms of capturing ‘context, while the receptive field gradually increases. We consider that on the one hand the localization ability of the network is improved [14], and on the other hand the ridding artifacts are restrained. Effectively, from the second to the second-to-last layer the kemel size of the previous layer is larger than the kernel size of the current layer, and therefore the inputs in neighboring sites are not completely disjoint. We found that the network tends to learn smooth feature maps in the ‘early layers, thus limiting artifacts due to the filter up-sampling, ‘We oblained smoothed dilated convolutions via the kernel size, in a way that differs from that of [17] In addition, it appears that the filters in the frst layer could hhave interesting interpretations related to the data of the con- sidered application domain. We show in Fig. 4 in the first row the eight more expressive leamed filters of the first layer and in the second row their corresponding smooth approximations as first or second order derivatives. More precisely, among the sixteen filters, six (6) are smoothed first order derivatives, ‘eight (8) are smoothed second order derivatives and two (2) are auto-convolutions of second order derivatives. We could say that the network learns to extract understandable and discriminative features directly relevant to the image content Furthermore, our experiments have shown that the network performs better and trains faster when the kemnel initialization in the first layer is not Ieft to random, Usually a typical ran- dom kernel initialization, such as the Glorot uniform [6], is ‘employed for weight initialization, and we also do soin the other convolutional layers. However, in the first layer the kernels are initialized withthe best weights that particular layer had to offer after multiple training sessions. To elaborate further, because the kernels inthe first layer are quite large, we have chosen to keep their number small, so that computational cost and time are not burdened. However having a small number of kemels can not ‘guarantee that all the desired low level features, such as edges, uminances, curves ete, are leamed by the training procedure, ‘especially When we opt for faster training times. Therefore, by observing the Weights of the first layer after each training in a 5 fold cross-validation scheme and repeating this several times, we automatically selected, using a clustering algorithm, ‘Aubrizedteorsed use ited to: Chandigarh Univeriy. Downloaded on May 18,2022 at 15454 UTC tm IEE Xplore. Reston apply. 9x9 5x5 dr. dra 6 x64 / / —_ — / Vv Fig. illustrated kernels followed by BatchNorm and ReLU Dilated Convotuonal Neural Network (DCN 3x3, 3x3 dr6 are x64 x64 — _ — ==: Conv20 1x1 followed by Softmax 9) architecture TABLEL Axcurrucrune Detats oF OUR Davi CNN Lavens [oe Kernel TxD Txt P oxe p oxe aed] xT Filler a 15 [6 3 5 3 a Dilation wae T Z a z T Receplive Gel 9x V [Wx OT | Tx ST BOOT | BE Parameter T37G [S096 [TORT aTIe | sea P2607 no aloe no dW oe [Bight more expressive coovlitional kernel of th fiat layer (Ist row) comesponding smooth approximations (2b row), (Inia weights (by Tesi weights Fig. Sistem 9x 9 convolutional Kernels ofthe fit layer before and ater teaing fo one 16 different filters, which clearly depicted desired low level Features according to the training data set, a8 shown in Fig. 5(a). ‘Again, these filters are not manually designed and, of course, the network then also updates these weights according tothe taining process to result in final filters as shown in Fig. 5(b), Since these eights ae te least affected by the back-propagation procedure, ‘the network has a consistent foundation to build on with that ‘modification. D. Loss Function ‘The network's training goal is to minimize a carefully de- signed loss function featuring the soft Dice loss: STU.) :) o where the summation on sis overall the pixels on all the images in a mini-batch, Y and T denote the networks’ prediction and the ground truth respectively. The class weights are set as w = 124 1], where label ‘I’ is for the background (BGR), '2' for the RV cavity, ‘3 for the Myocardium and ‘4’ for the LV cavity ‘The Myocardium, and more exactly the epicardium, is the most difficult to segment, followed by the RV cavity and therefore the weights are set accordingly. ‘This loss is complemented by additional criteria that ensure integrity of the predictions specifically for the ACDC data set. ‘Aubarizadtconsed uso ited o: Chandigarh Univeriy. Downloaded on May 15,2022 at 1:54:54 UTC tm IEE Xplore. Reston apply. ‘The additional criteria ate aiming to take in account the seg- ‘ments topology, either generic for slices where both cavities are present, or when the learning of specific configurations should be enforced. First, another term is added such that predictions for classes (c= 2,3,4) that do not exist at all in an image are penalized. If T(u,n,c) ~ 0, ¥u, where w axe the image pixels of image r in the mini-batch, Loln.e) Ymu-vandid. — ¢ Otherwise there is no cost, A small smoothness addend ¢ is present just for proper numerical computations, To understand the importance of (2), we present its derivative used in the back- propagation process for weight updating, if 7(u, n,¢) = 0,.vu, dhe 1 dy 1 Yume that is if any of the classes RV, MYO, LV is not present, it ‘becomes a punishing measure for emerged predictions of such lasses, Finally, apenalty for pixels belonging tothe Left Ventricle that are neighbors with pixels of the background or Right Ventricle area is featured, because the LV cavity is always surrounded by the myocardium, Therefore we penalize such outcomes by the term: ty -DY Vewnd) (m1) + ¥ (0,2) ° where the first summation on u is again over all the pixels of image nin the mini-batch forthe predictions of class 4" (LV) and. the second summation of is over the immediate neighborhood, of pixel v of image n in the mini-batch for predictions of classes 1’ and "2, namely the background and the Right Ventricle, That way, on the one hand LY, BGR adjacency is disapproved of, and. ‘on the other, LV, RV contiguity is also discouraged. The final Joss function forms then as shown in Equation (5), L=Lp take + sly ° ‘The terms Le and Ly incorporate also the same weights as deseribed in Eq. (1), Therefore the total loss L copes with, class imbalance, such that classes with heavy presence on the ‘raining set, covering larger areas, do not bias the network to false conclusions. In addition, the weights are fixed aiming to minimize the tisk of erroneous classifications, taking into specific consideration the more ambiguous and in-homogeneous E. Training Summarizing the augmentation process we start with the original ACDC training dataset consisting of 1902 cardiac MR images, pre-process them as stated in II-A and obtain another 2876 images augmented on the intensities as presented in TE-B. We then randomly distribute the original pre-processed training images into 5 folds in a stratified manner, making sure each fold contains data from each one ofthe 5 groups (4 pathological plus 1 healthy subject groups, a statedin the ACDC challenge). For the TABLE ‘Tragans ano VAuunsmiox Loss aN AccuRACY Vatas Be] Tarr ome oe won woe prer}a tava: [oem | oor | vost | vom [ness | aoe | REET Ome pO] OEY OY ses Oe evar [ome [oon [ov | vom [nse | am Fig. 6. ‘Thaning loss and accuracy fr fld 2 ‘raining process each fold which constituted the training set was ‘enriched with the corresponding intensity augmented images. ‘The validation set remained as was, so that images in this set ‘were never introduced to the network during the training stage, as neither did intensity augmented samples of them, Therefore ‘we have approximately for each fold a training set consisting of 3822 images and a validation set consisting of 381 images, from a pool of a total 4778 images. Small variations over the folds ‘occur because for each patient the number of available slices may differ. ‘ot optimization purposes we chose @ Stochastic Gradient Descend algorithm with momentum, We started with a learning rate of 0.04 and found out that decreasing it at each epoch by a factor of 0.8 leads to better results and faster convergence. In particular, 30 epochs were enough for convergence, where at ‘each epoch twice the fold’s training data, augmented, was fed to the network. “The mini batch size was determined to 32. Fewerimages inthe ‘mini batch did not favor waining and it compared. performance ‘with bigger sized batches which in addition requited more GPU ‘memory. We randomly built the mini batch from the old's ‘raining set, but made sure that each batch contained in any case ‘three images which had at least one of the classes of interest (RV, MYO, LV) not present. In our experiments we noticed that the network performed poorly in cases like that and especially if none of these classes were present in the image. Such images are significantly less present in the training set and the network didn’t see enough of them to learn. By this modification we present the network with more such cases resulting in improved performance, The loss and accuracy are reported in Table Il, proving thatthe proposed augmentation scheme with the dilated CNN leads (o consistent and stable performance throughout the folds. A sample of the learning curves is depicted in Fig. 6. All the folds show similar behavior during the taining. ‘We have to emphasize that nearly 79 out of 80 images that the network trains on are augmented, since original and intensity augmented images are subject to adapted random rotations and scaling in real time during the raining phase. That means that the ‘AubrizadIeonsed use ited a: Chansigarh Univery. Downloaded on May 15,2022 at 135454 UTC tm IEE Xplore. Reston apply. ‘network rarely has a chance to train on original, non augmented data, (Our nevwork was able to train each fold in approximately 14 minutes oa an NVidia Titan V GPU using Keras/Tensorilow. ‘After training, inference for one image is extracted in 1 ms. In Section IV we present scores for predictions on the validation sets for cach fold before the post-processing, which are already ceacouraging. Our post-processing method improves the segmen- tation results as elaborated below. IIL, PREDICTION Post PROCESSING “The 5-fold cross-validated trained dilated CNN is used for ‘computing voxel-wise probabilities forthe four classes. As the ‘raining is done in 5 folds the per class probabilities are obtained as the mean values of the 5 computed from the taining folds for every test image. The cardiac volume segmentation could be obtained by maximizing these probabilities voxel-wise, as they are computed on the 2-D slices. However anatomically incoherent segmentation results might occur. Our objective is 0 ‘obtain a more anatomically consistent segmentation by exploit- ing 3-D information and known constraints concerning region Properties ‘Ata first stage the segmentation is posed as a probabilistic ‘optimization problem wsing a 3-D discrete Markov random field (MRF) in order wo oblain a regularized label field Tn this manner, Wwe aim at capturing the local interactions between voxels and the coherence of the segmentation 3-D map. “The problem can be formulated as follows: we seek to assign a class label [(») to each node (voxel) of a graph v © V, so that the following cost is minimized Lae) + YO ae) e), © where £ is the set of the graph edges. In this work for a 3-D Volume a first-order model with 6 connections is employed. The ‘graph is composed by the whole regular 3-D grid of voxels, “The singleton potentials, or priors, are based on the voxel-wise ‘computed probabilities. Then the dissimilarities of voxels to the labels {are given by exlv) “The pairwise potentials are set according to the following weigl npr (v) o is matrix ‘The regularization constant dy is data adapted, having as relevant slatisties the average value on the minimum dissimilarities over the whole volume. For minimizing the MRF energy in Equa- tion (6), we make use of the primal-dual method [12 As in the ACDC data sets the identification of the most basal and apical slices is not provided, their detection is needed, because the CNN is not always able to predict them. If inthe fist Fig.7, Detection of alee LV cavity prediction, Fig 8. Detection of false heart prediction slice (for the ED phase) or in one of the fist slices (for the ES phase), possibly the most basal, no LV cavity is detected or the rican probability of the detected as LV cavity area is less than 0.7, then this slice is considered beyond the volume of interest. Such a case is shown in Fig. 7. On the left the image of a slice is shown and on the right the computed probabilities are given. In this case the mean probability of the presumed LV cavity is less than 0.7 If in the last slice, possibly the most apical one, there is no myocardium predicted, then this slice is also considered beyond the volume of interest ‘Then, only forthe diastolic phase, the largest 3-D 6-connected components selected for the whole heart In addition, the largest 3D 6-connected component of the left ventricle, including the cavity and the myocardium, is selected, For both diastolic and systolic phases a supplementary test for the most basal slice is applied based on the shape of the myocardium, A slice detected beyond the ventricular volume by this testis shown in Fig. 8. Again in this Figure the image and the corresponding computed probabilities are illustrate. ‘The final segmentation map for the volume is obtained at the end of the post-processing steps TV, RESULTS AND Discussion In this section we present and discuss results from our exper- iments, As stated previously, we used a $ fold cross-validation scheme for our training, It allowed us to observe the network's consistency in performance, AUfist, we measured the Dice score directly onthe predictions for the validation sets using the trained weights for each of the corresponding folds, That way we could see how each component, added of allered, affected the network's taining and if it contributed to a better end result or not, Then, the Dice scores were measured for each patient for the LV, RV and MYO classes for each one of the cardiac phases, end diastole Auberizedtconsed use ited to: Chandigarh University, Downloaded on May 192022 ot 18:54:54 UTC from EEE Xplore. Retctons apply. TABLE ce Sconts oR LV, RV AxD MYO os Tas Vasparion Sas ee | Mee | Mew | Mea | Mew | New Method Dice | Dee | Due | Dice | Die | Dice ww | xv favo | iv | Rv | vo p_| sp |e | ss | os | es oR Tass [OTHE] OSI | OTT | eT | OHO DaBCNN-TT IT oT TPT DaDCNN-10— [0955 0924 Taz 0385 ] DaDCNN-TT W959] 09H TR [OR] Tet? OFF Tait [Os | Dane O55 Das0_[ 099 “ast Treas oer [Oa Dats FE a ETE EE (ED) and end systole (ES). In Table TIT we present these met- rics for the proposed CNN using only classical augmentation techniques (random rotations [0, ), random scaling [0.7, 1.3] and minor elastic deformations) and the plain soft Dice loss as in Bg, (1). Then we show how much improvement our adapted augmentation together with our designed loss function given in Eq. (5) achieved (the Data-adapted Dilated CNN-DaDCNN). is shown gradually from rows 3 to 5 thal the Dala-adapted augmentation, as well as each component in the specially de- signed loss function improve performance. In DaDCNN-00 only the Dice loss function is used, in DaDCNN-OI the loss. function is Lp + 8Ly, in DaDCNN-10 itis Lp + alec and in DaDCNN-I1 the whole loss function of Eq, (5) is employed, Additionally, we report the same scores obtained from predic- tions resulted by a U-net, trained without and with the proposed. adapted augmentation and final loss function. Two versions of the U-net have been implemented and tested, one with ewice as ‘many parameters as the DCNN (Unet-2) and one with almost the same numbers of parameters (Unet-1). Again, improvement is achieved by utilizing the adapted augmentation and designed loss function, Our implementation of the U-net consisted of three max pooling stages, reducing the spatial dimensions from. 120 x 120 to 15 x 15, and respective transposed convolution Steps to return to original dimensions. Each layer consists of two successive convolutions, batch normalization and ReLU activation and, skip connections are also applied to transfer context from the the down-sampling path to the up-sampling path. To achieve a number of parameters comparable to the DCNN we used a second scheme and reduced the number of filters, The network was trained using an Adam optimizer with, decay, for 30 epochs as well, batch size 32 and learning rate 0,003. Bach fold trained in 15 min, respectively 7.5 min, while inference time for one image was 1 ms in both schemes. Finally, the DeepLabv3+ model [5] was also considered, being the state of the art in image segmentation nowadays. We used the ResNet-v1-50as abackbone and trained the model from scratch using our pre-processed ACDC data set for 300000 it- erations. No further modifications were done. The DeepLaby34 ‘managed to outperform by far all approaches with the standard soft Dice loss function and without the utilization of the adapted. augmentation (rows marked with a star in Table ID. ‘Taking into account the Data-adapted augmentation (cows starting with Dajn Table I1) we can see thatthe DCN compares and even outperforms in some metres the U-net, using our data augmentation method, with more or less the same amount of parameters In Tables IV, V and VI the evaluation ofthe proposed method onthe ACDC test datasetis given, as submitted tothe Post-2017- MICCAL-challenge testing phase.? The metrics are those used for evaluation in the ACDC challenge. There are four distance rmeasures, two for each phase, and six clinical indice metrics for the LV cavity, the RV cavity and the myocardium. In all the Tables the bias, the standard deviation (std) and the mean absolute error concerning volumes are given in mL, the ryocardium mass is given in g/m and the Hausdorff distance is given in mm. ‘A comparison is given withthe best algorithms submitted to the challenge as they appear in the Ieader-boand of the challenge? (10}, 11], (13), [21]. We present briefly the main auchitectural feature ofthese four algorithms, Isensee era. (10} used a network architecture inspired by the 3-D U-net. Khened «fal, [11] used a densely connected fully convolutional nevwork instead of the skip connections used in the U-net architecture Zoli et al, [21] used a multizesolution Gridnet architecture, which could be considered as an extension ofthe U-netarchitec- ture. Painchaud ef al. [13] proposed an adversarial variational auloencoder for guaranteeing anatomically plausible segmenta- tions In the case of the LV cavity we have obtained the best scare ‘on the cortelation coefficient and the standard deviation for the «jection fraction and the Volume in the ED phase. In addition wwe have obtained the best score on the Dice coefficient for the LV cavity in the ED phase. We have also obtained the best zesult on the coztelation coefficient and the standard deviation for myocardium mass, For the RV cavity and the myocardium ES phase we have obtained less good results, but comparable to those of other well performing deep learning methods. ‘We compare also the proposed method with our implemen- tation of a U-net and the methods of [10], [11] on the reliable clinical metric of the mean absolute error for the LV/RV ED volume, the LV/RV Ejection Fraction, and the myocardium zmass. The comparison is given in Table VII, where the metrics for (10), [11] ate as published in (1. For our algorithm we hhave measured the performance on mean absolute error for the prediction resulting from the proposed neual network without post processing, the result with only the 3-D MRE model, with ‘only the other post-processing modules, andthe whole algorithm sescribed in Section TT. We can sce that the post-processing improves the mean absolute error for the five criteria, Con- cerning the two terms of the post-processing, we found that the more costly 3-D MRF has limited impact, only on the RV Volume in ED phase. The specific post-processing modules addressing anatomical inconsistencies seem to have a clearly *Yoatine). —Avalabe:_ntps/facdecreatisinse-lyon esuhmisson! 50399156486 770898 789507 M[oulise| Available: hupsiaedecreaisiseljon dseripin/esuts al [AutrizedIconsed use nied to: Chandar Univeriy. Downloaded on May 182022 at 15454 UTC tom IEE Xplore. Rsteons apply. TABLEIV Mesrnucs on Tas Data Ser 108 TH Lary Viraict S| BE ea] Naa Va Metiod JP Dice | Dice | mtasdont | navaort | com | sas | st | co “a up | es [up | MNS | oh oct > Fy Tee |US | HEE epi pas par poe preps a rer pos} pips ro z (Leama foser poor ot] a Pov pos part om fae here rp 90 pone pos psa] oer [ sp 3s DIDCRN ee ‘Mernucs oh Tus Date Fon me Ric VEsmacut Se etioa ff Die | Dice | manus | mvs | cou | tae | sa | “tow | “tue | “oe ep [ies [a | Mas! | a ccttp | > | aE Fait — [Posse | os fre} a8 oe fs Por ome re | ran POT ee fe | repose oo Kiseal [Posse or [ten | tas ose [a Les ome [a9 [ine (Lomesscpisse Pos | ss] te Pose por} om toss tan vi Marnie Test Data Se FR Tt Maan ET NE] NEE Va WR Ww ep | gs | “ED ES cot_| es | es | coe | mp | ep Tae oe oe see at ee + Tr ot Fancher ost ow feet 9a os pos oreo pat | Kee [osw[ ose ps | tee pos eprom [ae | Ts Dapenn poss raat at rz spt port arf TARE vt the dilation rate which increases asthe architecture deepens. The May ans Exton 0 rb Test DTAsET network has fewer parameters compared to other deep learning See anchitectures designed for that task, and needs a small number of Metod eov | er | epv | er | mas epochs to rain, Even the proposed network has a small number DaCRRRGS | SPT 7 of features and parameters, we think that it would be possible to bepeewinmr as Pare ear further simplify the network, probably by accordingly designed Sn EA training. This task could be facilitated by the ascertainment that DataeMRRp SS PIT TT Lae oT the first ayer filters of the proposed network have an interesting Daa ps perp pe interpretation, Te sopra ‘The taining process is assisted by carefully adapted data aug- Kis 7 z mentation that relies on simple image transformation techniques such a8 rotation and scaling. Prior to training, a specially de- measurable impact. We have also measured the performance of Signed intensity augmentation takes place to enrich the taining ‘our implementation of the U-net, with and without MRF and #3!# set. Based on the distribution of the probability density postprocessing, and found that some improvement has been functions ofthe Left Venriele and the myocardium, the contrast [ho Gbtained Finally, our method pives the best ecu, among ‘euction i adjusted to preduce more samples, Our results show tho up to now known results on this dataset that thsi the desisv ey othe op perfomance eained in Finally, since the network does not take into account volume information, but produces predictions for each image inde- In this work our main objective was to prove that a simple pendently. a post-processing stp incorporates anatomical cot CNN could achieve top performance ifits design and taining is straints to produce more consistent segmentation map for the domain-driven, volume. For this purpose, a catcaded dilated convolutional neural "Even though we have achieved top performance in many network is designed for semantic segmentation of MR cardiac evaluation citeria, there is stil room for inaprovement,espe- images. The firs layer starts with larger convolutional Kernels cially with regard to the accuracy of the RV localization and and no dilation, but kernel sizes become smaller in opposite to the myocardium ES phase. We plan to work on adapted data V. Conci.usion Auberizedtconsed use ited to: Chandigarh University, Downloaded on May 192022 ot 18:54:54 UTC from EEE Xplore. Retctons apply. SIMANTIRIS AND TZIRITAS CARDIAC MRI SEGMENTATION WITH A DILATED C3 augmentation for the RV as well, as up until now only the LV ‘was specifically used for that. We would like to pursue the ex- tended testing of our adapted augmentation with other network architectures, as we did with a U-net implementation of our own, to establish its contribution in improving performance. Further ‘more, since classic augmentation techniques did not improve ES results, we have begun to investigate the use of Generative Adversarial Networks (GANs) introduced by [7] to produce augmented samples which ae not manually designed but learned, bby a neural network. A task related Variational Amtoencoder is also implemented in [13] making use of adversatials At last, we plan to test our approach on other data sets to determine if the adaptation could be applied in a wide range of similar data from different clinical centers and different scanners. We also intend to develop deep learning techniques to substitute pre- and post-processing. An NN assembly or a single architecture trained end-to-end without any human intervention, could undertake the problem for automated, fast and accurate semantic segmentation on cardiac MR volumes, ACKNOWLEDGMENT ‘The authors would like to thank the NVidia corporation for providing us with the TITAN V GPU used for this research as part of the NVidia GPU Grant Program, REFERENCES. (1) 0 Bernard eo, “Deep esming techniques for automate MRI carne rnulstracties segmentation and dagnoss sth problem solved” TEBE Thana Med. Imag. vl, 37,20. 11, pp. 2514-2525, Nov. 208, 12) A. Bhattacharya. "On a meaure divergence between wo sata pop atone defied by their probability dtributonr” Bl Calcata Math Soc, vol 38, pp. 99-108.1988, (3) TE Chan and L.A. Vese, "Active contours without edges” IEEE Tran. Image Proce ol 10, p. 2662277, 2001 (4) LG Chen etal, “Deepa Semutic tnage segmentation wit deep onvoluional et arousconvolton, and fully comected CRF" IEEE Trans. Paners Anal Mach Iniell vol 40.n0.4, pp. B44, Ape 2018, (5) LAC. Chen, ¥ Zhu, G.Pepandeou,F Selita H. Adan, "Encoder decoder with tous soparabe convoltion foreman mags eegoeata- tion in Proc Eur Conf Comput Vis 2018, p. 833-851. [6X Gioot and ¥ Beagio, “Understanding the dieaty of waining deep feedforward neural stork, in Proc 13h Int. Conf Ari Intell. Stats, 2010, pp. 28-256, 1 Goodillow etal, “Generative avertarial net im Proc. Ad Neural Inf Process. $s, 2014, p. 2572-2680, 2014 (8) 2 Grins and G, Tires, “Fast fly-automatic cardiac segmentation ia MRI using MRE model optimizaon,subsuuctres aching and B- ‘tline soothing” Lectre Notes in Computer Scence, vl 10683, Bel Germany: Springer, 2018 [9] 2. Gi et al, "CE-Net: Context encoder network for 2D medical image Segmeataion" IEEE Trans Med. Imag. vol 38, o. 10, pp. 2281-2202, oe. 2019, m INCORPORATING DOMAIN-SPECIFIC CONSTRAINTS ne [10] F thensce ea, “Automate cardia disease atsestment on cine-MRE via lumesseriessogmentaion and domain spite features,” Lectre Notes in ‘Compute Science vl 10653, Beri, Germany: Springer. 2018 MKhened, V. Aix, and G. Krihnamari, “Densely conoeced fll ‘ouvoltional network fr shost-ais aida cine MR image segmenaion Sod beat agners using random lore” Lecture foter be Computer Science, vl 10663. Bern. Germany: Springer, 2018, 1 Komodakis and Taiitas, "Approximate labeling via raph clsbased ‘on linear progratning” IEEE Trans. Paternal Mach Intell, ol. 29, to. 8 pp 1436-1455, Aug 2007 IN. Paipenatd of al. “Cardae MRI regmentation wih song anstom= deal guarantees" MICCAL 2019, Lecture Notes in Compucer Science vol 11765. Belin, Geamany Singer, 2019 Peng, X. Zhang. G. Yu, G.L, and J. Sun, “Large Kernel mates improve eemante segmentation by global convolutional newer" Proc TEE Conf, Comp Ve Pattern Recognit, 2017, pp. 4353-1361, (0 RonnebergerP Fischer. and Brox, "U-Net: Convoluional networks for biomedical image seghctation,” i, N. Navab eta eds) Medical Image Computing and CompaterAsiaied Intervention MICCAL 2015, NCS, vol 9351. Bern, Geemany: Springer, 2015, {Shorten and. Khosigoftea, “A survey on image data avgmentation {or deep learning" J Big Dat, vl. 60. pp. 18,2019. Z. Wang and Si, “Soothe dated convolusons fo improved dense Fredicton, in Proc. ACM SIGKDD Int Cont, Knowl. Discovery Data Mining, 2018, pp. 2886-2495, [LM Wolter er a, "Automatic segmentation and disease classification ‘sing eatdine cine MR images!” Lecnve Noter in Comparer Sconce, ‘ol 10665. Bein, Geman Springer, 2018 F Yuand V, Kolin, "Mallrcale comes aggrogtion by dilated convo ‘ons"in Proc. Ine. Conf Leam. Representations 2016, pp. 101-110, E-Ye, V Koln and, Puskbouse, "Dilated residal networks” ia Proc IEEE Conf, Comput. Vis, Pattern. Recogni. 2017. 9p. 636-544 Zot, 7.130, A Lalande, and P-ME. Todos, “Convolutional sara network wih shape prior applied o cardiac MR! Segmestation’ IEEE ‘Blamed. Health Informat, ol 28,00. 3, pp. 1119-1128. May 2015, uw way us ua si ns, ua us) us po) pn Georgios Simantiris recived the B Se. and M.Se. degrees from the Com puter Science Department, University of Crete in 2000 ané 2003 respectively {Curetly be is pursuing his PRD. degree in computer vision. He works as sn educator fr Information Technology in secondary education. Hs erent {esearch inclades dep leasing in computer vision copeclly for semantic {egmenation in medical images Georgios Triitas (Senior Member, TEBE) recived the electra engineering Aiploma from te National Tetnieal University of Athens, Aene, Gres, 0 1977, andthe doreures sciences degre from the lsitut National Poljtechnigae de Grenoble, Grenoble, France os 1985. He is euzetly a Professor wi the Department of Comper Seience, Univer of Crete He has publibed over than 120 papers im academie juimals nd conferences. His caret research interests include image analysis, computational vision and pattern recognition. ‘Autrizadteonsed use nied to: Chandigarh Univeriy. Downloaded on May 15,2022 at 135454 UTC tom IEE Xplore. Reston apply.

You might also like