0% found this document useful (0 votes)
15 views

CSE299 Sample Report

Uploaded by

Fariha Mehzabin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

CSE299 Sample Report

Uploaded by

Fariha Mehzabin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Department of Electrical and Computer Engineering

North South University

Junior Design Project

Malaria Cell Segmentation Using Machine Learning &


Watershed Algorithm

Mahdi Mohammed Shibli​ ​ ID # 1712784042


Md. Navid Bin Islam ID # 1712404642

Section:​ 9, ​Group:​ 6

Faculty Advisor:
Dr. Tanzilur Rahman
Assistant Professor
Department of ECE
Spring 2020
ABSTRACT

Malaria is a mosquito-borne life-threatening blood disease caused by parasites of the genus

Plasmodium. The parasite enters into the blood through the saliva of the mosquito while biting.

This parasite directly infects the red blood corpuscles and causes roughly 400 million deaths per

year. Conventional analysis for malaria detection is the examination of a patient's stained blood

sample in a microscope. The sample blood smear is placed on a slide and observed under a

microscope to count the number of infected RBC manually. An expert operator is involved in the

examination of the sample having intense visual and mental concentration. It is a tiresome and

time-consuming process where precision depends on the skill of the operator. Machine-based

blood smear analysis and detection of malaria-infected cells have opened a new area for early

malaria detection. This system has shown the potential to overcome the drawbacks of manual

strategies. As the traditional diagnostic process is problematic and error-prone, in this report, we

are developing a machine learning-based malaria cell segmentation system using watershed

algorithms. The result shows that the proposed methodology achieved more accurate results

and performed faster compared to other existing methods

Page || 2
Table of Contents

CHAPTERS ​ ​ ​ ​PAGES

ABSTRACT 2

CHAPTER 1: INTRODUCTION 5

1.1 Malaria 6

1.2 Malaria Detection Process 6

1.3 Cell Segmentation 7

1.4 Machine Learning 8

1.5 Watershed Algorithm 8

1.6 Project Aim and Objective 9

1.7 Motivation 10

CHAPTER 2: LITERATURE REVIEW 1​1

2.1 Existing Literature Explanation 1​2

CHAPTER 3: METHODOLOGY 1​4

3.1 Workflow 1​5

3.2 Data Collection 1​6

3.3 Preprocessing 1​6

3.4 Watershed Threshold 1​7

Page || 3
CHAPTER 4: EXPERIMENT RESULT 19

4.1 Sample Test Data 2​0

4.2 Result Analysis 2​1

CHAPTER 5: CONCLUSION 2​4

5.1 Discussion 2​5

5.2 Summary 2​6

5.3 Future Work 2​6

REFERENCES 2​7

Page || 4
CHAPTER 1: INTRODUCTION

Page || 5
In this chapter, we are going to discuss malaria, why the automated segmentation of malaria is

needed, our project’s aim, objectives, and the motivation for doing this project.

1.1 Malaria

Malaria is one of the most threatening global problems, causing worldwide sufferings and

deaths, particularly in underdeveloped countries. Plasmodium parasites cause malaria. The

disease transmitted by the bite of an Anopheles mosquito, which is already infected with

plasmodium parasites. Then the parasite is released into the human bloodstream when the

mosquito bites. Upon getting into the bloodstream, they travel to the liver of the infected person

and mature there.

In 2018, an estimated 288 million malaria cases were found, and among these numbers of

deaths stood at 405000. The most vulnerable group affected by malaria is children under 5; in

that year, they accounted for 67% of the total death toll. According to the WHO African Region

carries a high percentage of global malaria. In 2018, 93% of global disease and 94% of the total

death toll was from this region ​[1]​.

1.2 Malaria Detection Process

Malaria parasitic cells, which means plasmodium parasites infected cells can be detected

visually using chemicals in RBC ​[2]​. The staining process colorizes the RBC but highlights the

plasmodium. Thus, plasmodium can be identified by detecting the highlights. Traditionally,

malaria detection is done by hand. Which is a very error-prone method to detect a dangerous

Page || 6
disease like malaria. Because a small error in the detection process can cause someone's

death.

Figure 1:​ Conventional Process of malaria diagnosis ( ​a. ​A patient goes for a test, ​b. ​A blood

sample is taken, ​c. ​The sample is placed on a slide, ​d.​ The sample is stained with contrasting

agents, ​e.​ Malaria parasites get highlighted, ​f. ​Clinician examines the slides manually) ​[3]

1.3 Cell Segmentation

The process of identifying cells from blood smears is called cell segmentation. It is the serial

repetitions of similar organs such as tissues, cell types, or body cavities. Cell segmentation is

the most basic and essential step for the analysis of microscopic cell images. Cell segmentation

can be done by many approaches like traditional machine learning, deep learning, and different

other methods such as morphological operators. However, cell segmentation is challenging

using automation, one of which is identifying blood cells separately when there are overlapping

Page || 7
cells. But, in this modernization of technologies, automation is the best of doing cell

segmentation.

1.4 Machine Learning

Humans learn from past experiences if they want to, but computers do not. It was one of the

differences between humans and computers in ancient times. However, computers or machines

need instructions to accomplish a task because they are strict logic machines having zero

common sense. Hence detailed, step by step instructions on exactly what has to be done must

be provided to these machines for accomplishing a task. Thus scripts are written, and

computers are programmed to follow and run on the given instructions. That is where machine

learning comes in as the concept consists of training computers or machines based on the

experiences from past data. Machine learning is simply a probabilistic approach for solving

some real-life problems based on data of previous records and which no longer needed human

assistance as before. A large number of people often use the term artificial intelligence (AI) and

machine learning interchangeably apart. But honestly, it is a subset of an application of artificial

intelligence (AI). The system has the power to mechanically learn and improve from expertise

while not being expressly programmed.

1.5 Watershed Algorithm

A classical algorithm for separating different objects in an image termed as ‘segmentation’ is the

watershed algorithm. Simply outlined, watershed could be a transformation in grayscale images.

This system aims to phase the image usually, once two regions of interest are near to one

another or the objects in the image touch each other. In the blood smears, most of the blood

Page || 8
cells are not adequately scattered. In these circumstances, it becomes challenging to segment

all the blood cells in a particular smear. The watershed is used for this overlapping issue since it

uses the connectivity in the given image pixel.

Figure 2:​ Watershed Segmentation of overlapping objects ​[4]​.

Distance transform helps to calculate the difference between a pixel and non zero pixels nearest

to it. The strategy works very well on the rounded objects and binary images so that the darkest

parts of the image are the centers of the objects.

1.6 Project Aim and Objective

In this time of technology, everything is becoming automated. Still, malaria diagnosis is

performed by traditional procedures. The traditional malaria diagnosis process is error-prone,

time-consuming, and the accuracy of diagnosis depends on the operator’s concentration level

and mental state. Malaria can cause serious health hazards to the patient and can even cause

death. But, with faster treatment, the complications can be minimized.

Page || 9
The points below are our project’s aims and objectives:

❖ To find an optimized algorithm to segment malaria blood cells for further detection

automatically.

❖ To increase the accuracy of the segmentation of the overlapping cells.

❖ To make an easily implementable model for the real world for the betterment of the mass

people.

❖ To assist the medical practitioners in segmenting the cells efficiently without wasting a lot

of time.

1.7 Motivation

It is quite clear that malaria is prevalent throughout the world, particularly in tropical regions. The

motivation of this project is based totally on the character and fatality of this disease. Initially, if

the infected mosquito bites a person, parasites carried all the way will enter the blood and

slowly start destroying the red blood cells. Typically the first symptoms of malaria are just like

the flu or an endemic. The affected person starts feeling unwell within a few days or weeks after

the mosquito bite. Although these lethal parasites can live in the hosts for over a year without

any problem, thus, a put off in the proper treatment can lead to complications, eventually death.

Hence, rapid and fruitful malaria tests and detection can save a million patients from dying.

Computer vision techniques for malaria diagnosis represents a new area for early malaria

detection. According to the WHO malaria parasite counting protocol, a clinician may have to

count up to 5,000 cells ​[5] manually. Hence this error-prone and time-consuming visual

inspection process must be replaced with a technical system.

Page || 10
CHAPTER 2: LITERATURE REVIEW

Page || 11
While working on this project, we studied different research papers to learn more about the

approaches we can take to solve this problem. We selected a few from there, because of the

similarities they had with our project, and then we tried to take ideas from their works

2.1 Existing Literature Explanation

A study has been done in ​[6] by Suman Kunwar to detect malarial parasites by constructing a

new image processing system for the detection and quantification of plasmodium parasites in

blood smears. Gradually they have developed Machine Learning algorithms to learn, detect,

and determine the types of infected cells. Here image acquisition is the first process. Malaria

infected images that are less noisy and devoid of artifacts were used. Segmentation of blood

smears has been done by identifying common properties. Pixels share intensity in a region.

Hence a natural way to segment such areas is thresholding. It is the separation of light and dark

areas. Thresholding creates binary images by turning all pixels below some threshold to zero

and all pixels above that threshold to one. Pixels labeled one denote an object, and zero

indicates background. For further processing, enhancement is done on the input image after

thresholding. Erosion and dilation are fundamental steps for morphological processing. It helps

in detecting the objects. For the segmentation part, two kinds of segmentation have been done

here. Firstly Watershed Segmentation is a relatively new approach, and secondly, Color-based

segmentation. They have tested 40 images. Although there are some errors, if at least one

parasite is found in a blood smear, then he/she is declared as malaria-infected.

In ​[7]​, Weikang Wang and Yi-Jiun Chen and others have used a different strategy that combines

CNN and the watershed algorithm. At first, CNN is trained to learn Euclidean distance transform

Page || 12
(EDT) of binary masks according to the input images. Again they have trained another CNN,

which is a faster R-CNN (Region with CNN). It detects individual cells in the Euclidean distance

transform (EDT) image (deep cell detector). In the following step, the watershed algorithm was

applied for the final segmentation using the previous two steps. The combined method and

different types of pixel-wise classification methods achieved similar pixel-wise accuracy, But the

combined approach had made higher cell count accuracy than the other ones. Pixel-wise

classification had a drawback of separating connected cells as well as the cells connected by

blurry boundaries. Nevertheless, deep-distance estimators and deep cell detectors are easy to

train, and they also converge quickly.

In ​[8]​, Yousef Al-Kofahi and Mirabela Rusu and others have designed a single channeled cell

segmentation algorithm. A cytoplasm marker has been used in this research, which shows

hypo-intense nuclear regions and hyper-intense cellular regions. In the first step, a deep

learning predictive model has been trained using the images of the dataset. The model is

trained to implement image patches of 160x160 pixels to predict three different labels. The

second step is the deep learning inference, where the unseen image is divided into 176x176

patches. It results in the creation of a probability map of nuclei, cytoplasm, and background.

Then the patches are joined together for the prediction of the full image. In the third step, a

multiple level Laplacian of Gaussian (LoG) blob detector is applied. It results in enhancing the

blob-like nuclei regions at multiple scales. An automated multi-level Otsu thresholding is

implemented for extracting the binary nuclear mask. Segmented nuclei have been used as

seeds for the robustness of the mentioned design. Background labels and segmented nuclei,

which were identified earlier, have been used in the seeded watershed segmentation.

Page || 13
CHAPTER 3: METHODOLOGY

Page || 14
This chapter gives an overview of the different parts of the work chronologically. It mainly

discusses the theories, techniques, and step by step workflow of the work

3.1 Workflow

A complete workflow diagram of the proposed method is shown in the figure below.

Figure 3:​ Overall Workflow of the proposed method.

Page || 15
3.2 Data Collection

The Images ​[9] in .png or .jpg format. There are three sets of pictures consisting of 1364

pictures (~80,000 cells) with totally different researchers having ready everyone: from Brazil

(Stefanie Lopes), from the geographic region (Benoit Malleret), and time course (Gabriel

Rangel). Blood smears were stained with Giemsa chemical agents.

The data consists of 2 categories of clean cells (RBCs and leukocytes) and four types of

infected cells (gametocytes, rings, trophozoites, and schizonts). The info had a significant

imbalance towards clean RBCs versus clean leukocytes and infected cells, creating over

ninety-fifths of all cells.

A class label and set of bounding box coordinates got for every cell. For all knowledge sets,

infected cells got a category label by Stefanie Lopes, protozoal infection investigator at the Dr.

Heitor Vieira Dourado medical specialty Foundation hospital, indicating the stage of

development or marked as tough.

3.3 Preprocessing

Preprocessing is done on a dataset before applying any algorithm to increase features of the

dataset. In the first step of preprocessing the data, we are converting the single-channel image

into a three-channel RGB image, which will help in the next preprocessing steps.

Page || 16
Figure 5:​ splitting the input image and merging (​a.​ R-channel, ​b.​ G-channel, ​c.​ B-channel, ​d.

3-channel image)

Then we turned the 3-channel image into a grayscale image. This grayscale image is used for

thresholding the image. In this case, we are using Otsu’s binarization. Then we are filtering the

resulting image using dilation followed by erosion and a 2x2 kernel for the filtration process.

3.4 Watershed Threshold

In the blood smears, the RBC is very near to each other, and sometimes even overlapping each

other. It leads to the miscount of the RBC, hence a health hazard. Watershed transformation is

used in our work because it uses the connectivity in

Page || 17
Figure 6: ​Watershed Transform (​a.​ Greyscale, b
​ .​ Threshold, ​c.​ Filtered, ​d.​ Sure Background, e.

Distance Transformation, ​f.​ Sure foreground, ​g.​ Unknown Regions,

h.​ Markers, ​i.​ Result of Watershed Transformation)

the given image pixel. To apply the watershed transform, firstly, we are finding the sure

background and foreground of the resulting image of the preprocessing steps. Distance

transform helps to calculate the difference between the pixel and non-zero pixels nearest to it,

which allows us to find the sure foreground of the image. Euclidean distance transformation is

used for calculating the distance between the background and foreground, from where we are

generating the unknowns, which helps us to plot the markers. After getting the unknown

regions, we are applying the watershed algorithm.

Page || 18
CHAPTER 4: EXPERIMENT RESULT

Page || 19
This chapter gives an idea of the results of our experiment. It also discusses and analyzes

different results.

4.1 Sample Test Data

After training the model, here are some of the sample test data that we tested and checked our

results. We made a split of train and test images with a ratio of 3:1 In the following figure, output

images are bounded with a red-colored region. The cells of the input blood smear are

successfully segmented. These sample images were taken from the test images.

Figure 7​: Some of the test input and output

Page || 20
4.2 Result Analysis

Our technique is segmenting the blood cells successfully in some cases with the right level of

accuracy from the results of our experiment. In some of the cases, it is failing to segment the

blood cells with satisfactory accuracy.

Figure 8:​ Some of the output images (​a. ​Densely Overlapping cells with light boundaries, ​b​.

Sparsely overlapping cells , ​c. ​Sparsely​ ​Overlapping cells with dark boundaries).

From Figure 8.a, we can get that; our technique is not segmenting cells properly where the

blood smears have cells that are overlapping and have light boundaries. Here the segmentation

method was able to identify 43 out of 69 cells(Figure 9a). From Figure 8.b, we can see that the

cells are distributed and do not overlap cells as much as Figure 8.a. In this case, the

segmentation method detected almost 73 out of 76 cells(Figure 9b), which is very accurate.

Lastly, from Figure 8.c, we can observe that the cells are densely overlapping, and dark

boundaries. Here 48 out of 48 cells were detected (Figure 9c).

Page || 21
Figure 9: ​Comparison of three scenarios.

The bar chart, in Figure 9, shows a comparison of three scenarios of the data and how the

method of our approach performed in these scenarios. We can observe that the number of

identified cells depends on the contrast of the image.

Figure 10: ​Comparison of the accuracy of the three scenarios.

Page || 22
The bar chart in Figure 10 shows that the method we used was able to segment the cells with

an impressive rate of accuracy overlapping cells where a moderate amount of contrast was

present. But the accuracy goes down when the amount of contrast present in the smear is low.

The dataset we used came from ex vivo samples from Plasmodium vivax infected patients in

Brazil. Seven labels used to cover all possible cell types, such as RBC, leukocyte, gametocyte,

ring, trophozoite, and schizont. RBCs and leukocytes are uninfected cell types generally found

in the blood. Some cells marked as difficult when not clearly in any one of the classes, but those

marked difficulties ignored in training. The data is also naturally imbalanced among the object

classes.

Hence, we also manually counted the accuracy of our model for 50 images. In this process, we

calculated the total number of cells in those images and the number of cells recognized by the

model. In this case, we achieved an accuracy of 75%, which is a satisfactory result. So we have

found that cells present in the blood smears have detected successfully. More discussion has

done comparing our efficiency with other models in the next chapter.

Page || 23
CHAPTER 5: CONCLUSION

Page || 24
In this chapter, we will discuss and compare our works with other noble works in this field, the

challenges we faced while working on the project, and how we could make our work better and

about future developments.

5.1 Discussion

A few notable works related to our work discussed earlier. ​[10] used the watershed threshold

like ours and achieved an accuracy of 97.7%. Though their dataset was small and different,

containing only 250 RBC images, their result is excellent. Since they have used watershed

algorithms, but a different dataset was implemented on their system, we can not compare our

model with their model directly.

In ​[11]​, they have used the same data set ​[9] like ours, but their model was different. Firstly,

using traditional machine learning segmentation, their model attained an accuracy of 50% in

segmenting the cells of the images. Then, Two-stage classification is done using faster R-CNN

attaining accuracy of 59% then 98%, respectively disregarding background, RBCs, and delicate

cells. Thus they achieved a significant improvement over the one stage classification method

along with a traditional deep learning cell segmentation.​[10]

In our work, we used 800 images taken from the dataset of kaggle ​[8]​, where different sets for

train and test are present. Therefore, the dataset is not biased. We used 600 images from the

train set and 200 images from the test set, achieving an accuracy of 75%. As we used the

minimum number of features in our work, it does not require high computation power as well as

time.

Page || 25
5.2 Summary

The death toll due to malaria is increasing day by day. Finding an optimized algorithm for

segmenting the blood cells from the blood smear images might help reduce the deaths of the

patients suffering from this disease. Our proposed method is more straight forward and

optimized than the conventional detecting process. With an accuracy of 75%, our model might

be implemented in the real world to detect the blood cells as it takes less than a second to

segment the cells in a blood smear. Increasing the dataset size may make the model more

credible through the dataset size is more significant than other works in this field. Minimal

features of Machine Learning have been used in our model to make the cell segmentation

process much more straightforward and computation friendly.

5.3 Future Work

We want to work on different datasets to test our algorithm to make it more credible. Though the

achieved accuracy is satisfactory, we would like to compare our machine learning model’s

performance with a deep learning model using the Convolutional Neural Network (CNN) on the

same dataset. Besides, we would like to work on pixel rendering for increasing the accuracy of

our model.

Page || 26
REFERENCES

[1]"Fact sheet about Malaria", W


​ ho.int​, 2020. [Online]. Available:
https://www.who.int/news-room/fact-sheets/detail/malaria. [Accessed: 11- Mar- 2020].

[2]P. Bloland, ​Drug resistance in malaria​. Geneva: World Health Organization, 2001.

[3]"Medical Image Analyses for Malaria Detection", ​Medium​, 2020. [Online]. Available:
https://towardsdatascience.com/medical-image-analyses-for-malaria-detection-fc26dc39793b.
[Accessed: 11- Mar- 2020].

[4]"Watershed segmentation — skimage v0.18.dev0 docs", S ​ cikit-image.org​, 2020. [Online].


Available:
https://scikit-image.org/docs/dev/auto_examples/segmentation/plot_watershed.html.
[Accessed: 16- May- 2020].

[5]"Strategy, speed and collaboration are essential to eliminate malaria", W ​ ho.int​, 2020.
[Online]. Available:
https://www.who.int/westernpacific/news/feature-stories/detail/strategy-speed-and-collaborati
on-are-essential-to-eliminate-malaria. [Accessed: 19- May- 2020].

[6].[Kunwar, Suman & Shrestha, Manchana & Shikhrakar, Rojesh. (2018). Malaria Detection Using
Image Processing and Machine Learning. ​[Accessed: 11- Mar- 2020].

[7]W. Wang et al., "Learn to segment single cells with deep distance estimator and deep cell
detector", C
​ omputers in Biology and Medicine​, vol. 108, pp. 133-141, 2019. Available:
10.1016/j.compbiomed.2019.04.006.

[8]Y. Al-Kofahi, A. Zaltsman, R. Graves, W. Marshall and M. Rusu, "A deep learning-based
algorithm for 2-D cell segmentation in microscopy images", ​BMC Bioinformatics​, vol. 19, no. 1,
2018. Available: 10.1186/s12859-018-2375-z.

[9]"Malaria Bounding Boxes", ​Kaggle.com​, 2020. [Online]. Available:


https://www.kaggle.com/kmader/malaria-bounding-boxes?fbclid=IwAR0NUiGrFAiPeqNngxueQW
4YO5mLOn0cLHi4M7USG0RxqnN1-Sg372IRUk4. [Accessed: 17- May- 2020].

[10]K. Charpe, V. Bairagi, S. Desarda and S. Barshikar, "A Novel Method for Automatic Detection
of Malaria Parasite Stage in Microscopic Blood Image", I​ nternational Journal of Computer
Applications​, vol. 128, no. 17, pp. 32-37, 2015. Available: 10.5120/ijca2015906763.

[11]J. Hung et al., "Applying Faster R-CNN for Object Detection on Malaria Images", a
​ rXiv.org​,
2020. [Online]. Available: https://arxiv.org/abs/1804.09548. [Accessed: 17- May- 2020].

Page || 27

You might also like