CSE299 Sample Report
CSE299 Sample Report
Section: 9, Group: 6
Faculty Advisor:
Dr. Tanzilur Rahman
Assistant Professor
Department of ECE
Spring 2020
ABSTRACT
Plasmodium. The parasite enters into the blood through the saliva of the mosquito while biting.
This parasite directly infects the red blood corpuscles and causes roughly 400 million deaths per
year. Conventional analysis for malaria detection is the examination of a patient's stained blood
sample in a microscope. The sample blood smear is placed on a slide and observed under a
microscope to count the number of infected RBC manually. An expert operator is involved in the
examination of the sample having intense visual and mental concentration. It is a tiresome and
time-consuming process where precision depends on the skill of the operator. Machine-based
blood smear analysis and detection of malaria-infected cells have opened a new area for early
malaria detection. This system has shown the potential to overcome the drawbacks of manual
strategies. As the traditional diagnostic process is problematic and error-prone, in this report, we
are developing a machine learning-based malaria cell segmentation system using watershed
algorithms. The result shows that the proposed methodology achieved more accurate results
Page || 2
Table of Contents
CHAPTERS PAGES
ABSTRACT 2
CHAPTER 1: INTRODUCTION 5
1.1 Malaria 6
1.7 Motivation 10
Page || 3
CHAPTER 4: EXPERIMENT RESULT 19
REFERENCES 27
Page || 4
CHAPTER 1: INTRODUCTION
Page || 5
In this chapter, we are going to discuss malaria, why the automated segmentation of malaria is
needed, our project’s aim, objectives, and the motivation for doing this project.
1.1 Malaria
Malaria is one of the most threatening global problems, causing worldwide sufferings and
disease transmitted by the bite of an Anopheles mosquito, which is already infected with
plasmodium parasites. Then the parasite is released into the human bloodstream when the
mosquito bites. Upon getting into the bloodstream, they travel to the liver of the infected person
In 2018, an estimated 288 million malaria cases were found, and among these numbers of
deaths stood at 405000. The most vulnerable group affected by malaria is children under 5; in
that year, they accounted for 67% of the total death toll. According to the WHO African Region
carries a high percentage of global malaria. In 2018, 93% of global disease and 94% of the total
Malaria parasitic cells, which means plasmodium parasites infected cells can be detected
visually using chemicals in RBC [2]. The staining process colorizes the RBC but highlights the
malaria detection is done by hand. Which is a very error-prone method to detect a dangerous
Page || 6
disease like malaria. Because a small error in the detection process can cause someone's
death.
Figure 1: Conventional Process of malaria diagnosis ( a. A patient goes for a test, b. A blood
sample is taken, c. The sample is placed on a slide, d. The sample is stained with contrasting
agents, e. Malaria parasites get highlighted, f. Clinician examines the slides manually) [3]
The process of identifying cells from blood smears is called cell segmentation. It is the serial
repetitions of similar organs such as tissues, cell types, or body cavities. Cell segmentation is
the most basic and essential step for the analysis of microscopic cell images. Cell segmentation
can be done by many approaches like traditional machine learning, deep learning, and different
using automation, one of which is identifying blood cells separately when there are overlapping
Page || 7
cells. But, in this modernization of technologies, automation is the best of doing cell
segmentation.
Humans learn from past experiences if they want to, but computers do not. It was one of the
differences between humans and computers in ancient times. However, computers or machines
need instructions to accomplish a task because they are strict logic machines having zero
common sense. Hence detailed, step by step instructions on exactly what has to be done must
be provided to these machines for accomplishing a task. Thus scripts are written, and
computers are programmed to follow and run on the given instructions. That is where machine
learning comes in as the concept consists of training computers or machines based on the
experiences from past data. Machine learning is simply a probabilistic approach for solving
some real-life problems based on data of previous records and which no longer needed human
assistance as before. A large number of people often use the term artificial intelligence (AI) and
intelligence (AI). The system has the power to mechanically learn and improve from expertise
A classical algorithm for separating different objects in an image termed as ‘segmentation’ is the
This system aims to phase the image usually, once two regions of interest are near to one
another or the objects in the image touch each other. In the blood smears, most of the blood
Page || 8
cells are not adequately scattered. In these circumstances, it becomes challenging to segment
all the blood cells in a particular smear. The watershed is used for this overlapping issue since it
Distance transform helps to calculate the difference between a pixel and non zero pixels nearest
to it. The strategy works very well on the rounded objects and binary images so that the darkest
time-consuming, and the accuracy of diagnosis depends on the operator’s concentration level
and mental state. Malaria can cause serious health hazards to the patient and can even cause
Page || 9
The points below are our project’s aims and objectives:
❖ To find an optimized algorithm to segment malaria blood cells for further detection
automatically.
❖ To make an easily implementable model for the real world for the betterment of the mass
people.
❖ To assist the medical practitioners in segmenting the cells efficiently without wasting a lot
of time.
1.7 Motivation
It is quite clear that malaria is prevalent throughout the world, particularly in tropical regions. The
motivation of this project is based totally on the character and fatality of this disease. Initially, if
the infected mosquito bites a person, parasites carried all the way will enter the blood and
slowly start destroying the red blood cells. Typically the first symptoms of malaria are just like
the flu or an endemic. The affected person starts feeling unwell within a few days or weeks after
the mosquito bite. Although these lethal parasites can live in the hosts for over a year without
any problem, thus, a put off in the proper treatment can lead to complications, eventually death.
Hence, rapid and fruitful malaria tests and detection can save a million patients from dying.
Computer vision techniques for malaria diagnosis represents a new area for early malaria
detection. According to the WHO malaria parasite counting protocol, a clinician may have to
count up to 5,000 cells [5] manually. Hence this error-prone and time-consuming visual
Page || 10
CHAPTER 2: LITERATURE REVIEW
Page || 11
While working on this project, we studied different research papers to learn more about the
approaches we can take to solve this problem. We selected a few from there, because of the
similarities they had with our project, and then we tried to take ideas from their works
A study has been done in [6] by Suman Kunwar to detect malarial parasites by constructing a
new image processing system for the detection and quantification of plasmodium parasites in
blood smears. Gradually they have developed Machine Learning algorithms to learn, detect,
and determine the types of infected cells. Here image acquisition is the first process. Malaria
infected images that are less noisy and devoid of artifacts were used. Segmentation of blood
smears has been done by identifying common properties. Pixels share intensity in a region.
Hence a natural way to segment such areas is thresholding. It is the separation of light and dark
areas. Thresholding creates binary images by turning all pixels below some threshold to zero
and all pixels above that threshold to one. Pixels labeled one denote an object, and zero
indicates background. For further processing, enhancement is done on the input image after
thresholding. Erosion and dilation are fundamental steps for morphological processing. It helps
in detecting the objects. For the segmentation part, two kinds of segmentation have been done
here. Firstly Watershed Segmentation is a relatively new approach, and secondly, Color-based
segmentation. They have tested 40 images. Although there are some errors, if at least one
In [7], Weikang Wang and Yi-Jiun Chen and others have used a different strategy that combines
CNN and the watershed algorithm. At first, CNN is trained to learn Euclidean distance transform
Page || 12
(EDT) of binary masks according to the input images. Again they have trained another CNN,
which is a faster R-CNN (Region with CNN). It detects individual cells in the Euclidean distance
transform (EDT) image (deep cell detector). In the following step, the watershed algorithm was
applied for the final segmentation using the previous two steps. The combined method and
different types of pixel-wise classification methods achieved similar pixel-wise accuracy, But the
combined approach had made higher cell count accuracy than the other ones. Pixel-wise
classification had a drawback of separating connected cells as well as the cells connected by
blurry boundaries. Nevertheless, deep-distance estimators and deep cell detectors are easy to
In [8], Yousef Al-Kofahi and Mirabela Rusu and others have designed a single channeled cell
segmentation algorithm. A cytoplasm marker has been used in this research, which shows
hypo-intense nuclear regions and hyper-intense cellular regions. In the first step, a deep
learning predictive model has been trained using the images of the dataset. The model is
trained to implement image patches of 160x160 pixels to predict three different labels. The
second step is the deep learning inference, where the unseen image is divided into 176x176
patches. It results in the creation of a probability map of nuclei, cytoplasm, and background.
Then the patches are joined together for the prediction of the full image. In the third step, a
multiple level Laplacian of Gaussian (LoG) blob detector is applied. It results in enhancing the
implemented for extracting the binary nuclear mask. Segmented nuclei have been used as
seeds for the robustness of the mentioned design. Background labels and segmented nuclei,
which were identified earlier, have been used in the seeded watershed segmentation.
Page || 13
CHAPTER 3: METHODOLOGY
Page || 14
This chapter gives an overview of the different parts of the work chronologically. It mainly
discusses the theories, techniques, and step by step workflow of the work
3.1 Workflow
A complete workflow diagram of the proposed method is shown in the figure below.
Page || 15
3.2 Data Collection
The Images [9] in .png or .jpg format. There are three sets of pictures consisting of 1364
pictures (~80,000 cells) with totally different researchers having ready everyone: from Brazil
(Stefanie Lopes), from the geographic region (Benoit Malleret), and time course (Gabriel
The data consists of 2 categories of clean cells (RBCs and leukocytes) and four types of
infected cells (gametocytes, rings, trophozoites, and schizonts). The info had a significant
imbalance towards clean RBCs versus clean leukocytes and infected cells, creating over
A class label and set of bounding box coordinates got for every cell. For all knowledge sets,
infected cells got a category label by Stefanie Lopes, protozoal infection investigator at the Dr.
Heitor Vieira Dourado medical specialty Foundation hospital, indicating the stage of
3.3 Preprocessing
Preprocessing is done on a dataset before applying any algorithm to increase features of the
dataset. In the first step of preprocessing the data, we are converting the single-channel image
into a three-channel RGB image, which will help in the next preprocessing steps.
Page || 16
Figure 5: splitting the input image and merging (a. R-channel, b. G-channel, c. B-channel, d.
3-channel image)
Then we turned the 3-channel image into a grayscale image. This grayscale image is used for
thresholding the image. In this case, we are using Otsu’s binarization. Then we are filtering the
resulting image using dilation followed by erosion and a 2x2 kernel for the filtration process.
In the blood smears, the RBC is very near to each other, and sometimes even overlapping each
other. It leads to the miscount of the RBC, hence a health hazard. Watershed transformation is
Page || 17
Figure 6: Watershed Transform (a. Greyscale, b
. Threshold, c. Filtered, d. Sure Background, e.
the given image pixel. To apply the watershed transform, firstly, we are finding the sure
background and foreground of the resulting image of the preprocessing steps. Distance
transform helps to calculate the difference between the pixel and non-zero pixels nearest to it,
which allows us to find the sure foreground of the image. Euclidean distance transformation is
used for calculating the distance between the background and foreground, from where we are
generating the unknowns, which helps us to plot the markers. After getting the unknown
Page || 18
CHAPTER 4: EXPERIMENT RESULT
Page || 19
This chapter gives an idea of the results of our experiment. It also discusses and analyzes
different results.
After training the model, here are some of the sample test data that we tested and checked our
results. We made a split of train and test images with a ratio of 3:1 In the following figure, output
images are bounded with a red-colored region. The cells of the input blood smear are
successfully segmented. These sample images were taken from the test images.
Page || 20
4.2 Result Analysis
Our technique is segmenting the blood cells successfully in some cases with the right level of
accuracy from the results of our experiment. In some of the cases, it is failing to segment the
Figure 8: Some of the output images (a. Densely Overlapping cells with light boundaries, b.
Sparsely overlapping cells , c. Sparsely Overlapping cells with dark boundaries).
From Figure 8.a, we can get that; our technique is not segmenting cells properly where the
blood smears have cells that are overlapping and have light boundaries. Here the segmentation
method was able to identify 43 out of 69 cells(Figure 9a). From Figure 8.b, we can see that the
cells are distributed and do not overlap cells as much as Figure 8.a. In this case, the
segmentation method detected almost 73 out of 76 cells(Figure 9b), which is very accurate.
Lastly, from Figure 8.c, we can observe that the cells are densely overlapping, and dark
Page || 21
Figure 9: Comparison of three scenarios.
The bar chart, in Figure 9, shows a comparison of three scenarios of the data and how the
method of our approach performed in these scenarios. We can observe that the number of
Page || 22
The bar chart in Figure 10 shows that the method we used was able to segment the cells with
an impressive rate of accuracy overlapping cells where a moderate amount of contrast was
present. But the accuracy goes down when the amount of contrast present in the smear is low.
The dataset we used came from ex vivo samples from Plasmodium vivax infected patients in
Brazil. Seven labels used to cover all possible cell types, such as RBC, leukocyte, gametocyte,
ring, trophozoite, and schizont. RBCs and leukocytes are uninfected cell types generally found
in the blood. Some cells marked as difficult when not clearly in any one of the classes, but those
marked difficulties ignored in training. The data is also naturally imbalanced among the object
classes.
Hence, we also manually counted the accuracy of our model for 50 images. In this process, we
calculated the total number of cells in those images and the number of cells recognized by the
model. In this case, we achieved an accuracy of 75%, which is a satisfactory result. So we have
found that cells present in the blood smears have detected successfully. More discussion has
done comparing our efficiency with other models in the next chapter.
Page || 23
CHAPTER 5: CONCLUSION
Page || 24
In this chapter, we will discuss and compare our works with other noble works in this field, the
challenges we faced while working on the project, and how we could make our work better and
5.1 Discussion
A few notable works related to our work discussed earlier. [10] used the watershed threshold
like ours and achieved an accuracy of 97.7%. Though their dataset was small and different,
containing only 250 RBC images, their result is excellent. Since they have used watershed
algorithms, but a different dataset was implemented on their system, we can not compare our
In [11], they have used the same data set [9] like ours, but their model was different. Firstly,
using traditional machine learning segmentation, their model attained an accuracy of 50% in
segmenting the cells of the images. Then, Two-stage classification is done using faster R-CNN
attaining accuracy of 59% then 98%, respectively disregarding background, RBCs, and delicate
cells. Thus they achieved a significant improvement over the one stage classification method
In our work, we used 800 images taken from the dataset of kaggle [8], where different sets for
train and test are present. Therefore, the dataset is not biased. We used 600 images from the
train set and 200 images from the test set, achieving an accuracy of 75%. As we used the
minimum number of features in our work, it does not require high computation power as well as
time.
Page || 25
5.2 Summary
The death toll due to malaria is increasing day by day. Finding an optimized algorithm for
segmenting the blood cells from the blood smear images might help reduce the deaths of the
patients suffering from this disease. Our proposed method is more straight forward and
optimized than the conventional detecting process. With an accuracy of 75%, our model might
be implemented in the real world to detect the blood cells as it takes less than a second to
segment the cells in a blood smear. Increasing the dataset size may make the model more
credible through the dataset size is more significant than other works in this field. Minimal
features of Machine Learning have been used in our model to make the cell segmentation
We want to work on different datasets to test our algorithm to make it more credible. Though the
achieved accuracy is satisfactory, we would like to compare our machine learning model’s
performance with a deep learning model using the Convolutional Neural Network (CNN) on the
same dataset. Besides, we would like to work on pixel rendering for increasing the accuracy of
our model.
Page || 26
REFERENCES
[2]P. Bloland, Drug resistance in malaria. Geneva: World Health Organization, 2001.
[3]"Medical Image Analyses for Malaria Detection", Medium, 2020. [Online]. Available:
https://towardsdatascience.com/medical-image-analyses-for-malaria-detection-fc26dc39793b.
[Accessed: 11- Mar- 2020].
[5]"Strategy, speed and collaboration are essential to eliminate malaria", W ho.int, 2020.
[Online]. Available:
https://www.who.int/westernpacific/news/feature-stories/detail/strategy-speed-and-collaborati
on-are-essential-to-eliminate-malaria. [Accessed: 19- May- 2020].
[6].[Kunwar, Suman & Shrestha, Manchana & Shikhrakar, Rojesh. (2018). Malaria Detection Using
Image Processing and Machine Learning. [Accessed: 11- Mar- 2020].
[7]W. Wang et al., "Learn to segment single cells with deep distance estimator and deep cell
detector", C
omputers in Biology and Medicine, vol. 108, pp. 133-141, 2019. Available:
10.1016/j.compbiomed.2019.04.006.
[8]Y. Al-Kofahi, A. Zaltsman, R. Graves, W. Marshall and M. Rusu, "A deep learning-based
algorithm for 2-D cell segmentation in microscopy images", BMC Bioinformatics, vol. 19, no. 1,
2018. Available: 10.1186/s12859-018-2375-z.
[10]K. Charpe, V. Bairagi, S. Desarda and S. Barshikar, "A Novel Method for Automatic Detection
of Malaria Parasite Stage in Microscopic Blood Image", I nternational Journal of Computer
Applications, vol. 128, no. 17, pp. 32-37, 2015. Available: 10.5120/ijca2015906763.
[11]J. Hung et al., "Applying Faster R-CNN for Object Detection on Malaria Images", a
rXiv.org,
2020. [Online]. Available: https://arxiv.org/abs/1804.09548. [Accessed: 17- May- 2020].
Page || 27