full
full
INTRODUCTION
Malaria, a significant cause of global mortality and illness, is induced by protozoan parasites from
the genus Plasmodium. It spreads through the bite of infected female Anopheles mosquitoes.
Symptoms of malaria can emerge within 6-8 days after being bitten or may appear several months
Malaria presents with a range of symptoms, including fever, fatigue, vomiting, headaches,
jaundice, diarrhea, excessive sweating, muscle and joint pain, seizures, and can result in coma or
death if not promptly and properly treated. The primary method for identifying malaria parasites
in human red blood cells is through microscopic examination. However, the accuracy of this
method can vary depending on the skill level of the microscopist. (Bayingana. et al, 2019)
Malaria remains a significant global health challenge, with over 3.2 billion people in 91 countries
still at risk of the disease. The most vulnerable groups of people are children who have not
1
Objectives to achieve
1. To review existing literature on machine learning and deep learning techniques that aid in early diagnosis.
3. To prepare the data and tools used to create model for diagnosis prediction.
1. Data Collection and Preprocessing: Gather and prepare relevant medical data, laboratory test
2. Feature Selection and Engineering: Select appropriate features from the dataset or perform
feature engineering to extract relevant information that can aid in the detection process.
3. Model Selection: The CNN model performs well for malaria cell detection, achieving high
This study will focus on utilizing machine language to create a model that can detect parasitic cells by
The successful development of the model can provide valuable insights for doctors. It can reduce the
problems caused by late diagnoses and can also help the doctor.
2
CHAPTER TWO
REVIEW OF LITERATURE
2.0 Introduction
Malaria is a disease that is caused by protozoan parasites of the genus Plasmodium (Kohli, & Das,
2018). It is mostly transmitted through the bite of an infected female Anopheles mosquito. Malaria
parasites have an intricate life cycle that encompasses both mosquito vectors and vertebrate hosts,
such as humans (Kataria. et al, 2022). Malaria is a major global health concern, leading to
Several factors can magnify the risk of contracting malaria. Some of the key risk factors include:
especially in sub-Saharan Africa, which is the main location for most cases. Additional
impacted regions include some locations in India, Brazil, and Southeast Asia.
healthcare access have an elevated risk owing to insufficient preventative and treatment
alternatives.
stagnant water habitats such as marshes and swamps. Therefore, being in or in close
3
4. Travel History: Individuals who go to regions with a high prevalence of malaria, without
taking appropriate prophylactic or preventative measures, are more likely to get infected
5. Susceptibility: Individuals who have not previously been exposed to malaria, such as small
children or individuals without immunity, are more likely to have severe illness. Pregnant
women are more susceptible to harm because of changes in their immune system during
It is essential to take preventive measures, such as using insecticide-treated bed nets, taking
The symptoms of malaria vary thereby depends on the type of parasite which causes the infection
2. Chills and shivering: Frequently accompanying the fever, these symptoms may be quite
intense.
4. Joint Pain: Numerous individuals have discomfort in their joints and muscles.
4
6. Nausea and Vomiting: Gastrointestinal symptoms such as nausea and vomiting are often
noted.
9. Anemia: Anemia is a condition characterized by the process of red blood cell destruction
10. Jaundice is a condition where the skin and eyes become yellow as a result of liver
It is important to note that symptoms of malaria can develop within a span of 6-8 days after a
mosquito bite from an infected insect, even several months later. If you have reason to believe that
you may have contracted malaria or have just visited a region with a high incidence of malaria and
are experiencing these symptoms, it is crucial to swiftly seek medical help for diagnosis and
treatment.
In terms of treatments, malaria is a curable illness when managed well and early. Quinine, extracted
from the bark of the Andean Cinchona tree, was the first extensively used medication for treating
malaria. Nevertheless, the parasite has the ability to quickly acquire resistance to widely used
medications such as Fansidar and chloroquine in several areas. In order to address this issue, a
treatment approach known as combination therapy is being used more often. This involves the use
of artemisinin, derived from the Artemisia annua plant, in conjunction with other medications like
5
It is crucial to swiftly seek medical aid for more accurate diagnosis and appropriate treatment if
history, and laboratory testing. Here are some common methods used for diagnosing malaria:
1. Microscopic examination is often regarded as the most reliable way for diagnosing malaria.
The process entails extracting a blood sample from the patient, which is then applied as a
thin layer onto a glass slide. The slide is treated with a dye, often Giemsa stain, which
enhances the visibility of the malaria parasites. An adept laboratory technician uses a
microscope to scrutinize the stained blood smear in order to detect the existence of malaria
parasites and ascertain their kind. This approach is very efficient but it requires workers
who have received extensive training and the availability of appropriate laboratory
facilities.
developed to identify and detect distinct antigens generated by malaria parasites. These
tests are user-friendly and provide findings within a timeframe of 15-30 minutes. A
minute quantity of blood is applied onto a diagnostic strip, and in the presence of malaria
antigens, a discernible line becomes apparent on the strip. Rapid diagnostic tests (RDTs)
are especially valuable in areas that lack access to microscopy, such as distant or
based on the test's quality and the incidence of malaria in the region.
Polymerase Chain Reaction (PCR) is a molecular method used to amplify the genetic
6
material of malaria parasites. This allows for the detection of even very low quantities of
parasites in the blood. This approach has a high level of sensitivity and specificity,
making it beneficial for verifying instances that are difficult to diagnose using
microscopy or RDTs. PCR may further be used for species identification of malaria
parasites and the detection of mixed infections. Nevertheless, the use of this technology is
limited in some environments due to the need of specific equipment and skilled staff.
3. Serological assays identify the presence of antibodies against malaria parasites in the
patient's blood. Although serological tests may provide evidence of previous exposure to
malaria, they are not often used to diagnose current infections. This is due to the fact that
antibodies have the ability to persist in the circulation for extended periods of time, ranging
from months to even years, after the resolution of the illness. Hence, serological tests are
more valuable for conducting epidemiological research and determining the prevalence of
4. Clinical Diagnosis: In areas where malaria is often found, healthcare professionals may
diagnose malaria by evaluating the patient's clinical symptoms and medical background.
Typical manifestations of malaria are fever, chills, perspiration, headache, myalgia, and
vomiting, and diarrhea. Clinical diagnosis is often used in situations when laboratory
7
coincide with those of other ailments.
5. Blood cultures, however not often used for malaria diagnosis, may sometimes be applied
to identify the existence of malaria parasites in the circulation. This technique involves
placing a blood sample in a culture medium to facilitate the growth and proliferation of any
parasites that may be present. The cultivated specimen is then scrutinized under a
microscope to discern the presence of parasites. Blood cultures are mostly used for the
diagnosis of bacterial diseases. However, they may also be beneficial in certain instances
network designed specifically for analyzing data with a grid-like structure, such as images. A
digital picture is a representation of visual data using just two possible values, often 0 and 1. It
consists of a grid-like arrangement of pixels, each with pixel values indicating their brightness and
The human brain rapidly and efficiently analyzes a vast quantity of information as soon as we see
a picture. Every individual neuron operates within its own receptive field and is interconnected
with other neurons in a manner that collectively covers the full visual field. Similar to how each
neuron in the biological vision system only reacts to stimuli within a certain area of the visual field
8
known as the receptive field, each neuron in a CNN also analyzes data exclusively within its own
receptive field. The layers are organized in a manner that enables them to first recognize simpler
patterns, like as lines and curves, and then progress to identifying more intricate patterns, such as
faces and objects. Utilizing a Convolutional Neural Network (CNN) allows for the provision of
i.Convolutional layer
CNN operates by doing a meticulous comparison of photos fragment by fragment. Filters have a
limited spatial size in terms of width and height, but they cover the whole depth of the input
picture. The architecture of the system allows it to identify a certain characteristic in the input
picture.
During the convolution layer, the filter/kernel is systematically shifted to each conceivable point
on the input matrix. The input picture is multiplied element-wise with the filter-sized patch, and
the resulting values are then summed by translating the filter to every possible point of the input
matrix of the picture, one may determine if a certain feature is present somewhere in the image.
Convolutional neural networks have the ability to simultaneously learn from numerous features.
During the last phase, we arrange all the output feature maps along with their depth and generate
the result. Feature maps are an essential component of Convolutional Neural Networks (CNNs).
9
Some crucial words that may come across when studying Convolutional Neural Networks. Local
connection pertains to pictures that are represented as a matrix of pixel values. The dimension of
the picture expands according to its size. When all the neurons in a layer are linked to all the
neurons in the preceding layer, as in a completely connected layer, the number of parameters
In order to address this issue, we establish a connection between each neuron and a specific
section of input data. The geographic extent, sometimes referred to as the receptive field of the
Here is the practical implementation of how it functions— Assume that our input picture has
dimensions of 128 pixels in width, 128 pixels in height, and 3 color channels. Given a filter size
of 5*5*3, each neuron in the convolution layer will possess a sum of 75 weights (along with an
additional bias parameter of +1). The size and placement of neurons in the output volume are
determined by spatial organization. There are three hyperparameters that determine the
The depth of the output volume is determined by the number of filters used to detect various
characteristics in the picture. The output volume consists of stacked activation/feature maps,
Stride is the term used to describe the distance, measured in pixels, that we move the filter while
comparing it to the input picture patch. When the stride is set to one, the filters are shifted by one
pixel at a time. Increasing the stride will result in lower spatial output volumes.
10
Zero-padding is a technique that enables us to manipulate the spatial dimensions of the output
Parameter Sharing refers to the use of a single weight matrix to operate on all neurons inside a
certain feature map. This implies that the same filter is applied to various sections of the picture.
The ReLU activation function is used in this layer, where any negative value in the output
volume from the convolution layer is substituted with zero. This is done to avoid the
Pooling layers are inserted between two convolution layers with the primary objective of
The pooling layer is characterized by two hyperparameters; Window size refers to the size of the
sliding window used in a computational process. Stride, on the other hand, refers to the step size
For each window, we choose either the highest value or the average of the values in the window,
The Pooling Layer performs spatial resizing on each depth slice of the input individually, and
i. Max Pooling extracts the highest value from each window of the feature map. Therefore, after
11
the max-pooling layer, the resulting output would be a feature map that consists of the most
ii. Average Pooling calculates the mean value of the elements within the area of the feature map
that is covered by the filter. It computes the mean value of the features in the feature map.
It is important to note that Max Pooling outperforms Average Pooling. (Baheti, 2024)
Normalization layers serve the purpose of standardizing or normalizing the output of the
preceding levels. The insertion of this component between the convolution and pooling layers
enhances the network's ability to learn independently at each layer and mitigates the risk of
overfitting the model. Nevertheless, sophisticated designs do not use normalization layers due to
neural network where each neuron is connected to every neuron in the previous layer. (Baheti,
2024) The Convolutional Layer, in conjunction with the Pooling Layer, constitutes a module
inside the Convolutional Neural Network. The quantity of these layers may be augmented to
capture more intricate features, contingent upon the intricacy of the job, but at the expense of
more computer power. A fully connected layer in a convolutional neural network (CNN) is a
layer where each neuron is connected to every neuron in the previous layer. This layer is
responsible for learning complex patterns and relationships in the input data.
After successfully completing the key feature extraction process, we will flatten the final feature
representation and input it into a fully-connected neural network for the purpose of picture
12
2.5 Related works
(Kazeem & Adebanji, 2021) discusses a model developed by researchers at Osun State University
for predicting malaria outbreaks. The study focuses on the relationship between meteorological
data and malaria outbreaks, utilizing machine learning algorithms to make predictions based on
this data.
1. Feature selection process using correlation matrix to identify relevant features for training
2. Data partitioning into training and testing sets, with the application of classification
algorithms like KNN, Support Vector Machine, Logistic Regression, Linear Regression,
3. Results showing the training and testing accuracies of different algorithms, with Naive
Machine learning was utilized to develop a model for predicting malaria outbreaks. Here is how
1. Data Acquisition: Meteorological data and malaria incidence data from Osun State,
Nigeria, were collected for the years 2010-2020. This data served as the basis for
13
2. Data Preprocessing: The collected data underwent preprocessing, which involved
learning algorithms.
3. Model Training: The data was divided into a training set (70%) and a test set (30%).
Neighbor (K-NN), Logistic Regression (LoR), Linear Regression (LiR), and Naive
4. Model Evaluation: After training, the models were tested using the remaining 30% of
the preprocessed data. The performance of the models was evaluated using metrics such
as accuracy, with Naive Bayes showing the best accuracy for predicting malaria
outbreaks.
to analyze the data and predict malaria outbreaks based on meteorological and
incidence data.
6. Confusion Matrix: The performance of the prediction models was assessed using a
confusion matrix, which helps evaluate the accuracy of the models in predicting malaria
outbreaks.
By leveraging machine learning techniques and algorithms, the researchers were able to develop
a predictive model that could assist in forecasting malaria outbreaks based on the collected data.
14
Table 2.1: A Summary of Related Works
Machine used to
(SVM), K- develop a
Regression outbreak
(LoR), Linear
Regression
(LiR), and
Naive Bayes,
were trained
using the
training data.
15
2 Zhang et Convolutional Data The CNN The study faced
categories in
classifying
the cells
16
3 Hamisu Malaria Developing a The accuracy obtained in The research study is
Ahmad using Bayes on decision the 4 different classification only on malaria cases
data Increasing
performance of
classification
algorithm which is
significance and
power of Decision
17
OneR classifiers in
supervised learning.
18
4 Rajaraman Pre-trained Utilized pre- Performance: Achieved an The study's results
toward feature
malaria malaria
parasite parasite
detection in detection in
images.
19
5 Dong et al. Evaluations of Evaluated Reported accuracy rates The dataset was
automatic automatic
identification identification.
of malaria
infected cells.
diagnosis malaria
diagnosis
20
7 Bajpai et Deep learning Developed Achieved accuracy levels of Models need more
al. (2020) models for and evaluated 94% to 97%. extensive testing
microscopic
blood smear
images
21
CHAPTER THREE
METHODOLOGY
3.0 Introduction
The chapter covers the description and discussion of the various techniques that will be used in the
study to collect and analyze data along with different procedures and methods that will be
implemented for the development of parasitized cell detection using deep learning.
Kaggle was the secondary source from which the data for this project was sourced. On the website
Kaggle, users and data scientists may collaborate with one another to solve global data science
problems, publish their datasets, and create machine learning models in a web-based environment.
malaria).
The dataset "Cell Images for Detecting Malaria" on Kaggle contains images of blood cells infected
with malaria parasites, as well as healthy cells, for the purpose of training machine learning models
22
Figure 3.1: Parasitized cell
The model is a CNN built using TensorFlow and Keras. The model is a Convolutional Neural
Network (CNN) designed for binary classification of images into parasitized or uninfected cells It
consists of:
4. An output layer with a single neuron and sigmoid activation for binary classification.
23
This architecture uses convolutional layers to extract features from the images, max-pooling layers
to reduce dimensionality, and fully connected dense layers to perform the final classification. The
model is designed to differentiate between parasitized and uninfected cells based on the learned
The diagrams that will be used to design a model for malaria-infected cells are:
The model's implementation is explained in the architecture. 30% of the data was used for testing
the model and 70% of the data was obtained from Kaggle for the initial data set.
24
3.3.1 Architectural Diagram for the model
25
3.3.2 Flow chart diagram
26
3.4 System Development Life Cycle
The software development lifecycle (SDLC) is the cost-effective and time-efficient process that
development teams use to design and build high-quality software. The goal of SDLC is to minimize
project risks through forward planning so that software meets customer expectations during
production and beyond. This methodology outlines a series of steps that divide the software
development process into tasks you can assign, complete, and measure. Software development can
systematic management framework with specific deliverables at every stage of the software
development process. As a result, all stakeholders agree on software development goals and
requirements upfront and also have a plan to achieve those goals. A software development lifecycle
implement it. Different models arrange the SDLC phases in varying chronological order to
optimize the development cycle. In traditional software development, security testing was a
separate process from the software development lifecycle (SDLC). The security team discovered
security flaws only after they had built the software. This led to a high number of bugs that
remained hidden as well as increased security risks (“What Is SDLC? - Software Development
Lifecycle Explained - AWS,” n.d.). The SDLC diagram which shows the sequential steps to be
27
Figure 3.5: SDLC diagram for the Models to be Built
1. Problem definition: The problem definition in this project is to detect parasitic and non
parasitic cells
28
2. Data selection: The dataset was gotten from Kaggle, which was described in Table 3.1
3. Exploratory Data Analysis: Tableau would be used to provide any insight and graphical
representation
4. Model selection: When evaluating which model performs best on the malaria cell image
6. Testing: The Model was tested to check if it is functional before it is then evaluated
8. Review: In this last phase, all other phases were reviewed to know if the model provided a
29
CHAPTER FOUR
4.0 INTRODUCTION
This chapter reported the implementation procedure for student’s academic performance using
python and visualization tools. It consists of 27,558 images of cell images, classified into
parasitized and uninfected categories and the data virtualization on the essential attributes,
performance evaluation of the algorithm chosen, code snippets, screenshots of the model, and the
The Kaggle dataset "Cell Images for Detecting Malaria" consists of 27,558 images of cell images,
classified into parasitized and uninfected categories. Each category contains thousands of images,
which can be used to train machine learning models for detecting malaria from blood smear
samples. The dataset is structured into two main folders: "Parasitized" and "Uninfected," each
containing images in PNG format. It was gotten through the link below;
https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria
Before data exploration was carried out for the training of the model using the random forest
1. Shutil
30
2. Numpy
3. Tensorflow
4. Panda
5. Shutil: This module offers a number of high-level operations on files and collections of
6. Numpy: A fundamental package for scientific computing with Python. It supports large,
7. TensorFlow: An open-source library developed by Google for machine learning and deep
learning. It’s widely used for building and training neural networks.
After the necessary libraries for virtualization and preprocessing had been imported as shown in
Figure4.1a, the next step was the reading of the data from the directory it’s located using pandas
library, the head and tail of the dataset was printed. Figure 4.1b also shows the head and tail of the
31
Figure4.1a: Python libraries imported into the Jupyter Notebook
The dataset contains 27,558 images classified into "Parasitized" and "Uninfected" categories,
32
4.2.1 Library tools for Data Visualization
i. Anaconda environment
The creation of directory paths and using functions to manage file operations without altering the
actual structure.
33
4.3 Data Preprocessing
Data processing is the steps taken to prepare and manipulate the data for training a machine
learning model. This includes handling, transforming, and augmenting the data to ensure it is
suitable for input into the model. Here’s a breakdown of the data processing steps used in this
project:
Splitting images into training and testing sets based on a specified ratio. The ratio used is 80 for
34
Figure 4.3a: Shows the splitting of images into training and testing sets
Rescaling image pixel values and augmenting training data, which improves model generalization.
using ImageDataGenerator.
35
Figure 4.3b: Shows the rescaling and augmentation of image
The model is created using TensorFlow Keras for image classification. Here’s a concise
explanation of how the model is structured and created based on the code:
The Sequential() function initializes a linear stack of layers. This model allows you to add layers
36
4.4.2 Convolutional Layers:
Convolutional layers (Conv2D) are added to extract features from the input images. Each Conv2D
layer applies a set of filters to the input, with each filter learning different features.
After each Conv2D layer, a MaxPooling2D layer is added to downsample the feature maps,
reducing their spatial dimensions. This process helps in focusing on the most important features
The Flatten() layer converts the 2D feature map into a 1D vector. This flattened output is then fed
Dense layers (Dense) are fully connected layers that perform the final classification based on the
The model architecture includes two dense layers: one with 512 units and ReLU activation for
learning complex representations, and the final layer with 1 unit and sigmoid activation for binary
4.4.5 Compilation
The compile() method configures the model for training. Here, the optimizer (RMSprop) adjusts
the model's weights during training to minimize the binary_crossentropy loss function. The metric
37
4.4.6 Model Summary
The summary() method prints a detailed summary of the model architecture, including the type
and number of parameters in each layer. This summary helps in understanding the model's structure
38
Figure 4.4a: Shows the creation of model
39
4.5 Evaluation Metrics
Based on the observed accuracy, loss metrics, and visualizations from the code snippet:
The CNN model performs moderately well for malaria detection, achieving an average accuracy
The plotted graphs confirm the model's learning progress and stability, showcasing consistent
This indicates that the model is well-suited for classifying parasitized and uninfected cell images,.
40
Figure 4.5a Shows the validation and conclusion reached
41
Confusion Matrix: The code first resets the validation data generator and uses the model to predict
the class labels for the validation set. These predictions are compared to the true class labels to
compute the confusion matrix, which is then printed. The confusion matrix shows the counts of
true positive, true negative, false positive, and false negative predictions, providing insight into
the model's accuracy for each class. The matrix is also visualized using a color-coded plot, with
Classification Report: Following the confusion matrix, the code prints a classification report that
includes precision, recall, and F1-score for each class (Parasitized and Uninfected). These metrics
offer a detailed view of the model's performance, highlighting how well it distinguishes between
the two classes. The report helps in assessing the model's effectiveness and identifying areas for
improvement.
42
Figure 4.5b Shows the code matrix
43
Figure 4.5c: Shows the confusion matrix and the classification report
44
CHAPTER FIVE
5.0 Introduction
This chapter outlines the study's limitations and summarizes the investigation from the
5.1 Summary
This study will focus on utilizing machine language to create a model that can detect parasitic cells
by relying on cell images obtained from the microscope. The methodology chapter details the data
collection, analysis of the model, design, and implementation procedures. Data was sourced from
Kaggle, specifically the "Cell Images for Detecting Malaria" dataset, which contains images of
parasitized and uninfected blood cells. A Convolutional Neural Network (CNN) was designed
using TensorFlow and Keras to classify these images. The CNN model consists of three
convolutional layers, a flattening layer, a dense layer, and an output layer, all aimed at
The project follows a systematic approach using the Software Development Life Cycle (SDLC),
ensuring structured development and minimizing risks. The dataset is split into training and testing
sets, and various Python libraries are employed for data visualization and preprocessing. The CNN
model's architecture is described in detail, including its layers and functions. The implementation
chapter showcases the steps taken in developing the model, including data visualization,
preprocessing, and model creation. The results demonstrate the model's effectiveness in detecting
malaria-infected cells.
45
5.2 Conclusion
The developed CNN model demonstrates robust performance in classifying parasitized and
uninfected cell images, achieving high accuracy and effectively reducing loss over epochs. The
systematic approach of the SDLC, combined with the use of advanced data visualization and
preprocessing techniques, contributes to the model's success. The results indicate that the model is
well-suited for the task, showcasing consistent improvements in accuracy and stability across
5.3 Limitations
• Dataset Specificity: The model was trained specifically on the "Cell Images for Detecting
Malaria" dataset, which may limit its generalizability to other types of datasets or real-
• Data Quality and Diversity: The dataset used may not encompass all variations of malaria-
infected and uninfected cells, potentially impacting the model's performance on new,
unseen data.
• Computational Resources: Training deep learning models like CNNs require substantial
• Overfitting: Despite measures to prevent it, the model may still be prone to overfitting,
especially given the high number of parameters in CNNs and the specific characteristics of
46
5.4 Recommendations
increase the variability of the training data, thereby improving the model's robustness.
• Model Optimization: Explore and experiment with different model architectures and
• Real-World Testing: Validate the model with real-world data to assess its practical
model pruning or quantization, making the model more accessible for deployment in
resource-constrained environments.
47
REFERENCE
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). TensorFlow:
A system for large-scale machine learning. In *12th {USENIX} Symposium on Operating
Systems Design and Implementation ({OSDI} 16)* (pp. 265-283).
Ahmad, H. I. (2019). Malaria prediction using Bayesian and other machine learning
techniques (Master's thesis, African University of Science and Technology).
Bajpai, V. K., Mishra, A., & Deep, S. (2020). Deep learning models for automated diagnosis of
malaria disease using microscopic blood smear images. In 2020 International Conference
on Computational Intelligence and Knowledge Economy (ICCIKE) (pp. 1-5). IEEE.
https://doi.org/10.1109/ICCIKE48582.2020.9319361
Dong, Y., Jiang, Z., Shen, H., Pan, W., Williams, L. A., Osen, K. K., ... & Sahni, N. (2017).
Evaluations of deep convolutional neural networks for automatic identification of malaria
infected cells. In 2017 IEEE EMBS International Conference on Biomedical & Health
Informatics (BHI) (pp. 101-104). IEEE. https://doi.org/10.1109/BHI.2017.7897226
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., ... &
Oliphant, T. E. (2020). Array programming with NumPy. *Nature*, 585(7825), 357-362.
https://doi.org/10.1038/s41586-020-2649-2
Iradukunda, O., Che, H., Uwineza, J., Bayingana, J. Y., Bin-Imam, M. S., & Niyonzima, I. (2019).
Malaria Disease Prediction Based on Machine Learning. School of Computer Science and
Technology, Beijing Institute of Technology, Beijing, China.
Kataria, P., Surela, N., Chaudhary, A., Das, J., & Das, J. (2022). MiRNA: Biological Regulator in
Host-Parasite Interaction during Malaria Infection. International Journal of Environmental
Research and Public Health, 19(4), 2395.
Kazeem, I., & Adebanji, S. (2021). A model for predicting malaria outbreak using machine
learning technique. Scientific Annals of Computer Science.
Kohli, K., & Das, A. K. (2018). Clinicopathological profile of malaria patients in an Central
African United Nations hospital. https://doi.org/10.18203/2320-6012.ijrms20184915
Liang, Z., Powell, A., Ersoy, I., Poostchi, M., Silamut, K., Palaniappan, K., & Hossain, M. A.
(2018). CNN-based image analysis for malaria diagnosis. In 2018 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM) (pp. 2714-2721). IEEE.
https://doi.org/10.1109/BIBM.2018.8621490
Mishra, M. (2021, December 15). Convolutional Neural Networks, Explained - Towards Data
Science. Medium. https://towardsdatascience.com/convolutional-neural-networks-
explained-9cc5188c4939?gi=92ba89675a70
48
Python Software Foundation. (n.d.). Shutil — high-level file operations. In *Python
documentation*. Retrieved from https://docs.python.org/3/library/shutil.html
Rajaraman, S., Antani, S. K., Poostchi, M., Silamut, K., Hossain, M. A., Maude, R. J., ... & Thoma,
G. R. (2018). Pre-trained convolutional neural networks as feature extractors toward
improved malaria parasite detection in thin blood smear images. PeerJ, 6, e4568.
https://doi.org/10.7717/peerj.4568
Tangpukdee, N., Duangdee, C., Wilairatana, P., & Krudsood, S. (2009). Malaria diagnosis: A brief
review. The Korean Journal of Parasitology, 47(2), 93-102.
https://doi.org/10.3347/kjp.2009.47.2.93
Zhang, L., Xie, J., Shen, C., Ji, X., Wang, Y., & Ye, Q. (2019). Blood cell image classification
using convolutional neural networks. Journal of Medical Systems, 43(11), 353
49
APPENDIX
import tensorflow as tf
import shutil
import zipfile
import os
import random
print(os.listdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj'))
try:
os.mkdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj/images')
os.mkdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj/images/training')
os.mkdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj/images/testing')
50
os.mkdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj/images/training/para')
os.mkdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj/images/training/uninf')
os.mkdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj/images/testing/para')
os.mkdir(r'C:\Users\Belema\Desktop\DAVID ADEBESIN/proj/images/testing/uninf')
except OSError:
pass
files = []
if os.path.getsize(file) > 0:
files.append(filename)
else:
51
training_set = shuffled_set[:training_length]
testing_set = shuffled_set[training_length:]
shutil.copyfile(this_file, destination)
shutil.copyfile(this_file, destination)
PAR_SOURCE_DIR = r'C:\Users\Belema\Desktop\DAVID
ADEBESIN\proj\cell_images\Parasitized'
TRAINING_PAR_DIR = r'C:\Users\Belema\Desktop\DAVID
ADEBESIN\proj\images\training\para'
TESTING_PAR_DIR = r'C:\Users\Belema\Desktop\DAVID
ADEBESIN\proj\images\testing\para'
52
UNI_SOURCE_DIR = r'C:\Users\Belema\Desktop\DAVID
ADEBESIN\proj\cell_images\Uninfected'
TRAINING_UNI_DIR = r'C:\Users\Belema\Desktop\DAVID
ADEBESIN\proj\images\training\uninf'
TESTING_UNI_DIR = r'C:\Users\Belema\Desktop\DAVID
ADEBESIN\proj\images\testing\uninf'
# Split ratio
split_size = 0.8
print(len(os.listdir(r"C:\Users\Belema\Desktop\DAVID
ADEBESIN/proj/images/training/para/")))
print(len(os.listdir(r"C:\Users\Belema\Desktop\DAVID
ADEBESIN/proj/images/testing/para/")))
print(len(os.listdir(r"C:\Users\Belema\Desktop\DAVID
ADEBESIN/proj/images/training/uninf/")))
print(len(os.listdir(r"C:\Users\Belema\Desktop\DAVID
ADEBESIN/proj/images/testing/uninf/")))
53
train_datagen = ImageDataGenerator(
rescale=1.0/255.,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
model = tf.keras.models.Sequential([
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.MaxPooling2D(2,2),
54
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.summary()
train_generator = train_datagen.flow_from_directory(TRAINING_DIR,
batch_size=10,
class_mode='binary',
55
target_size=(150, 150))
validation_generator = validation_datagen.flow_from_directory(VALIDATION_DIR,
history = model.fit(
train_generator,
epochs=5,
validation_data=validation_generator,
callbacks=[early_stopping]
%matplotlib inline
#-----------------------------------------------------------
56
# sets for each training epoch
#-----------------------------------------------------------
acc=history.history['acc']
val_acc=history.history['val_acc']
loss=history.history['loss']
val_loss=history.history['val_loss']
#------------------------------------------------
#------------------------------------------------
plt.figure()
#------------------------------------------------
57
#-----------------------------------------------
batch_size=10,
class_mode = 'binary',
# Confusion Matrix
validation_generator.reset()
y_pred = np.round(Y_pred).astype(int).flatten()
true_classes = validation_generator.classes
class_labels = list(validation_generator.class_indices.keys())
cm = confusion_matrix(true_classes, y_pred)
58
print('Confusion Matrix')
print(cm)
plt.figure(figsize=(8, 8))
plt.title('Confusion Matrix')
plt.colorbar()
tick_marks = np.arange(len(class_labels))
plt.yticks(tick_marks, class_labels)
thresh = cm.max() / 2.
for i, j in np.ndindex(cm.shape):
horizontalalignment="center",
59
color="white" if cm[i, j] > thresh else "black")
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()
# Classification Report
print('Classification Report')
60