Using Sentiment Analysis and Machine Learning Algorithms To Determine Citizens' Perceptions

This document summarizes a research paper that compares different machine learning models for detecting diabetes early using medical record data. It introduces two approaches - the first uses data recovery and feature selection followed by a neural network classifier, achieving 85.15% accuracy. The second uses k-means clustering for noise reduction and feature selection before using random forest, logistic regression, and neural network classifiers, achieving a maximum of 77.08% accuracy. The authors conclude that the data recovery method combined with the neural network performed better than the k-means noise reduction approach.

Uploaded by

s.bahli

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views

Using Sentiment Analysis and Machine Learning Algorithms To Determine Citizens' Perceptions

Uploaded by

s.bahli

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Important Feature Selection & Accuracy Comparisons of Different

Machine Learning Models for Early Diabetes Detection

Sajratul Yakin Rubaiat Md Monibor Rahman Md.Kamrul Hasan
Computer Science and Engineering Department Electrical and Computer Mathematics, Statistics and
Patuakhali Science and Technology University Engineering Department Computer Science Department
Patuakhali, Bangladesh Old Dominion University,USA Marquette University, USA
Email: [email protected] Email: [email protected] Email: [email protected]

Abstract—More than 400 million people in the world have analysis still has room for improvement. Every hospital pos-
diabetes. High-risk factors of diabetic individuals vary dramat- sesses a different kind of necessary and medical information,
ically, and many patients suffer complications and avoidable and it is essential to extract useful information from these data
harm. Improving the identification level of high-risk factors
would help to reduce the rate of complications. To do this, to support future medical analysis and diagnosis [2, 3]. It is
it is essential to analyze a person’s medical record, detailed rational to believe that there are several valuable patterns and
health information that currently requires doctors and is manual, are waiting for researchers to examine them.
time-consuming, and subjective. In this work, we introduce As the number of the patient of diabetes patients is in-
an approach to automatically predict type 2 diabetes mellitus creasing, it is necessary to build a model that can classify
(T2DM) applying a neural network. The objective of this paper
is to find which type of model that works best for predicting patients with high risk of diabetes in the future. In the future,
diabetes. We used the Pima Indian Diabetes data-set in this the identified high-risk factors could potentially prevent more
analysis. The analysis was carried out on this database using cases of diabetes.
two methods. The first method includes Data Recovery followed
by feature selection. We input these features to the MLP neural II. PIMA I NDIAN D IABETES DATABASE
network classifier which achieved an accuracy of 85.15%. In
our second approach, we applied noise reduction based method Pima Indian data-set [2] is obtained originally from the
using k-means followed by feature selection. The features thus database of National Institute of Diabetes and Digestive and
obtained are used with Random Forest, Logistic Regression and Kidney Diseases of the United States. The objective of the
MLP neural network classifier. The maximum accuracy obtained data-set is to diagnostically predict whether or not a patient has
among these classifiers is 77.08%. The consultation shows why diabetes, based on specific diagnostic measurements included
Data recovery with MLP is far better than K-means based noise
reduction with the different type of classifier. in the data-set. Several constraints were placed on the selection
of these instances from a more extensive database. In partic-
I. I NTRODUCTION ular, all patients here are females from Pima Indian Heritage
who are at least 21 years old. The information consists of 768
Diabetes Mellitus (DM) is a significant public health prob- patients (268 instances of 1 and 500 cases of 0) coming from
lem that is approaching epidemic proportions globally [1]. It a population near Phoenix, Arizona, USA. 1 and 0 indicates
has notably increased in the 21 century. Diabetes is caused by whether the patient has diabetes or not, respectively. Each
several factors, including obesity, consumption of unhealthy instance is comprised of 8 attributes, which are all numeric.
food, heredity, etc. As of 2015, about 415 million people have The data-set consists of several medical predictor variables
had diabetes worldwide, and the trend suggests that the rate one target variable and the outcome. A number of predictor
will continue to rise. Diabetes has some serious long-term variables is available in the database, including number of
complications including cardiovascular disease, stroke, chronic times pregnant (preg), plasma glucose concentration at 2h in
kidney disease, foot ulcers, and damage to the eyes. For these an oral glucose tolerance test (plas), diastolic blood pressure
reasons, researchers need to put more focus on this problem. (pres), triceps skinfold thickness (skin), 2-h serum insulin
There are three kinds of diabetes. First, Type 1 DM which (insu), body mass index (bmi), diabetes pedigree function
is caused by a failure of the pancreas to produce sufficient (pedi), age, and class variable (class).
insulin. Second, Type 2 that is the most common DM and the
reason is identified as excessive body weight and insufficient III. R ELATED WORKS
exercise. Third, gestational diabetes which occurs in pregnant Artificial intelligence (AI) techniques are used today to
women with no prior history of diabetes. It is pointed out that enhance and improve our regular lifestyle. Use of AI tech-
Type 2 DM makes up about 90% of the cases. niques span modeling and analysis of hemoglobin level iden-
Data analysis has been successfully applied to various fields tification [3]–[7], activity detection [8], [9], pain level detec-
of human society, such as weather prognosis, market analysis, tion [10], [11], as well as prediction model to identify the high
engineering diagnosis, and customer relationship management. risk factors to prevent diabetic issues, including [12]–[15].
However, the utilization of disease prediction and medical data Kamer Kayaer et al. [12] used the PID dataset to evaluate the
perceptron-like general regression neural network (GRNN). affect the data analysis process and mislead the prediction
This study had 576 cases in the training set and 192 cases in results of the model. Model built with this data would be
the test set. Using 576 training instances, the sensitivity and misleading. There are different methods to preprocess the data.
specificity of their algorithm was 80.21% on the remaining 192 For example, we can delete the data from the data set, replace
instances. The same number of random training and test sets the missing data with the mean value, or replace the missing
was used to compare the simulation results. Dilip Kumar et al. data with the most likely value of this feature.
[13] used naive Bayes with the genetic algorithm to evaluate Pima Indian dataset is minimal, only 768 samples. There-
the perception. The accuracy and specificity of their algorithm fore, the model can end up being highly biased if the training
was 78.69% on the test set. Manjeevan S. et al. [14] used fuzzy data are deleted. So, removing observations from the training
min-max (FMM) neural network to evaluate the model and the set is not a good idea. Different person has a different level
accuracy of the algorithm was 78.39%. Hayashi, Y., & Yukita, of insulin level. If we transfer the high number of data with
S.(2016) [15] used Recursive-Rule extraction algorithm with the most likely value, then we may face the issue with high
J48graft combined with sampling selection techniques and get variance in the data. So, the best option is replacing the
83.83% accuracy. missing values with the mean value of that particular attribute.
As a part of preprocessing, we replace the missing data with
IV. M ETHODOLOGY
a NaN value at first. Here, NaN is used for replacing the
We introduce a rich analysis of the features available in the numerical missing value with a string. Afterward, we iterate
PIMA database to identify the risk factor of a diabetic. We de- over the column and find the sum of all the numerical values
velop two approaches leveraging machine learning algorithm, (NaN is a string value, so it does not add up here). Then, we
which to our knowledge has not been previously studied. We calculate the general mean by dividing total summation by
show that our method is able to both efficiently detect the risk the number of the entity in the column. Finally, we replace
factors as well as significantly outperform previous works on the NaN string with the numeric mean value.
(diabetic diseases) risk factor detection tool. Out first method
B. Feature selection
is accomplished applying neural network model. The second
process is developed based on the K-means algorithm. All the extracted features do not carry significant weight,
and some of them do not have any impact on the prediction
A. Neural network-based method model. Apply this kind of feature for training the model
We use a neural network model in this process. We have will only add up the additional computational power. Fig. 1
three different steps in this section such as data recovery, contains the Skin thickness graph of all the patient.
feature selection, and M.L.P. Classifier. As a first step, data
recovery techniques are applied by replacing the missing data
with the mean value for making the dataset complete for
building a model. Then, we do the feature selection process
which is done to find the features that have the most impact
on the risk factor identification. Lastly, a suitable number of
hyper-parameters are selected that works well for this data-set.
B. K-means-based method
We apply the K-means algorithm in this section after
selecting the features. We also use different machine learning
Fig. 1. Skin Thickness graph for ”Pima Indian data set”. Here X-axis contain
classifiers to compare the outcome. The k-means algorithm skin thickness, Y-axis Contain Number of patient that have diabetes positive(as
effectively reduces noise from the data. The output of the red) and diabetes negative(as blue).
k-means algorithm is used as a feature for the model. The
classification methods are applied to the selected features to The first block of the histogram contains a pretty much
see the result. same number of element from both classes (diabetes positive
or diabetes negative). And it still maintains this ratio when
V. A PPLYING DATA RECOVERY WITH NEURAL NETWORK the skin thickness is increased. So this cannot be an essential
MODEL feature for the model.
In this process, we apply data processing, feature selection, These data from Fig. 2 and Fig. 3 concludes that when
and machine learning algorithm sequentially. glucose and B.M.I. levels increase, the risk for diabetes rises
significantly. So it provides us with a useful linear property
A. Data recovery with the B.M.I or glucose level.
Pima Indian dataset contains various missing data in several Greedy Stepwise Search Algorithm to select the critical
features including blood pressure, insulin level, skin thickness, attributes. Greedy Stepwise Search Algorithm iterates through
blood pressure, BMI, and glucose levels. We observe zero each or set of the attribute to calculate which property gives
entries in 374 insulin, 227 skin thickness, 35 blood pressure, the minimum error. The step for the feature selection algorithm
11 BMI, and five glucose features since the missing data is as follows:
C. Multilayer Perceptron Classifier
A multilayer perceptron (MLP) is a class of feed-forward
artificial neural network. We use this algorithm because MLPs
are used in research for their ability to solve problems sarcasti-
cally, which often allows approximate solutions for extremely
complex issues like fitness approximation.
Learning occurs in the perception by changing connection
weights after each portion of data is processed, based on the
quantity of error in the output associated with the exacted
Fig. 2. B.M.I. feature graph from Pima Indian data set(Here X-axis contain
B.M.I And Y-axis contain the Number of patient that have diabetes positive(as result. This is an example of supervised learning and is carried
red) and diabetes negative(as blue). out through back-propagation, a generalization of the ”least
mean squares algorithm” in the linear perception. We represent
the error in output node j in the nth data point (training
example) by,
ej (n) = Yj (n) − a(n) (1)

Where Y is the target value, and a is the value produced by the

perception. The node weights are adjusted based on corrections
that minimize the error in the entire output, given by,

1X 2
Fig. 3. Glucose level graph for Pima Indian data set(Here X-axis contain (n) = e (2)
glucose level, Y-axis contain Number of patient that have diabetes positive(as
2 j j
red) and diabetes negative(as blue).
There are many hyper-parameters for MLP classifier such
as alpha, hidden-layer size, solver, learning-rate decay, etc. To
1) Pick a dictionary of feature h0 (x),...hD (x)
find the best model, the different combination of these hyper-
• e.g., polynomials for linear regression
parameters are tried randomly and iteratively. Firstly the model
2) Greedy heuristic: gets lower accuracy due to the high bias problem since it gives
i Start with empty set of feature F0 = Ø (or simple test and training accuracy pretty much same. There is some
set, like just h0 (x) → yi + ε ) solution for high bias which is given below:
ii Fit model using current feature set Fi to get wt • Build a bigger network
iii Select next best feature hj .(x) • Train for a longer period
• e.g., hj (x) resulting in lowest training error • Search for different NN (Neural network) architecture
where learning with Fi +hj (x)
The appropriate choice is option one and three because
iv Set Ft + 1 ⇐ Ft + hj .(x)
the training set is minimal so training for a longer time is
v Recurse
not a very effective process to remove high bias problem.
By analyzing the different graph and investing which feature Analyzing the Pima dataset, we found the values of these
affects the most, four features were taken. Those features are following parameters. We selected LBFGS as an optimizer (an
given below: optimizer in the family of quasi-Newton methods), alpha=1e-
• Glucose 5, and hidden layer sizes = (15, 7, 7, 3). In the hidden layer,
• B.M.I. the first layer has 15 node (neurons), second layer has seven
• Diabetes Pedigree Function neurons, third layer has seven neurons, and fourth layer has
• Age three neurons.
We apply logistic regression on these features, and we
TABLE I observed the accuracy level which is shown in Table II.
S AMPLE DATA RESULTING FROM APPLYING GREEDY STEP - WISE
ALGORITHM
TABLE II
Glucose BMI Diabetes Pedigree Function Age Outcome M.L.P. C LASSIFIER ACCURACY
148 33.6 0.62 50 1
85 26.6 0.35 31 0
183 23.3 0.67 32 1 Algorithm Accuracy
89 28.1 0.16 21 0 Training Set: M.L.P. Classifier 86.73%
137 43.1 2.28 33 1 Test Set: M.L.P. Classifier 85.15%
VI. A PPLYING K- MEANS WITH DIFFERENT MACHINE useful feature. The greedy stepwise search algorithm takes
LEARNING MODEL those set of feature that gives minimum error rate. Here are
A. K-means Algorithm the selected features.
• Pregnancies
Cluster analysis aims at partitioning the observations into
• Glucose
disparate clusters so that comments within the same group
• B.M.I.
are more closely related to each other than those assigned to
• Age
different clusters [14]. Fig. 4 shows the procedure of the K-
• Diabetes Pedigree Function
means algorithm, and the methods for the K-means Cluster
• Cluster(Output of k-means algorithm)
algorithm is given below:
C. Classifier
We apply many different kinds of the classifier to examine
which method works well. Different classifier like the decision
tree, MLP, and Logistic regression has a different approach to
evaluate the model. Table III contain the result of the different
classifier:

TABLE III
T HE RESULT OF THE DIFFERENT CLASSIFIER

Algorithm Accuracy
Logistic Regression 77.08%
M.L.P. Classifier 75.39%
Fig. 4. Visualizing the k-means algorithm for Pima Indian data-set. Random Forest 75.00%

• Show all objects (step a). Select K from provided N as

Logistic regression is most suitable for this kind of problem
the number of initial cluster center (step b). In Fig. 4b,
because its cost function is targeted to make (zero, one)
the value of K is 2.
classifier. For this reason, this works well compared to M.L.P.
• Calculate the distance between each object and cluster
and Random Forest.
center.[6] (step c).
• Recalculate every cluster center to verify whether they VII. P ERFORMANCE E VALUATION
are changed.
The difference between two proposed methods is that one is
• Circulate step 2 and step 3 until the new cluster center
using data recovery technique to eliminate noise and another
is the same as the original one, i.e., convergence and end
one is using K-means based noise reduction technique. The
of the algorithm (step e and f).
first method showed improved accuracy for the given dataset.
Fig. 5 shows the output of the K-means clustering algorithm Here, we observed that the data recovery with the mean value
applied on Pima Indian Data-set. more stable. But the second method which used K-means with
neural network did not improve much as the data-set is noisy
due to the different kinds of missing value. And therefore
applying k-means for clustering the feature does not affect
enhancing the efficiency of the classifier.
VIII. D ISCUSSION
A. M.L.P. classifier work
Multilayer Perceptron Classifier is a deep neural network
classifier. It cannot be determined at the beginning which
hyper-parameter like learning rate, batch size, optimizer; hid-
den layer size works best for the model. Only after analyzing
Fig. 5. k-means clustering result in Pima Indian data-set.(Blue for diabetes the errors like high bias or high variance, the model can
negative and red for diabetes positive)
be built by an iterative process that gives a low error rate.
K-means algorithm result can be used as an input feature In the beginning, M.L.P. classifier starts with hidden layer
which gives a good advantage in accuracy. size=(16,8,2). The model then gives the same training set
error and test set error. We found that the model is affected
B. Feature Selection by high bias problem. So it is required to make the model
The Greedy stepwise search is a feature selection algorithm big. After some iterative process and changing different hyper-
(Discuss in Section V(B)) that can be used for selecting the parameters, an accuracy of 85% is achieved.
B. Comparison with others experiments R EFERENCES
A comparison is made with work done by other researchers
[1] International diabetes federation (idf) diabetes atlas (seventh ed.).
using the same data set in Table 5 where the table shows the [2] R. Bellazzi and B. Zupan, “Predictive data mining in clinical medicine:
result of the different work done by other researchers. current issues and guidelines,” International journal of medical infor-
matics, vol. 77, no. 2, pp. 81–97, 2008.
TABLE IV [3] M. K. Hasan, M. Haque, N. Sakib, R. Love, and S. I. Ahamed,
C OMPARISON WITH OTHERS EXPERIMENTS “Smartphone-based human hemoglobin level measurement analyzing
pixel intensity of a fingertip video on different color spaces,” Smart
Health, vol. 5, pp. 26–39, 2018.
Method Accuracy Reference [4] M. K. Hasan, N. Sakib, J. Field, R. R. Love, and S. I. Ahamed, “A novel
GRNN 80.21% Kamer Kayaer [16] technique of noninvasive hemoglobin level measurement using hsv value
Naive Bayes 78.69% Dilip Kumar Choubey [17] of fingertip image,” in 2017 IEEE 41st Annual Computer Software and
FMM with neural network 78.39% Manjeevan seera [18] Applications Conference (COMPSAC). IEEE, 2017, pp. 221–229.
J48graft 83.83% Hayashi [19] [5] M. K. Hasan, N. Sakib, R. R. Love, and S. I. Ahamed, “Rgb pixel
analysis of fingertip video image captured from sickle cell patient
Hybrid model 84.5% Humar Kahramanli [20]
with low and high level of hemoglobin,” in Ubiquitous Computing,
MLP 81.9% Aliza Ahmad [21]
Electronics and Mobile Communication Conference (UEMCON), 2017
ELM 75.72% Rojalina Priyadarshini [22]
IEEE 8th Annual. IEEE, 2017, pp. 499–505.
Artificial bee colony 84.21% Beloufa [23]
[6] G. M. Ahsan, M. O. Gani, M. K. Hasan, S. I. Ahamed, W. Chu,
Swarm intelligence 82.03% Christopher [24] M. Adibuzzaman, and J. Field, “A novel real-time non-invasive
fuzzy rule 79.37% Lekkas [25] hemoglobin level detection using video images from smartphone cam-
K-means 77% Our approach era,” in 2017 IEEE 41st Annual Computer Software and Applications
MLP with Feature Selection 85.153% Our study Conference (COMPSAC). IEEE, 2017, pp. 967–972.
[7] M. K. Hasan, M. Haque, R. Adib, J. Tumpa, A. Begum, R. Love, Y. L.
Kim, and S. I. Ahamed, “Smarthelp: Smartphone-based hemoglobin
IX. CONCLUSIONS level prediction using an artificial neural network,” in AMIA Annual
Symposium Proceedings, vol. 2018. American Medical Informatics
This paper presents an analysis of two kinds of the predic- Association, 2018, p. 535.
tion model for diabetes mellitus and making the model adapt [8] F. Kawsar, M. K. Hasan, R. Love, and S. I. Ahamed, “A novel activity
detection system using plantar pressure sensors and smartphone,” in
to different data-sets. Using these two methods, we can train a Computer Software and Applications Conference (COMPSAC), 2015
model that can predict whether or not someone has diabetes at IEEE 39th Annual, vol. 1. IEEE, 2015, pp. 44–49.
a very early stage with the help of some features like Glucose [9] F. Kawsar, M. K. Hasan, T. Roushan, S. I. Ahamed, W. C. Chu, and
R. Love, “Activity detection using time-delay embedding in multi-modal
level, BMI, Age. sensor system,” in International Conference on Smart Homes and Health
The method incorporating k-means based noise reduction Telematics. Springer, 2016, pp. 489–499.
technique is easy to implement, but provides lower efficiency [10] M. K. Hasan, G. M. T. Ahsan, S. I. Ahamed, R. Love, and R. Salim,
and requires more computational power. Whereas, the process “Pain level detection from facial image captured by smartphone,”
Journal of Information Processing, vol. 24, no. 4, pp. 598–608, 2016.
that includes the data-recovery with neural network requires [11] K. H. Md, M. T. A. Golam, I. A. Sheikh, L. Rechard, and S. Reza, “Pain
less computation, but offers the higher efficiency of 85%.So, level detection from facial image captured by smartphone,” ł, vol. 57,
the method that is capable of data-recovery with neural net- no. 7, 2016.
[12] M. Gittens, R. King, C. Gittens, and A. Als, “Post-diagnosis manage-
work is more acceptable. ment of diabetes through a mobile health consultation application,” in e-
For future work, it is essential to bring in the hospitals real Health Networking, Applications and Services (Healthcom), 2014 IEEE
and latest patients data for continuous training and optimiza- 16th International Conference on. IEEE, 2014, pp. 152–157.
[13] D. K. Choubey and S. Paul, “Ga mlp nn: a hybrid intelligent system for
tion of our proposed model. The quantity of the data-set should diabetes disease diagnosis,” International Journal of Intelligent Systems
be large enough to train appropriately and predict with higher and Applications, vol. 8, no. 1, p. 49, 2016.
efficiency [26], [27]. [14] H. Wu, S. Yang, Z. Huang, J. He, and X. Wang, “Type 2 diabetes mel-
It is more useful and efficient for people to obtain an litus prediction model based on data mining,” Informatics in Medicine
Unlocked, vol. 10, pp. 100–107, 2018.
application about health management of DM on their mobile [15] K. Wagstaff, C. Cardie, S. Rogers, S. Schrödl et al., “Constrained k-
devices [28], [29]. An application can be developed that will means clustering with background knowledge,” in ICML, vol. 1, 2001,
provide rational and reasonable health advice to the high- pp. 577–584.
[16] K. Kayaer and T. Yıldırım, “Medical diagnosis on pima indian diabetes
risk group. Diabetes patients can be convinced to use this using general regression neural networks,” in Proceedings of the inter-
application to test their blood glucose level, blood pressure, national conference on artificial neural networks and neural information
and heart rate. processing (ICANN/ICONIP), 2003, pp. 181–184.
[17] D. K. Choubey, S. Paul, S. Kumar, and S. Kumar, “Classification of
ACKNOWLEDGEMENT pima indian diabetes dataset using naive bayes with genetic algorithm
as an attribute selection,” in Communication and Computing Systems:
We would like to show our gratitude to the Bangladeshi Proceedings of the International Conference on Communication and
Engineers and Scientists in the USA specially Raihan Computing System (ICCCS 2016), 2017, pp. 451–455.
Masud, S M Iftekharul Alam and Farzana Khalid for [18] M. Seera and C. P. Lim, “A hybrid intelligent system for medical data
classification,” Expert Systems with Applications, vol. 41, no. 5, pp.
mentoring this research project and reviewing our paper 2239–2249, 2014.
as part of their extensive support from Ankur Interna- [19] Y. Hayashi and S. Yukita, “Rule extraction using recursive-rule ex-
tional, Portland, USA(https://ankurintl.org/project/about-aus- traction algorithm with j48graft combined with sampling selection
techniques for the diagnosis of type 2 diabetes mellitus in the pima
scholarships/). Also thanks to Manish Sah for his excellent indian dataset,” Informatics in Medicine Unlocked, vol. 2, pp. 92–104,
suggestions. 2016.
[20] H. Kahramanli and N. Allahverdi, “Design of a hybrid system for the International Conference on Computer and Information Technology,
diabetes and heart diseases,” Expert systems with applications, vol. 35, 2015, pp. 379–385.
no. 1-2, pp. 82–89, 2008.
[21] A. Ahmad, A. Mustapha, E. D. Zahadi, N. Masah, and N. Y. Yahaya,
“Comparison between neural networks against decision tree in improv-
ing prediction accuracy for diabetes mellitus,” in Digital Information
Processing and Communications. Springer, 2011, pp. 537–545.
[22] R. Priyadarshini, N. Dash, and R. Mishra, “A novel approach to
predict diabetes mellitus using modified extreme learning machine,” in
Electronics and Communication Systems (ICECS), 2014 International
Conference on. IEEE, 2014, pp. 1–5.
[23] F. Beloufa and M. A. Chikh, “Design of fuzzy classifier for diabetes dis-
ease using modified artificial bee colony algorithm,” Computer methods
and programs in biomedicine, vol. 112, no. 1, pp. 92–103, 2013.
[24] J. J. Christopher, H. K. Nehemiah, and A. Kannan, “A swarm optimiza-
tion approach for clinical knowledge mining,” Computer methods and
programs in biomedicine, vol. 121, no. 3, pp. 137–148, 2015.
[25] S. Lekkas and L. Mikhailov, “Evolving fuzzy medical diagnosis of pima
indians diabetes and of dermatological diseases,” Artificial Intelligence
in Medicine, vol. 50, no. 2, pp. 117–126, 2010.
[26] H. Li, Q. Zhang, and K. Lu, “Integrating mobile sensing and social
network for personalized health-care application,” in Proceedings of the
30th Annual ACM Symposium on Applied Computing. ACM, 2015, pp.
527–534.
[27] Y. Luo, C. Ling, J. Schuurman, and R. Petrella, “Glucoguide: An
intelligent type-2 diabetes solution using data mining and mobile com-
puting,” in Data Mining Workshop (ICDMW), 2014 IEEE International
Conference on. IEEE, 2014, pp. 748–752.
[28] R. Schnall, M. Rojas, S. Bakken, W. Brown III, A. Carballo-Dieguez,
M. Carry, D. Gelaude, J. P. Mosley, and J. Travers, “A user-centered
model for designing consumer mobile health (mhealth) applications
(apps),” Journal of biomedical informatics, vol. 60, pp. 243–251, 2016.
[29] M. A. Basar, H. N. Alvi, G. N. Bokul, M. S. Khan, F. Anowar, M. N.
Huda, and K. A. Al Mamun, “A review on diabetes patient lifestyle
management using mobile application,” in proceeding of the 18th

Communication Plan Template and Example
100% (1)
Communication Plan Template and Example
5 pages
FORMAT of Internship Report - PADI619
No ratings yet
FORMAT of Internship Report - PADI619
4 pages
Efficient Binary Classifier For Prediction of Diabetes Using Data Preprocessing and Support Vector Machine
No ratings yet
Efficient Binary Classifier For Prediction of Diabetes Using Data Preprocessing and Support Vector Machine
2 pages
Bio-Inspired PSO For Improving Neural Based Diabetes Prediction System
No ratings yet
Bio-Inspired PSO For Improving Neural Based Diabetes Prediction System
21 pages
RPF
No ratings yet
RPF
8 pages
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
No ratings yet
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
19 pages
Sindh University
No ratings yet
Sindh University
10 pages
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
No ratings yet
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
4 pages
Literature survey paper on Comparative Analysis of Diabetics Prediction Systems using Machine Learning Algorithms
No ratings yet
Literature survey paper on Comparative Analysis of Diabetics Prediction Systems using Machine Learning Algorithms
4 pages
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
No ratings yet
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
11 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Article 2
No ratings yet
Article 2
9 pages
3. A novel hybrid deep learning model for early stage
No ratings yet
3. A novel hybrid deep learning model for early stage
23 pages
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
No ratings yet
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
14 pages
Improving support vector machine and backpropagation performance for diabetes mellitus classification
No ratings yet
Improving support vector machine and backpropagation performance for diabetes mellitus classification
10 pages
Automatic diabetes prediction with explainable machine learning techniques
No ratings yet
Automatic diabetes prediction with explainable machine learning techniques
8 pages
Report Diabetics
No ratings yet
Report Diabetics
8 pages
A Deep Learning Approach Based On Convolutional - 2020 - Computational Biology
No ratings yet
A Deep Learning Approach Based On Convolutional - 2020 - Computational Biology
10 pages
Nigercon Abuad IEEE 2024
No ratings yet
Nigercon Abuad IEEE 2024
5 pages
3 Journal
No ratings yet
3 Journal
9 pages
Dinesh Paper On Diabetes Mellitus (9%)
No ratings yet
Dinesh Paper On Diabetes Mellitus (9%)
8 pages
Early Diagnosis of Breast Cancer, Diabetes, And Hypertension
No ratings yet
Early Diagnosis of Breast Cancer, Diabetes, And Hypertension
4 pages
Prediction of Diabetes Using Artificial Neural Networks: A Review
No ratings yet
Prediction of Diabetes Using Artificial Neural Networks: A Review
6 pages
Predicting Diabetes Using Deep Learning Techniques: A Study On The Pima Dataset
No ratings yet
Predicting Diabetes Using Deep Learning Techniques: A Study On The Pima Dataset
15 pages
Eng p3 Heartdisease Forecasting-V7
No ratings yet
Eng p3 Heartdisease Forecasting-V7
25 pages
tdp_sem_3[2]
No ratings yet
tdp_sem_3[2]
9 pages
Diabetes Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Techniques
18 pages
An Effective Pre-Processing Techniques For Diabetes Mellitus Prediction in Healthcare Systems
No ratings yet
An Effective Pre-Processing Techniques For Diabetes Mellitus Prediction in Healthcare Systems
15 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
9 pages
Wissanu Naruemon Ic2it Manu
No ratings yet
Wissanu Naruemon Ic2it Manu
9 pages
Ijarcce 2020 9712
No ratings yet
Ijarcce 2020 9712
7 pages
A Transformer On Tabular Data Comparative Analysis With Linear and Tree Base Machine Learning Algorithm On Diabetic Dataset
No ratings yet
A Transformer On Tabular Data Comparative Analysis With Linear and Tree Base Machine Learning Algorithm On Diabetic Dataset
6 pages
INTRODUCTION
No ratings yet
INTRODUCTION
8 pages
Vai Bhav
No ratings yet
Vai Bhav
7 pages
Research Proposal
100% (1)
Research Proposal
13 pages
Assignment Bigdata
No ratings yet
Assignment Bigdata
17 pages
Ext_74513
No ratings yet
Ext_74513
10 pages
Type 2 Diabetes Mellitus Prediction Model Based On Data Mining
No ratings yet
Type 2 Diabetes Mellitus Prediction Model Based On Data Mining
8 pages
Springer Lecture Notes in Computer Science (1)
No ratings yet
Springer Lecture Notes in Computer Science (1)
11 pages
Prediction of Heart Disease Using A Hybrid Technique in Data Mining Classification
No ratings yet
Prediction of Heart Disease Using A Hybrid Technique in Data Mining Classification
3 pages
Research Proposal
No ratings yet
Research Proposal
8 pages
Diabetes 2010
No ratings yet
Diabetes 2010
4 pages
An Expert System For Diabetes Prediction Using AutoTuned Multi-Layer Perceptron 2017 Model
No ratings yet
An Expert System For Diabetes Prediction Using AutoTuned Multi-Layer Perceptron 2017 Model
7 pages
Early Detection of Type 2 Diabetes Mellitus Using Machine Learning-Based Prediction Models
No ratings yet
Early Detection of Type 2 Diabetes Mellitus Using Machine Learning-Based Prediction Models
12 pages
Machine Learning Approach For Predicting Heart and Diabetes Diseases Using Data-Driven Analysis
No ratings yet
Machine Learning Approach For Predicting Heart and Diabetes Diseases Using Data-Driven Analysis
8 pages
paper 1
No ratings yet
paper 1
9 pages
Diabetes Prediction Using Machine Learning Algorithms and Ontology
No ratings yet
Diabetes Prediction Using Machine Learning Algorithms and Ontology
19 pages
Prediction_of_Diabetes_Using_Deep_Learning
No ratings yet
Prediction_of_Diabetes_Using_Deep_Learning
2 pages
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
No ratings yet
AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes
22 pages
V5i9 0240
No ratings yet
V5i9 0240
4 pages
Intartif Review Assignment 1042 Article 2844
No ratings yet
Intartif Review Assignment 1042 Article 2844
7 pages
Epidemics vs. Pandemics (1)
No ratings yet
Epidemics vs. Pandemics (1)
15 pages
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
No ratings yet
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
16 pages
Healthcure Disease Detection - 1678257628
No ratings yet
Healthcure Disease Detection - 1678257628
6 pages
Prediction of Cardiovascular Disease Using Machine Learning Techniques
No ratings yet
Prediction of Cardiovascular Disease Using Machine Learning Techniques
6 pages
DiseasePredReport (3) (1)
No ratings yet
DiseasePredReport (3) (1)
42 pages
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE-Final
No ratings yet
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE-Final
6 pages
paper4
No ratings yet
paper4
5 pages
A Model For Early Prediction of Diabetes
No ratings yet
A Model For Early Prediction of Diabetes
6 pages
itmconf_icacc2022_03057
No ratings yet
itmconf_icacc2022_03057
6 pages
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE PAPER_12
No ratings yet
A MACHINE LEARNING APPROACH TO EARLY HEART DISEASE PAPER_12
6 pages
Advanced Analytics of Image Datasets in Human Health
From Everand
Advanced Analytics of Image Datasets in Human Health
Dr. Zemelak Goraga
No ratings yet
Manusceipt 03
No ratings yet
Manusceipt 03
17 pages
Access 3087647
No ratings yet
Access 3087647
12 pages
Stat126 Result
No ratings yet
Stat126 Result
1 page
Stat224 Result
No ratings yet
Stat224 Result
1 page
Introduction To JAVA Programming
No ratings yet
Introduction To JAVA Programming
9 pages
Storing and Indexing Massive RDF Data Sets: 1.1 Introduction: Different Perspectives On RDF Storage
No ratings yet
Storing and Indexing Massive RDF Data Sets: 1.1 Introduction: Different Perspectives On RDF Storage
32 pages
Designing Conducting and Gathering Information From Surveys
100% (3)
Designing Conducting and Gathering Information From Surveys
26 pages
Mean and Variance of Random Variables and Probability Distribution Discussion
No ratings yet
Mean and Variance of Random Variables and Probability Distribution Discussion
36 pages
Consumers Knowledge Attitude and Practices On Overthecounter Otc Medicine Use in Cebu City
No ratings yet
Consumers Knowledge Attitude and Practices On Overthecounter Otc Medicine Use in Cebu City
4 pages
8466 15925 1 SM
No ratings yet
8466 15925 1 SM
14 pages
Lovely Professional University, Punjab: Parameter Marks
No ratings yet
Lovely Professional University, Punjab: Parameter Marks
3 pages
Research in International Business and Finance: Khalil Jebran, Shihua Chen, Ruibin Zhang T
No ratings yet
Research in International Business and Finance: Khalil Jebran, Shihua Chen, Ruibin Zhang T
19 pages
Lab 6
No ratings yet
Lab 6
6 pages
College of PHD Prospectus A4 Size
No ratings yet
College of PHD Prospectus A4 Size
22 pages
Introduction To Research Methodology
No ratings yet
Introduction To Research Methodology
80 pages
SSG104
No ratings yet
SSG104
58 pages
Data Mining: Nicoleta ROGOVSCHI
No ratings yet
Data Mining: Nicoleta ROGOVSCHI
84 pages
Proceedings of The Third International Conference On Innovations in Computing Research (ICR'24)
No ratings yet
Proceedings of The Third International Conference On Innovations in Computing Research (ICR'24)
794 pages
ML Project List
No ratings yet
ML Project List
3 pages
CHAPTER VI Action Research
No ratings yet
CHAPTER VI Action Research
4 pages
Literature Review - Tip Sheet
No ratings yet
Literature Review - Tip Sheet
2 pages
A Good Thesis Statement For Bullying in Schools
100% (3)
A Good Thesis Statement For Bullying in Schools
4 pages
LT (Low Tension) Motors Market in India - Feedback OTS - 2014
No ratings yet
LT (Low Tension) Motors Market in India - Feedback OTS - 2014
6 pages
LARANGJO, DASMARINAS, GLANG & BAUTISTA, EDWIN. The School Head in School-Based Management (SBM)
0% (1)
LARANGJO, DASMARINAS, GLANG & BAUTISTA, EDWIN. The School Head in School-Based Management (SBM)
10 pages
DLL Research 1
No ratings yet
DLL Research 1
66 pages
Cambridge International AS & A Level: Mathematics 9709/51
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/51
12 pages
Answers To 10 Most Common Job Interview Questions
100% (1)
Answers To 10 Most Common Job Interview Questions
3 pages
Cover Letter
No ratings yet
Cover Letter
1 page
Basic Productvity Tools Lesson Plan
No ratings yet
Basic Productvity Tools Lesson Plan
2 pages
8170-1722918766531-Unit 02 Marketing Processes and Planning (V2) (1) (1)
No ratings yet
8170-1722918766531-Unit 02 Marketing Processes and Planning (V2) (1) (1)
14 pages
Soils of the Laurentian Great Lakes, USA and Canada James G. Bockheim All Chapters Instant Download
100% (4)
Soils of the Laurentian Great Lakes, USA and Canada James G. Bockheim All Chapters Instant Download
65 pages
Guidelines Biotech Park
No ratings yet
Guidelines Biotech Park
12 pages
Business Analytics Forecasting Assignt 3B
No ratings yet
Business Analytics Forecasting Assignt 3B
2 pages
Elsabet Getachew Thesis
No ratings yet
Elsabet Getachew Thesis
62 pages

Using Sentiment Analysis and Machine Learning Algorithms To Determine Citizens' Perceptions

Uploaded by

Using Sentiment Analysis and Machine Learning Algorithms To Determine Citizens' Perceptions

Uploaded by

Important Feature Selection & Accuracy Comparisons of Different

Machine Learning Models for Early Diabetes Detection

Where Y is the target value, and a is the value produced by the

• Show all objects (step a). Select K from provided N as

You might also like