0% found this document useful (0 votes)
102 views

Data Mining Techniques For Medical Data A Review PDF

This document provides a review of data mining techniques used for medical data. It discusses how data mining can be used to extract useful knowledge and patterns from large medical databases. Different data mining tasks that are commonly used for healthcare applications are described, including summarization, association, classification, clustering, trend analysis, and regression. The uniqueness of medical data is also discussed.

Uploaded by

Apurva Hagawane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Data Mining Techniques For Medical Data A Review PDF

This document provides a review of data mining techniques used for medical data. It discusses how data mining can be used to extract useful knowledge and patterns from large medical databases. Different data mining tasks that are commonly used for healthcare applications are described, including summarization, association, classification, clustering, trend analysis, and regression. The uniqueness of medical data is also discussed.

Uploaded by

Apurva Hagawane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/318130038

Data Mining Techniques for Medical Data: A Review

Conference Paper · November 2016


DOI: 10.1109/SCOPES.2016.7955586

CITATIONS READS

3 3,972

1 author:

Dr. Subhash Chandra Pandey


Birla Institute of Technology, Mesra
27 PUBLICATIONS   32 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Philosophical Subtlety of Mind, Machine and Intelligence View project

Do Artificially Intelligent Devices Really be Conscious? View project

All content following this page was uploaded by Dr. Subhash Chandra Pandey on 11 October 2017.

The user has requested enhancement of the downloaded file.


International conference on Signal Processing, Communication, Power and Embedded System (SCOPES)-2016

Data Mining Techniques for Medical Data: A Review


Subhash Chandra Pandey
Computer Science & Engineering Department
Birla Institute of Technology, Ranchi -Allahabad Campus
Naini, Allahabad (UP), India
[email protected]

Abstract— Data mining is an important area of research and is


pragmatically used in different domains like finance, clinical
research, education, healthcare etc. Further, the scope of data Raw data
mining have thoroughly been reviewed and surveyed by many
researchers pertaining to the domain of healthcare which is an Selection
active interdisciplinary area of research. In fact, the task of
knowledge extraction from the medical data is a challenging
Data Warehouse
endeavor and it is a complex task. The main motive of this review
paper is to give a review of data mining in the purview of
healthcare. Moreover, intertwining and interrelation of previous Preprocessing
researches have been presented in a novel manner. Furthermore,
merits and demerits of frequently used data mining techniques in Prepared Data
the domain of health care and medical data have been compared.
The use of different data mining tasks in health care is also
discussed. An analytical approach regarding the uniqueness of Data Mining
medical data in health care is also presented.

Keywords— Medical data;Data mining tasks;Data mining Pattern


applications on medical data style; Data mining techniques; Interpretation Evaluation
Uniqueness of medical data

I. INTRODUCTION Knowledge
Medical data means databases that stores healthcare
information, like patient’s records. With the development of Fig. 1: Role of data mining in knowledge discovery process
Information Technology, lots of such medical data are stored
in electronic forms. These databases contain large volume of nowadays we have lots of data available in our databases for
data. Medical data is available from different sources for this purpose. However, the knowledge that is extracted from it
example; X-ray, computed tomography scans (CT), magnetic is nearly negligible. Thus, effective organization, analysis and
resonance images (MRI), ultrasound, etc. Thus, the increase in interpretation of data are of the paramount importance so that
the volume of data and the databases required to store the tangible extraction of knowledge could become possible. In
digitized data has increased exponentially [1]. Further, raw fact, different computational techniques are required to
medical data is usually huge and dissimilar in nature and it manage these large databases of medical data to discover
may be collected from different sources like, images, useful patterns and hidden knowledge from them [4]. Often in
interviews with the patient, laboratory data, and the data mining process we analyze enormous and large
physician’s observations and evaluations [2]. Medical data are observational datasets and subsequently extract the useful
of the various types. It can be in the form of images, datasets, hidden patterns for the purpose of data classification. Today,
signals, wavelengths etc. In present scenario, due to researches data mining has also started its tryst with healthcare and
and development in the field of information gathering tools, medical data. It is because of the fact that there is dire need of
we can witness huge amount of information or data available efficient techniques for detecting unknown and valuable
in electronic format. It is obvious that to store such a large hidden information from medical data [6] so that complex
amount of data or information the sizes of databases also interrelation among the patients, their medical conditions, and
increase substantially [3]. treatments can be analyzed in a lucid manner [7]. The use of
Medical data are available in hundreds of public and data mining in healthcare and medical field is pervasive and it
private databases, which has only been possible by novel has many applications like, detection of fraud in health
database technologies and the Internet [4]. It has been insurance, providing better medical solutions to patients at a
estimated that healthcare industry may generate terabytes of lower cost, detection and causes of diseases, and identification
data every year [5]. Actually, the job of extracting useful of efficient medical treatments methods. Indeed, data mining
information for quality healthcare is tricky and important and is a core process of a broader prospect known as the
Birla Institute of Technology, Mesra, Ranchi (Allahabad Campus).
knowledge discovery. The inter-relation between the data Further, the classifier is trained with the help of training data
mining and knowledge discovery is shown in the Figure 1. set and subsequently the correctness of the classifier is tested
on test dataset. The classification task of data mining is
generally used in healthcare industries [6]. The classification
II. DATA MINING TASKS AND ITS USE IN HEALTHCARE task is often used to predict the treatment cost of different
There are different data mining models varying from one disease [11].
application domain to another. However, it can be broadly (iv)Clustering: There is subtle difference between
categorized in two groups. Namely: Predictive Model and classification and clustering. Classification is a supervised
Descriptive Model. Some important data mining tasks learning whereas clustering is an unsupervised learning
pertaining to medical and healthcare domain are enumerated method. Classification has the information of the class leveled
below. but in clustering the information regarding the class leveled is
not known. In clustering similar data are placed in the same
cluster and dissimilar data are placed in some other cluster
Summarization [12]. Clustering needs very less or no information for
Association partitioning the data. The drawback of clustering is that first
Classification we have to identify the clusters and then assign a new instance
Clustering to the clusters [13].
Trend analysis (v) Trend analysis: We can observe a lot of time dependent
Regression. data in literature. In different walks of life such that: sales of a
(i) Summarization: In summarization, the set of data is company, credit card transactions of a customer, and stock
abstracted that results into a smaller set of data which gives us prices are all time series data. Such data can be viewed as
a general overall review of the data. Thus, summarization is objects with a ‘time’ attribute. It is interesting to find patterns
the abstraction or generalization of the data. Summarization and regularities in the data along the dimension of time. Trend
can be done till many levels of abstraction and it can be analysis discovers these interesting patterns [9].
(vi) Regression: Regression is learning a function which can
viewed from different perspectives. For example, rather than map a data item to a real – valued prediction variable [14].
looking at the details of the call, it can be summarized into Indeed, regression establishes a relationship between unknown
duration of the call, number of call, and cost incurred during and independent estimated variable and known dependent
the call. In the same way, calls can also be summarized on the variable. Regression is a widely used technique for prediction
basis of national calls or international calls. These
A. Data Mining for Healthcare
combinations of different levels of abstraction tell us about the
various patterns and regularities present in the data [8]. In healthcare industries dependence on data is increasing
(ii) Association: Association is looking for togetherness or day by day [15]. In medical science, diagnosis of any disease
connection of objects in large databases. Such kind of and treatment of patients is the most important task. In recent
connection is known as association rule. An association days, doctor’s hand written notes have been converted to
reveals relationships existing among objects. Its main purpose electronic records with an aim of reducing cost incurred
is to find interesting correlations existing among the objects, during treatment and improves efficiency of the treatment
i.e., existence of a set of objects in some other object [9]. [16].
Association rules are usually used in marketing, commodity Data mining applications in healthcare can be further
management, advertising, etc. From these association rules divided into following categories:
associations and patterns are extracted that exist among a. Diagnosis and prediction of diseases – When it comes to
various attributes. Indeed, association based data mining aims healthcare industries, diagnosis and prognosis of diseases is
to find associations between attributes and then generate rules very important [17], it is one of the most important purpose of
from those data sets [10]. For example, an association rule that using data mining for healthcare. Use of data mining for
“call waiting” is associated with “call display”, says if a healthcare has helped doctor’s to improve the health services
customer is subscribed to the “call waiting” service, that provided by them [15]. One cannot waste time and money by
customer is very likely to subscribe to “call display” service as choosing some incorrect treatment for a patient, which can
well. also harm patient’s health [18].
(iii) Classification: Classification divides data sets into target b. Ranking of various hospitals – Data mining techniques are
classes. Classification techniques predict the target classes for used to study all the details of various hospitals in order to
each of the data instance present. For example, using rank them [19]. Organizations rank various hospitals on the
classification techniques a patient can be classified into “high basis of their capability to handle patients with serious illness,
risk” or “low risk” on the basis of their disease patterns. In this i.e., hospitals with a higher rank are more suitable for handling
approach the classes are known and thus it is a kind of high–risk patients, as it is their highest priority whereas this is
supervised learning. There are two methods of classification not the case in lower ranked hospitals because they do not
task. These are: binary and multilevel. In classification task even consider the risk factor.
the dataset is divided into training and testing data sets. c. Better treatment techniques – With the help of data mining
techniques, both the doctor and patient can choose the best
treatment option by comparing among all the treatment
techniques. They can select the best treatment techniques both Data Mining Tasks
in terms of effectiveness and cost. Through data mining they
can also find out the side effects of various treatments and thus
decreases risk to patients [6].
d. Effective treatments– By comparing factors like causes,  Association
symptoms, side effects, and cost of treatments data mining is  Classification
used to analyze the effectiveness of treatments. For example,  Clustering
 Trend Analysis
one can compare the results of treatments of different patients  Regression
which were suffering from the same disease but were treated  Summarization
with different drugs. In this way, we can find which treatment
is effective in terms of the patient’s health and cost [20].
e. Better quality services provided to patients– With the
advancement in technology, we already have voluminous data Data Mining Applications in Health Care
stored in digitized form. Data mining when applied on this
huge medical data can help us in extracting many of the
interesting unknown patterns. With the help of these patterns  Hospital Resource Management
we can improve the quality of services and care provided to  Fraud Deduction
patients. Data mining also helps in knowing patients needs and  Identify High Risk Patients
more of their requirements so that they can be better treated  Infection Control
[6]. Milley has also stated that data mining can help in  Better Services
 Effective Treatments
analyzing specific patient’s needs in order to enhance services  Better Treatment Techniques
provided by healthcare organizations [21].  Diagnosis of Disease
f. Infection control in hospitals– Hospital infections affects  Medical Device Industry
millions of patients every year and the number of infections
which are drug resistant is really high [22]. Inspection for
infection is done through data mining to identify some Fig. 2: Data mining tasks and applications in healthcare
irregular patterns in the data of infection control [15]. For
aspect of mobile healthcare applications which provides a safe
infection control, these patterns are further studied by a
method for studying important signs of patients [29].
knowledgeable person. Such a surveillance system that uses
data mining techniques for discovering unknown patterns in Ultimately, the success of data mining in healthcare totally
infection control data was implemented at the University of depends on the availability of clean and organized healthcare
data. Thus, the healthcare industries must look into this factor
Alabama [23].
as well, i.e., how to capture and store data so that it could be
g. Identifying high risk patients–American Health ways helps
properly mined subsequently [30]. The applications of data
hospitals with diabetes disease management services to
mining techniques in healthcare along with various data
improve the quality and reduce the cost of diabetic patients.
To differentiate between high–risk and low–risk patients, mining tasks are diagrammatically shown in the Figure 2.
American Health ways used predictive modeling technique.
Using predictive modeling technique, high–risk patients who
needed more concern regarding their health were identified by III. KNOWLEDGE MANAGEMENT AND DATA MINING IN
the healthcare providers [24]. HEALTHCARE
h. Reduction in insurance fraud and abuse–Healthcare insurer
constructs a model to identify unusual patterns of claims by Medical healthcare has been recently gaining increasing
patients, physicians, hospitals, etc [25]. In 1998, Texas attention and popularity. Due to advances in technologies like
Medicaid Fraud and Abuse Detection System saved million molecular, biomedical techniques, medical imaging, and
dollars by detecting fraud and abuse through data mining medical records of patients, large amount of medical data is
techniques [26]. generated every day. From clinical practices to individual
i. Proper hospital resources management – Management of research, these medical data is being stored in hundreds of
hospital resources is an important task in healthcare industries. private as well as public databases after the digitization of
Data mining constructs a model for managing hospital medical information like patient records, lab reports etc.
resources. Group Health Cooperative uses data mining and Today, the rate of data accumulation is much faster than the
provides services to hospitals at a lower cost [27]. Blue Cross rate of data extraction. Thus, this data needs to be well
manages diseases efficiently by reducing the cost and organized and stored in order to be useful. New information
improving the outputs with the help of data mining [28]. technology techniques are required to handle these large data
j. Medical device industry – Without medical devices, repositories of medical data and to extract useful patterns from
healthcare industry could not exist. Mobile communications it. Basically, knowledge management and data mining have
and inexpensive wireless bio-sensors are the most important been adopted in various medical domains in recent years.
In the 20th century, management along with
psychology and cognitive sciences led to the evolution of Feature Set
knowledge management [31]. The term ‘knowledge
management’ came into existence in 80s and the academic Searching
discipline was developed in 1995 [32]. Indeed, knowledge
management is the managerial approach to collect, manage, Subset
use, analyze, share, and discover the knowledge in order to
maximize the performance [33]. There is no definition for Evaluating the Subset
what constitutes knowledge, but it is something abstract and
No
inferential and is needed to support hypothesis generation and Selection Criteria
decision making. Recently researchers have done studies
which showed that knowledge management has good effects Yes
on organizational and operational performance [34, 35]. A
knowledge management model proposed in [36] gave Subset Feature
substantial information regarding the healthcare industries and
it said that the knowledge management processes lead to better Fig. 3: Process of feature selection.
organization learning and decision making which in turn leads
to better organization performance. Knowledge management patterns that are not known to the system and the users [42,
methodologies and techniques have been used to support 33]. In biomedical data mining, patient data should not be
storing, retrieving, sharing and management of data to make it ‘individually identifiable’, i.e., no record should give
explicit to biomedical knowledge. It is used in both scientific sufficient data about the patient so that no one can identify the
and business domains recently. There are many goals and patient [2].
challenges for knowledge management in companies. This is
due to the following reasons; knowledge management could
increase their performance, evaluate risks, help in developing IV. DATA MINING TECHNIQUES FOR HEALTHCARE
partnerships, organize the management as well as enhance
their economic value [37]. There are some criticisms also for
knowledge management given by T.D. Wilson, [38]. Data mining uses various techniques for mining medical
However, knowledge management could succumb these data. In fact, data mining techniques are used for feature
criticisms mainly because of the fact that companies and selection. Feature Selection can be described as the process of
organization really need knowledge management. selecting a minimum subset of features which are actually
Methods and techniques in knowledge management essential for classification. The feature set may be redundant
can be categorized into three sections: people and technology, and it may decrease the efficiency. Feature selection is a
requirements elicitation, and measurement of value. Today problem in the field of medical diagnosis [43]. The feature
frameworks take humans as well as technical perspectives into subset generation is also known as data reduction that is a step
account. When we talk about human perspectives: it is about in data preprocessing [44]. Further, feature selection
motivation and adoption. The employees are motivated either minimizes the number of essential features required for
by giving financial or non–financial incentives in order to use maximizing the accuracy of the model. It helps in reducing the
knowledge management, not only for the sake of technology space required by the feature set.
but also because it would affect the company. In [39], it is
suggested that apart from giving incentives there should be a It also removes the redundant noise that might be present
win-win system, both for the employee as well as the company in the feature set and thus it increases the efficiency of the data
and not a win-lose reward system. Other issue related to mining algorithm [45]. The objective of feature selection is to
knowledge motivation was knowledge adoption; since people produce cost effective and efficient model [46].
were not ready to use knowledge management. In [40], a Fig. 3 shows complete process of the feature
model is proposed which discussed about issues of knowledge selection. It mainly consists of four stages: subset formation,
adoption. Indeed, data mining is a core step of a broader evaluation of the subset, a selection criterion which is used as
prospect known as knowledge discovery and it is used in stopping criteria, and the final subset feature [44]. In the first
different domain e.g.; to discover different biological, drug step the feature set is searched after eliminating some
and patient care knowledge. It is also used for statistical inconsistencies like null values etc and redundancies that are
analysis of the patterns. Perhaps, data mining is frequently present. Then the process of subset generation starts after
used technique in medicine [27]. The basic objective of data searching the feature set. Subsequently, attribute evaluator
mining is to analyze a set of raw data or data and to identify evaluates the subset generated [47]. The phase of subset
and extract novel and useful patterns [41]. Various data generation and evaluation continues until the
mining techniques such as neural networks, decision trees, selection/stopping criteria are fulfilled. Only after that the final
fuzzy sets, support vector machines, bayesian networks and subset feature set is selected.
genetic algorithms are used to discover knowledge and
A. Neural Networks C. Fuzzy Sets
Neural networks were developed in the early days of the 20th Fuzzy sets and fuzzy logic are the best methodology used
century [48]. Neural networks are used in medicines as one of in data mining that is generally used for representing and
the most popular data modeling algorithm. Before the processing uncertainty. It is one of the best methods to deal
invention of decision trees and Support Vector Machine, with imperfect and noisy data [51]. This fuzzy set theory was
neural networks were the best classification algorithm introduced by Zadeh [59], which helps us in handling vague
[49].The main objective of using neural networks is for pattern data. Fuzzy sets and fuzzy logic are needed to implement the
recognition and performing the tasks of classification [50]. proposed expert system. With the help of fuzzy logic we can
The neural network system is modeled like a human brain. calculate the probability of any particular case to fall in any
The human brain consists of millions of interconnected cluster and after that based on the value, decisions can be
made [60].
neurons. In a similar way, the neural network is an
interconnection of artificial neurons and each connection has
D. Support Vector Machine (SVM)
associated weight. By adjusting the weights, due to its
adaptive nature it helps in minimizing the error [3].These The concept of SVM was proposed first time in. [61-62]. It
neurons work together in parallel to produce the output provides the most accurate results in comparison to all the
function. In the learning phase the network will learn by other algorithms. It is a classification technique and it works
adjusting the weights to predict the correct class label of the on the basis of statistical learning theory [62-63]. For various
input. Neural Networks have added advantage because they kernels, SVM has been used as a universal approximator [64].
can predict nonlinear relationship unlike simple modeling The subset of the learning data is called support vector and
methods [51]. Neural networks play an important role in with the help of this the support vector machines is defined.
analysis of medical data. Applications of neural networks in Absence of local minima is one of the main features of SVM.
this field consists tissue classification, disease prediction and The SVM model is a representation of the training data and
drug development. Prediction of heart diseases can be done with the help of support vectors one can extract the condensed
with the help of a neural network [52]. There are a few data set [65]. SVM finds an optimal separating hyper-plane
architectures of neural networks which are enumerated below: which maximizes the margin between the examples of two
i. Multi Layer Neural Network (MLNN): This type of neural different classes. SVM was developed for problems related to
networks use hidden layers with the help of which it solves the binary classification but then it can easily be extended to
classification problem for non linear sets [53]. These hidden problems related to multiclass problems. This is one of the
layers are usually interpreted as hyper-planes. This kind of most important reasons for SVM to gain popularity [66-67]. In
neural networks is used for classifying different categories of a binary classification task, such as predicting ICU mortality,
data. the hyper-plane is the division between two outputs. To be
ii. Polynomial Neural Network (PNN): Polynomial neural useful for tasks it can create single as well as multiple hyper-
networks have neurons like units as multilayer perceptrons planes. There are two methods for implementing SVM’s. The
which produce multivariate polynomial mappings. first method involves mathematical programming and the
second method employs kernel functions. The main task of
B. Decision Tree using hyper-planes is that it will maximize the separation
A decision tree is one which has terminal and non-terminal between data points [3]. In noisy data, error is minimized by
nodes. Each non-terminal node represents a test or condition maximizing the margin between the examples of two different
on a data item. Decision trees classify the instances by sorting classes and the hyper-plane is defined as the center line of the
them down from the non-terminal to the terminal nodes [54]. separating space. There are two types of SVMs. The first one
The output that which branch will be selected completely is Linear SVMs which separates the data points with the help
depends on the outcome of the test. For example, we have a of a linear decision boundary. It performs well on the datasets
decision tree for medical readmission. With the help of this that can easily be separated into two parts. But sometimes
tree we can decide whether a patient needs readmission or not complex datasets are difficult to classify with the help of a
[3]. Decision trees basically create a visual representation of
linear kernel for which the second kind of SVMs is used i.e.,
various pros and cons and potential values of each option [55].
Non–linear SVMs which separates the datasets with the help
Decision trees are commonly used for calculating conditional
probabilities in operations research analysis [56]. Best of non linear decision boundary. It is the most powerful
alternatives can be chosen with the help of decision trees and algorithm as it can obtain maximal generalization when
based on maximum information gain the traversal from root to predicting the classification of data [45]. The SVM shows
leaf node indicates unique class separation [57]. In some other accuracy in binary classification problems like valve
applications of data mining, like in marketing, the accuracy of classification/heart beat etc [68-70].
a prediction could be all that they need. It may not be E. Bayesian Networks
important to know about the working of the model. For
example, when a marketing professional wants to launch a Bayesian network is a specific type of network which
marketing campaign, he would require the overall descriptions represents knowledge about uncertain domain. It belongs to
of customer segments. For these types of applications, the the domain of probabilistic graphical models (GMs). In
decision tree algorithm is very suitable [58]. Bayesian network nodes represent the variables and various
edges represent probabilistic dependencies among those
variables [71-73]. Bayesian network specifies two types of existence. Machine learning includes many methods, but we
information for each variable [74]. can broadly classify them as symbolic and sub-symbolic based
on the nature of manipulation while learning [78]. When we
F. Rough Set
talk about symbolic learning method, knowledge required and
The concept of rough sets theory is similar to the concept the level of inference performed are different, like in decision
of fuzzy sets theory. The only difference is that in this theory trees [79]. On the other hand genetic algorithms [80] and
the uncertainty is described as a boundary region of a set. artificial neural networks [81] are examples of sub-symbolic
Every subset that is defined through upper and lower methods of classification.
approximations is called a rough set. This definition also When we talk about machine learning methods in
needs mathematical concepts since it is defined by topological healthcare domain, these techniques and tools can help in
operations known as approximations. They are usually diagnosis and prognosis of diseases, prediction of disease
combined with other methods such as classification, clustering progression, or extraction of medical knowledge. Symbolic
[51]. classification like inductive learning is used to add learning
G. Genetic Algorithm and knowledge management to expert systems [82]. Machine
learning tools help us in handling few characteristic features of
The genetic algorithm is a search and optimization medical domain like missing values, random noise or only few
techniques which is based on genetics and selection. Genetic patient records available [83]. Sub-symbolic learning methods
algorithms are basically used in neural sets which act as a
like neural networks help in improving the decision making
guide for the learning process of data mining algorithms rather
because they are able to handle these datasets [84]. A major
than for finding patterns. They are also used in the form of
application in medical diagnosis is to interpret the medical
association rules or some other formalism in data mining to
image which provides significant assistance [85]. Indeed, as
formulate hypothesis about variables and dependencies among the healthcare domains is becoming more and more reliant on
them. The basic idea of genetic algorithm is that we can obtain computer systems, machine learning methods can substantially
a much better solution by combining the good parts of other
help the physician’s in many cases and enable diagnosis in
solutions which is said in schemata theory, in a way like
real time.
nature does by combining the DNAs of living creatures [75].
Apart from making medical decisions, machine
In a genetic algorithm there is a population that is composed learning improves the efficiency and quality of medical
of many individuals which evolve under specific selection decision making systems [86]. Issues like how well a medical
rules to a state where fitness is maximized [76]. Initially a
expert can understand and use the results obtained from a
population of rules is created at random, each rule
system depend considerably on machine learning methods
representing a solution to the problem. Then pairs of rules are
used. Many researchers worked on medical expert systems for
selected as parents which are usually the strongest rules and
ECG diagnosis by implementing machine learning techniques
these pairs of rules are then combined to produce offspring to improve the knowledge of the medical expert system.
[77]. A genetic algorithm basically consists of three operators,
namely, selection, crossover and mutation. In selection, on the
basis of fitness a suitable string is selected for breeding a new
generation, then crossover combines these suitable good VI. UNIQUENESS OF DATA MINING IN HEALTHCARE
strings to produce better offspring, mutation then alters a
string locally so that the genetic diversity is maintained from In this section, we will render the unique features of
one generation of a population to another. In every generation medical data mining to make the expert system dealing with
the population is evaluated for the termination of the healthcare more constraint free specifically while mining the
algorithm, if the termination criteria are not satisfied it again is large heterogeneous medical data because medical data itself
operated by the three operators and then again it is evaluated. is very rewarding and difficult to mine in comparison to other
datasets. The medical datasets are huge and contain large
V. MACHINE LEARNING METHODS IN HEALTHCARE amount of medical information. At the same time, medical
data also possess distinct legal, ethical, and social constraints
[2]. Precisely, there are four main points that should be
There is plethora of research in machine learning domain discussed regarding the uniqueness of medical data.
and it is mostly application driven. Machine learning
researches are widely used in healthcare domain. Machine i. Medical data is heterogeneous in nature: As we already
learning methods are able to identify areas in which an know raw medical data is voluminous and heterogeneous. It
increase in research would lead to advances. In conditions may be collected from various sources like images,
where algorithmic solutions are not present and there is lack of physician’s observations, interviews with patients, laboratory
formal codes or there is poor definition of knowledge about data. All these help in diagnosis and prognosis of diseases and
the application domain, machine learning methods come into
TABLE 1. ADVANTAGES AND DISADVANTAGES OF DIFFERENT TECHNIQUES USED IN HEALTHCARE

S. No. Name of the Advantages Disadvantages


Technique
1. It is able to handle noisy data properly for training. 1. It does not work well with hundreds or thousands of
2. It is capable of producing complex relationships between input input features and even it does not work well for
1. Neural Networks and output. It can analyze and organize data based on its own complex problems.
features without any external help.
3. Various neural networks can be used for clustering and prototype 2. Local minima.
creation.
3. Over fitting.

4. It is difficult to understand the model built by neural


network and requires high processing time.

1. It can handle all types of variables, variables with missing values 1. For numeric dataset, it generates complex decision
as well and it is easy to interpret. trees.
2. Decision Trees 2. For constructing decision trees one does not need to know about 2. It is an unstable classifier, i.e., performance of a
the domain. Even it can handle numerical and categorical data. classifier depends on the dataset.
3. It can process high dimension data easily and it minimizes 3. It is restricted to one output attribute and generates
ambiguity of complex decisions and assigns exact values to the categorical data.
outputs. 4. Performance of decision trees is not affected by co-
linearity and linear-separability problems.

1. Unsupervised approach. 1. Larger computational time.


2. Converges approach. 2. Sensitivity to speed, local minima.
3. Fuzzy sets
3.Sensitivity to noise, and one expects zero or low noise
level.
1. Provides better accuracy in comparison to other classifiers and it is 1. It gives poor performances when the number of
effective in high dimensional spaces. features is much greater than the number of
4. Support Vector 2. It is effective in cases where the number of dimensions is greater samples.
than the number of samples. 2. It is computationally expensive and even the training
3. It easily handles complex non linear data points and over fitting is process takes more than in comparison to other
Machines not a problem like in other cases. methods.
4. It is memory efficient because it uses a subset of training sets in 3. Selection of right kernel function is a problem
support vectors. because for every dataset different kernel function
shows different results.
5. It is versatile because different kernel functions can be specified
for the decision functions. 4. SVM was developed to solve the problems of binary
class. Thus, it solves problem of multi class by
breaking it into pair of two classes.
5. It does not provide probability estimates directly.
These are calculated using an expansive five – fold
cross validation.
1. It is fast and accurate for huge datasets as well. 1. In some cases, where there is dependency among
2. It makes computations easier. variables, it does not gives accurate results.
5. Bayesian
Networks
1. It does not need any additional knowledge about data like 1. Some new discretization methods are required for
probability in statistics. quantitative attributes. Even more research is needed
6. Rough Sets 2. Identifies relationships that would not be easily found using in this field.
statistical methods. 2. Studies of new approach to missing data are also
3. From data it produces sets of decision rules. needed.
1. Here the fitness function is a flexible expression of modeling 1. Finding fitness function is critical.
criteria.
7. Genetic
Algorithms
thus are very important in nature and cannot be ignored. One Second point of discussion is fear of lawsuits. This is
of the areas in the heterogeneity of medical data is the volume another unique feature in the mining of medical data and it
and complexity of medical data. It is worth to mention here restricts the health care providers and physicians. Medical care
that the heterogeneity is in the sense that we have data in in some places is expensive than other places. Because of this,
numeric as well as images form. Further, the huge medical physicians and other producers of medical data are reluctant to
data requires lots of storage space and needs new tools to handle their medical data to mining experts for the mining
analyze the data. In fact, un-stored and un-organized data are purposes which in turn cause untoward events.
considered less pragmatic in healthcare domain. Security and privacy is the third unique feature
The second area in the heterogeneity of medical data concerned with human data. In different countries, guidelines
is the importance of physician’s observations. It may be in the are set by government agencies for concealment of patient
form of images, signals and is usually written in English and is identification. This renders the patients to be frank with their
difficult to standardize and mine. physicians. Moreover, patients are assured that their personal
Even experts from the same field find it difficult to data would not be made public. Another issue is data security
understand because of reasons like different grammatical or rather data handling and data transfer. Since the data is
constructs used for describing relations between medical transferred electronically it is insecure. It has been noted in US
entities or different names used for same disease. It is said that federal documents [95-97], that there are two research needs
a part of the solution for the processing of physician’s for re-identification of de-identified medical data which are
interpretation may be held by the computer translation [87- important in nature. Some important cases of this domain are:
89]. Firstly, accidental duplicate records of the same patient should
be prevented. Secondly, there might be a need to refer to re-
The third area in the heterogeneity of medical data is the identify the records to verify patient data or to obtain some
specificity analysis and sensitivity – almost all diagnoses and additional information.
finding of effective treatments in medicine have some
associated errors and it is not easy to measure which Next is the theory of expected benefits. Patient’s data in
specificity analysis and sensitivity should used. public databases cannot be mined without justifying that this
will creates some obvious benefits to the society otherwise one
For understanding the concept of specificity and cannot perform data analysis legally and ethically. US federal
sensitivity we should first understand what a test is. A test is guidelines specify a number of administrative policies for
basically one of those values that are used to characterize the patient privacy that would not be required for non– medical
condition of a patient. Sensitivity measures how many times data mining [98].
you find what you are looking for. Specificity measures how
many times what you find is what you are looking for. iii. Statistical philosophy: Data mining methods, especially
The fourth area in the heterogeneity of medical data is statistics may be different for medical data. Primarily,
due to the poor mathematical characterization of medical data. medicine is a patient – care entity and secondly it is also used
Moreover, another unique feature of mining medical data is as a research resource. Generally, justification of patient’s
that the underlying structures of medical data are poorly benefits is given before collection or rejection of medical data.
characterized and less emphasized mathematically in Therefore, to reduce such complexities; statistical philosophy
comparison to other fields of science. Medicine has no formal in medicine is incorporated.. When classical statistical tests
structure into which information can be organized by a data are designed, rules are set up in advance on the basis of the
miner. Perhaps, the main reason of heterogeneity in medical idea that the experiment would be repeated. So, we cannot
data is difference in the logic of medicine from the logic of change rules in the middle of the experiment otherwise it
physical sciences [90-93]. would lead to meaningless formulas and distributions. Thus,
classical statistical tests in medicine may lead to ambiguous
ii. Legal, ethical and social issues: Medical data is basically results. If one’s mind changes during the investigation, then
patient’s data. So any misuse of medical data would lead to interpretation of the data will be polluted even if the observed
patient’s abuse. Thus, there is a large ethical and legal values are not changed. Suppose we are taking a neural
traditions designed to prevent misuse of medical data. network to examine the dataset then different training
A point of discussion under legal, ethical and social strategies will produce different outputs [94]. Here, it is
issues is ‘ownership of data’. Theoretically, ownership is defeating to conceal a subset of cases from the training data
entitlement to sell an item of property [94]. The question of set. The second point is that data mining is a superset of
data ownership within the purview of medical data is quite statistics. Data mining and statistics share a great deal together
complicated because human data cannot be actually sold. since both aims at discovering underlying structures in data.
Human medical data is available in thousands of terabytes for The difference is that data mining must deal with
data mining and it is very often a heterogeneous databases. In heterogeneous data fields. Further, because of large volume
addition, it is scattered without any format throughout the and heterogeneous nature of medical databases, it is not
medical care establishment. That’s why; it is hard to decide plausible that any data mining tool can succeed with raw and
the actual ownership of medical data. unorganized data [99]. When we talk about medical data
mining and knowledge discovery, it is important to follow a no single data mining technique which can give consistent
set of rules from problem specification to application of the results for all types of healthcare data. Indeed, the
results [100]. Knowledge discovery is a non–trivial process of performance of techniques varies from one dataset to other
determining valid, new, useful, and understandable patterns dataset. For effective utilization of these techniques in
from large sets of data [101]. healthcare domain, there is a need to enhance and secure
health data sharing among various parties. This paper also
iv. Status of medicine: Medicine is a need, a must for a patient. addresses uniqueness of data mining with respect to medical
It is not a luxury or pleasure for any human being. The data. Further, the constraints and difficulties related to privacy
outcome of healthcare is life or death which applies to all sensitivity and large volume of medical data play vital role in
humans. Medicine has a special status in daily life and is a selection of the particular data mining technique. Moreover,
popular subject of common interest of humanity. Medical care ethical and legal aspects of medical data are also important
is sometimes risky and when it fails the desire for revenge is aspects. Medical data can have a special status based on its
intense. Medical information of a patient is private and the applicability to all people.
public is fearful about its disclosure. We enjoy the benefits of
medical research, but very few of us are ready to contribute Acknowledgment (HEADING 5)
our personal details for research purpose. Moreover, when The author is pleased to acknowledge the sincere effort
medical data are published, it is expected that the researchers and help extended by his student Ms. Akanksha Verma. Ms.
will maintain the confidentiality regarding the identity of an Akanksha has completed his M.Tech (Computer Engg.) from
individual patient, and the results will be used for benefit of Birla Institute of Technology (Allahabad Campus). She
the society [102]. As a matter of fact, researchers must follow completed her M.Tech thesis on medical data classification
that scientific advancement are for overall development i.e., it under the supervision of the author.
could be used for betterment of good as well as bad [103].

VII. RESULTS AND DISCUSSION References


We elucidate different aspects and techniques of machine
learning regarding the medical data and healthcare. There are [1] S. Mitra, S.K.Pal & Mitra , P., Data mining in soft computing
many techniques used in healthcare. However, this paper framework: A survey, IEEE transactions on neural networks, 13(1), 3-
14,2002.
mainly focused on these techniques namely; neural networks,
[2] Krzysztof J. Cios, G.William Moore, Uniqueness of medical data
decision trees, fuzzy sets, support vector machines, bayesian mining, Artificial Intelligence in Medicine 26, 1–24, 2002.
networks, rough sets, and genetic algorithms. Each technique [3] Parvez Ahmad, Saqib Qamar, Syed QasimAfser Rizvi, Techniques of
has its own advantages and associated disadvantages e.g.,NN Data Mining in Healthcare : A Review, International Journal of
is able to handle noisy data properly for training but it does Computer Applications (0975 – 8887) Volume 120 – No.15, June 2015.
not work well with hundreds or thousands of input features [4] Hsinchun Chen, Sherrilynne, S. Fuller, Carol Friedman and William
Hersh, Knowledge Management, Data Mining and text mining
and even it does not work well for complex problems.Further, inmedical informatics.
decision tree can handle all types of variables but its use is [5] V. krishnaiah, G. Narsimha, & N. Subhash Chandra, A study on clinical
restricted to one output attribute. SVM provides better prediction using Data Mining techniques, International Journal of
accuracy in comparison to other classifiers and it is effective Computer Science Engineering and Information Technology Research
in high dimensional spaces. However, it gives poor (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 1, 239 248, March 2013.
performances when the number of features is much greater [6] Divya Tomar and Sonali Agarwal , A survey on data mining approaches
for healthcare, International Journal of Bio-Science and Bio-Technology
than the number of samples. Bayesian Networks is fast and Vol.No.5, pp. 241-266, 2013.
accurate for huge datasets as well but in some cases result [7] Mohammed Abdul Khalid, Sateesh kumar Pradhan, G.N.Dash,
obtained from it is wrong. Therefore, it is hard to say that F.A.Mazarbhuiya, A survey of data mining techniques on medical data
which method is best. Indeed, in different scenario different for finding temporally frequent diseases”, International Journal of
Advanced Research in Computer and Communication Engineering
technique renders best while the same technique performs Vol.2, Issue 12, December 2013.
worst in other application. A comparatice study is shown in [8] S.D.Gheware, A.S.Kejkar, S.M.Tondare, Data Mining: Task, Tools,
table 1. Techniques and Applications, International Journal of Advanced
Research in Computer and Communication Engineering Vol. 3, Issue
10, October 2014.
VIII.CONCLUSIONS [9] Yongjian Fu , Data Mining : Tasks, Techniques and Applications
http://academic.csuohio.edu/fuy/Pub/pot97.pdf
In this paper, we have discussed that data mining can be
[10] Pang-Ning Tan, Michael Steinbach, Vipin Kumar, "Introduction to Data
beneficial in medical domain. Due to rapid increase in the Mining", Addison Wesley, 2002.
volume of medical data, data mining techniques have high [11] G. Beller, J. Nucl. Cardiol. “The rising cost of health care in the United
utility in this field. Various tasks and applications related to States: is it making the United States globally noncompetitive?” vol. 15,
data mining are analyzed within the purview of healthcare no. 4, pp. 481-482, 2008.
organizations. This paper explores different data mining [12] Pang-Ning Tan, Michael Steinbach ,Vipin Kumar, "Introduction to Data
Mining", Addison Wesley, 2005.
techniques, their advantages and drawbacks. Perhaps, there is
[13] Gosain, A.; Kumar, A., "Analysis of health care data using different data [39] Zand D.E. The Leadership Triad: Knowledge, Trust and Power, New
mining techniques," Intelligent Agent & Multi-Agent Systems, 2009. York: Oxford University Press, 1997.
IAMA 2009, International Conference on, vol. no., pp.1,6, 22-24 July [40] Sussman S. W & Siegal W. S, Informational Influence in Organizations:
2009. An Integrated Approach to Knowledge Adoption, Information Systems
[14] Dr. M.H.Dunham, “Data Mining, Introductory and Advanced Topics”, Research, 14(1), 47-65, 2003.
Prentice Hall, 2002. [41] Fayyad U. M, Piatetsky-Shapiro G and Smyth P, “From Data Mining to
[15] A. S. Elmaghraby, et al. Data Mining from multimedia patient records. Knowledge Discovery in Databases,” AI Magazine, 17(3), 37-54, 1996.
6, 2006. [42] Dunham M. H, Data Mining: Introductory and Advanced Topics, New
[16] Nada Lavrac, BlažZupan, "Data Mining in Medicine" in Data Mining Jersey, USA: Prentice Hall, 2002.
and Knowledge Discovery Handbook, 2005. [43] Jihoon Yang and Vasant Honavar, Feature subset selection using
[17] Soni J, Ansari U, Sharma D, "Predictive Data Mining for Medical Genetic Algorithm, IEEE Intelligent Systems, 1998.
Diagnosis: An Overview of Heart Disease Prediction", International [44] Hany M. Harb, Abeer S. Desuky, Feature Selection on Classification of
Journal of Computer Applications (0975 – 8887), Volume 17– No.8, Medical Datasets based on Particle Swarm Optimization, International
March 2011. Journal of Computer Applications (0975 – 8887) Volume 104 – No.5,
[18] Naren Ramakrishnan, David Hanauer, Benjamin J. Keller, Mining October 2014.
Electronic Health Records, IEEE Computer 43(10): 77-81, 2010. [45] G. Ravi Kumar, Dr. G.A.Ramachandra, K.Nagamani, An Efficient
[19] O. Mary K, Mat, “Applications of Data Mining Techniques to Feature Selection System to Integrating SVM with Genetic Algorithm
Healthcare Data”, Infection Control and Hospital Epidemiology, August for Large Medical Datasets International Journal of Advanced Research
2004. in Computer Science and Software Engineering, Volume 4, Issue 2,
[20] Hian Chye K, Gerald T, Data mining applications in healthcare, Journal February 2014.
of healthcare information management: JHIM.19 (2): 64-72, (2005). [46] V.Sangeetha, J.Preethi, M.Sreeshakthy, Survey on Medical Data Cluster
[21] A. Milley, “Healthcare and data mining”, Health Management analysis using Feature Selection and Neural Networks, International
Technology, vol. 21, no. 8, pp. 44-47, 2000. Journal of Advanced Research in Computer Engineering & Technology
(IJARCET) Volume 3 Issue 11, November 2014.
[22] Gaynes R, Richards C, Edwards J, et al. Feeding back surveillance data
to prevent hospital-acquired infections. Emerg Infect Dis 2001;7:2 95- [47] Megha Aggarwal, Amrita, Performance Analysis Of Different Feature
298, 2001. Selection Methods In Intrusion Detection, International Journal Of
Scientific & Technology Research Volume 2, Issue 6, June 2013.
[23] Brosette SE, Spragre AP, Jones WT, Moser SA. A data mining system
for infection control surveillance. Methods Inf Med,39: 303-310, 2000. [48] Anderson J. A and Davis J., An introduction to neural networks, MIT,
Cambride, 1995.
[24] M. Ridinger, “American Healthways uses SAS to improve patient care”,
DM Review, vol. 12, no.139, 2002. [49] Obenshain M. K, Application of data mining techniques to healthcare
data Infect. Control Hosp. Epidemiol, 25(8):690–695, 2004.
[25] M. Durairaj, V.Ranjani, Data mining applications in healthcare sector:
A Study, International Journal Of Scientific & Technology Research [50] M. H. Dunham, “Data mining introductory and advanced topics”, Upper
Volume 2, Issue 10, ISSN 2277-8616, October 2013. Saddle River, NJ: Pearson Education, Inc., 2003.
[26] Anonymous. Texas Medicaid Fraud and Abuse Detection System [51] A. Shameem Fathima, D. Manimegalai and Nisar Hundewale, A
recovers $2.2 million, wins national award. Health Management Review of Data Mining Classification Techniques Applied for Diagnosis
Technology, vol. 20, no. 10, 1999. and Prognosis of the Arbovirus-Dengue, IJCSI International Journal of
Computer Science Issues, Vol. 8, Issue 6, No 3, ISSN (Online): 1694-
[27] H. C. Koh and G. Tan, “Data Mining Application in Healthcare”, 0814, November 2011.
Journal of Healthcare Information Management, vol. 19, no. 2, 2005.
[52] K. Usha Rani, Analysis Of Heart Diseases Dataset Using Neural
[28] B. K. Schuerenberg, “An information excavation”, Health Data Network Approach, International Journal Of Data Mining &Knowledge
Management, vol. 11, no. 6, pp. 80-82, 2003.. Management Process (Ijdkp) Vol.1, No.5, September 2011.
[29] P. D. Haghighi et. al., Mobile Data Mining for Intelligent Healthcare
[53] Haykin. S, Neural Networks: A Comprehensive Foundation, Prentice
Support, IEEE xplore, 2009.
Hall, 1999.
[30] Neelamadhab Padhy, Pragnyaban Mishra and Rasmita Panigrahi, The
[54] Emina Alickovic and Abdulhamit Subasi, Data Mining Techniques for
survey of data mining applications and feature scope, Asian Journal Of
Medical Data Classification, The International Arab Conference on
Computer Science And Information Technology 2: 4, 68– 77, 2012.
Information Technology (ACIT) ,2011.
[31] Wiig, K, Knowledge Management: An Emerging Discipline Rooted in a [55] S. Anto, Dr.S.Chandramathi, Supervised Machine Learning Approaches
Long History Knowledge Management (pp. 352): Butterworth- for Medical Data Set Classification - A Review, IJCST Vol. 2, Issue 4,
Heinemann, 1999. Oct - Dec 2011.
[32] Stankosky M, Creating the Discipline of Knowledge Management: [56] Goharian & Grossman, Data Mining Classification, Illinois Institute of
Butterworth-Heinemann, 2005. Technology, 2003.
[33] Chen H and Chau M. “Web Mining: Machine Learning for Web http://ir.iit.edu/~nazli/cs422/CS422-Slides/DMClassification.
Applications,” Annual Review of Information Science and Technology,
38, 289-329, 2004. [57] Apte & S.M. Weiss, Data Mining with Decision Trees and Decision
Rules, T.J. Watson Research Center,
[34] Chen C-J & Huang J-W. Strategic human resource practices and http://www.research.ibm.com/dar/papers/pdf/fgcsapteweissue_with_cov
innovation performance - The mediating role of knowledge management er.pdf, 1997.
capacity. Journal of Business Research 62: 104-114, 2009.
[58] V.Gayathri, M.Chanda Mona, S.Banu Chitra, A survey of data mining
[35] Fugate BS, Stank TP & Mentzer JT. Linking improved knowledge
techniques on medical diagnosis and research. V.Gayathri, M.Chanda
management to operational and organizational performance. Journal of
Mona, S.Banu Chitra, International Journal of Data Engineering (IJDE)
Operations Management in Press, Corrected Proof, 2008.
Singaporean Journal of Scientific Research(SJSR) Vol.6.No.6 2014.
[36] Orzano A.J, McInerney CR, Scharf D, Tallia AF & Crabtree BF. A
[59] L.A. Zadeh, “Some reflection on soft computing, granular computing
knowledge management model: Implications for enhancing quality in
and their roles in the conception, design and utilization of
health care. Journal of the American Society for Information Science & information/intelligent system”, Soft computing,vol. 2, 1998.
Technology 59: 489-505, 2008.
[60] Kalyani Mali & Samayita Bhattacharya., Soft computing on Medical -
[37] Christo El Morr and Julien Subercaze, Knowledge Management in Data (SCOM) for a Countrywide Medical System using Data Mining
Health care, DOI: 10.4018/978-1-61520-670-4.ch023, pp 490-510.
and Cloud Computing Features, Global Journal of Computer Science
[38] Wilson T. D, The nonsense of knowledge management, Information
Research, 8(1), 2002.
and Technology Cloud and Distributed, Volume 13 Issue 3 Version 1.0 [84] Akay Y.M, Akay M, Welkowitz W and Kostis J.B, "Noninvasive
Year 2013. detection of coronary artery disease using wavelet-based fuzzy neural
[61] V. Vapnik, “Statistical Learning Theory”, Wiley, ISBN: 978-0-471- networks", IEEE Engineering in Medicine and Biology, 761-764, 1994.
03003-4, 1998. [85] Coppini G, Poli R and Valli G, "Recovery of the 3-D shape of the left
[62] V. Vapnik, “The support vector method of function estimation”, AT & T ventricle from echo cardio graphic images", IEEE Transactions on
Labs – Research, John Wiley and Sons, New York, USA,1998. Medical Imaging, 14, 301-317, 1995.
[63] N. Cristianini and J.Shawe-taylor, An introduction to support vector [86] Moustakis V and Charissis G, "Machine learning and medical decision
machines and other kernel-based learning methods. Cambridge making". In Proceedings of Workshop on Machine Learning in Medical
university press, 1995. Applications, Advance Course in Artificial Intelligence- ACAI99,
Chania, Greece, 1-19, 1996.
[64] Hammer, B. and Gersmann, K., “A Note On theUniversal
Approximation Capability of SupportVector Machines”, Neural Process [87] Manning CD, Schuetze H, Foundations of statistical natural language
Lett 17, pp. 1061 - 1085, 2003. processing. Cambridge (MA): MIT Press, 2000.
[65] Vapnik, V.N., “The Nature of Statistical LearningTheory”, Springer, [88] Ceusters W, Medical natural language understanding as a supporting
New York, 2005. technology for data mining in healthcare. In: Cios KJ, editor. Medical
data mining and knowledge discovery. Heidelberg: Springer, p. 32–60
[66] N. Chistianini and J. Shawe-Taylor, “An Introduction to Support Vector [chapter 3], 2000.
Machines, and other kernel-based learning methods”, Cambridge
University Press, 2000. [89] Friedman C, Hripcsak GW, Evaluating natural language processors in
the clinical domain. Meth Inform Med,37:334–44, 1998.
[67] N. Cristianini and J. Shawe-Taylor, “An Introduction to Support Vector
Machines”, Cambridge University Press, 2000. [90] Brewka G, Dix J, Konolige K. Non monotonic reasoning: an overview.
CSLI Lecture Notes No. 73, ISBN 1-881526-83-6, pp. 179, 1997.
[68] Argyro Kampouraki, Christophoros Nikou, George Manis, "Robustness
of Support Vector Machine-based Classification of Heart Rate Signals", [91] Moore GW, Hutchins GM. Effort and demand logic in medical decision
Proceedings of the 28th IEEE, EMBS Annual International Conference, making. Meta medicine,1:277–304, 1980.
New York, USA, Aug30-sep 3,2006, 1995. [92] Moore GW, Hutchins GM, Miller RE. Token swap test of significance
[69] Samjin Choi, "Detection of valvular heart disorders using wavelet packet for serial medical databases. Am J Med,80:182–90, 1986.
decomposition and support vector machine, Elsevier", Expert Systems [93] Zadeh LA. Fuzzy sets and information granularity. In: Gupta MM, et al.,
with Applications, 35, pp 1679-1687, 2008. editors. Advances in fuzzy set theory and applications. Dordrecht:
[70] Ilias Maglogiannis, Euripidis Loukis, Elias Zafiropoulos, Antonis Stasis, North-Holland, pp. 3–18, 1979.
"Support vector machine based identification of heart valve diseases [94] Moore GW, Berman JJ. Anatomic pathology data mining. In: Cios KJ,
using heart sounds", Elsevier, Computer Methods and Programs in editor. Medical data mining and knowledge discovery. Heidelberg:
Biomedicine ,95, pp. 47-61, 2009. Springer, p. 61–108 [chapter 4], 2000.
[71] Friedman N., Geiger, D.Goldszmidt M, "Bayesian network classifiers. [95] US Department of Health and Human Services. 45 CFR (Code of
Machine Learning 29: pp. 131-163, 1997. Federal Regulations). Parts 160–164. Standards for Privacy of
[72] Friedman N., Koller D, "Being Bayesian About Network Structure: A Individually Identifiable Health Information. Final Rule. Fed Regist 28 ,
Bayesian Approach to Structure Discovery in Bayesian Networks", 65(250):82461–610, 2000. (http://aspe.hhs.gov/admnsimp/).
Machine Learning 50(1): pp. 95-125, 2003. [96] US Code of Federal Regulations, 45 CFR Subtitle A, 10-1-95 ed. Part
[73] Finn V. Jensen, An Introduction to Bayesian Networks, Springer, New 46. 101 (b) (4). US Department of Health and Human Services
York, 1996. (CommonRule),56:28003,1991.
[74] Sebe N., Ira Cohen, Ashutosh Garg and Thomas Huang S. "Machine http:// ohrp.osophs.dhhs.gov/humansubjects/guidance/45cfr46.htm).
Learning in Computer Vision", Springer, Netherlands, pp. 130-133, [97] US National Cancer Institute’s Confidentiality Brochure, 2000.
2005. (http://www-cdp.ims.nci.nih.gov/policy.html).
[75] Ankita Agarwal, ”Secret Key Encryption algorithm using genetic [98] Saul J M, Legal policy and security issues in the handling of medical
algorithm”, vol.-2, no.-4, ISSN: 2277 128X, IJARCSSE, pp. 57-61, data. In: Cios KJ, editor, Medical data mining and knowledge discovery.
April 2012. Heidelberg: Springer, p. 17–31 [chapter 2], 2000.
[76] Jihoon Yang and Vasant Honavar. Feature subset selection using [99] Pawlak Z, Rough classification, Int J Man-Mach Stud, 20, pp.469–83,
Genetic Algorithm. IEEE Intelligent Systems, 1998. 1984.
[77] Sang Jun Lee, Keng Siau, A review of data mining techniques, Industrial [100]Cios KJ, Kurgan LA. Trends in data mining and knowledge discovery.
Management & Data Systems, 101/1,MCB University Press [ISSN In: Pal NR, Jain LC, Teodoresku N, editors. Knowledge discovery in
0263-5577], 2001.. advanced information systems. Berlin: Springer, 2002.
[78] George D., Magoulas and Andriana Prentza. Machine Learning in [101]Fayyad U M, Piatesky-Shapiro G, Smyth P, Uthurusamy R. Advances in
Medical Applications, Proceeding machine learning and its knowledge discovery and data mining. Boston: AAAI Press/MIT Press,
applications:Advance lectures, pp. 300-307, 2001. 1996.
[79] Quinlan J.R, "Induction of decision trees", Machine Learning, 1, 1, 81- [102]Saul J M, Legal policy and security issues in the handling of medical
106, 1986. data. In: Cios KJ, editor. Medical data mining and knowledge discovery.
[80] Goldberg D, Genetic Algorithms in Search, Optimization, and Machine Heidelberg: Springer, pp. 17–31 [chapter 2], 2000.
Learning. Addison-Wesley, 1989. [103]Changeux J-P, Connes A, Conversations on mind, matter, and
[81] Rumelhart D.E and Mc Clelland, J.L. (eds.), Parallel Distributed mathematics [De Bevoise MB, Trans.]. Princeton (NJ): Princeton
Processing, Vol. 1: Foundations. MIT Press, Cambridge, MA: MIT University Press, 1995.
Press,1986.
[82] Bourlas Ph, Sgouros N, Papakonstantinou G and Tsanakas P, "Towards
a knowledge acquisition and management system for ECG diagnosis", In
Proceedings of 13th International Congress Medical Informatics Europe-
MIE96, Copenhagen, 1996.
[83] Zupan B., Halter J.A and Bohanec M., “Qualitative model approach to
computer assisted reasoning in physiology”, In Proceedings of
Intelligent Data Analysis in Medicine and Pharmacology-IDAMAP98,
Brighton, UK, 1998.

View publication stats

You might also like