Chapter Five Data Mining for Healthcare Analytics
Chapter Five Data Mining for Healthcare Analytics
Hierarchical clustering,
Anomaly detection.
Common Techniques Used in Mining…cont’d
3. Semi-supervised Learning: Semi-supervised learning in
data mining combines both labeled and unlabeled data to
improve the learning process.
– Predictor
– Training dataset
– Testing dataset
Common Techniques Used in Mining…cont’d
• Class:
– The dependent variable of the model
– Is a categorical variable representing the ‘label’ put on
the object after its classification
• Example
– Presence of myocardial infarction
– Customer loyalty
– Condition of a patient
Common Techniques Used in Mining…cont’d
• Predictor:
– Smoking
– Alcohol consumption
– Blood pleasure
Classification
Common Techniques Used in Mining…cont’d
• Decision tree is a flowchart-like tree structure where:
• Each internal node (non-leaf node) denotes a test on an
attribute
• Each branch represents an outcome of the test
• Each leaf node (terminal node) holds a class label
• The topmost node in a tree is the root node
• is a hierarchical model whereby the local region is
identified in a sequence of recursive splits in a smaller
number of steps.
Common Techniques Used in Mining…cont’d
Classification:
• Prone to overfitting.
• Require some kind of measurement as to how well
they are doing.
• Need to be careful with parameter tuning.
• Can create biased learned trees if some classes
dominate.
Model Evaluation
• Metrics for Performance Evaluation
– How to evaluate the performance of a model?
Limitation of Accuracy
• Consider a 2-class problem:
– Number of Class 0 examples = 9990
– Number of Class 1 examples = 10
• If model predicts everything to be class 0, accuracy is
9990/10000 = 99.9 %
– Accuracy is misleading because model does not detect
any class 1 example
Metrics for Performance Evaluation…cont’d
• Specificity (True Negative Rate): Measures the proportion
of actual negative instances correctly identified by the
model.
• It is calculated as TN / (TN + FP).
• It is important for ruling out the condition and minimizing
false positives.
65
Association Rule Mining
• To produce association rules, primarily focused on features
that are implied by the target features (Antecedent =>
Consequent),
• Which is a way to classify all the variables that contribute to
dependent variable.
• These rules are often referred to as classification association
rules.
• Support, Confidence, and Lift can be represented as follows
for a certain rule: Here, the sets of features represented by X
and Y are mutually exclusive.
• Rule X=>Y
• Support
• Confidence
Association Rule Mining
• Support, confidence, and lift are key measures used in association
rule mining to evaluate the significance and strength of association
rules.
1. Support: It indicates how frequently an itemset occurs in the
dataset.
Support = (frequency of itemset) / (total number of transactions)
2. Confidence: It represents the strength of the implication from the
antecedent to the consequent.
Confidence = (frequency of antecedent and consequent) / (frequency of antecedent)
70
Challenges of Implementing Healthcare Data mining