0% found this document useful (0 votes)
15 views

9-10 - Machine-Learning Techniques for Predictive Analytics

Chapter 5 of the document discusses machine-learning techniques for predictive analytics, focusing on artificial neural networks (ANN), support vector machines (SVM), k-nearest neighbor (k-NN) methods, and Bayesian learning. It outlines learning objectives, applications in various fields including healthcare and electric power, and compares the advantages and disadvantages of different models. The chapter also includes case studies demonstrating the practical applications of these techniques in real-world scenarios.

Uploaded by

harkatchrist
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

9-10 - Machine-Learning Techniques for Predictive Analytics

Chapter 5 of the document discusses machine-learning techniques for predictive analytics, focusing on artificial neural networks (ANN), support vector machines (SVM), k-nearest neighbor (k-NN) methods, and Bayesian learning. It outlines learning objectives, applications in various fields including healthcare and electric power, and compares the advantages and disadvantages of different models. The chapter also includes case studies demonstrating the practical applications of these techniques in real-world scenarios.

Uploaded by

harkatchrist
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Analytics, Data Science and A I:

Systems for Decision Support


Eleventh Edition, Global Edition

Chapter 5
Machine-Learning Techniques for
Predictive Analytics

Slide in this Presentation Contain Hyperlinks.


JAWS users should be able to get a list of links by
using INSERT+F77

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Learning Outcomes
LO 1: Explain the motivations, concepts, methods, and
methodologies for different types of analytics.
LO 3: Analyze current applications of technology trends
using several methodologies.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Learning Objectives (1 of 2)
5.1 Understand the basic concepts and definitions of
artificial neural networks (AN N)
5.2 Learn the different types of AN N architectures
5.3 Understand the concept and structure of support vector
machines (SV M)
5.4 Learn the advantages and disadvantages of SV M
compared to AN N
5.5 Understand the concept and formulation of k-nearest
neighbor (k N N) algorithm

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Learning Objectives (2 of 2)
5.6 Learn the advantages and disadvantages of k N N
compared to AN N and SV M
5.7 Understand the basic principles of Bayesian learning
and Naïve Bayes algorithm
5.8 Learn the basics of Bayesian Belief Networks and how
they are used in predictive analytics
5.9 Understand different types of ensemble models and
their pros and cons in predictive analytics

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Opening Vignette (1 of 4)
Predictive Modeling Helps Better Understand
and Manage Complex Medical Procedures

• Situation
• Problem
• Solution
• Results
• Answer & discuss the case questions.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Opening Vignette (2 of 4)
Discussion Questions for the Opening Vignette:
1. Why is it important to study medical procedures? What is
the value in predicting outcomes?
2. What factors do you think are the most important in
better understanding and managing healthcare?
3. What would be the impact of predictive modeling on
healthcare and medicine? Can predictive modeling
replace medical or managerial personnel?
4. What were the outcomes of the study? Who can use
these results? How can they be implemented?
5. Search the Internet to locate two additional cases in
managing complex medical procedures.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Opening Vignette (3 of 4)
A Process Map for Training and Testing Four Predictive Models

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Opening Vignette (4 of 4)
The Comparison of the Four Models

1
Acronyms for model types: artificial neural networks (A N N), support vector machines (S V M), popular decision tree
algorithm (C5), classification and regression trees (C A R T).
2
Prediction results for the test data samples are shown in a confusion matrix where the rows represent the actuals and
columns represent the predicted cases.
3
Accuracy, sensitivity, and specificity are the three performance measures that were used in comparing the four
prediction models.
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
Neural Network Concepts
• Neural networks (N N): a human brain metaphor for
information processing
• Neural computing
• Artificial neural network (AN N)
• Many uses for AN N for
– pattern recognition, forecasting, prediction, and
classification
• Many application areas
– finance, marketing, manufacturing, operations,
information systems, and so on

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Biological Neural Networks

• Two interconnected brain cells (neurons)

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Processing Information in AN N

• A single neuron (processing element – P E) with inputs


and outputs
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
Biology Analogy
Biological Artificial
Soma Node
Dendrites Input
Axon Output
Synapse Weight
Slow Fast
Many neurons (109) Few neurons (a dozen to hundreds of thousands)

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Elements of AN N
• Processing element (P E)
• Network architecture
– Hidden layers
– Parallel processing
• Network information processing
– Inputs
– Outputs
– Connection weights
– Summation function

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.1
Neural Networks Are Helping to Save Lives in
the Mining Industry

Questions for Discussion:


1. How did neural networks help save lives in the mining
industry?
2. What were the challenges, the proposed solution, and
the obtained results?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Neural Network Architectures
• Architecture of a neural network is driven by the task it is
intended to address
– Classification, regression, clustering, general
optimization, association
• Feedforward, multi-layered perceptron with
backpropagation learning algorithm
– Most popular architecture:
– This AN N architecture will be covered in Chapter 6
• Other AN N Architectures – Recurrent, self-organizing
feature maps, hopfield networks, …

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Neural Network Architectures
Recurrent Neural Networks

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Other Popular AN N Paradigms Self
Organizing Maps (SO M)

• First introduced
by the Finnish
Professor Teuvo
Kohonen
• Applies to
clustering type
problems

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Other Popular AN N Paradigms
Hopfield Networks

• First introduced by
John Hopfield
• Highly interconnected
neurons
• Applies to solving
complex
computational
problems (e.g.,
optimization
problems)

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.2
Predictive Modeling Is Powering the Power
Generators

Questions for Discussion:


1. What are the key environmental concerns in the electric
power industry?
2. What are the main application areas for predictive
modeling in the electric power industry?
3. How was predictive modeling used to address a variety
of problems in the electric power industry?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Support Vector Machines (SV M)
(1 of 4)
• SV M are among the most popular machine-learning
techniques.
• SV M belong to the family of generalized linear models…
(capable of representing non-linear relationships in a
linear fashion)
• SV M achieve a classification or regression decision
based on the value of the linear combination of input
features.
• Because of their architectural similarities, SV M are also
closely associated with AN N.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Support Vector Machines (SV M)
(2 of 4)
• Goal of SV M: to generate mathematical functions that
map input variables to desired outputs for classification or
regression type prediction problems.
– First, SV M uses nonlinear kernel functions to
transform non-linear relationships among the variables
into linearly separable feature spaces.
– Then, the maximum-margin hyperplanes are
constructed to optimally separate different classes
from each other based on the training dataset.
• SV M has solid mathematical foundation!

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Support Vector Machines (SV M)
(3 of 4)
• A hyperplane is a geometric concept used to describe the
separation surface between different classes of things.
– In SV M, two parallel hyperplanes are constructed on
each side of the separation space with the aim of
maximizing the distance between them.
• A kernel function in SV M uses the kernel trick (a method
for using a linear classifier algorithm to solve a nonlinear
problem)
– The most commonly used kernel function is the radial
basis function (RB F).

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Support Vector Machines (SV M)
(4 of 4)

• Many linear classifiers (hyperplanes) may separate the


data
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
Application Case 5.3 (1 of 4)
Identifying Injury Severity
Risk Factors in Vehicle
Crashes with Predictive
Analytics
Figure 5.7 Data Acquisition/Merging/Preparation
Process.
• Problem

• Method

• Results

• Conclutions

Source: Microsoft Excel 2010, Microsoft Corporation.


Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
Application Case 5.3 (2 of 4)
Identifying Injury Severity Risk Factors in
Vehicle Crashes with Predictive Analytics
Key success factors:
• Data acquisition
• Data Preparation

For numeric variables: mean (st. dev.); for binary or nominal variables: % frequency of the top two classes.
1

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.3 (3 of 4)
Identifying Injury Severity Risk Factors in
Vehicle Crashes with Predictive Analytics

• Accuracy
• Variable importance

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.3 (4 of 4)
Identifying Injury Severity Risk Factors in
Vehicle Crashes with Predictive Analytics

Questions for Discussion:


1. What are the key environmental concerns in the electric
power industry?
2. What are the main application areas for predictive
modeling in the electric power industry?
3. How was predictive modeling used to address a variety of
problems in the electric power industry?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


How Does a SV M Works?
• Following a machine-learning process, a SV M learns from the
historic cases.
• The Process of Building SV M
1. Preprocess the data
 Scrub and transform the data.
2. Develop the model.
 Select the kernel type (RB F is often a natural choice).
 Determine the kernel parameters for the selected kernel
type.
 If the results are satisfactory, finalize the model,
otherwise change the kernel type and/or kernel
parameters to achieve the desired accuracy level.
3. Extract and deploy the model.
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
The Process of Building a SV M

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


SV M Applications
• SV M are the most widely used kernel-learning algorithms
for wide range of classification and regression problems
• SV M represent the state-of-the-art by virtue of their
excellent generalization performance, superior prediction
power, ease of use, and rigorous theoretical foundation
• Most comparative studies show its superiority in both
regression and classification type prediction problems.
• SV M versus AN N?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


k-Nearest Neighbor Method (k-N N)
(1 of 2)
• ANN s and SVM s  time-demanding, computationally
intensive iterative derivations
• k-N N a simplistic and logical prediction method, that
produces very competitive results
• k-N N is a prediction method for classification as well as
regression types (similar to AN N & SV M)
• k-N N is a type of instance-based learning (or lazy
learning) – most of the work takes place at the time of
prediction (not at modeling)
• k : the number of neighbors used in the model

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


k-Nearest Neighbor Method (k-N N)
(2 of 2)

• The answer to
“which class a
data point
belongs to?”
depends on the
value of k

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


The Process of k-N N Method

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


k-N N Model Parameter (1 of 2)
1. Similarity Measure: The Distance Metric

Minkowski distance
q q q
d (i, j ) = ( xi1  x j1  xi 2  x j 2  ...  xip  x jp )
q

If q 1, then d is called Manhatten distance

d (i, j ) = xi1  x j1  xi 2  x j 2  ...  xip  x jp


If q 2, then d is called Euclidean distance
2 2 2
d (i, j ) = ( xi1  x j1  xi 2  x j 2    xip  x jp )

– Numeric versus nominal values?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


k-N N Model Parameter (2 of 2)
2. Number of Neighbors (the value of k)
– The best value depends on the data
– Larger values reduces the effect of noise but also
make boundaries between classes less distinct
– An “optimal” value can be found heuristically
• Cross Validation is often used to determine the best value
for k and the distance measure

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.4
Efficient Image Recognition and Categorization
with k N N

Questions for Discussion:


1. Why is image recognition/classification a worthy but
difficult problem?
2. How can kN N be effectively used for image
recognition/classification applications?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Naïve Bayes Method for
Classification (1 of 2)
• Naïve Bayes is a simple probability-based classification
method
– Naïve - assumption of independence among the input
variables
• Can use both numeric and nominal input variables
– Numeric variables need to be discretized
• Can be used for both regression and classification
• Naïve based models can be developed very efficiently
and effectively
– Using maximum likelihood method

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Bayes Theorem
• Developed by Thomas Bayes (1701–1761)
• Determines the conditional probabilities
• Given that X and Y are two events:
P ( X | Y ) P (Y ) Likelihood  Prior
P (Y | X )   Posterior 
P( X ) Evidence
P (Y | X ): Posterior probability of Y given X
P ( X | Y ): Conditional probability of X given Y (likelihood )
P (Y ) : Prior probability of Y
P ( X ) : Prior probability of X (evidence, or unconditional probability of X )

– Go trough the simple example in the book (p. 315)

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Naïve Bayes Method for
Classification (2 of 2)
• Process of Developing a Naïve Bayes Classifier
• Training Phase
1. Obtain and pre-process the data
2. Discretize the numeric variables
3. Calculate the prior probabilities of all class labels
4. Calculate the likelihood for all predictor
variables/values
• Testing Phase
– Using the outputs of Steps 3 and 4 above, classify the
new samples
 See the numerical example in the book…
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
Application Case 5.5 (1 of 2)
Predicting Disease Progress in Crohn’s Disease
Patients: A Comparison of Analytics Methods

Questions for Discussion:


1. What is Crohn’s disease and why is it important?
2. Based on the findings of this Application Case, what can
you tell about the use of analytics in chronic disease
management?
3. What other methods and data sets might be used to
better predict the outcomes of this chronic disease?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.5 (2 of 2)
Predicting Disease Progress in Crohn’s Disease
Patients: A Comparison of Analytics Methods

c t i on
i y
red rac
P cu
gy

Ac
lo
do
ho
et
M

ble ce
r ia rtan
Va po
Im

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Bayesian Networks (1 of 5)
• A tool for representing dependency structure in a
graphical, explicit, and intuitive way
– A directed acyclic graph whose nodes correspond to
the variables and arcs that signify conditional
dependencies between variables and their possible
values
– Direction of the arc matter
– A partial causality link in student retention

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Bayesian Networks (2 of 5)
How can B N be constructed?
1. Manually
– By an engineer with the help of a domain expert
– Time demanding, expensive (for large networks)
– Experts may not even be available
2. Automatically
– Analytically …
– By learning/inducing the structure of the network from
the historical data
 Availability high-quality historical data is imperative

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Bayesian Networks (3 of 5)
How can B N be constructed?
• Analytically

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Bayesian Networks (4 of 5)
How can B N be constructed?
Tree Augmented Naïve Bayes Network Structure

1. Compute information
function
2. Build the undirected
graph
3. Build a spanning tree
4. Convert the undirected

graph into a directed


one Tree Augmented Naïve (T A N)
Bayes Network Structure
5. Construct a TA N model
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
Bayesian Networks (5 of 5)

• EXAMPLE: Bayesian Belief Network for Predicting


Freshmen Student Attrition
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
Ensemble Modeling (1 of 3)
• Ensemble – combination of models (or model outcomes)
for better results
• Why do we need to use ensembles:
– Better accuracy
– More stable/robust/consistent/reliable outcomes
• Reality: ensembles wins competitions!
– Netflix $1M Prise completion
– Many recent competitions at Kaggle.com
• The Wisdom of Crowds

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Ensemble Modeling (2 of 3)
Figure 5.19 Graphical Depiction of Model Ensembles for
Prediction Modeling.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Types of Ensemble Modeling (1 of 4)
Figure 5.20 Simple Taxonomy for Model Ensembles.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Types of Ensemble Modeling (2 of 4)
Figure 5.20 Bagging-Type Decision Tree Ensembles.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Types of Ensemble Modeling (3 of 4)
Figure 5.20 Boosting-Type Decision Tree Ensembles.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Ensemble Modeling (3 of 3)
• Variants of Bagging & Boosting (Decision Trees)
– Decision Trees Ensembles  Homogeneous

– Random Forest  model types
– Stochastic Gradient Boosting  decision trees 
• Stacking
– Stack generation or super learners  Homogeneous

• Information Fusion  model types

– Any number of any models  decision trees 
– Simple/weighted combining

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Types of Ensemble Modeling (4 of 4)
• STACKING • INFORMATION FUSION

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Ensembles – Pros and Cons
Table 5.9 Brief List of Pros and Cons of Model Ensembles Compared to
Individual Models.
PROS (Advantages) Description
• Accuracy Model ensembles usually result in more accurate models than individual models.
• Robustness Model ensembles tend to be more robust against outliers and noise in the data set
than individual models.
• Reliability (stable) Because of the variance reduction, model ensembles tend to produce more stable,
reliable, and believable results than individual models.
• Coverage Model ensembles tend to have a better coverage of the hidden complex patterns in
the data set than individual models.
CONS (Shortcomings) Description
• Complexity Model ensembles are much more complex than individual models.
• Computationally Compared to individual models, ensembles require more time and computational
expensive power to build.
• Lack of transparency Because of their complexity, it is more difficult to understand the inner structure of
(explainability) model ensembles (how they do what they do) than individual models.
• Harder to deploy Model ensembles are much more difficult to deploy in an analytics-based
Managerial decision-support system than single models.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.6 (1 of 3)
To Imprison or Not to Imprison: A Predictive
Analytics-Based DS S for Drug Courts

Questions for Discussion:


1. What are drug courts and what do they do for the
society?
2. What are the commonalities and differences between
traditional (theoretical) and modern (machine-learning)
base methods in studying drug courts?
3. Can you think of other social situations and systems for
which predictive analytics can be used?

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.6 (2 of 3)
To Imprison or Not to Imprison: A Predictive
Analytics-Based DS S for Drug Courts
Methodology

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Application Case 5.6 (3 of 3)
To Imprison or Not to Imprison: A Predictive
Analytics-Based DS S for Drug Courts
Prediction Accuracy

AN N: artificial neural networks; D T: decision trees; L R: logistic regression; R F: random forest; H E: heterogeneous
ensemble; AU C: area under the curve; G: graduated; T: terminated
Copyright © 2021 Pearson Education Ltd. All Rights Reserved.
End of Chapter 5
• Questions / Comments

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Copyright

This work is protected by United States copyright laws and is


provided solely for the use of instructors in teaching their
courses and assessing student learning. Dissemination or sale of
any part of this work (including on the World Wide Web) will
destroy the integrity of the work and is not permitted. The work
and materials from it should never be made available to students
except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and
the needs of other instructors who rely on these materials.

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.


Thank you

Copyright © 2021 Pearson Education Ltd. All Rights Reserved.

You might also like