0% found this document useful (0 votes)
13 views

Lecture 9_Classification_Part 2_ec0c64efddca717f99b726e6fd37c459

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lecture 9_Classification_Part 2_ec0c64efddca717f99b726e6fd37c459

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ICT731 - Week 9

Classification – Part 2
(See Textbook Chapter 11)

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com.
Objectives

 Compare and contrast classification algorithms

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Define and describe the role of training and testing
data sets

 Describe the steps to perform the classification


process
Classification Using Logistic Regression

 The logistic regression classifier is best suited for binary-dependent variables,


that is, classifications for which there are only two classes, such as gender, a

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
tumor being malignant or benign, and so on. You can use logistic regression for
multiclass problems, but your results may not prove as accurate as other methods
 A logistic regression classifier does not use the dependent variable (the classes
we are trying to group into) directly, but rather, it uses a function that uses each of
the predictor variables called a “logit.” The logistic regression algorithm is often
called the logit algorithm
 Behind the scenes, the algorithm uses a series of odds that correspond to whether
an event will occur. The logistic classifier determines the probability that data
belongs to each class based on this series of odds, which it produces by analyzing
each predictor variable
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Performing Logistic Regression Using Python
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Classifying Cancer with Logistic Regression Using
Python
Classification Using a Neural Network

 Neural networks are at the heart of machine learning and are used for a wide
range of applications, including classification

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 You will examine the MLPClassifer function, so named because it uses multilayer
perceptrons (MLPs) to accomplish its processing, in this case classifications
 In a neural network, a perceptron is a supervised learning algorithm that uses a
linear function to convert inputs into outputs
Classification Using a Neural Network (cont’d)

 The inputs to a perceptron can be weighted and biased


 A perceptron uses a linear function, however, many real-world problems are

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
not linear in nature
 As such, the problems must be decomposed into a series of linear
components and additional layers of perceptrons must be used, creating a
MLP solution
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
MLP Using Python
Classification Using Decision Trees

 A decision tree is a graph-based data

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
structure that a program can use to follow
a series of decision paths to arrive at a
decision. This figure illustrates a decision
tree that determines a student’s test
grade.
 Within machine learning, a decision-tree
classifier creates a similar structure with
decision points that are based on the
different data-set attributes. As the number
and complexity of the attributes increase,
so does the complexity of the underlying
decision tree.
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Using a Decision Tree to Classify Tumors Using Python
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Viewing the Decision Tree
Classifying Data Using Random Forests

 You have learned how to use decision-tree modeling to classify data.

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Depending on the data set and model, there may be times when the decision
tree becomes very deep (many levels of nodes).
 These decision trees will overfill the data and will have a large variance.
 A random-forest classification model creates many different decision trees for a
data set and then, based on each tree’s prediction, the trees essentially vote to
select the tree that produces the best result.
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Random-Forest Classification Using Python
Classifying Data Using a Support Vector Machine

 The support vector machine (SVM), also called an SVC (support vector classifier)
classifies data by separating values with a line called a hyperplane.

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Hyperplanes

 Classes that you can separate using a line are said to be linearly separable. As
you can see, there are many lines you can use to divide the classes.

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Support Vectors

 The goal of SVC is to find the line that creates the widest separation between
the classes. To calculate the best separation line, the algorithm uses two

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
additional lines called support vectors.
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Many classes are not linearly separable
Support Vectors (cont’d)
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Classifying Iris Flowers Using SVC in Python
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Classifying Tumors Using SVC in Python
Key Terms You Should Know

 Bayes Theorem: a theorem that produces a probability based upon known or


related events

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Binary classification: classification that assigns data to one of two classes,
such as a loan being approved or disapproved
 Classification: the process of assigning data to matching groups
 Decision tree: a graph-based data structure that contains decision points; by
following paths through the decision points, a decision-tree classifier can
determine to which class it should assign data.
 Decision-tree classifier: a classification technique that creates a decision tree
which it applies to assign data to a class
Key Terms You Should Know (cont’d)

 Dependent variable: in data classification, the dependent variable is the class to


which the algorithm will assign the data

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 K-nearest-neighbors classifier: a data classification technique that assigns data
(the K-nearest-neighbors in the training set) to the class it most closely resembles
 Logistic regression classifier: a classification algorithm that assigns data to classes
by determining the probability that the data belongs to a class; they are best suited
for binary classification
 Logit: a function used in logistic regression that determines the probability that data
belongs to a class
 MLP: acronym for multilayer perceptron, the MLP classifier is a neural-network
solution that uses multiple layers of perceptrons to assign data to groups
Key Terms You Should Know (cont’d)

 Multiclass classification: a classification that assigns data to one of many classes,


such as a wine being a white, red, or rose

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 Naïve Bayes classifier: a classification technique that assigns data to groups by
applying Bayes Theorem to determine the probability that data belongs to a class.
It is called “naïve” in that it treats each predictor variable as independent
 Neural network: a machine-learning algorithm that simulates the activities of the
brain and nervous system. Behind the scenes, neural networks use mathematical
functions (called perceptrons)
 Overfitting data: with respect to the K-nearest-neighbors algorithm to classify data,
if you specify a value of K that is too small, you may “overfit” the model, meaning
the model may start to treat noise or errant data as valid training data
Key Terms You Should Know (cont’d)

 Perceptron: a linear function used in neural networks. Because many problems


are not linear, they must be further decomposed into linear models by creating

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
additional layers of perceptrons—a multilayer perceptron (MLP) solution
 Predictor variable: a data-set value used by a classification algorithm to predict
the class to which the data should be assigned
 Random tree classifier: a data classification technique similar to a decision tree
but at each split, only a random subset of the attributes is considered
 Supervised learning: a machine-learning technique that uses a training data set
to teach the model how to perform a task; classification uses supervised
cleaning
Key Terms You Should Know (cont’d)

 SVC: acronym for support vector classifier, a classification technique that


creates classes using a series of lines (vectors) that divide the classes

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
 SVM: acronym for support vector machine, a machine-learning algorithm. When
applied to classification, the term SVC is often used
 Test data set: a data set with predictor variables and correct results for the
dependent variable that is used by a machine-learning algorithm to test the
accuracy of a model
 Training data set: a data set with predictor variables and correct results for the
dependent variable that is used by a machine-learning algorithm to create a
model
Key Terms You Should Know (cont’d)

 Underfitting data: with respect to the K-nearest-neighbors algorithm, if you


specify a value of K that is too large a value for K, you may “underfit” the model,

Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
which means the model is not capable of correctly modeling the training data
 Unsupervised learning: a machine-learning technique that does not use a
training data set; clustering uses unsupervised learning
Copyright © 2020 by Jones & Bartlett Learning, LLC an Ascend Learning Company. www.jblearning.com
Questions

You might also like