0% found this document useful (0 votes)

59 views

04 Probability and Learning PDF

The document discusses different probabilistic and discriminative models for classification problems, including generative models that model the joint distribution of inputs and outputs, discriminative models that directly model the posterior probabilities, and discriminant functions that map inputs to class labels. It also covers the naive Bayes classifier, logistic regression, and how logistic regression uses a logistic sigmoid function to output class probabilities between 0 and 1 for binary classification problems.

Uploaded by

Raden Eka G.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

04 Probability and Learning PDF

Uploaded by

Raden Eka G.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Probability and Learning

Machine Learning
Unit 4

University of Vienna

28. März 2014

1
Probability and Learning

The Naive Bayes Classifier

Logistic Regression
Linear Discriminant Analysis (LDA)

2
Classification problem

The goal in classification is to take an input vector x and to assign it to one of k discrete
classes Ci where i = 1, ..., k .
In the most common scenario, the classes are taken to be disjoint, so that each input is
assigned to one and only one class.
The input space is thereby divided into decision regions whose boundaries are called
decision boundaries or decision surfaces.
qk
in probabilistic models we define ti as the probability that the class is Ci : i =1 ti =1

Three strategies for classification

1 generative models
2 discriminative models
3 discriminant function

3
Generative models

Approaches that explicitly or implicitly model the distribution of inputs as well as

outputs are known as generative models, because by sampling from them it is
possible to generate synthetic data points in the input space.

1 solve the inference problem of determining the class-conditional densities p(x |Ci )
for each class Ci individually
2 separately infer the prior class probabilities p(Ci )
3 use Bayes theorem in the form

p(x |Ci )p(Ci )

p(Ci |x ) =
p (x )

to find the posterior class probabilities p(Ci |x ) The denominator

ÿ
p (x ) = p(x |Ci )p(Ci )
i

Equivalently, we can model the joint distribution p(x , Ci ) directly and then normalize to
obtain the posterior probabilities.
4
Discriminative models

Approaches that model the posterior probabilities directly are called discriminative
models.

1 solve the inference problem of determining the posterior class probabilities p(Ci |x )
2 subsequently use decision theory to assign each new x to one of the classes

5
Discriminant function

Find a function f (x ), called a discriminant function, which maps each input x directly
onto a class label.
In the case of two-class problems, f might be binary valued and such that f =0
represents class C1 and f = 1 represents class C2 .
In this case, probabilities play no role.
Example: Fisher’s discriminant, perceptron algorithm.

6
Example

Options: go to the pub, watch TV, go to a party or study. Depends from an assignement,
an availability of a party and feeling.
Deadline Party Lazy Activity
Urgent Yes Yes Party
Urgent No Yes Study
Near Yes Yes Party
None Yes No Party
None No Yes Pub
None Yes No Party
Near No No Study
Near No Yes TV
Near Yes Yes Party
Urgent No No Study

7
The Bayes Theorem

There are m = 4 different classes Ci and n = 10 different examples Xj .

C1 = Pub, C2 = TV, C3 = Party, C4 = Study

For Deadline we have 3 states:

D1 = Urgent, D2 = Near, D3 = None

For Party we have 2 states:

P1 = Yes, P2 = No
For Lazy we have 2 states:
L1 = Yes, L2 = No
We calculate the value of P (Ci ) as how many times out of the total the class was Ci ,
divide by the total number of examples.

P (Pub) = 0.1, P (TV) = 0.1, P (Party) = 0.5, P (Study) = 0.3

8
The Bayes Theorem

9
The Bayes Theorem

The same procedure we use for the second feature Party

10
The Naive Bayes classifier

The simplification:
The features are conditionally independent of each others.
The Naive Bayes classifier:

feed the values of the features

compute the probabilities of each of the possible classes
pick the most likely class

Suppose:
Deadline = Near, Party = No, Lazy = Yes

11
The Naive Bayes classifier

From conditional independence P (BC |A) = P (B |A)P (C |A) follows

12
Probabilistic Discriminative Models

The second approach is to use the functional form of the generalized linear model
explicitly and to determine its parameters directly by using maximum likelihood.
In the direct approach, we are maximizing a likelihood function defined through the
conditional distribution p(Ci |x ), which represents a form of discriminative training.
One advantage of the discriminative approach is that there will typically be fewer
adaptive parameters to be determined.
It may also lead to improved predictive performance, particularly when the
class-conditional density assumptions give a poor approximation to the true distributions.

13
Logistic Regression

Classification
Email: Spam/Not Spam?
Online Transaction: Fraudulent (Yes/No)?
Tumor: Malignant /Benign

Two-Class Classification problem: y œ 0, 1

0: Negative Class (e.g. Benign Tumor) - absence of something 1: Positive Class
(e.g. Malignant Tumor)-presence of something
Multiclass Classification problem:
y œ 0, 1, 2, 3

14
Logistic Regression

The posterior probability for class C1 can be written as

p(x |C1 )p(C1 )

p(C1 |x ) =
p(x |C1 )p(C1 ) + p(x |C2 )p(C2 )

1
p(C1 |x ) = = g (z )
1 + exp (≠z )
with

p(x |C1 )p(C1 )

z = ln
p(x |C2 )p(C2 )
and

15
Logistic Sigmoid Function

1
g (z ) =
1 + e ≠z
g (z ) - Sigmoid function or Logistic function. The term sigmoid means S-shaped.

16
Logit function

The inverse of the logistic sigmoid is given by

g
z = ln
1≠g
and is known as the logit function.
It represents the log of the ratio of probabilities for the two classes, also known as the log
odds.

17
Logistic regression

Linear threshold classifier output h◊ (x ) = ◊T x at 0.5:

If h◊ (x ) Ø 0.5, predict y = 1
If h◊ (x ) < 0.5, predict y = 0
The use of any regression for the classification problem is not a very good idea:

1 ◊ can be influenced by outliers

2 h◊ (x ) can be > 1 or < 0

Logistic Regression:
want 0 Ø h◊ (x ) Ø 1
Logistic Regression is a classification problem and not a regression problem.
Here is a solution:
h◊ ( x ) = g (◊T x )

18
Hypothesis Representation

Task: fix the parameter ◊ to the data

h◊ ( x )= 1+e1≠◊T x = estimated probability that y = 1 on input x :
h◊ (x ) = P (y = 1|x ; ◊)
Suppose:

predict y = 1 if h◊ (x ) Ø 0.5
predict y = 0 if h◊ (x ) < 0.5

h◊ (x ) = g (◊T x ) Ø 0.5 when ◊T x Ø 0

h◊ (x ) = g (◊T x ) < 0.5 when ◊T x < 0

19
Decision Boundary

predict y = 1 when ◊T x Ø 0
predict y = 0 when ◊T x < 0
E.g. h◊ (x )= g (◊0 + ◊1 x1 + ◊2 x2 )
Then ◊0 + ◊1 x1 + ◊2 x2 = 0 is a decision boundary.

20
Non-linear decision boundary

= g (◊0 + ◊1 x1 + ◊2 x2 + ◊3 x12 + ◊4 x22 )

h◊ ( x )
Then ◊0 + ◊1 x1 + ◊2 x2 + ◊3 x12 + ◊4 x22 = 0 is a decision boundary. E.g. if
◊ = (≠1, 0, 0, 1, 1)T , then decision boundary will be x12 + x22 = 1.
More complicated cases:
h◊ ( x ) = g (◊0 + ◊1 x1 + ◊2 x2 + ◊3 x12 + ◊4 x12 x2 + ◊5 x12 x22 + ◊6 x13 x2 + · · · )

21
Logistic regression cost function

Cost (h◊ (x ), y ) = ≠ log (h◊ (x )) if y = 1

Cost (h◊ (x ), y ) = ≠ log (1 ≠ h◊ (x )) if y = 0
Note: y = 0 or 1 always. The combination of two cost functions:

Cost (h◊ (x ), y ) = ≠y log (h◊ (x )) ≠ (1 ≠ y ) log (1 ≠ h◊ (x ))

m
ÿ
1 (i ) (i )
J (◊) =≠ [ y log h◊ (x ) + (1 ≠ y (i ) ) log (1 ≠ h◊ (x i ))]
m i =1

22
Cost function and gradient

The gradient of the cost function is a vector ◊ where the j element is defined as follow:

m
ˆ J (◊) 1 ÿ (i )
= (h◊ (x (i ) ) ≠ y (i ) )xj
ˆ◊j m i =1

For the calculation of ◊ we use the gradient method:

scipy.optimize.fmin_bfgs
It will find the best parameters ◊ for the logistic regression cost function given a fixed
dataset (of x and x values). The parameters of scipy.optimize.fmin_bfgs are:

The initial values of the parameters you are trying to optimize;

A function that, when given the training set and a particular ◊ , computes the
logistic regression cost and gradient with respect to ◊ for the dataset (x , y ).

23
Visualizing the Data

The first two components for the first two groups of iris datas with different markers.

24
Results of logistic regression for iris data
The first two components for the first two groups of iris datas with different markers with
the decision Boundary
≠5.672 + 7.726x1 ≠ 11.645x2 = 0 Accuracy: 99.00%

25
Multiclass classification

Email foldering/tagging:
Work (y = 1), Friends (y = 2), Family (y = 3), Hobby (y = 4)
Medical diagrams:
Not ill (y = 1) , Cold (y = 2) , flu (y = 3)
Weather:
Sunny (y = 1) , Cloudy (y = 2), Rain (y = 3), Snow (y = 4)

26
One-vs-all

(1)
Class 1: h◊ (x )
(2)
Class 2: h◊ (x )
(3)
Class 3: h◊ (x )
(i )
h◊ (x ) = P (y = i |x ; ◊) (1, 2, 3)
(i )
h◊ (x ) - estimated probability that y = i on input x
(i )
1 ≠ h◊ (x ) - estimated probability that y ”= i on input x i ( one-vs-rest)

27
One-vs-all

Train a logistic regression classifier ◊ (i ) (x ) for each class i to predict the probability that
y = i.
On a new input x, to make a prediction, pick the class i that maximizes

(i )
max ◊ (x )
i

28
Discriminant Function

The third strategy for classification is a discriminant function.

Non probabilistic method.
A discriminant is a function that takes an input vector x and assigns it to one of m
classes, denoted Ci .
Linear discriminants have hyperplanes as the decision surfaces.
If m = 2 a linear discriminant function:

y (x ) = w T x + w0
w is a weight vector and w0 is a bias.
The negative of the bias is sometimes called a threshold.
Decision rule:
x is assigned to class C1 if y (x ) > 0 and to class C2 otherwise.
The corresponding decision boundary is defined by y (x ) = 0, which corresponds to a
(D ≠ 1)-dimensional hyperplane within the D-dimensional input space.

29
Linear Discriminant Analysis (LDA)

c classes of data with means µ1 , µ2 , · · · , µc and mean of the entire dataset µ

Covariance: ÿ
C = (xj ≠ µ)(xj ≠ µ)T
j
q q
Within-class scatter : SW = classes c j œc pc (xj ≠ µc )(xj ≠ µc )T with pc the
probability of the class (that is, the number of datapoints there are in that class divided
by the total number)
q q
Between-classes scatter: SB = classes c j œc (µc ≠ µ)(µc ≠ µ)T

C = SW + SB

30
Fisher’s linear discriminant

31
Linear Discriminant Analysis (LDA)

The datasets are easy to separate into different classes (i.e. the classes are
discriminable) if SB /SW is large.
The projection of the data:
z = wT · x
w T SW w
We want to make the ration of within-class and between-class scatter w T SB w
maximal.
≠1
w are the generalised eigenvectors of SW SB

32
Linear Discriminant Analysis (LDA)
Plot of the first two dimention of the iris data showing the three classes before and after
LDA has been applied. Only one dimention (y ) is required for the separation of the
classes after LDA has been applied.

33
Multiple classes

Extension of linear discriminants to m > 2 classes.

1 one-versus-the-rest classifier :
the use of m ≠ 1 classifiers each of which solves a two-class problem of
separating points in a particular class Ci from points not in that class.
2 one-versus-one classifier
the use of m(m ≠ 1)/2 binary discriminant functions, one for every possible pair
of classes. Each point is then classified according to a majority vote amongst the
discriminant functions.
3 single m-class discriminant
the use of m linear functions of the form

yi (x ) = wiT x + wi0

and then assigning a point x to class Ci if yi (x ) > yj (x ) for all j ”= i.

Stanford University CS 229, Autumn 2014 Midterm Examination
No ratings yet
Stanford University CS 229, Autumn 2014 Midterm Examination
23 pages
The Ultimate Probability Cheatsheet
No ratings yet
The Ultimate Probability Cheatsheet
8 pages
ASSIGNMENT Machine Learning
100% (5)
ASSIGNMENT Machine Learning
63 pages
A Survey of Audio-Based Music Classification and Annotation
No ratings yet
A Survey of Audio-Based Music Classification and Annotation
17 pages
Binomial Distribution Powerpoint 1
100% (2)
Binomial Distribution Powerpoint 1
17 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
CS 229 Autumn 2016 Problem Set#3:Theory & Unsupervised Learning
No ratings yet
CS 229 Autumn 2016 Problem Set#3:Theory & Unsupervised Learning
5 pages
Introduction
No ratings yet
Introduction
35 pages
Lecture 2 Probabilistic Robotics
No ratings yet
Lecture 2 Probabilistic Robotics
35 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
RD05 Statistics
No ratings yet
RD05 Statistics
7 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Multivariate Probability: 1 Discrete Joint Distributions
No ratings yet
Multivariate Probability: 1 Discrete Joint Distributions
10 pages
Lecture 2 --Recursive State Estimation-p1
No ratings yet
Lecture 2 --Recursive State Estimation-p1
39 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Probability Cheatsheet 140718
100% (1)
Probability Cheatsheet 140718
7 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
31 pages
411 Note LDV
No ratings yet
411 Note LDV
12 pages
Lec5 Class
No ratings yet
Lec5 Class
14 pages
Bayesian Classification- problem (1)
No ratings yet
Bayesian Classification- problem (1)
4 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
4.1 Bayes Decision Theory
No ratings yet
4.1 Bayes Decision Theory
23 pages
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
No ratings yet
Naïve Bayes Classifier: Adopted From Slides by Ke Chen From University of Manchester and Yangqiu Song From Msra
25 pages
S1) Basic Probability Review
No ratings yet
S1) Basic Probability Review
71 pages
CS3491-AI ML-Chapter 3
No ratings yet
CS3491-AI ML-Chapter 3
23 pages
Class19 Approxinf
No ratings yet
Class19 Approxinf
45 pages
Communications_Faculty_of_Sciences_University_of_Ankara_Series_A1_Mathematics_and_Statistics_Template (2)
No ratings yet
Communications_Faculty_of_Sciences_University_of_Ankara_Series_A1_Mathematics_and_Statistics_Template (2)
2 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Probability Cheatsheet
100% (1)
Probability Cheatsheet
8 pages
Total probability
No ratings yet
Total probability
25 pages
Output PDF
No ratings yet
Output PDF
2 pages
Naive Bayes
No ratings yet
Naive Bayes
2 pages
BayesClassifier Updated
No ratings yet
BayesClassifier Updated
82 pages
MLESA v2024 Week10 Assignment Solution
No ratings yet
MLESA v2024 Week10 Assignment Solution
7 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
19 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Christian Notes For Exam P
100% (1)
Christian Notes For Exam P
9 pages
CS 563 Advanced Topics in Computer Graphics Monte Carlo Integration: Basic Concepts
No ratings yet
CS 563 Advanced Topics in Computer Graphics Monte Carlo Integration: Basic Concepts
38 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
Chapter2 Probability
No ratings yet
Chapter2 Probability
45 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
LecturesQM 4
No ratings yet
LecturesQM 4
23 pages
Lecture1 2023
No ratings yet
Lecture1 2023
44 pages
2a Probability
No ratings yet
2a Probability
25 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Classification - Naive Bayes
No ratings yet
Classification - Naive Bayes
17 pages
Operations_Research_Lesson_3-1
No ratings yet
Operations_Research_Lesson_3-1
42 pages
Cs Ai Lecture Notes 02
No ratings yet
Cs Ai Lecture Notes 02
103 pages
Capacity Convex Opt
No ratings yet
Capacity Convex Opt
15 pages
Distributions Zoom
No ratings yet
Distributions Zoom
82 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Driver Drowsiness Classification Using Fuzzy Wavelet-Packet-Based Feature-Extraction Algorithm
No ratings yet
Driver Drowsiness Classification Using Fuzzy Wavelet-Packet-Based Feature-Extraction Algorithm
11 pages
Models Credit Policy,, PDF
No ratings yet
Models Credit Policy,, PDF
16 pages
B.Tech in Computer Science and Business System CSBS Second Year
No ratings yet
B.Tech in Computer Science and Business System CSBS Second Year
49 pages
MMPC15
No ratings yet
MMPC15
9 pages
Applied Multivariate Statistics for the Social Sciences Analyses with SAS and IBM s SPSS 6th Edition Keenan A. Pituch pdf download
100% (1)
Applied Multivariate Statistics for the Social Sciences Analyses with SAS and IBM s SPSS 6th Edition Keenan A. Pituch pdf download
46 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
Tuning Parameters
No ratings yet
Tuning Parameters
15 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
Elliot M. Cramer R. Darrell Bock
No ratings yet
Elliot M. Cramer R. Darrell Bock
24 pages
Full download Multivariate Analysis for the Biobehavioral and Social Sciences A Graphical Approach 1st Edition Bruce L. Brown pdf docx
100% (9)
Full download Multivariate Analysis for the Biobehavioral and Social Sciences A Graphical Approach 1st Edition Bruce L. Brown pdf docx
67 pages
World Class Training Solutions
No ratings yet
World Class Training Solutions
49 pages
Pertemuan1a-Introduction To Multivariate Analysis
No ratings yet
Pertemuan1a-Introduction To Multivariate Analysis
23 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
ML ProjectReport-Sonali Joshi
100% (2)
ML ProjectReport-Sonali Joshi
38 pages
A Taxonomy of Manufacturing Strategies
No ratings yet
A Taxonomy of Manufacturing Strategies
21 pages
Chap12 DiscriminantAnalysis
No ratings yet
Chap12 DiscriminantAnalysis
30 pages
Dan Dillon Homework Problem #7 - Part 2 STAT 582, Statistical Consulting and Collaboration Dr. Jennings
No ratings yet
Dan Dillon Homework Problem #7 - Part 2 STAT 582, Statistical Consulting and Collaboration Dr. Jennings
12 pages
A Machine Learning-Based Intrusion Detection
100% (1)
A Machine Learning-Based Intrusion Detection
15 pages
Geographical Origin For UV-Vis IR
No ratings yet
Geographical Origin For UV-Vis IR
7 pages
Statistical PG Syllabus
No ratings yet
Statistical PG Syllabus
62 pages
Some Exercises Using Minitab
No ratings yet
Some Exercises Using Minitab
20 pages
Data Screening Checklist
No ratings yet
Data Screening Checklist
57 pages
A New Classification Approach For Prediction of Flyrock Throw in Surface Mines
No ratings yet
A New Classification Approach For Prediction of Flyrock Throw in Surface Mines
11 pages
Journal Rohit Bansal 2020 The Analysis of Altman and Ohlson Model in Predicting Distress of Indian Companies A Review
No ratings yet
Journal Rohit Bansal 2020 The Analysis of Altman and Ohlson Model in Predicting Distress of Indian Companies A Review
10 pages
Machine Learning in Non Stationary Environments Ab00 PDF
No ratings yet
Machine Learning in Non Stationary Environments Ab00 PDF
263 pages
Financial Ratios and The State of Health of Nigerian Banks
No ratings yet
Financial Ratios and The State of Health of Nigerian Banks
15 pages
ITEC 621 Predictive Analytics
No ratings yet
ITEC 621 Predictive Analytics
9 pages
Multivariate Analysis
No ratings yet
Multivariate Analysis
15 pages

04 Probability and Learning PDF

Uploaded by

04 Probability and Learning PDF

Uploaded by

Probability and Learning

28. März 2014

The Naive Bayes Classifier

Three strategies for classification

Approaches that explicitly or implicitly model the distribution of inputs as well as

p(x |Ci )p(Ci )

to find the posterior class probabilities p(Ci |x ) The denominator

There are m = 4 different classes Ci and n = 10 different examples Xj .

C1 = Pub, C2 = TV, C3 = Party, C4 = Study

For Deadline we have 3 states:

D1 = Urgent, D2 = Near, D3 = None

For Party we have 2 states:

P (Pub) = 0.1, P (TV) = 0.1, P (Party) = 0.5, P (Study) = 0.3

The same procedure we use for the second feature Party

feed the values of the features

From conditional independence P (BC |A) = P (B |A)P (C |A) follows

Two-Class Classification problem: y œ 0, 1

The posterior probability for class C1 can be written as

p(x |C1 )p(C1 )

p(x |C1 )p(C1 )

The inverse of the logistic sigmoid is given by

Linear threshold classifier output h◊ (x ) = ◊T x at 0.5:

1 ◊ can be influenced by outliers

Task: fix the parameter ◊ to the data

h◊ (x ) = g (◊T x ) Ø 0.5 when ◊T x Ø 0

= g (◊0 + ◊1 x1 + ◊2 x2 + ◊3 x12 + ◊4 x22 )

Cost (h◊ (x ), y ) = ≠ log (h◊ (x )) if y = 1

Cost (h◊ (x ), y ) = ≠y log (h◊ (x )) ≠ (1 ≠ y ) log (1 ≠ h◊ (x ))

For the calculation of ◊ we use the gradient method:

The initial values of the parameters you are trying to optimize;

The third strategy for classification is a discriminant function.

c classes of data with means µ1 , µ2 , · · · , µc and mean of the entire dataset µ

Extension of linear discriminants to m > 2 classes.

and then assigning a point x to class Ci if yi (x ) > yj (x ) for all j ”= i.

You might also like