0% found this document useful (0 votes)

7 views

Slide 2

Machine learning slide 2

Uploaded by

Muhammad Ibrahim Isah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Slide 2

Machine learning slide 2

Uploaded by

Muhammad Ibrahim Isah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

LOGISTIC

REGRESSION
In statistics, the logistic model is used to model the probability of a certain class or event existing
such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes
of events such as determining whether an image contains a cat, dog, lion, etc

Based on the number of categories, Logistic regression can be classified as:

binomial: Target variable can have only 2 possible types: “0” or “1” which may represent “win” vs
“loss”, “pass” vs “fail”, “dead” vs “alive”, etc.

multinomial: Target variable can have 3 or more possible types which are not ordered(i.e. types
have no quantitative significance) like “disease A” vs “disease B” vs “disease C”.

ordinal: It deals with target variables with ordered categories. For example, a
test score can be categorized as:“very poor”, “poor”, “good”, “very good”. Here,
each category can be given a score like 0, 1, 2, 3.
•Start with binary class problems
How do we develop a classification algorithm?
• Tumour size vs malignancy (0 or 1)
• We could use linear regression
• Then threshold the classifier output (i.e. anything over some value is yes, else
no)
• In our example below linear regression with thresholding seems to work
•We can see above this does a reasonable job of stratifying the data points into one of
two classes
• But what if we had a single Yes with a very small tumour
• This would lead to classifying all the existing yeses as nos
•Another issues with linear regression
• We know Y is 0 or 1
• Hypothesis can give values large than 1 or less than 0
•So, logistic regression generates a value where is always either 0 or 1
• Logistic regression is a classification algorithm - don't be confused
Hypothesis representation
•What function is used to represent our hypothesis in classification
•We want our classifier to output values between 0 and 1
• When using linear regression we did hθ(x) = (θT x)
• For classification hypothesis representation we do hθ(x) = g((θT x))
• Where we define g(z)
• z is a real number
• This is the sigmoid function, or the logistic function
• If we combine these equations we can write out the hypothesis as
•How does the sigmoid function look like

•Crosses 0.5 at the origin, then flattens out]

• Asymptotes at 0 and 1
•Interpreting hypothesis output

When our hypothesis (hθ(x)) outputs a number, we treat that value as the estimated probability that
y=1 on input x
• Example
• If X is a feature vector with x0 = 1 (as always) and x1 = tumourSize
• hθ(x) = 0.7
• Tells a patient they have a 70% chance of a tumor being
malignant
hθ(x) = P(y=1|x ; θ)
• What does this mean?
• Probability that y=1, given x, parameterized by θ
•Since this is a binary classification task we know y = 0 or 1
• So the following must be true
• P(y=1|x ; θ) + P(y=0|x ; θ) = 1
• P(y=0|x ; θ) = 1 - P(y=1|x ; θ)
Decision boundary
•This gives a better sense of what the hypothesis function is computing
• One way of using the sigmoid function is;
• When the probability of y being 1 is greater than 0.5 then we can predict y =
1
• Else we predict y = 0
• When is it exactly that hθ(x) is greater than 0.5?
• Look at sigmoid function
• g(z) is greater than or equal to 0.5 when z is greater than or equal to 0
• So if z is positive, g(z) is greater than 0.5
• z = (θT x)
• So when
• θT x >= 0
• Then hθ >= 0.5

•So what we've shown is that the hypothesis predicts y = 1 when θ T x >= 0
• The corollary of that when θT x <= 0 then the hypothesis predicts y = 0
• Let's use this to better understand how the hypothesis makes its predictions
Decision boundary
•This gives a better sense of what the hypothesis function is computing
• One way of using the sigmoid function is;
• When the probability of y being 1 is greater than 0.5 then we can predict y =
1
• Else we predict y = 0
• When is it exactly that hθ(x) is greater than 0.5?
• Look at sigmoid function
• g(z) is greater than or equal to 0.5 when z is greater than or equal to 0
• So if z is positive, g(z) is greater than 0.5
• z = (θT x)
• So when
• θT x >= 0
• Then hθ >= 0.5

•So, for example

• θ0 = -3
• θ1 = 1
• θ2 = 1
•So our parameter vector is a column vector with the above values
• So, θT is a row vector = [-3,1,1]
•What does this mean?
• The z here becomes θT x
• We predict "y = 1" if
• -3x0 + 1x1 + 1x2 >= 0
• -3 + x1 + x2 >= 0
•We can also re-write this as
• If (x1 + x2 >= 3) then we predict y = 1
• If we plot
• x1 + x2 = 3 we graphically plot our decision boundary
hθ(x) = g(θ0 + θ1x1+ θ3x12 + θ4x22)

•Say θT was [-1,0,0,1,1] then we say;

•Predict that "y = 1" if
• -1 + x12 + x22 >= 0
or
• x12 + x22 >= 1
•If we plot x12 + x22 = 1
Cost function for logistic regression

Linear regression uses the following function to determine θ

•If we use this function for logistic regression this is a non-convex function for
parameter optimization Could work !!!
•What do we mean by non convex?
• We have some function - J(θ) - for determining the parameters
• Our hypothesis function has a non-linearity (sigmoid function of h θ(x) )
• This is a complicated non-linear function
• If you take hθ(x) and plug it into the Cost() function, and them plug the
Cost() function into J(θ) and plot J(θ) we find many local optimum -> non
convex function
• Why is this a problem
• Lots of local minima mean gradient descent may not find
the global optimum - may get stuck in a global minimum
• We would like a convex function so if you run gradient descent you
converge to a global minimum
A convex logistic regression cost function

•To get around this we need a different, convex Cost() function which means we can apply
gradient descent

The above two functions can be compressed into a single function i.e.
Gradient Descent

Now the question arises, how do we reduce the cost value. Well, this can be done by
using Gradient Descent. The main goal of Gradient descent is to minimize the cost
value. i.e. min J(θ).
Now to minimize our cost function we need to run the gradient descent function on
each parameter i.e.
Gradient descent has an analogy in which we have to imagine ourselves at the top
of a mountain valley and left stranded and blindfolded, our objective is to reach the
bottom of the hill. Feeling the slope of the terrain around you is what everyone
would do. Well, this action is analogous to calculating the gradient descent, and
taking a step is analogous to one iteration of the update to the parameters.
Multiclass classification problems
•Getting logistic regression for multiclass classification using one vs. all
•Multiclass - more than yes or no (1 or 0)
• Classification with multiple classes for assignment
•Given a dataset with three classes, how do we get a learning algorithm to work?
• Use one vs. all classification make binary classification work for multiclass classification
•One vs. all classification
• Split the training set into three separate binary classification problems
• i.e. create a new fake training set
• Triangle (1) vs crosses and squares (0) hθ1(x)
• P(y=1 | x1; θ)
• Crosses (1) vs triangle and square (0) hθ2(x)
• P(y=1 | x2; θ)
• Square (1) vs crosses and square (0) hθ3(x)
• P(y=1 | x3; θ)

•Train a logistic regression classifier hθ(i)(x) for each class i to

predict the probability that y = i
•On a new input, x to make a prediction, pick the class i that
maximizes the probability that hθ(i)(x) = 1
K-Nearest Neighbors
This algorithm classifies cases based on their similarity to other cases.

In K-Nearest Neighbors, data points that are near each other are said to be
neighbors.

K-Nearest Neighbors is based on this paradigm.

Similar cases with the same class labels are near each other.
Thus, the distance between two cases is a measure of their dissimilarity.
There are different ways to calculate the similarity or conversely,
the distance or dissimilarity of two data points.
For example, this can be done using Euclidean distance.
the K-Nearest Neighbors algorithm works as follows.

- pick a value for K.

- calculate the distance from the new case hold out from each of the cases in the
dataset.

- search for the K-observations in the training data that are nearest to the
measurements of the unknown data point.

- predict the response of the unknown data point using the most popular response
value from the K-Nearest Neighbors.

There are two parts in this algorithm that might be a bit confusing.

- First, how to select the correct K

- second, how to compute the similarity between cases,

Let's first start with the second concern.

How to select the correct
AsKmentioned, K and K-Nearest
Neighbors is the number of nearest
neighbors to examine.

It is supposed to be specified by the

user.
So, how do we choose the right K?

Assume that we want to find the class of

the customer noted as question mark on
the chart.

What happens if we choose a very low

value of K?
Let's say, K equals one.
The first nearest point would be blue,
which is class one.
This would be a bad prediction,
since more of the points around it are
magenta or class four.
In fact, since its nearest neighbor is blue we can say that we capture the noise in the data or we
chose one of the points that was an anomaly in the data.

A low value of K causes a highly complex model as well, which might result in overfitting of the
model.

It means the prediction process is not generalized enough to be used for out-of-sample cases.

Out-of-sample data is data that is outside of the data set used to train the model.

In other words, it cannot be trusted to be used for prediction of unknown samples. It's important to
remember that overfitting is bad, as we want a general model that works for any data, not just the
data used for training.

Now, on the opposite side of the spectrum, if we choose a very high value of K such as K equals
20,
then the model becomes overly generalized.
So, how can we find the best value for K?

The general solution is to reserve a part of your data for testing the accuracy of the model.
Once you've done so, choose K equals one and then use the training part for modeling and
calculate the accuracy of prediction using all samples in your test set.

Repeat this process increasing the K and see which K is best for your model.
For example, in our case,

K equals four will give us the best accuracy.

Advantages of KNN

1. No Training Period: KNN is called Lazy Learner (Instance based learning). It does not learn
anything in the training period. It does not derive any discriminative function from the training
data. In other words, there is no training period for it. It stores the training dataset and learns
from it only at the time of making real time predictions. This makes the KNN algorithm much
faster than other algorithms that require training e.g. SVM, Linear Regression etc.

2. Since the KNN algorithm requires no training before making predictions, new data can be
added seamlessly which will not impact the accuracy of the algorithm.

3. KNN is very easy to implement. There are only two parameters required to implement KNN i.e.
the value of K and the distance function (e.g. Euclidean or Manhattan etc.)
Disadvantages of KNN

1. Does not work well with large dataset: In large datasets, the cost of calculating the
distance between the new point and each existing points is huge which degrades the
performance of the algorithm.

2. Does not work well with high dimensions: The KNN algorithm doesn't work well
with high dimensional data because with large number of dimensions, it becomes difficult
for the algorithm to calculate the distance in each dimension.

3.Sensitive to noisy data, missing values and outliers: KNN is sensitive to noise in
the dataset. We need to manually impute missing values and remove outliers.
SUPPORT VECTOR MACHINE(SVM)
A Support Vector Machine is a supervised algorithm that can classify cases by finding
a separator.

SVM works by first mapping data to a high dimensional feature space so that
data points can be categorized, even when the data are not linearly separable.

Then, a separator is estimated for the data. The data should be transformed in such a
way that a separator could be drawn as a hyperplane.
Therefore, the SVM algorithm outputs an optimal hyperplane that categorizes new examples.
DATA TRANFORMATION
For the sake of simplicity, imagine that our dataset is one-dimensional data.
This means we have only one feature x.
As you can see, it is not linearly separable.
Well, we can transfer it into a two-dimensional space. For example, you can increase the dimension of
data by mapping x into a new space using a function with outputs x and x squared.

Basically, mapping data into a higher-dimensional space is called, kernelling.

The mathematical function used for the transformation is known as the kernel
function, and can be of different types,such as linear, polynomial, Radial Basis Function, or RBF,
and sigmoid.
SVMs are based on the idea of finding
a hyperplane that best divides a data
set into two classes as shown here.
As we're in a two-dimensional space,
you can think of the hyperplane as a
line that linearly separates the blue
points from the red points.

ADVANTAGES
- Accurate in high dimension place
- Memory efficient

DISADVANTAGES
- Small datasets
- Prone to overfitting

APPLICATIONS
- Image Recognition
- Spam detection

Elastix Manual v4.4
100% (1)
Elastix Manual v4.4
50 pages
Machine Learning
100% (2)
Machine Learning
76 pages
06 Logistic Regression PDF
No ratings yet
06 Logistic Regression PDF
10 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Week 3 Lecture Notes
No ratings yet
Week 3 Lecture Notes
7 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Lecture 3. Classification
No ratings yet
Lecture 3. Classification
60 pages
06LogisticRegression
No ratings yet
06LogisticRegression
55 pages
Classification
No ratings yet
Classification
31 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Logistic Regression Training DR Anil
No ratings yet
Logistic Regression Training DR Anil
38 pages
Text Classification Using Logistics Regression
No ratings yet
Text Classification Using Logistics Regression
64 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Classification
No ratings yet
Classification
74 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
lec20
No ratings yet
lec20
16 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
04- Linear-Classification-2024
No ratings yet
04- Linear-Classification-2024
65 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
43 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
23 pages
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
No ratings yet
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
153 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Multimedia Application L9
No ratings yet
Multimedia Application L9
43 pages
3-LG_Eval
No ratings yet
3-LG_Eval
52 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
4.Logistic Regression
No ratings yet
4.Logistic Regression
16 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Classification and Regression
No ratings yet
Classification and Regression
34 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Decision Boundry
No ratings yet
Decision Boundry
2 pages
Sample Research Paper
No ratings yet
Sample Research Paper
26 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Chapter02-Introduction-to-DeepLearning
No ratings yet
Chapter02-Introduction-to-DeepLearning
84 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
What Is Machine Learning by Coursera
No ratings yet
What Is Machine Learning by Coursera
47 pages
CH3 Logistic Regression 2020
No ratings yet
CH3 Logistic Regression 2020
28 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Definition_and_Scope_of_Statistics_PPT
No ratings yet
Definition_and_Scope_of_Statistics_PPT
20 pages
Presentation
No ratings yet
Presentation
2 pages
Slide 1
No ratings yet
Slide 1
29 pages
OR - LPP For Class
No ratings yet
OR - LPP For Class
8 pages
unit 4 nndl
No ratings yet
unit 4 nndl
37 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
Implementation and Study of K-Nearest Ne
No ratings yet
Implementation and Study of K-Nearest Ne
62 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
HW 4
No ratings yet
HW 4
7 pages
A Hybrid Neural Network and ARIMA Model For Water Quality Time
No ratings yet
A Hybrid Neural Network and ARIMA Model For Water Quality Time
9 pages
Hearn Hendricks Chin Gray Moore - 2016 - Optimization of Turbine Engine Cycle Analysis With Analytic Derivatives
No ratings yet
Hearn Hendricks Chin Gray Moore - 2016 - Optimization of Turbine Engine Cycle Analysis With Analytic Derivatives
11 pages
Scaling ChatGPT Five Real-World Engineering Challenges
No ratings yet
Scaling ChatGPT Five Real-World Engineering Challenges
24 pages
(Ebook) Engineering Optimization: An Introduction with Metaheuristic Applications by Xin-She Yang ISBN 9780470582466, 0470582464 instant download
100% (1)
(Ebook) Engineering Optimization: An Introduction with Metaheuristic Applications by Xin-She Yang ISBN 9780470582466, 0470582464 instant download
53 pages
All You Need To Know About Batch Size, Epochs and Training Steps in A Neural Network - by Rukshan Pramoditha - Data Science 365 - Medium
No ratings yet
All You Need To Know About Batch Size, Epochs and Training Steps in A Neural Network - by Rukshan Pramoditha - Data Science 365 - Medium
19 pages
4.2 Gradient-Based Optimization
No ratings yet
4.2 Gradient-Based Optimization
35 pages
VMIFGSM
No ratings yet
VMIFGSM
18 pages
BigDFT Manual 1.4
No ratings yet
BigDFT Manual 1.4
35 pages
Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science
No ratings yet
Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science
20 pages
Nestrov Gradient Descent
No ratings yet
Nestrov Gradient Descent
8 pages
Krishnaveni 2021
No ratings yet
Krishnaveni 2021
8 pages
2410.19706 (1)
No ratings yet
2410.19706 (1)
15 pages
Golay Codes
No ratings yet
Golay Codes
50 pages
Unit 5
No ratings yet
Unit 5
17 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
2marks ML
No ratings yet
2marks ML
3 pages
Predictive Analytics Notes
No ratings yet
Predictive Analytics Notes
42 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
DPL302m (FPTU_AI) Flashcards _ Quizlet
No ratings yet
DPL302m (FPTU_AI) Flashcards _ Quizlet
11 pages
ETHZ Lecture8
No ratings yet
ETHZ Lecture8
50 pages
Elastix Manual v4.7
No ratings yet
Elastix Manual v4.7
62 pages
09. Stochastic Gradient Descent 2
No ratings yet
09. Stochastic Gradient Descent 2
42 pages

Slide 2

Uploaded by

Slide 2

Uploaded by

LOGISTIC

Based on the number of categories, Logistic regression can be classified as:

•Crosses 0.5 at the origin, then flattens out]

•So, for example

•Say θT was [-1,0,0,1,1] then we say;

Linear regression uses the following function to determine θ

•Train a logistic regression classifier hθ(i)(x) for each class i to

K-Nearest Neighbors is based on this paradigm.

- pick a value for K.

- First, how to select the correct K

- second, how to compute the similarity between cases,

Let's first start with the second concern.

It is supposed to be specified by the

Assume that we want to find the class of

What happens if we choose a very low

K equals four will give us the best accuracy.

Basically, mapping data into a higher-dimensional space is called, kernelling.

You might also like