0% found this document useful (0 votes)
12 views

Machine Learning: Practice 2

This document discusses the perceptron model for classification. It provides details on the perceptron algorithm, including the net input and unit step functions, training process using weight updates, and implementing perceptrons from scratch and using scikit-learn. It also demonstrates fitting a perceptron model to the iris dataset to classify flowers using sepal length and width as features.

Uploaded by

Cao Hùng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Machine Learning: Practice 2

This document discusses the perceptron model for classification. It provides details on the perceptron algorithm, including the net input and unit step functions, training process using weight updates, and implementing perceptrons from scratch and using scikit-learn. It also demonstrates fitting a perceptron model to the iris dataset to classify flowers using sepal length and width as features.

Uploaded by

Cao Hùng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Machine Learning

Practice 2

Quan Minh Phan & Ngoc Hoang Luong

University of Information Technology (UIT)

April 7, 2021

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 1 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 2 / 74
Figure: The general concept of the perceptron

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 3 / 74
Recall

Problem
Using the perceptron model to classify the species of flowers (’setosa’ or
’versicolor’) based on the sepal and petal width.

2 features: the sepal width x1 ; the petal width x2


x = [x1 , x2 ]
2 labels: ’setosa’ ← -1; ’versicolor’ ← 1
w = [w1 , w2 ]
Net input function (z)
z = w1 ∗ x1 + w2 ∗ x2
Unit step function (φ(·))
(
1 if z ≥ θ,
φ(z) =
−1 otherwise.

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 4 / 74
Recall (next)

w = [w0 , w1 , w2 ]
Net input function (z)
z = w0 + w1 ∗ x1 + w2 ∗ x2 = w T x
Unit step function (φ(·))
(
1 if z ≥ 0,
φ(z) =
−1 otherwise.

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 5 / 74
Unit Step Function

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 6 / 74
Figure: The general concept of the perceptron

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 7 / 74
Training process

Algorithm 1 Pseudocode for the training process


1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: for x ∈ X do
4: Compute the output value, ŷ
5: Updates the weights
6: end for
7: end while

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 8 / 74
Updating the weights

w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label

Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 9 / 74
Components

Hyperparameters
eta → the learning rate
max iter → the maximum number of epochs
random state → to make the reproducible results

Parameters
w → the weights of model
errors → to store the error in each epoch

Methods
fit(X , y ) → to train the model
predict(X ) → to predict the output value
net input(X ) → to combine the features with the weights

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 10 / 74
Implement (code from scratch)

class Peceptron :
def init (self, eta = 0.01, max iter = 20, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.errors = [ ]

def net input(self, X):


return np.dot(X, self.w[1:]) + self.w[0]

def predict(self, X):


return np.where(self.net input(X) ≥ 0.0, 1, -1)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 11 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.errors = [ ]
for n iter in range (self.max iter):
n wronglabels = 0
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self. eta * error
n wronglabels += int(error != 0.0)
self.errors.append(n wronglabels)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 12 / 74
Implement (library)

from sklearn.linear model import Perceptron

Hyperparameters
eta
max iter
random state

Parameters
coef
intercept

Methods
fit(X , y )
predict(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 13 / 74
Practice

Using ’iris.csv’ dataset


How can we use the ’sepal length’ and ’sepal width’ to classify the
speices of flower?

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 14 / 74
Data visualization

>> import matplotlib.pyplot as plt

>> idx setosa = y train == -1


idx versicolor = y train != -1

>> plt.scatter(X train[idx setosa, 0], X train[idx setosa, 1], color=’red’,


marker=’s’, label=’setosa’)
plt.scatter(X train[idx versicolor, 0], X train[idx versicolor, 1],
color=’blue’, marker=’x’, label=’versicolor’)
plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 15 / 74
Data visualization

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 16 / 74
Practice

>> ppn = Perceptron (eta = 0.001, max iter = 30, random state = 1)
ppn.fit(X train, y train)

>> from sklearn.linear model import Perceptron


>> ppn = Perceptron(eta = 0.001, max iter = 30, random state = 1)
ppn .fit(X train, y train)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 17 / 74
Plotting the errors

>> plt.plot(range(1, len(ppn.errors) + 1), ppn.errors, marker=’o’)


plt.xlabel(’Epochs’)
plt.ylabel(’# Missifications’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 18 / 74
Plotting the errors

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 19 / 74
Practice

>> w ppn = ppn.w


w ppn
>> [-0.01575655 0.06508244 -0.11108172]

>> w ppn = np.append(ppn .intercept , ppn .coef )


w ppn
>> [-0.006 0.0196 -0.0325]

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 20 / 74
Visualization

>> plot decision regions(X train, y train, classifier=ppn)


plt.xlabel(’sepal length [cm]’)
plt.ylabel(’sepal width [cm]’)
plt.legend(loc=’upper left’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 21 / 74
Visualization

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 22 / 74
Practice (next)

Create a new model and train it on data after standardizing

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 23 / 74
Data visualization

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 24 / 74
Plotting the costs

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 25 / 74
Plotting the results

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 26 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 27 / 74
Figure: The general concept of Linear Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 28 / 74
Minimizing cost functions with gradient descent

Cost function:
1 X (i)
J(w ) = (y − φ(z (i) ))2
2
i

Update the weights:


w := w + ∆w
∆w = −η∇J(w )

∂J X (i)
=− (y (i) − φ(z (i) ))xj
∂wj
i

∂J X (i)
∆wj = −η =η (y (i) − φ(z (i) ))xj
∂wj
i

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 29 / 74
Minimizing cost functions with gradient descent

(
wj + η ∗ X T .dot((y − φ(z)) j ∈ [1, . . . , n]
wj =
wj + η ∗ sum(y − φ(z)) j =0

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 30 / 74
Pseudocode of Training process

Algorithm 2 Gradient Descent


1: Initialize the weights, w
2: while Stopping Criteria is not satisfied do
3: Compute the output value, ŷ
4: Updates the weights
5: end while

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 31 / 74
Components

Hyperparameters Parameters Methods


eta w fit(X , y )
max iter costs predict(X )
random state net input(X )

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 32 / 74
Implement (code from scratch)

class LinearRegression GD:


def init (self, eta = 0.001, max iter = 20, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]

def net input(self, X):


return np.dot(X, self.w[1:]) + self.w[0]

def predict(self, X):


return self.net input(X)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 33 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iters in range (self.max iter):
errors = y - self.predict(X)
self.w[1:] += self.eta * X.T.dot(error)
self.w[0] += self.eta * error.sum()
cost = (error**2).sum() / 2
self.costs.append(cost)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 34 / 74
Implement (library)

Stochastic Gradient Descent


from sklearn.linear model import SGDRegressor

Hyperparameters Parameters Methods


eta0 intercept fit(X, y)
max iter coef predict(X)
random state

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 35 / 74
Implement (library)

Normal Equation
from sklearn.linear model import LinearRegression

Parameters Methods
intercept fit(X, y)
coef predict(X)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 36 / 74
Differences

Gradient Descent
w := w + ∆w
∆w = η i (y (i) − φ(z (i) )x i
P

Stochastic Gradient Descent


w := w + ∆w
∆w = η(y (i) − φ(z (i) )x i

Normal Equation
w = (X T X )−1 X T y

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 37 / 74
Practice

Using ’housing.csv’ dataset


How can we use the ’average number of rooms’ (RM) to estimate the
’price’ of houses (MEDV)?

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 38 / 74
Plotting data

>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)


plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 39 / 74
Practice

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 40 / 74
Practice

Gradient Descent
>> reg GD = LinearRegression GD(eta=0.001, max iter=20,
random state=1)
reg GD.fit(X train, y train)

Stochastic Gradient Descent


>> reg SGD = SGDRegressor(eta0=0.001, max iter=20,
random state=1, l1 ratio=0, tol=None, learning rate=’constant’)
reg SGD.fit(X train, y train)

Normal Equation
>> reg NE = LinearRegression()
reg NE.fit(X train, y train)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 41 / 74
Plotting the cost

>> plt.plot(range(1, len(reg GD.costs) + 1), reg GD.costs)


plt.xlabel(’Epochs’)
plt.ylabel(’Cost’)
plt.title(’Gradient Descent’)
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 42 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 43 / 74
Practice

>> w GD = reg GD.w


w GD
>> [0.00767139 0.64623542]

>> w SGD = np.append(reg SGD.intercept , reg SGD.coef )


w SGD
>> [0.00783841 0.64551218]

>> w NE = np.append(reg NE.intercept , reg NE.coef )


w NE
>> [0.00773059 0.64638912]

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 44 / 74
Plotting the results

>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)


plt.plot(X train, reg GD.predict(X train), color=’red’, lw=10,
label=’Gradient Descent’)
plt.plot(X train, reg SGD.predict(X train), color=’blue’, lw=6,
label=’Stochastic Gradient Descent’)
plt.plot(X train, reg NE.predict(X train), color=’black’, lw=2,
label=’Normal Equation’)
plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.legend()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 45 / 74
Plotting the results

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 46 / 74
Practice

>> y pred 1 = reg GD.predict(X test)

>> y pred 2 = reg SGD.predict(X test)

>> y pred 3 = reg NE.predict(X test)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 47 / 74
Performance Evaluation

>> from sklearn.metrics import mean absolute error as MAE


from sklearn.metrics import mean squared error as MSE
from sklearn.metrics import r2 score as R2

Mean Absolute Error


>> print(’MAE of GD:’, round(MAE(y test, y pred 1), 6))
print(’MAE of SGD:’, round(MAE(y test, y pred 2), 6))
print(’MAE of NE:’, round(MAE(y test, y pred 3), 6))

Mean Squared Error


>> print(’MSE of GD:’, round(MSE(y test, y pred 1), 6))
print(’MSE of SGD:’, round(MSE(y test, y pred 2), 6))
print(’MSE of NE:’, round(MSE(y test, y pred 3), 6))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 48 / 74
R 2 score
>> print(’R2 of GD:’, round(R2(y test, y pred 1), 6))
print(’R2 of SGD:’, round(R2(y test, y pred 2), 6))
print(’R2 of NE:’, round(R2(y test, y pred 3), 6))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 49 / 74
Learning rate too large

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 50 / 74
Polynominal Regression

Example
X = [258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0, 480.0, 586.0]
y = [236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0, 391.2, 390.8]

>> X = np.array([258.0, 270.0, 294.0, 320.0, 342.0, 368.0, 396.0, 446.0,


480.0, 586.0])[:, np.newaxis]
y = np.array([236.4, 234.4, 252.8, 298.6, 314.2, 342.2, 360.8, 368.0,
391.2, 390.8])

>> plt.scatter(X, y, label=’Training points’)


plt.xlabel(’X’)
plt.ylabel(’y’)
plt.legend()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 51 / 74
Plotting data

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 52 / 74
Polynominal Regression

>> from sklearn.linear model import LinearRegression


lr = LinearRegression()
lr.fit(X, y)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 53 / 74
Polynominal Regression

Syntax
from sklearn.preprocessing import PolynomialFeatures

>> from sklearn.preprocessing import PolynomialFeatures


pr = LinearRegression()
quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X)
pr.fit(X quad, y)

>> X fit = np.arange(250, 600, 10)[:, np.newaxis]

>> y fit linear = lr.predict(X fit)


y fit quad = pr.predict(quadratic.fit transform(X fit))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 54 / 74
>> plt.scatter(X, y, label=’Training points’)
plt.xlabel(’X’)
plt.ylabel(’y’)
plt.plot(X fit, y fit linear, label=’Linear fit’, linestyle=’–’)
plt.plot(X fit, y fit quad, label=’Quadratic fit’)
plt.legend()
plt.tight layout()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 55 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 56 / 74
Practice

Linear regression
>> lr = LR()
lr.fit(X train, y train)

Polynominal regression (quadratic)


>> quadratic = PolynomialFeatures(degree=2)
X quad = quadratic.fit transform(X train)
pr quad = LR() pr quad = pr quad.fit(X quad, y train)

Polynominal regression (cubic)


>> cubic = PolynomialFeatures(degree=3)
X cubic = cubic.fit transform(X train)
pr cubic = LR() pr cubic = pr cubic.fit(X cubic, y train)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 57 / 74
>> X fit = np.arange(X train.min(), X train.max(), 0.1)[:, np.newaxis]

>> y linear fit = lr.predict(X fit)


y quad fit = pr quad.predict(quadratic.fit transform(X fit))
y cubic fit = pr cubic.predict(cubic.fit transform(X fit))

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 58 / 74
Plotting the results
>> plt.scatter(X train, y train, c=’steelblue’, edgecolor=’white’, s=70)
plt.plot(X fit, y lin fit, label=’Linear (d=1)’, color=’blue’, lw=2,
linestyle=’:’)
plt.plot(X fit, y quad fit, label=’Quadratic (d=2)’, color=’red’, lw=2,
linestyle=’-’)
plt.plot(X fit, y cubic fit, label=’Cubic (d=3)’, color=’green’,
lw=2,linestyle=’–’)
plt.xlabel(’Average number of rooms [RM] (standardized)’)
plt.ylabel(’Price in $1000s [MEDV] (standardized)’)
plt.legend()
plt.show()

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 59 / 74
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 60 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 61 / 74
Figure: Differences between Perceptron and Adaline

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 62 / 74
Training process

Algorithm 3 Pseudocode for the training process


1: Initialize the weights, w
2: while stopping criteria is not satisfied do
3: for x ∈ X do
4: Compute the output value, ŷ
5: Updates the weights
6: end for
7: end while

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 63 / 74
Updating the weights

w = w + ∆w
∆wi = η ∗ (y − ŷ ) ∗ xi
where:
I η: learning rate
I y : the true class label
I ŷ : the predicted class label

Examples
∆w0 = η ∗ (y − ŷ )
∆w1 = η ∗ (y − ŷ ) ∗ x1
∆w2 = η ∗ (y − ŷ ) ∗ x2

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 64 / 74
Components

Hyperparameters
eta
max iter
random state

Parameters
w
costs

Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 65 / 74
Implement (code from scratch)

class Adaline:
def init (self, eta = 0.01, max iter = 50, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]

def activation(self, X):


return self.net input(X)

def predict(self, X):


return np.where(self.activation(X) ≥ 0.0, 1, -1)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 66 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iter in range (self.max iter):
idx = rgen.permutation(len(y))
X, y = X[idx], y[idx]
cost = 0
for xi, yi in zip(X, y):
error = yi - self.predict(xi)
self.w[1:] += self.eta * error * xi
self.w[0] += self.eta * error
cost += error**2
cost /= 2
self.costs.append(cost)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 67 / 74
Table of contents

1 Perceptron

2 Linear Regression

3 Adaptive Linear Neuron (Adaline)

4 Logistic Regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 68 / 74
Figure: Differences between Adaline and Logistic regression

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 69 / 74
Components

Hyperparameters
eta
max iter
random state

Parameters
w
costs

Methods
fit(X , y )
predict(X )
net input(X )
activation(X )
Quan Minh Phan & Ngoc Hoang Luong (University of Information
MachineTechnology
Learning (UIT)) April 7, 2021 70 / 74
Implement (code from scratch)

class LogisticRegression:
def init (self, eta = 0.01, max iter = 50, random state = 1):
self.eta = eta
self.max iter = max iter
self.random state = random state
self.w = None
self.costs = [ ]

def activation(self, X):


return 1. / (1. + np.exp(-np.clip(self.net input(X), -250, 250)))

def predict(self, X):


return np.where(self.activation(X) ≥ 0.5, 1, 0)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 71 / 74
def fit(self, X, y):
rgen = np.random.RandomState(self.random state)
self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
self.costs = [ ]
for n iter in range (self.max iter):
output = self.activation(X)
errors = y - output
self.w[1:] += self.eta * X.T.dot(errors)
self.w[0] += self.eta * errors.sum()
cost = (-y.dot(np.log(output)) - ((1 - y).dot(np.log(1 - output))))
self.costs.append(cost)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 72 / 74
Practice

>> clf LR = LogisticRegression(eta=0.01, max iter=20,


random state=1)
clf LR.fit SGD(X train, y train)
>> y pred = clf LR.predict(X test)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 73 / 74
Implement (library)

Syntax (import)
from sklearn.linear model import LogisticRegression

Examples
>> from sklearn.linear model import LogisticRegression as
LogisticRegression
>> clf LR lib = LogisticRegression (random state=1)
clf LR lib.fit(X train, y train)
>> y pred lib1 = clf LR lib.predict(X test)

Quan Minh Phan & Ngoc Hoang Luong (University of Information


MachineTechnology
Learning (UIT)) April 7, 2021 74 / 74

You might also like