1_logistic_regression
1_logistic_regression
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from collections import OrderedDict
from scipy.special import expit
import unittest
%matplotlib inline
rcParams['figure.figsize'] = 14, 8
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
def run_tests():
unittest.main(argv=[''], verbosity=1, exit=False)
Your data
In [0]: data = OrderedDict(
amount_spent = [50, 10, 20, 5, 95, 70, 100, 200, 0],
send_discount = [0, 1, 1, 1, 0, 0, 0, 0, 1]
)
In [0]: df = pd.DataFrame.from_dict(data)
df
0 50 0
1 10 1
2 20 1
3 5 1
4 95 0
5 70 0
6 100 0
7 200 0
8 0 1
Some examples for problems that can be solved with Logistic regression are:
y ∈ {0, 1}
and set 0: negative class (e.g. email is not spam) or 1: positive class (e.g. email is spam).
The response target variable y of the Linear regression model is not restricted within the [0, 1] interval.
hw (x) = w1 x1 + w0
where the coefficients wi are paramaters of the model. Let the coeffiecient vector W be:
w1
W = ( )
w0
T
hw (x) = w x
We want to build a model that outputs values that are between 0 and 1, so we want to come up with a hypothesis that satisfies 0 . For Logistic regression we want to modify this and introduce
≤ hw (x) ≤ 1
another function g:
T
hw (x) = g(w x)
1
g(z) =
−z
1 + e
where z ∈ R g . is also known as the sigmoid function or the logistic function. So, after substition, we end up with this definition:
1
hw (x) =
T
−(w x)
1 + e
1
g(z) =
−z
1 + e
def test_at_zero(self):
self.assertAlmostEqual(sigmoid(0), 0.5)
def test_at_negative(self):
self.assertAlmostEqual(sigmoid(-100), 0)
def test_at_positive(self):
self.assertAlmostEqual(sigmoid(100), 1)
In [0]: run_tests()
...
----------------------------------------------------------------------
Ran 3 tests in 0.006s
OK
Loss function
We have a model that we can use to make decisions, but we still have to find the parameters W . To do that, we need an objective measurement of how good some set of parameters are. For that purpose, we
will use a loss (cost) function:
m
1
(i) (i)
J (W ) = ∑ Cost(hw (x ), y )
m
i=1
1
J (W ) = (−y log (hw ) − (1 − y) log (1 − hw ))
m
where
T
hw (x) = g(w x)
def test_zero_h_zero_y(self):
self.assertLess(loss(h=0.000001, y=.000001), 0.0001)
def test_one_h_zero_y(self):
self.assertGreater(loss(h=0.9999, y=.000001), 9.0)
def test_zero_h_one_y(self):
self.assertGreater(loss(h=0.000001, y=0.9999), 9.0)
def test_one_h_one_y(self):
self.assertLess(loss(h=0.999999, y=0.999999), 0.0001)
In [0]: run_tests()
.......
----------------------------------------------------------------------
Ran 7 tests in 0.010s
OK
In [0]: X = df['amount_spent'].astype('float').values
y = df['send_discount'].astype('float').values
I am pretty lazy this approach seems like too much hard work for me.
0.0
0.0
0.0
6.661338147750941e-16
9.359180097590508e-14
1.3887890837434982e-11
2.0611535832696244e-09
3.059022736706331e-07
4.539889921682063e-05
0.006715348489118056
0.6931471805599397
5.006715348489103
10.000045398900186
15.000000305680194
19.999999966169824
24.99999582410784
30.001020555434774
34.945041100449046
inf
inf
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:2: RuntimeWarning: divide by zero encountered in log
In Machine Learning, we use gradient descent algorithms to find "good" parameters for our models (Logistic Regression, Linear Regression, Neural Networks, etc...).
Somewhat deeper look into how Gradient descent works (Source: PyTorchZeroToAll)
Starting somewhere, we take our first step downhill in the direction specified by the negative gradient. Next, we recalculate the negative gradient and take another step in the direction it specifies. This process
continues until we get to a point where we can no longer move downhill - a local minimum.
′
g (z) = g(z)(1 − g(z))
1
J (W ) = (−y log (hw ) − (1 − y) log (1 − hw ))
m
Then:
∂J (W ) 1 1
= (y(1 − hw ) − (1 − y)hw )x = (y − hw )x
∂W m m
1
W := W − α( (y − hw )x)
m
The parameter α is known as learning rate. High learning rate can converge quickly, but risks overshooting the lowest point. Low learning rate allows for confident moves in the direction of the negative
gradient. However, it time-consuming so it will take us a lot of time to get to converge.
W = np.zeros(X.shape[1])
for i in range(n_iter):
z = np.dot(X, W)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
W -= lr * gradient
return W
def test_correct_prediction(self):
global X
global y
if len(X.shape) != 2:
X = X.reshape(X.shape[0], 1)
w, _ = fit(X, y)
y_hat = predict(X, w).round()
self.assertTrue((y_hat == y).all())
In [0]: run_tests()
E.......
======================================================================
ERROR: test_correct_prediction (__main__.TestGradientDescent)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-15-59757b6f0a6f>", line 8, in test_correct_prediction
w, _ = fit(X, y)
ValueError: not enough values to unpack (expected 2, got 1)
----------------------------------------------------------------------
Ran 8 tests in 0.888s
FAILED (errors=1)
Well, that's not good, after all that hustling we're nowhere near achieving our goal of finding good parameters for our model. But, what went wrong? Let's start by finding whether our algorithm improves over
time. We can use our loss metric for that:
W = np.zeros(X.shape[1])
errors = []
for i in range(n_iter):
z = np.dot(X, W)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
W -= lr * gradient
return W, errors
In [0]: run_tests()
loss: 0.6931471805599453
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
F.......
loss: 0.41899283818630056
======================================================================
FAIL: test_correct_prediction (__main__.TestGradientDescent)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-15-59757b6f0a6f>", line 10, in test_correct_prediction
self.assertTrue((y_hat == y).all())
AssertionError: False is not true
----------------------------------------------------------------------
Ran 8 tests in 0.938s
FAILED (failures=1)
loss: 0.6931471805599453
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
Good, we found a possible cause for our problem. Our loss doesn't get low enough, in other words, our algorithm gets stuck at some point that is not a good enough minimum for us. How can we fix this?
Perhaps, try out different learning rate or initializing our parameter with a different value?
W = np.zeros(X.shape[1])
errors = []
for i in range(n_iter):
z = np.dot(X, W)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
W -= lr * gradient
return W, errors
In [0]: run_tests()
loss: 0.6931471805599453
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
loss: 0.41899283818630056
F.......
loss: 0.41899283818630056
======================================================================
FAIL: test_correct_prediction (__main__.TestGradientDescent)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-15-59757b6f0a6f>", line 10, in test_correct_prediction
self.assertTrue((y_hat == y).all())
AssertionError: False is not true
----------------------------------------------------------------------
Ran 8 tests in 0.903s
FAILED (failures=1)
Hmm, how about adding one more parameter for our model to find/learn?
X = add_intercept(X)
W = np.zeros(X.shape[1])
errors = []
for i in range(n_iter):
z = np.dot(X, W)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
W -= lr * gradient
In [0]: run_tests()
........
----------------------------------------------------------------------
Ran 8 tests in 0.837s
OK
def test_correct_prediction(self):
global X
global y
X = X.reshape(X.shape[0], 1)
clf = LogisticRegressor()
y_hat = clf.fit(X, y).predict(X)
self.assertTrue((y_hat == y).all())
In [0]: run_tests()
.E.......
======================================================================
ERROR: test_correct_prediction (__main__.TestLogisticRegressor)
----------------------------------------------------------------------
Traceback (most recent call last):
File "<ipython-input-25-b3cbfe737800>", line 7, in test_correct_prediction
clf = LogisticRegressor()
NameError: name 'LogisticRegressor' is not defined
----------------------------------------------------------------------
Ran 9 tests in 0.858s
FAILED (errors=1)
X = self._add_intercept(X)
self.W = np.zeros(X.shape[1])
for i in range(n_iter):
z = np.dot(X, self.W)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
self.W -= lr * gradient
return self
In [0]: run_tests()
.........
----------------------------------------------------------------------
Ran 9 tests in 1.695s
OK
In [0]: y_test
array([1., 0.])
Out[0]: