Maxbox Starter66 Machine Learning4
Maxbox Starter66 Machine Learning4
Machine Learning IV
____________________________________________________________________________
maXbox Starter 66 - Data Science with Max
There are two kinds of data scientists:
1) Those who can extrapolate from incomplete data.
A B C D
[[ 1. 2. 3. 4. 0.]
[ 3. 4. 5. 6. 0.]
[ 5. 6. 7. 8. 1.]
[ 7. 8. 9. 10. 1.]
[10. 8. 6. 4. 0.]
[ 9. 7. 5. 3. 1.]]
There are two possible predicted classes: 1 as "yes" and 0 as "no". If we were
predicting for example the presence of a disease, "yes" would mean they have the
disease, and "no" would mean they don't have the disease.
If you want to learn to carry out tasks and concepts by themselves, so here is
an overview of the confusion matrix and a general overview:
http://www.softwareschule.ch/decision.jpg
http://www.softwareschule.ch/examples/machinelearning.jpg
OK., lets start with a first classifier, we split the dataset in y (target or
label) and X (predictors) for the 4 features1:
y = arr2[0:,4]
X = arr2[0:,0:4]
features = ['A','B','C','D']
1 For sake of simplicity we don't split data in a train and test set
1/10
print(y,'\n',X,'\n')
[0. 0. 1. 1. 0. 1.]
[[ 1. 2. 3. 4.]
[ 3. 4. 5. 6.]
[ 5. 6. 7. 8.]
[ 7. 8. 9. 10.]
[10. 8. 6. 4.]
[ 9. 7. 5. 3.]]
svm = LinearSVC(random_state=100)
y_pred = svm.fit(X,y).predict(X) # fit and predict in one line
[[2 1]
[0 3]]
2/10
The confusion matrix has the form:
print(confusion_matrix(y, y_pred))
[[2 1]
[0 3]]
The first row belongs to the 0 class and the second the 1 class:
0 1 predict
0 [[2 1]
1 [0 3]]
As we can see there's one false positive predicted!
As we compare y != y_pred:
print((y, y_pred))
That means we predicted yes [1], but they don't actually have the disease; we
can also say the false positive is like a false alarm.
What can we learn from this matrix?
• There are two possible predicted classes: "yes" and "no". If we were
predicting the presence of a disease, for example, "yes" would mean
they have the disease, and "no" would mean they don't have the disease
after a diagnosis.
• The classifier made a total of 6 predictions (e.g., 6 patients were
being tested for the presence of that disease).
• Out of those 6 cases, our classifier predicted "yes" 4 times, and "no"
2 times (no=0, yes=1).
• In reality and best case, 3 patients in the sample have the disease,
and 3 patients do not (only true negative & true positive):
• [[3 0]
• [0 3]]
The precision mean when it predicts (yes or no), how often is it correct?
TP/predicted yes = 3/4 = 0.75
The recall mean when it's actually yes, how often does it predict yes?
3/10
Note that in binary classification, recall of the positive class is also known
as “sensitivity”; recall of the negative class is “specificity”.
More details of that topic at:
https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/
or EKON 22 - November 2018 at Düsseldorf Session
So our data has 4 features & 3 duplicates, easy to find with a classification:
clf = SVC(random_state=100)
y_pred = clf.fit(X,y).predict(X)
print('supportvectormachine score1: ',clf.score(X,y))
print('score2: ',accuracy_score(y, y_pred))
print(confusion_matrix(y, y_pred))
#plotPredictions(clf)
I initialize the constructor with a defined random state, so our tests will have
always the same result to reproduce. The seed of the pseudo random number
generator used when shuffling the data for probability estimates.
This classification has no mislabeled data, the score is 1:
>>> supportvectormachine score1: 1.0
score2: 1.0
[[3 0]
[0 3]]
classification report:
precision recall f1-score support
4/10
Another 4 classifier with scores on the same script2:
clf = GaussianNB()
y_pred = clf.fit(X,y).predict(X)
print('gaussian nb score2: ',accuracy_score(y, y_pred))
print(confusion_matrix(y, y_pred))
>>>gaussian nb score2: 0.8333333333333334
[[2 1]
[0 3]]
clf = KNeighborsClassifier(n_neighbors=3)
y_pred = clf.fit(X,y).predict(X)
print('kneighbors score2: ',accuracy_score(y, y_pred))
print(confusion_matrix(y, y_pred))
#plotPredictions(clf)
[[2 1]
[0 3]]
clf = DecisionTreeClassifier(random_state=100,max_depth=5)
y_pred = clf.fit(X,y).predict(X)
print('decision tree score2: ',accuracy_score(y, y_pred))
print(confusion_matrix(y, y_pred))
>>> decision tree score2: 1.0
[[3 0]
[0 3]]
Decision Trees and Random Forest are very interesting, cause you can visualize
the implicit knowledge to a explicit decision map with the help of pydotplus:
import pydotplus;
2 http://www.softwareschule.ch/examples/classifier_compare2confusion.py.txt
5/10
from sklearn.externals.six import StringIO
import pydotplus
from sklearn import tree
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data,
feature_names=features)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
print(graph,dot_data, basePath)
#Image(graph.create_png())
graph.write_png(basePath+'\maxboxdecisiontree_graph2.png')
The features of a decision tree are always randomly permuted at each split.
Therefore, the best found split may vary, even with the same training data and
max_features=n_features, if the improvement of the criterion is identical for
several splits enumerated during the search of the best split.
To obtain a deterministic behavior during fitting, random_state has to be fixed.
clf = DecisionTreeClassifier(random_state=100,max_depth=5)
https://en.wikipedia.org/wiki/Decision_tree_learning
At its core, most algorithms should have a proof of classification and this is
nothing more than keeping track of which feature gives evidence to which class.
The way the features are designed determines the model that is used to learn.
This can be a confusion matrix, a certain confidence interval, a T-Test
statistic, p-value or something else used in hypothesis3 testing.
http://www.softwareschule.ch/examples/decision.jpg
This model optimizes the log-loss function using LBFGS or stochastic gradient
descent. MLPClassifier trains iteratively since at each time step the partial
derivatives of the loss function with respect to the model parameters are
computed to update the parameters.
It can also have a regularization term added to the loss function (e.g. cross
entropy) that shrinks model parameters to prevent over-fitting (learn by heart).
This implementation works with data represented as dense numpy arrays or sparse
scipy arrays of floating point values.
6/10
"Classification with Cluster"
Now we will use linear SVC to partition our graph into clusters and split the
data into a training set and a test set for further predictions.
# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
By setting up a dense mesh of points in the grid and classifying all of them, we
can render the regions of each cluster as distinct colors:
def plotPredictions(clf):
xx, yy = np.meshgrid(np.arange(0, 250000, 10),
np.arange(10, 70, 0.5))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
plt.figure(figsize=(8, 6))
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:,0], X[:,1], c=y.astype(np.float))
plt.show()
A simple CNN architecture was trained on MNIST dataset using TensorFlow with 1e-
3 learning rate and cross-entropy loss using four different optimizers: SGD,
Nesterov Momentum, RMSProp and Adam.
We compared different optimizers used in training neural networks and gained
intuition for how they work. We found that SGD with Nesterov Momentum and Adam
produce the best results when training a simple CNN on MNIST data in TensorFlow.
https://sourceforge.net/projects/maxbox/files/Docu/EKON_22_machinelearning_slide
s_scripts.zip/download
7/10
Last note concerning PCA and Data Reduction or Factor Analysis:
As PCA simply transforms the input data, it can be applied both to
classification and regression problems. In this section, we will use a
classification task to discuss the method.
The script can be found at:
http://www.softwareschule.ch/examples/811_mXpcatest_dmath_datascience.pas
..\examples\811_mXpcatest_dmath_datascience.pas
clf = QA()
y_pred = clf.fit(X,y).predict(X)
print('\n QuadDiscriminantAnalysis score2: ',accuracy_score(y, y_pred))
print(confusion_matrix(y, y_pred))
8/10
Of course, its not always this and that simple. Often, we don't know what number
of dimensions is advisable in upfront. In such a case, we leave n_components or
Nvar parameter unspecified when initializing PCA to let it calculate the full
transformation. After fitting the data, explained_variance_ratio_ contains an
array of ratios in decreasing order: The first value is the ratio of the basis
vector describing the direction of the highest variance, the second value is the
ratio of the direction of the second highest variance, and so on.
https://sourceforge.net/projects/maxbox/files/Docu/EKON_22_machinelearning_s
lides_scripts.zip/download
http://www.softwareschule.ch/examples/classifier_compare2confusion.py.txt
9/10
http://www.softwareschule.ch/box.htm
https://scikit-learn.org/stable/modules/
https://packaging.python.org/tutorials/managing-dependencies/
https://towardsdatascience.com/understanding-data-science-classification-
metrics-in-scikit-learn-in-python-3bc336865019
Doc:
http://fann.sourceforge.net/fann_en.pdf
http://www.softwareschule.ch/examples/datascience.txt
https://maxbox4.wordpress.com
Last Note:
Pipenv is a dependency manager for Python projects. If you’re familiar with
Node.js’ npm , Composer of PHP, or Ruby’s bundler, it is similar in spirit to
those tools. While pip alone is often sufficient for personal use, Pipenv is
recommended for collaborative projects as it’s a higher-level tool that
simplifies dependency management for common use cases.
Use pip to install Pipenv:
pip install --user pipenv
Keep in mind that Python is used for a great many different purposes, and
precisely how you want to manage your dependencies may change based on how you
decide to publish your software.
10/10