SlideShare a Scribd company logo
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
What is model overfitting?
How would you identify it?
How would you correct it?
Deriving Knowledge from Data at Scale
What is model underfitting?
How would you identify it?
How would you correct it?
Deriving Knowledge from Data at Scale
List methods for dealing with
missing values?
But what must you do first?
Deriving Knowledge from Data at Scale
Describe stratification and
give an example of when
you would use it…
Deriving Knowledge from Data at Scale
Why is feature selection
important?
Deriving Knowledge from Data at Scale
Strategy to Incrementally Build on Foundation
• Select a Data Type
• Data Manipulation for Selected Data Type
• Feature Selection, Feature Creation, Transformation,…
• Machine Learning Algorithm
• Techniques and Association Tool(s)
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
You can make much stronger inferences about a woman named Brittany. That name was very popular from the mid-1980s
through the mid-1990s, but it wasn’t all that common before and hasn’t been since. If you know a Brittany, she is probably
of college age or just a bit older. Half of living American Brittanys are between the ages of 19 and 25
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Optional Reading…
Deriving Knowledge from Data at Scale
About the Experiment
Deriving Knowledge from Data at Scale
Key Points
Deriving Knowledge from Data at Scale
Key Points
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Read to Hack Your Machine Learning Development
Deriving Knowledge from Data at Scale
Read to Hack Your Machine Learning Development
Deriving Knowledge from Data at Scale
Read to Hack Your Machine Learning Development
Cascade Classifier is one of the most popular face detection algorithms and the
default choice in OpenCV libraries as well. Highly accurate and very fast…
• 15 times faster than previous work at the time, fast enough for realtime;
• Intuition: Accurate and complex models are computational expensive. The
input are subjected to a series of increasing accurate and expensive models,
and the most expensive model will be used on only the most promising input.
So it asks a series of questions like: 1) Does model one thinks it is a face? If no,
stop. If Yes, ask model two. 2) Does model two thinks it is a face? If no, stop. If
yes, ask model three. And so on, until the last model also say yes.
• Feature-based then pixel-based. Haar-like features were created for the
images. Note working with pixels is generally very computational expensive.
• Haar-like features are basically high-level features over pixels. For example,
the feature observation that the region of the eyes is darker than the region of
the cheeks. This can be used as a feature input to the model.
Deriving Knowledge from Data at Scale
Better, Best?
Simple (single) Classifier, Bagging, or Boosting?
Bagging: Multiple classifiers trained on different under-sampled subsets and
allow classifiers to vote on final decision, in contrast to using just one classifier.
Boosting: Series of classifiers to train on the dataset, but gradually putting more
emphasis on training examples that the previous classifiers have failed on, in the
hope of that the next classifier will focus on these harder examples. So in the
end, you will have a series of classifiers who are in general balanced but slightly
more focused on the hard training examples.
In practice, boosting beats bagging, either bagging and boosting will beat a
plain classifier. See ‘Bagging, Boosting and C4.5’ by JR Quinlan. Related work
and experience is that Random Forest models are as good as boosting.
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Retrain on whole dataset after validating model?
Suppose you have a dataset split into 80% for training and 20% for validation,
do you follow ‘Plan A’ or ‘Plan B’?
Plan A
Plan B
Deriving Knowledge from Data at Scale
Retrain on whole dataset after validating model?
Plan A
Deriving Knowledge from Data at Scale
Retrain on whole dataset after validating model?
Plan A
hyper-parameters
Deriving Knowledge from Data at Scale
Hyperparameter Tuning
Just to be sure we’re all on the same page…
Deriving Knowledge from Data at Scale
Hyperparameter Tuning
Just to be sure we’re all on the same page…
Deriving Knowledge from Data at Scale
Hyperparameter Tuning
Just to be sure we’re all on the same page…
Deriving Knowledge from Data at Scale
Hyperparameter Tuning
Just to be sure we’re all on the same page…
Random search
• If the close-to-optimal region of hyperparameters occupies at least 5%
of the grid surface, then random search with 60 trials will find that region
with high probability.
With its utter simplicity and surprisingly reasonable performance,
random search is my to-go method for hyperparameter tuning. It’s
trivially parallelizable, just like grid search, but it takes much fewer
tries and performance almost as well most of the time
Deriving Knowledge from Data at Scale
Feature Creation
Deriving Knowledge from Data at Scale
Process of Feature Engineering
Deriving Knowledge from Data at Scale
Why?
most important
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Google Refine
• Weka
• Brainstorming
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
https://www.youtube.com/watch?v=yNccGtn3Wb0
https://www.youtube.com/watch?v=cO8NVCs_Ba0
https://www.youtube.com/watch?v=5tsyz3ibYzk
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Not all errors are equal
We need better metrics…
Deriving Knowledge from Data at Scale
True
Label
Predicted Label
Confusion
matrix
Deriving Knowledge from Data at Scale
True Positive Rate True
Negative Rate False Positive Rate False Negative Rate
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
average per-class accuracy
average of the accuracy for each class
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Frequently, one might look at only the top k items from the ranker, k = 5, 10,
20, 100, etc. Then the metrics would be called “precision@k” and “recall@k.”
Deriving Knowledge from Data at Scale
Frequently, one might look at only the top k items from the ranker, k = 5, 10,
20, 100, etc. Then the metrics would be called “precision@k” and “recall@k.”
One might average precision and recall scores for each query and look at
“average precision@k” and “average recall@k.” (Analogous to the relationship
between accuracy and average per-class accuracy for classification.)
Deriving Knowledge from Data at Scale
Once you can compute precision and recall, you are often able to produce
precision/recall curves. Suppose that you are attempting to identify spam. You
run a learning algorithm to make predictions on a test set. But instead of just
taking a “yes/no” answer, you allow your algorithm to produce its confidence.
For instance, using a perceptron, you might use the distance from the
hyperplane as a confidence measure. You can then sort all of your test emails
according to this ranking. You may put the most spam-like emails at the top
and the least spam-like emails at the bottom
Deriving Knowledge from Data at Scale
Once you can compute precision and recall, you are often able to produce precision/recall curves. Suppose
that you are attempting to identify spam. You run a learning algorithm to make predictions on a test set. But
instead of just taking a “yes/no” answer, you allow your algorithm to produce its confidence. For instance, using
a perceptron, you might use the distance from the hyperplane as a confidence measure. You can then sort all
of your test emails according to this ranking. You may put the most spam-like emails at the top and the least
spam-like emails at the bottom
Once you have this sorted list, you can choose how aggressively you want your
spam filter to be by setting a threshold anywhere on this list. One would hope
that if you set the threshold very high, you are likely to have high precision (but
low recall). If you set the threshold very low, you’ll have high recall (but low
precision). By considering every possible place you could put this threshold,
you can trace out a curve of precision/recall values. This allows us to ask the
question: for some fixed precision, what sort of recall can I get…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Method 1
• Method 2
Deriving Knowledge from Data at Scale
Method 1
Method 2
Deriving Knowledge from Data at Scale
Method 1
Method 2
very small
differences quite a
large difference
Deriving Knowledge from Data at Scale
Sometimes we want a single number that informs us of the quality of the
solution. A popular way to combe precision and recall into a single number is
by taking their harmonic mean. This is known as the balanced f-measure:
The reason to use a harmonic mean rather than an arithmetic mean is that it
favors systems that achieve roughly equal precision and recall. In the extreme
case where P = R, then F = P = R. But in the imbalanced case, for instance P =
0.1 and R = 0.9, the overall f-measure is a modest 0.18.
Deriving Knowledge from Data at Scale
NDCG
treat all retrieved items equally
Deriving Knowledge from Data at Scale
NDCG
Deriving Knowledge from Data at Scale
Regression Metrics
Deriving Knowledge from Data at Scale
Regression Metrics
Deriving Knowledge from Data at Scale
Regression Metrics
Deriving Knowledge from Data at Scale
This was not an exhaustive list or coverage, but
I hope the next time you encounter a new
evaluation metric you can deconstruct it, identify
what is it measuring and why – to align with the
business objective or function of the model?
Deriving Knowledge from Data at Scale
10 minutes break…
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
X
Y
N1
N2
o1
o2
O3
Deriving Knowledge from Data at Scale
Related problems
* N. Talleb, The Black Swan: The Impact of the Highly Probable?, 2007
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Tid SrcIP
Start
time
Dest IP Dest
Port
Number
of bytes
Attack
1 206.135.38.95 11:07:20 160.94.179.223 139 192 No
2 206.163.37.95 11:13:56 160.94.179.219 139 195 No
3 206.163.37.95 11:14:29 160.94.179.217 139 180 No
4 206.163.37.95 11:14:30 160.94.179.255 139 199 No
5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes
6 206.163.37.95 11:14:35 160.94.179.253 139 177 No
7 206.163.37.95 11:14:36 160.94.179.252 139 172 No
8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes
9 206.163.37.95 11:14:41 160.94.179.250 139 195 No
10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes
10
Deriving Knowledge from Data at Scale
Tid SrcIP Duration Dest IP
Number
of bytes
Internal
1 206.163.37.81 0.10 160.94.179.208 150 No
2 206.163.37.99 0.27 160.94.179.235 208 No
3 160.94.123.45 1.23 160.94.179.221 195 Yes
4 206.163.37.37 112.03 160.94.179.253 199 No
5 206.163.37.41 0.32 160.94.179.244 181 No
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
X
Y
N1
N2
o1
o2
O3
Deriving Knowledge from Data at Scale
Normal
Anomaly
Deriving Knowledge from Data at Scale
Anomalous Subsequence
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Evaluation of Anomaly Detection – F-value
Deriving Knowledge from Data at Scale
Evaluation of Outlier Detection – ROC & AUC
Standard measures for evaluating anomaly detection problems:
Predicted
class
Confusion
matrix
NC C
NC TN FPActual
class C FN TP
anomaly class – C
normal class – NC
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Using Support Vector Machines
SVM Classifiers
• Main idea
• Normal data records belong to high density data regions
• Anomalies belong to low density data regions
• Use unsupervised approach to learn high density and low density data
regions
• Use SVM to classify data density level
• Main idea
• Data records are labelled (normal network behaviour vs. intrusive)
• Use standard SVM for classification
Deriving Knowledge from Data at Scale
Using Replicator Neural Networks
Target
variables
Input
Deriving Knowledge from Data at Scale
Using Support Vector Machines
origin push the hyper plane
away from origin as
much as possible
Deriving Knowledge from Data at Scale
Nearest Neighbour Based Techniques
Deriving Knowledge from Data at Scale
Nearest Neighbour Based Techniques
Deriving Knowledge from Data at Scale
Nearest Neighbor Based Techniques
Deriving Knowledge from Data at Scale
Distance based Outlier Detection
Nearest Neighbour (NN) approach
• For each data point d compute the distance to the k-th nearest neighbour dk
• Sort all data points according to the distance dk
• Outliers are points that have the largest distance dk and therefore are located
in the more sparse neighbourhoods
• Usually data points that have top n% distance dk are identified as outliers
• n – user parameter
Deriving Knowledge from Data at Scale
Model-based approaches to anomaly detection construct a profile of normal
instances, then identify instances that do not conform to this profile as anomalies.
Isolation Forrest is a fundamentally different model-based method that explicitly
isolates anomalies instead of profiles normal points. iForests create an algorithm
which has a linear time complexity with a low constant and a low memory
requirement.
Empirical evaluations shows that iForest performs favorably to OCSVM a near-
linear time complexity distance-based method, NN and random forests in terms
of AUC and processing time, and especially in large data sets. iForest also works
well in high dimensional problems which have a large number of irrelevant
attributes, and in situations where training set does not contain any anomalies.
Deriving Knowledge from Data at Scale
Randomly select a dimension (feature) and then
randomly select a cut point within that feature
space, recurse, build a tree until leaf node has
only n (for example 1).
Create an ensemble of such trees, merge the
results.
Outliers do not have neighbors (mass) and will
be close to the root (top) of the tree…
Deriving Knowledge from Data at Scale
“Isolation Forest”
• When building a tree, anomalies are likely to be
isolated closer to the root of the tree; whereas
normal points appear deeper in the tree structure
• No need to profile normal data points
• No distance or density measures
• Gap
• Lacks explanatory power
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Neural Networks
http://www.youtube.com/watch?v=GrugzF0-V3I
Deriving Knowledge from Data at Scale
X1 X2 X3 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 0
0 1 0 0
0 1 1 1
0 0 0 0
X1
X2
X3
Y
Black box
Output
Input
Output Y is 1 if at least two of the three inputs are equal to 1.
Deriving Knowledge from Data at Scale
X1 X2 X3 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 0
0 1 0 0
0 1 1 1
0 0 0 0

X1
X2
X3
Y
Black box
0.3
0.3
0.3 t=0.4
Output
node
Input
nodes





otherwise0
trueisif1
)(where
)04.03.03.03.0( 321
z
zI
XXXIY
Deriving Knowledge from Data at Scale

X1
X2
X3
Y
Black box
w1
t
Output
node
Input
nodes
w2
w3
)( tXwIY
i
ii  
Perceptron Model
)( tXwsignY
i
ii  
or
Deriving Knowledge from Data at Scale
−
+
+
+
+
+
+
+
+ +
+
+ −
−
−
−
−
−
−
−
− −
−
−
−
+
+
+
+
+
+
+
+ +
+
+
−
−
−
−
−
−
−
−
−
−
−
−
Linearly separable data: Non-linearly separable data:
+
+
+
−
−
−
Deriving Knowledge from Data at Scale
Activation
function
g(Si )
Si
Oi
I1
I2
I3
wi1
wi2
wi3
Oi
Neuron iInput Output
threshold, t
Input
Layer
Hidden
Layer
Output
Layer
x1 x2 x3 x4 x5
y
Training ANN means learning the weights of
the neurons
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Inputs
input layer
hidden layer
output layer
feed-forward
nonlinear regression
Deriving Knowledge from Data at Scale
input layer # of hidden
layers each hidden layer output layer
• Normalize
input
• Output
unacceptable
different network topology different set of
initial weights
Deriving Knowledge from Data at Scale
𝛻𝐸 𝒘 =
𝜕𝐸
𝜕𝑤0
,
𝜕𝐸
𝜕𝑤1
, … ,
𝜕𝐸
𝜕𝑤 𝑑
Deriving Knowledge from Data at Scale
114
 2
),( 
i
iii XwfYE
Deriving Knowledge from Data at Scale
Backpropagation
Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
minimize the mean squared error
backwards
backpropagation
Deriving Knowledge from Data at Scale
Stochastic Gradient Descent
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
• Cross-validation
0E
0E
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
WEKA provides user control of training
parameters:
• # of hidden layers
• Learning rate
• # of iterations or epochs (“training time”)
• Increment of weight adjustments in back
propogation (“learning rate”)
• Controls on varying changes to increments
(“momentum”) and weight decay
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
Multilayered Neural Networks require a lot
of experimentation to get right…
a lot…
I’m serious…
Deriving Knowledge from Data at Scale
Hands On
Deriving Knowledge from Data at Scale
Are they any good?
Deriving Knowledge from Data at Scale
rare
Deriving Knowledge from Data at Scale
Deriving Knowledge from Data at Scale
That’s all for tonight….
Ad

More Related Content

What's hot (20)

Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
Roger Barga
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
David Murgatroyd
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
Shishir Choudhary
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
Sara Hooker
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised Learning
Sara Hooker
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our World
Ken Tabor
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
Sara Hooker
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
BICA Labs
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
Bhaskara Reddy Sannapureddy
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
Michał Łopuszyński
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
Roger Barga
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
Shishir Choudhary
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
Stefan Duprey
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
Sara Hooker
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised Learning
Sara Hooker
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
Simplilearn
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our World
Ken Tabor
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
Sara Hooker
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
BICA Labs
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
Sara Hooker
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
Michał Łopuszyński
 

Viewers also liked (11)

Supply Chain Planning Paper
Supply Chain Planning PaperSupply Chain Planning Paper
Supply Chain Planning Paper
Gary Lauson, M.S., P.E.
 
Lo Stretto digitale - 0
Lo Stretto digitale - 0Lo Stretto digitale - 0
Lo Stretto digitale - 0
anucita
 
Diploma a pachi
Diploma a pachiDiploma a pachi
Diploma a pachi
degatitos
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANASPARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
degatitos
 
Отчет куратора Школы, 2015 год
Отчет куратора Школы, 2015 годОтчет куратора Школы, 2015 год
Отчет куратора Школы, 2015 год
nizhgma.ru
 
Francesco Micali : Dal sito internet al network diocesano - Mediabeta srl
Francesco Micali : Dal sito internet al network diocesano - Mediabeta srlFrancesco Micali : Dal sito internet al network diocesano - Mediabeta srl
Francesco Micali : Dal sito internet al network diocesano - Mediabeta srl
f.micali
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
Roger Barga
 
Present progressive
Present progressivePresent progressive
Present progressive
Ingles Corporativo
 
World Cancer Day Awareness In UAE, 2017
World Cancer Day Awareness In UAE, 2017World Cancer Day Awareness In UAE, 2017
World Cancer Day Awareness In UAE, 2017
Indus Health Plus Management Services LLC
 
Proyecto final
Proyecto final Proyecto final
Proyecto final
Ana Mtez Ortega
 
Lo Stretto digitale - 0
Lo Stretto digitale - 0Lo Stretto digitale - 0
Lo Stretto digitale - 0
anucita
 
Diploma a pachi
Diploma a pachiDiploma a pachi
Diploma a pachi
degatitos
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANASPARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
PARA QUE SEPAS QUIÉN TE CANTA POR LAS MAÑANAS
degatitos
 
Отчет куратора Школы, 2015 год
Отчет куратора Школы, 2015 годОтчет куратора Школы, 2015 год
Отчет куратора Школы, 2015 год
nizhgma.ru
 
Francesco Micali : Dal sito internet al network diocesano - Mediabeta srl
Francesco Micali : Dal sito internet al network diocesano - Mediabeta srlFrancesco Micali : Dal sito internet al network diocesano - Mediabeta srl
Francesco Micali : Dal sito internet al network diocesano - Mediabeta srl
f.micali
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
Roger Barga
 
Ad

Similar to Barga Data Science lecture 9 (20)

Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview Questions
Rock Interview
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty
Adnan Rashid
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
butest
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
Gauravsd2014
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
When Models Meet Data: From ancient science to todays Artificial Intelligence...
When Models Meet Data: From ancient science to todays Artificial Intelligence...When Models Meet Data: From ancient science to todays Artificial Intelligence...
When Models Meet Data: From ancient science to todays Artificial Intelligence...
ssuserbbbef4
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
Chitrachitrap
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
Sqrrl
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Aun Akbar
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
SVasuKrishna1
 
How can algorithms be biased?
How can algorithms be biased?How can algorithms be biased?
How can algorithms be biased?
Software Guru
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
Darius Barušauskas
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
Satyam Jaiswal
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview Questions
Rock Interview
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty
Adnan Rashid
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
Eng Teong Cheah
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
butest
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
Gauravsd2014
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 
When Models Meet Data: From ancient science to todays Artificial Intelligence...
When Models Meet Data: From ancient science to todays Artificial Intelligence...When Models Meet Data: From ancient science to todays Artificial Intelligence...
When Models Meet Data: From ancient science to todays Artificial Intelligence...
ssuserbbbef4
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
Chitrachitrap
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
Sqrrl
 
How can algorithms be biased?
How can algorithms be biased?How can algorithms be biased?
How can algorithms be biased?
Software Guru
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
Darius Barušauskas
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
Satyam Jaiswal
 
Ad

Recently uploaded (20)

Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 

Barga Data Science lecture 9

  • 1. Deriving Knowledge from Data at Scale
  • 2. Deriving Knowledge from Data at Scale
  • 3. Deriving Knowledge from Data at Scale
  • 4. Deriving Knowledge from Data at Scale
  • 5. Deriving Knowledge from Data at Scale What is model overfitting? How would you identify it? How would you correct it?
  • 6. Deriving Knowledge from Data at Scale What is model underfitting? How would you identify it? How would you correct it?
  • 7. Deriving Knowledge from Data at Scale List methods for dealing with missing values? But what must you do first?
  • 8. Deriving Knowledge from Data at Scale Describe stratification and give an example of when you would use it…
  • 9. Deriving Knowledge from Data at Scale Why is feature selection important?
  • 10. Deriving Knowledge from Data at Scale Strategy to Incrementally Build on Foundation • Select a Data Type • Data Manipulation for Selected Data Type • Feature Selection, Feature Creation, Transformation,… • Machine Learning Algorithm • Techniques and Association Tool(s)
  • 11. Deriving Knowledge from Data at Scale
  • 12. Deriving Knowledge from Data at Scale You can make much stronger inferences about a woman named Brittany. That name was very popular from the mid-1980s through the mid-1990s, but it wasn’t all that common before and hasn’t been since. If you know a Brittany, she is probably of college age or just a bit older. Half of living American Brittanys are between the ages of 19 and 25
  • 13. Deriving Knowledge from Data at Scale
  • 14. Deriving Knowledge from Data at Scale
  • 15. Deriving Knowledge from Data at Scale
  • 16. Deriving Knowledge from Data at Scale
  • 17. Deriving Knowledge from Data at Scale
  • 18. Deriving Knowledge from Data at Scale Optional Reading…
  • 19. Deriving Knowledge from Data at Scale About the Experiment
  • 20. Deriving Knowledge from Data at Scale Key Points
  • 21. Deriving Knowledge from Data at Scale Key Points
  • 22. Deriving Knowledge from Data at Scale
  • 23. Deriving Knowledge from Data at Scale Read to Hack Your Machine Learning Development
  • 24. Deriving Knowledge from Data at Scale Read to Hack Your Machine Learning Development
  • 25. Deriving Knowledge from Data at Scale Read to Hack Your Machine Learning Development Cascade Classifier is one of the most popular face detection algorithms and the default choice in OpenCV libraries as well. Highly accurate and very fast… • 15 times faster than previous work at the time, fast enough for realtime; • Intuition: Accurate and complex models are computational expensive. The input are subjected to a series of increasing accurate and expensive models, and the most expensive model will be used on only the most promising input. So it asks a series of questions like: 1) Does model one thinks it is a face? If no, stop. If Yes, ask model two. 2) Does model two thinks it is a face? If no, stop. If yes, ask model three. And so on, until the last model also say yes. • Feature-based then pixel-based. Haar-like features were created for the images. Note working with pixels is generally very computational expensive. • Haar-like features are basically high-level features over pixels. For example, the feature observation that the region of the eyes is darker than the region of the cheeks. This can be used as a feature input to the model.
  • 26. Deriving Knowledge from Data at Scale Better, Best? Simple (single) Classifier, Bagging, or Boosting? Bagging: Multiple classifiers trained on different under-sampled subsets and allow classifiers to vote on final decision, in contrast to using just one classifier. Boosting: Series of classifiers to train on the dataset, but gradually putting more emphasis on training examples that the previous classifiers have failed on, in the hope of that the next classifier will focus on these harder examples. So in the end, you will have a series of classifiers who are in general balanced but slightly more focused on the hard training examples. In practice, boosting beats bagging, either bagging and boosting will beat a plain classifier. See ‘Bagging, Boosting and C4.5’ by JR Quinlan. Related work and experience is that Random Forest models are as good as boosting.
  • 27. Deriving Knowledge from Data at Scale
  • 28. Deriving Knowledge from Data at Scale Retrain on whole dataset after validating model? Suppose you have a dataset split into 80% for training and 20% for validation, do you follow ‘Plan A’ or ‘Plan B’? Plan A Plan B
  • 29. Deriving Knowledge from Data at Scale Retrain on whole dataset after validating model? Plan A
  • 30. Deriving Knowledge from Data at Scale Retrain on whole dataset after validating model? Plan A hyper-parameters
  • 31. Deriving Knowledge from Data at Scale Hyperparameter Tuning Just to be sure we’re all on the same page…
  • 32. Deriving Knowledge from Data at Scale Hyperparameter Tuning Just to be sure we’re all on the same page…
  • 33. Deriving Knowledge from Data at Scale Hyperparameter Tuning Just to be sure we’re all on the same page…
  • 34. Deriving Knowledge from Data at Scale Hyperparameter Tuning Just to be sure we’re all on the same page… Random search • If the close-to-optimal region of hyperparameters occupies at least 5% of the grid surface, then random search with 60 trials will find that region with high probability. With its utter simplicity and surprisingly reasonable performance, random search is my to-go method for hyperparameter tuning. It’s trivially parallelizable, just like grid search, but it takes much fewer tries and performance almost as well most of the time
  • 35. Deriving Knowledge from Data at Scale Feature Creation
  • 36. Deriving Knowledge from Data at Scale Process of Feature Engineering
  • 37. Deriving Knowledge from Data at Scale Why? most important
  • 38. Deriving Knowledge from Data at Scale
  • 39. Deriving Knowledge from Data at Scale • Google Refine • Weka • Brainstorming
  • 40. Deriving Knowledge from Data at Scale
  • 41. Deriving Knowledge from Data at Scale
  • 42. Deriving Knowledge from Data at Scale
  • 43. Deriving Knowledge from Data at Scale
  • 44. Deriving Knowledge from Data at Scale
  • 45. Deriving Knowledge from Data at Scale https://www.youtube.com/watch?v=yNccGtn3Wb0 https://www.youtube.com/watch?v=cO8NVCs_Ba0 https://www.youtube.com/watch?v=5tsyz3ibYzk
  • 46. Deriving Knowledge from Data at Scale
  • 47. Deriving Knowledge from Data at Scale
  • 48. Deriving Knowledge from Data at Scale
  • 49. Deriving Knowledge from Data at Scale Not all errors are equal We need better metrics…
  • 50. Deriving Knowledge from Data at Scale True Label Predicted Label Confusion matrix
  • 51. Deriving Knowledge from Data at Scale True Positive Rate True Negative Rate False Positive Rate False Negative Rate
  • 52. Deriving Knowledge from Data at Scale
  • 53. Deriving Knowledge from Data at Scale
  • 54. Deriving Knowledge from Data at Scale
  • 55. Deriving Knowledge from Data at Scale average per-class accuracy average of the accuracy for each class
  • 56. Deriving Knowledge from Data at Scale
  • 57. Deriving Knowledge from Data at Scale
  • 58. Deriving Knowledge from Data at Scale Frequently, one might look at only the top k items from the ranker, k = 5, 10, 20, 100, etc. Then the metrics would be called “precision@k” and “recall@k.”
  • 59. Deriving Knowledge from Data at Scale Frequently, one might look at only the top k items from the ranker, k = 5, 10, 20, 100, etc. Then the metrics would be called “precision@k” and “recall@k.” One might average precision and recall scores for each query and look at “average precision@k” and “average recall@k.” (Analogous to the relationship between accuracy and average per-class accuracy for classification.)
  • 60. Deriving Knowledge from Data at Scale Once you can compute precision and recall, you are often able to produce precision/recall curves. Suppose that you are attempting to identify spam. You run a learning algorithm to make predictions on a test set. But instead of just taking a “yes/no” answer, you allow your algorithm to produce its confidence. For instance, using a perceptron, you might use the distance from the hyperplane as a confidence measure. You can then sort all of your test emails according to this ranking. You may put the most spam-like emails at the top and the least spam-like emails at the bottom
  • 61. Deriving Knowledge from Data at Scale Once you can compute precision and recall, you are often able to produce precision/recall curves. Suppose that you are attempting to identify spam. You run a learning algorithm to make predictions on a test set. But instead of just taking a “yes/no” answer, you allow your algorithm to produce its confidence. For instance, using a perceptron, you might use the distance from the hyperplane as a confidence measure. You can then sort all of your test emails according to this ranking. You may put the most spam-like emails at the top and the least spam-like emails at the bottom Once you have this sorted list, you can choose how aggressively you want your spam filter to be by setting a threshold anywhere on this list. One would hope that if you set the threshold very high, you are likely to have high precision (but low recall). If you set the threshold very low, you’ll have high recall (but low precision). By considering every possible place you could put this threshold, you can trace out a curve of precision/recall values. This allows us to ask the question: for some fixed precision, what sort of recall can I get…
  • 62. Deriving Knowledge from Data at Scale
  • 63. Deriving Knowledge from Data at Scale • Method 1 • Method 2
  • 64. Deriving Knowledge from Data at Scale Method 1 Method 2
  • 65. Deriving Knowledge from Data at Scale Method 1 Method 2 very small differences quite a large difference
  • 66. Deriving Knowledge from Data at Scale Sometimes we want a single number that informs us of the quality of the solution. A popular way to combe precision and recall into a single number is by taking their harmonic mean. This is known as the balanced f-measure: The reason to use a harmonic mean rather than an arithmetic mean is that it favors systems that achieve roughly equal precision and recall. In the extreme case where P = R, then F = P = R. But in the imbalanced case, for instance P = 0.1 and R = 0.9, the overall f-measure is a modest 0.18.
  • 67. Deriving Knowledge from Data at Scale NDCG treat all retrieved items equally
  • 68. Deriving Knowledge from Data at Scale NDCG
  • 69. Deriving Knowledge from Data at Scale Regression Metrics
  • 70. Deriving Knowledge from Data at Scale Regression Metrics
  • 71. Deriving Knowledge from Data at Scale Regression Metrics
  • 72. Deriving Knowledge from Data at Scale This was not an exhaustive list or coverage, but I hope the next time you encounter a new evaluation metric you can deconstruct it, identify what is it measuring and why – to align with the business objective or function of the model?
  • 73. Deriving Knowledge from Data at Scale 10 minutes break…
  • 74. Deriving Knowledge from Data at Scale
  • 75. Deriving Knowledge from Data at Scale
  • 76. Deriving Knowledge from Data at Scale
  • 77. Deriving Knowledge from Data at Scale X Y N1 N2 o1 o2 O3
  • 78. Deriving Knowledge from Data at Scale Related problems * N. Talleb, The Black Swan: The Impact of the Highly Probable?, 2007
  • 79. Deriving Knowledge from Data at Scale
  • 80. Deriving Knowledge from Data at Scale
  • 81. Deriving Knowledge from Data at Scale Tid SrcIP Start time Dest IP Dest Port Number of bytes Attack 1 206.135.38.95 11:07:20 160.94.179.223 139 192 No 2 206.163.37.95 11:13:56 160.94.179.219 139 195 No 3 206.163.37.95 11:14:29 160.94.179.217 139 180 No 4 206.163.37.95 11:14:30 160.94.179.255 139 199 No 5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes 6 206.163.37.95 11:14:35 160.94.179.253 139 177 No 7 206.163.37.95 11:14:36 160.94.179.252 139 172 No 8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes 9 206.163.37.95 11:14:41 160.94.179.250 139 195 No 10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes 10
  • 82. Deriving Knowledge from Data at Scale Tid SrcIP Duration Dest IP Number of bytes Internal 1 206.163.37.81 0.10 160.94.179.208 150 No 2 206.163.37.99 0.27 160.94.179.235 208 No 3 160.94.123.45 1.23 160.94.179.221 195 Yes 4 206.163.37.37 112.03 160.94.179.253 199 No 5 206.163.37.41 0.32 160.94.179.244 181 No
  • 83. Deriving Knowledge from Data at Scale
  • 84. Deriving Knowledge from Data at Scale
  • 85. Deriving Knowledge from Data at Scale X Y N1 N2 o1 o2 O3
  • 86. Deriving Knowledge from Data at Scale Normal Anomaly
  • 87. Deriving Knowledge from Data at Scale Anomalous Subsequence
  • 88. Deriving Knowledge from Data at Scale
  • 89. Deriving Knowledge from Data at Scale Evaluation of Anomaly Detection – F-value
  • 90. Deriving Knowledge from Data at Scale Evaluation of Outlier Detection – ROC & AUC Standard measures for evaluating anomaly detection problems: Predicted class Confusion matrix NC C NC TN FPActual class C FN TP anomaly class – C normal class – NC
  • 91. Deriving Knowledge from Data at Scale
  • 92. Deriving Knowledge from Data at Scale Using Support Vector Machines SVM Classifiers • Main idea • Normal data records belong to high density data regions • Anomalies belong to low density data regions • Use unsupervised approach to learn high density and low density data regions • Use SVM to classify data density level • Main idea • Data records are labelled (normal network behaviour vs. intrusive) • Use standard SVM for classification
  • 93. Deriving Knowledge from Data at Scale Using Replicator Neural Networks Target variables Input
  • 94. Deriving Knowledge from Data at Scale Using Support Vector Machines origin push the hyper plane away from origin as much as possible
  • 95. Deriving Knowledge from Data at Scale Nearest Neighbour Based Techniques
  • 96. Deriving Knowledge from Data at Scale Nearest Neighbour Based Techniques
  • 97. Deriving Knowledge from Data at Scale Nearest Neighbor Based Techniques
  • 98. Deriving Knowledge from Data at Scale Distance based Outlier Detection Nearest Neighbour (NN) approach • For each data point d compute the distance to the k-th nearest neighbour dk • Sort all data points according to the distance dk • Outliers are points that have the largest distance dk and therefore are located in the more sparse neighbourhoods • Usually data points that have top n% distance dk are identified as outliers • n – user parameter
  • 99. Deriving Knowledge from Data at Scale Model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to this profile as anomalies. Isolation Forrest is a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. iForests create an algorithm which has a linear time complexity with a low constant and a low memory requirement. Empirical evaluations shows that iForest performs favorably to OCSVM a near- linear time complexity distance-based method, NN and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.
  • 100. Deriving Knowledge from Data at Scale Randomly select a dimension (feature) and then randomly select a cut point within that feature space, recurse, build a tree until leaf node has only n (for example 1). Create an ensemble of such trees, merge the results. Outliers do not have neighbors (mass) and will be close to the root (top) of the tree…
  • 101. Deriving Knowledge from Data at Scale “Isolation Forest” • When building a tree, anomalies are likely to be isolated closer to the root of the tree; whereas normal points appear deeper in the tree structure • No need to profile normal data points • No distance or density measures • Gap • Lacks explanatory power
  • 102. Deriving Knowledge from Data at Scale
  • 103. Deriving Knowledge from Data at Scale Neural Networks http://www.youtube.com/watch?v=GrugzF0-V3I
  • 104. Deriving Knowledge from Data at Scale X1 X2 X3 Y 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 0 X1 X2 X3 Y Black box Output Input Output Y is 1 if at least two of the three inputs are equal to 1.
  • 105. Deriving Knowledge from Data at Scale X1 X2 X3 Y 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 0  X1 X2 X3 Y Black box 0.3 0.3 0.3 t=0.4 Output node Input nodes      otherwise0 trueisif1 )(where )04.03.03.03.0( 321 z zI XXXIY
  • 106. Deriving Knowledge from Data at Scale  X1 X2 X3 Y Black box w1 t Output node Input nodes w2 w3 )( tXwIY i ii   Perceptron Model )( tXwsignY i ii   or
  • 107. Deriving Knowledge from Data at Scale − + + + + + + + + + + + − − − − − − − − − − − − − + + + + + + + + + + + − − − − − − − − − − − − Linearly separable data: Non-linearly separable data: + + + − − −
  • 108. Deriving Knowledge from Data at Scale Activation function g(Si ) Si Oi I1 I2 I3 wi1 wi2 wi3 Oi Neuron iInput Output threshold, t Input Layer Hidden Layer Output Layer x1 x2 x3 x4 x5 y Training ANN means learning the weights of the neurons
  • 109. Deriving Knowledge from Data at Scale
  • 110. Deriving Knowledge from Data at Scale
  • 111. Deriving Knowledge from Data at Scale • Inputs input layer hidden layer output layer feed-forward nonlinear regression
  • 112. Deriving Knowledge from Data at Scale input layer # of hidden layers each hidden layer output layer • Normalize input • Output unacceptable different network topology different set of initial weights
  • 113. Deriving Knowledge from Data at Scale 𝛻𝐸 𝒘 = 𝜕𝐸 𝜕𝑤0 , 𝜕𝐸 𝜕𝑤1 , … , 𝜕𝐸 𝜕𝑤 𝑑
  • 114. Deriving Knowledge from Data at Scale 114  2 ),(  i iii XwfYE
  • 115. Deriving Knowledge from Data at Scale Backpropagation Iteratively process a set of training tuples & compare the network's prediction with the actual known target value minimize the mean squared error backwards backpropagation
  • 116. Deriving Knowledge from Data at Scale Stochastic Gradient Descent
  • 117. Deriving Knowledge from Data at Scale
  • 118. Deriving Knowledge from Data at Scale • Cross-validation 0E 0E
  • 119. Deriving Knowledge from Data at Scale
  • 120. Deriving Knowledge from Data at Scale WEKA provides user control of training parameters: • # of hidden layers • Learning rate • # of iterations or epochs (“training time”) • Increment of weight adjustments in back propogation (“learning rate”) • Controls on varying changes to increments (“momentum”) and weight decay
  • 121. Deriving Knowledge from Data at Scale
  • 122. Deriving Knowledge from Data at Scale Multilayered Neural Networks require a lot of experimentation to get right… a lot… I’m serious…
  • 123. Deriving Knowledge from Data at Scale Hands On
  • 124. Deriving Knowledge from Data at Scale Are they any good?
  • 125. Deriving Knowledge from Data at Scale rare
  • 126. Deriving Knowledge from Data at Scale
  • 127. Deriving Knowledge from Data at Scale That’s all for tonight….