0% found this document useful (0 votes)

93 views

2021 01 Slides l4 ML

This document outlines an introduction to machine learning algorithms course. The course consists of 5 sessions covering topics like decision trees, regression models, neural networks, clustering, and data preparation. Each session includes discussion of exercises, a lecture, and introduction of new exercises. The course materials recommended several textbooks on intelligent data analysis, machine learning, and data mining. The document also provides examples of potential use cases for machine learning like churn prediction, customer segmentation, risk assessment, demand prediction, and fraud detection. It defines key concepts in data science including learning algorithms, algorithm training, and the three main types of learning: supervised, unsupervised, and semi-supervised learning.

Uploaded by

sajjad Baloch

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views

2021 01 Slides l4 ML

Uploaded by

sajjad Baloch

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 253

[L4-ML] Introduction to

Machine Learning
Algorithms
KNIME AG

1
Structure of the Course
Session Topic

Session 1 Introduction & Decision Tree Algorithm

Session 2 Regression Models, Ensemble Models, & Logistic Regression

Session 3 Neural Networks & Recommendation Engines

Session 4 Clustering & Data Preparation

Session 5 Last Exercise and Q&A

§ Structure of each session

§ Discussion of past exercises (10 minutes)
§ Course (60 minutes)
§ Introduction of next exercises (5 minutes)

© 2021 KNIME AG. All rights reserved. 3

Material
§ Michael Berthold, Christian Borgelt, Frank Höppner, Frank Klawonn:
Guide to Intelligent Data Analysis
Springer, 2010.

§ Tom Mitchell:
Machine Learning
McGraw Hill, 1997.

§ David Hand, Heikki Mannila, Padhraic Smyth:

Principles of Data Mining
MIT Press, 2001.

§ Michael Berthold, David Hand (eds):

Intelligent Data Analysis, An Introduction
(2nd Edition) Springer Verlag, 2003.

© 2021 KNIME AG. All rights reserved. 5

What is Data Science?

[Wikipedia quoting Dhar 13, Leek 13]

Data science is a multi-disciplinary field that uses scientific methods, processes,

algorithms and systems to extract knowledge and insights from structured and
unstructured data.

[Fayyad, Piatetsky-Shapiro & Smyth 96]

Knowledge discovery in databases (KDD) is the process of

(semi-)automatic extraction of knowledge from databases which is valid,
previously unknown, and potentially useful.

© 2021 KNIME AG. All rights reserved. 6

Some Clarity about Words

§ (semi)-automatic: no manual analysis, though some user interaction required

§ valid: in the statistical sense
§ previously unknown: not explicit, no „common sense knowledge“
§ potentially useful: for a given application
§ structured data: numbers
§ unstructured data: everything else (images, texts, networks, chem. compounds,
…)
Data Mining à
Data Science
Machine Data
Learning Preparation

Structured & Big Data

Unstructured
Data

© 2021 KNIME AG. All rights reserved. 7

Use Case Collection
Churn Prediction

CRM System § Churn Prediction

Data about your customer § Upselling Likelihood
§ Demographics § Product Propensity /NBO
§ Behavior § Campaign Management
§ Revenues § Customer Segmentation
§ …

Model

© 2021 KNIME AG. All rights reserved. 9

Customer Segmentation

CRM System § Churn Prediction

Data about your customer § Upselling Likelihood
§ Demographics § Product Propensity /NBO
§ Behavior § Campaign Management
§ Revenues § Customer Segmentation
§ …

Model

© 2021 KNIME AG. All rights reserved. 10

Risk Assessment

Customer History Risk Prognosis

Oct 2015 Oct 2015 § High Risk

CvrerhfdNov CvrerhfdNov
Vdsyh 2016 Vdsyh 2016 § Low Risk
dfgh dfgh
ddgd Vdsyh
Oct 2015
Cvrerhfd
CvrerhfdNov
Cvrerhfd
ddgd Vdsyh § High Risk
Jun 2017 Vdsyh 2016
dfgh
Cvrerhfdddgd
Jun 2017dfgh § Very High Risk
dfgh Cvrerhfd Cvrerhfdddgd
Apr 2018
Vdsyh ddgd Vdsyh Vdsyh
dfgh Cvrerhfd dfgh
Apr 2018
Cvrerhfd
§ Very Low Risk
Jun 2017
Oct 2015Vdsyh dfgh
ddgd Cvrerhfd
CvrerhfdNov
dfgh
ddgd
Feb 2019
ddgd Vdsyh
dfgh Feb 2019
§ Medium Risk
Vdsyh Apr 2018
Vdsyh 2016
dfgh
ddgd Cvrerhfd
dfgh Vdsyh Cvrerhfd ddgd Cvrerhfd
Vdsyh
§ …
Cvrerhfd
ddgd Vdsyh
ddgd Vdsyh dfgh dfgh Feb 2019 dfgh
Jun 2017 ddgd Cvrerhfd ddgd
dfgh ddgd
Cvrerhfdddgd Vdsyh
Vdsyh Apr 2018 dfgh
dfgh Cvrerhfd ddgd
ddgd Vdsyh
dfgh Feb 2019
ddgd Cvrerhfd
Vdsyh
dfgh
ddgd
Model

© 2021 KNIME AG. All rights reserved. 11

Demand Prediction

§ How many taxis do I need in NYC on Wednesday at noon?

Model

© 2021 KNIME AG. All rights reserved. 12

Recommendation Engines / Market Basket Analysis

Recommendation

IF è

Model

© 2021 KNIME AG. All rights reserved. 13

Fraud Detection

Suspicious Transaction

Transactions
§ Trx 1
§ Trx 2
§ Trx 3
§ Trx 4
§ Trx 5
§ Trx 6
§ …
Model

© 2021 KNIME AG. All rights reserved. 14

Sentiment Analysis

© 2021 KNIME AG. All rights reserved. 15

Anomaly Detection

Predicting mechanical failure as late as possible but before it happens

A1-SV3 [0, 100]

31 August
2007

A1-SV3 [500, 600]

Predictive Breaking point Hz
Maintenance July 21, 2008

Training Set

Only some Spectral Time Series shows the break down via REST

© 2021 KNIME AG. All rights reserved. 16

Basic Concepts in Data Science
What is a Learning Algorithm?

𝑿 = (𝑥1, 𝑥2, … , 𝑥𝑛)

§ Class
§ Input features § Label
§ Input attributes 𝑦 § Target
§ Independent variables § Output feature/attribute
§ Dependent variable

Model
Model parameters
𝑦 = 𝑓( 𝜷, 𝑿 ) with 𝜷 = [𝛽1, 𝛽2, … , 𝛽𝑚]

A learning algorithm adjusts (learns) the model parameters 𝜷 throughout a

number of iterations to maximize/minimize a likelihood/error function on 𝑦.

© 2021 KNIME AG. All rights reserved. 19

Algorithm Training / Learning

§ The model learns / is trained during the learning / training phase to produce
the right answer y (a.k.a., label)

§ That is why machine learning J

§ Many different algorithms for three ways of learning:

§ Supervised
§ Unsupervised
§ Semi-supervised

© 2021 KNIME AG. All rights reserved. 20

Supervised Learning

§ 𝑿 = (𝑥1, 𝑥2) and 𝑦 = {𝑦𝑒𝑙𝑙𝑜𝑤, 𝑔𝑟𝑎𝑦}

§ A training set with many examples of (𝑿, 𝑦)
§ The model learns on the examples of the training set to produce the right value
of y for an input vector 𝑿
x2
𝑿 model 𝑦
Age
Money Sunny vs. Cloudy
Temperature Healthy vs. Sick
Speed Churn vs. Remain
Number of taxi Increase vs.
... Deacrease
...

© 2021 KNIME AG. All rights reserved. 21

Unsupervised Learning

§ 𝑿 = (𝑥1, 𝑥2) and 𝑦 = {𝑦𝑒𝑙𝑙𝑜𝑤, 𝑔𝑟𝑎𝑦}

§ A training set with many examples of (𝑿, 𝑦)
§ The model learns to group the examples 𝑿 of the training set based on similarity
(closeness) or probability
model
x2
𝑿
Age
Money
Temperature
Speed
Number of taxi
...

© 2021 KNIME AG. All rights reserved. 22

Semi-Supervised Learning

§ 𝑿 = (𝑥1, 𝑥2) and 𝑦 = {𝑦𝑒𝑙𝑙𝑜𝑤, 𝑔𝑟𝑎𝑦}

§ A training set with many examples of 𝑿, 𝑦 and some samples 𝑿, 𝑦
§ The model labels the data in the training set using a modified unsupervised
learning procedure

x2
X
Age
Money
Temperature
Speed
Number of taxi
...

© 2021 KNIME AG. All rights reserved. 23

Supervised Learning: Classification vs. Numerical Predictions

§ 𝑿 = (𝑥1, 𝑥2) and 𝑦 = {𝑙𝑎𝑏𝑒𝑙 1, … , 𝑙𝑎𝑏𝑒𝑙 𝑛} or 𝑦 ∈ ℝ

§ A training set with many examples of (𝑿, 𝑦)
§ The model learns on the examples of the training set to produce the right value
of 𝑦 for an input vector 𝑿

Classification Numerical Predictions

y = {yellow, gray} y = temperature
y = {churn, no churn} y = number of visitors
y = {increase, unchanged, decrease} y = number of kW
y = {blonde, gray, brown, red, black} y = price
y = {job 1, job 2, ... , job n} y = number of hours

© 2021 KNIME AG. All rights reserved. 24

Training vs. Testing: Partitioning

§ Training phase: the algorithm trains a model using the data in the training set
§ Testing phase: a metric measures how well the model is performing on data in
a new dataset (the test set)

Training Set Evaluation Set* Test Set

* sometimes
© 2021 KNIME AG. All rights reserved. 25
Data Science: Process Overview

Train and apply Evaluate

Partition data performance
model

Training Train
Set Model

Original
Data Set

Apply Score
Test Set Model Model

© 2021 KNIME AG. All rights reserved. 26

The CRISP-DM Cycle

https://en.wikipedia.org/wiki/Cross_Industry_Standard_
Process_for_Data_Mining

© 2021 KNIME AG. All rights reserved. 27

A Classic Data Science Project

Data Model
Model Training Model Testing Deployment
Preparation Optimization

It always starts with

some data …

Data Manipulation Model Training Parameter Tuning Performance Files & DBs
Data Blending Bag of Models Parameter Optimization Measures Dashboards
Missing Values Handling Model Selection Regularization Accuracy REST API
Feature Generation Ensemble Models Model Size ROC Curve SQL Code Export
Dimensionality Reduction Own Ensemble Model No. Iterations Cross-Validation Reporting
Feature Selection External Models … … …
Outlier Removal Import Existing Models
Normalization Model Factory
Partitioning …
…

© 2021 KNIME AG. All rights reserved. 28

Decision Tree Algorithm
Goal: A Decision Tree
Outlook Wind Temp Storage Sailing
sunny 3 30 no yes
sunny 3 25 no no
rain 12 15 no yes
overcast 15 2 yes no
rain 16 25 no yes
sunny 14 18 no yes
rain 3 5 yes no
sunny 9 20 no yes
overcast 14 5 yes no
sunny 1 7 yes no
rain 4 25 no no
rain 14 24 no yes
sunny 11 20 no yes
sunny 2 18 no no
overcast 8 22 no yes
overcast 13 24 no yes

© 2021 KNIME AG. All rights reserved. 30

How can we Train a Decision Tree with KNIME Analytics Platform

© 2021 KNIME AG. All rights reserved. 31

Goal: A Decision Tree
Outlook Wind Temp Storage Sailing Option 1
sunny 3 30 yes yes
sunny 3 25 yes no
rain 12 15 yes yes
overcast 15 2 no no
rain 16 25 yes yes
sunny 14 18 yes yes
Option 2
rain 3 5 no no
sunny 9 20 yes yes
overcast 14 5 no no
sunny 1 7 no no
rain 4 25 yes no
rain 14 24 yes yes
sunny 11 20 yes yes How can we measure which is
sunny 2 18 yes no
the best feature for a split?
overcast 8 22 yes yes
overcast 13 24 yes yes

© 2021 KNIME AG. All rights reserved. 32

Possible Split Criterion: Gain Ratio

Based on entropy = measure for information / uncertainty

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑝 = − ∑$!"# 𝑝! log % 𝑝! for 𝑝 ∈ ℚ$

𝑝! = "⁄!# 𝑝! = !#⁄!# = 1

𝑝$ = %⁄!# 𝑝$ = &⁄!# = 0

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑝 = − "⁄!# log $ "⁄!# + %⁄!# log $ %⁄!# 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑝 = − !#⁄ log !#⁄
!# $ !# + &⁄!# log $ &⁄!#
= 0,995 =0

© 2021 KNIME AG. All rights reserved. 33

Possible Split Criterion: Gain Ratio

Split criterion:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦&'()*'
% " 𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦9:013: − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦;0.:3
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ,
!# !#
% "
𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦9:013: − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦! − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦$
!# !#

Next splitting feature: Feature with

highest 𝐺𝑎𝑖𝑛

Problem: Favors features with many

different values

Solution: Gain Ratio

+ ! $ +
𝐸𝑛𝑡𝑟𝑜𝑝𝑦! = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ,
" "
𝐸𝑛𝑡𝑟𝑜𝑝𝑦$ = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ,
% %
𝑤! = "⁄!# '()* 2*.31,4!"#$%" 5∑)
&'( 7& 2*.31,4&
𝑤$ = %⁄!# 𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 = +,-)./*01 = ∑)
&'( 7& -18* 7&

© 2021 KNIME AG. All rights reserved. 34

Possible Split Criterion: Gini Index

Gini index is based on Gini impurity:

𝑝! = %⁄!#
𝐺𝑖𝑛𝑖(𝑝) = 1 − ∑4123 𝑝15 for 𝑝 ∈ ℚ4
𝑝$ = "⁄!#
6< 8<
𝐺𝑖𝑛𝑖 𝑝 = 1 − 37< − 37<

Split criterion:
𝐺𝑖𝑛𝑖94:;< = ∑4123 𝑤1 𝐺𝑖𝑛𝑖1
8 6
𝐺𝑖𝑛𝑖94:;< = 𝐺𝑖𝑛𝑖3 + 𝐺𝑖𝑛𝑖5
37 37

𝐺𝑖𝑛𝑖! = 𝐺𝑖𝑛𝑖 +⁄", !⁄" 𝐺𝑖𝑛𝑖$ = 𝐺𝑖𝑛𝑖 $⁄%, +⁄% Next splitting feature:
𝑤! = "⁄!# 𝑤$ = %⁄!# Feature with lowest 𝐺𝑖𝑛𝑖&$'()

© 2021 KNIME AG. All rights reserved. 35

What happens for numerical Input Features?

Subset for each value? – NO

Solution: Binary splits

𝑥 = 1.2 𝑥=<
3.4𝑥 𝑥 = 1.7 𝑥 = 3.6
≥𝑥 𝑥 = 4.9
𝑥 = 9.2 𝑥=2 𝑥 = 12.6 𝑥 = 7.4 𝑥=8 𝑥 = 2.3

© 2021 KNIME AG. All rights reserved. 36

The Deeper the Better?!
wind
≥4 <4
𝑡𝑒𝑚𝑝

30 temp temp
≥ 10 < 10 ≥ 25 < 25
25
temp wind
20
≥ 22 < 22 ≥6 <6

15
temp
≥ 26 < 26
10

5 wind
≥6 <6
1 2 3 4 5 6 7 𝑤𝑖𝑛𝑑

© 2021 KNIME AG. All rights reserved. 37

Overfitting vs Underfitting

Underfitted Generalized Overfitted

Model memorizes
Model overlooks Model captures
the training set
underlying correlations in the
rather then finding
patterns in the training set
underlying patterns
training set

© 2021 KNIME AG. All rights reserved. 38

Overfitting vs Underfitting

Overfitting Underfitting

§ Model that fits the training data too § A model that can neither model the
well, including details and noise training data nor generalize to new data
§ Negative impact on the model’s ability
to generalize

Overfitted Generalized Underfitted

© 2021 KNIME AG. All rights reserved. 39

Controlling the Tree Depth

Goal: Tree that generalizes to new data and doesn’t overfit

Pruning Early stopping

Idea: Cut branches that seem as Idea: Define a minimum size for the
result from overfitting tree leaves

Techniques:
• Reduced Error Pruning
• Minimum description length

© 2021 KNIME AG. All rights reserved. 40

Pruning - Minimum Description Length Pruning (MDL)

Definition: Description length = #bits(tree) + #bits(misclassified samples)

Tree 1 Tree 2 Note

wind wind
Many misclassified
Example 1

samples in tree 1
temp
12 0 12 0 => DL(Tree 1) > DL(Tree 2)
6 7 => Select Tree 2

wind wind Only 1 misclassified sample

Example 2

in tree 1
temp
12 0 1 13 12 0 => DL(Tree 1) < DL(Tree 2)
=> Select Tree 1

© 2021 KNIME AG. All rights reserved. 41

Applying the Model – What are the Outputs?

© 2021 KNIME AG. All rights reserved. 42

No True Child Strategy

Outlook Wind Temp Storage Sailing Training:

sunny 3 30 yes yes outlook
sunny 3 25 yes no sunny rain
Training

rain 12 15 yes yes

rain 16 25 yes yes
sunny 14 18 yes yes
rain 3 5 no no
What happens with outlook = overcast?
sunny 9 20 yes yes
sunny 1 7 no no
rain 4 25 yes no
rain 14 24 yes yes
Testing

sunny 11 20 yes yes

sunny 2 18 yes no
overcast 8 22 yes yes
overcast 13 24 yes yes

© 2021 KNIME AG. All rights reserved. 43

Evaluation of Classification
Models
Evaluation Metrics

§ Why evaluation metrics?

§ Quantify the power of a model
§ Compare model configurations and/or models, and select the best performing one
§ Obtain the expected performance of the model for new data
§ Different model evaluation techniques are available for
§ Classification/regression models
§ Imbalanced/balanced target class distributions

© 2021 KNIME AG. All rights reserved. 45

Overall Accuracy

§ Definition:

# 𝑪𝒐𝒓𝒓𝒆𝒄𝒕 𝒄𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏𝒔 (𝒕𝒆𝒔𝒕 𝒔𝒆𝒕)

𝑶𝒗𝒆𝒓𝒂𝒍𝒍 𝒂𝒄𝒄𝒖𝒓𝒂𝒄𝒚 =
# 𝑨𝒍𝒍 𝒆𝒗𝒆𝒏𝒕𝒔 (𝒕𝒆𝒔𝒕 𝒔𝒆𝒕)

§ The proportion of correct classifications

§ Downsides:
§ Only considers the performance in general and not for the different classes
§ Therefore, not informative when the class distribution is unbalanced

© 2021 KNIME AG. All rights reserved. 46

Confusion Matrix for Sailing Example

Sailing Predicted Predicted Sailing Predicted Predicted

yes / no class: yes class: no yes / no class: yes class: no

True class: True class:

22 3 0 25
yes yes
True class: True class:
no 12 328 no 0 340

*+# *-#
Ac𝑐𝑢𝑟𝑎𝑐𝑦 = = 0,96 Ac𝑐𝑢𝑟𝑎𝑐𝑦 = = 0,93
*,+ *,+

§ Rows – true class values

§ Columns – predicted class values
§ Numbers on main diagonal – correctly classified samples
§ Numbers off the main diagonal – misclassified samples

© 2021 KNIME AG. All rights reserved. 47

Confusion matrix

Arbitrarily define one class value as POSITIVE and the remaining class as
NEGATIVE
TRUE POSITIVE (TP): Actual and
Predicted class Predicted class
positive negative predicted class is positive
TRUE NEGATIVE (TN): Actual and
True class TRUE FALSE
positive POSITIVE NEGATIVE
predicted class is negative
FALSE NEGATIVE (FN): Actual class
True class FALSE TRUE
negative POSITIVE NEGATIVE
is positive and predicted negative
FALSE POSITIVE (FP): Actual class
is negative and predicted positive

Use these four statistics to calculate other evaluation metrics, such as overall
accuracy, true positive rate, and false positive rate
© 2021 KNIME AG. All rights reserved. 48
ROC Curve

§ The ROC Curve shows the false positive rate and true positive rate for
different threshold values
§ False positive rate (FPR)
§ negative events incorrectly classified as positive
§ True positive rate (TPR)
§ positive events correctly classified as positive

Predicted Predicted class Optimal

class positive negative threshold
True 𝑇𝑃
True Positive False Negative 𝑇𝑃𝑅 =
class 𝑇𝑃 + 𝐹𝑁
(TP) (FN)
positive
True
False True Negative
class
Positive (FP) (TN) 𝐹𝑃
negative 𝐹𝑃𝑅 =
𝐹𝑃 + 𝑇𝑁

© 2021 KNIME AG. All rights reserved. 49

Cohen‘s Kappa (κ) vs. Overall accuracy

Positive Negative Positive Negative

Positive 14 6 Switch TP Positive 6 14

and FP
Negative 5 75 Negative 5 75

19 20 11 20
𝑝'! = × 𝑝'! = ×
100 100 100 100
81 80 89 80
𝑝'$ = × 𝑝'$ = ×
100 100 100 100
Overall
𝑝' = 𝑝'! + 𝑝'$ = 0.686 accuracy
𝑝' = 𝑝'! + 𝑝'$ = 0.734
89 81
𝑝- = = 0.89 𝑝- = = 0.81
100 100
κ = 1: perfect model
performance
9!:9" #.%#- 9!:9" #.#?,
𝜅= = ≈ 0.65 κ = 0: the model performance
;:9" #.*;- is equal to a random classifier
𝜅= = = 0.29
;:9" #.%,,

© 2021 KNIME AG. All rights reserved. 50

Exercise: 01_Training_a_Decision_Tree_Model

§ Dataset: Sales data of individual residential properties in Ames, Iowa from 2006
to 2010.
§ One of the columns is the overall condition ranking, with values between 1 and
10.
§ Goal: train a binary classification model, which can predict whether the overall
condition is high or low.

You can download the training workflows from the KNIME Hub:
https://hub.knime.com/knime/spaces/Education/latest/Courses/

© 2021 KNIME AG. All rights reserved. 51

Exercise Session 1

§ Import the course material to KNIME Analytics Platform

2. Click on Browse and select

downloaded .knar file

1. Right click on
LOCAL and select
Import KNIME
Workflow….

3. Click on Finish

© 2021 KNIME AG. All rights reserved. 52

Exercise: Decision_Tree

© 2021 KNIME AG. All rights reserved. 53

Session II: Regression
Models, Ensemble Models
& Logistic Regression
Regression Problems

55
Regression Analysis

§ Prediction of numerical target values

§ Commonality with models for classification

§ First, construct a model
§ Second, use model to predict unknown value
§ Major method for prediction is regression in all its flavors
§ Simple and multiple regression
§ Linear and non-linear regression

§ Difference from classification

§ Classification aims at predicting categorical class label
§ Regression models aim at predicting values from continuous-valued functions

© 2021 KNIME AG. All rights reserved. 56

Regression
Predict numeric outcomes on existing data (supervised)

Applications
§ Forecasting
§ Quantitative Analysis

Methods
§ Linear
§ Polynomial
§ Regression Trees
§ Partial Least Squares

© 2021 KNIME AG. All rights reserved. 57

Linear Regression Algorithm
Linear Regression

Predicts the values of the target variable y

based on a linear combination of
the values of the input feature(s) xj
Two input features: 𝑦j = 𝑎# + 𝑎; 𝑥; + 𝑎% 𝑥%
p input features: 𝑦j = 𝑎# + 𝑎; 𝑥; + 𝑎% 𝑥% + ⋯ + 𝑎9 𝑥9

§ Simple regression: one input feature à regression line

§ Multiple regression: several input features à regression hyper-plane
§ Residuals: differences between observed and predicted values (errors)
Use the residuals to measure the model fit

© 2021 KNIME AG. All rights reserved. 59

Simple Linear Regression

Optimization goal: minimize sum of squared residuals

∑$!"; 𝑒!% = ∑$!"; 𝑦! − 𝑦m! %

y
𝑥
𝑎;
+
𝑎#
=
Residual 𝑦j
ei
yi

x
© 2021 KNIME AG. All rights reserved. 60
Simple Linear Regression

§ Think of a straight line 𝑦j = 𝑓 𝑥 = 𝑎 + 𝑏𝑥

§ Find 𝑎 and 𝑏 to model all observations (𝑥! , 𝑦! ) as close as possible
§ è SSE 𝐹 𝑎, 𝑏 = ∑$!";(𝑓 𝑥 − 𝑦! )% = ∑$!";(𝑎 + 𝑏𝑥! − 𝑦! )% should be minimal
§ That is:
$
𝜕𝐹
= p 2 𝑎 + 𝑏𝑥! − 𝑦! = 0
𝜕𝑎
!";
$
𝜕𝐹
= p 2 𝑎 + 𝑏𝑥! − 𝑦! 𝑥! = 0
𝜕𝑏
!";

§ è A unique solution exists for 𝑎 and 𝑏

© 2021 KNIME AG. All rights reserved. 61

Linear Regression

§ Optimization goal: minimize the squared residuals

%
∑$!"; 𝑒!% = ∑$!"; 𝑦! − ∑$@"# 𝑎@ 𝑥@,! = 𝑦 − 𝑎𝑋 B 𝑦 − 𝑎𝑋

§ Solution:
𝑎j = 𝑋 B 𝑋 :; 𝑋 B 𝑦

§ Computational issues:
§ 𝑋 = 𝑋 must have full rank, and thus be invertible
(Problems arise if linear dependencies between input features exist)
§ Solution may be unstable, if input features are almost linearly dependent

© 2021 KNIME AG. All rights reserved. 62

Linear Regression: Summary

§ Positive:
§ Strong mathematical foundation
§ Simple to calculate and to understand
(For moderate number of dimensions)
§ High predictive accuracy
(In many applications)

§ Negative:
§ Many dependencies are non-linear
(Can be generalized)
§ Model is global and cannot adapt well to locally different data distributions
But: Locally weighted regression, CART

© 2021 KNIME AG. All rights reserved. 63

Polynomial Regression

Predicts the values of the target variable y

based on a polynomial combination of degree d of
the values of the input feature(s) xj
9 9 9
ỹ = 𝑎# + ∑@"; 𝑎@,; 𝑥@ + ∑@"; 𝑎@,% 𝑥@% + ⋯ + ∑@"; 𝑎@,' 𝑥@'

§ Simple regression: one input feature à regression curve

§ Multiple regression: several input features à regression hypersurface
§ Residuals: differences between observed and predicted values (errors)
Use the residuals to measure the model fit

© 2021 KNIME AG. All rights reserved. 64

Evaluation of Regression Models
Numeric Errors: Formulas
Error Metric Formula Notes

R-squared ∑0./!(𝑦. −𝑓(𝑥. ))$ Universal range: the closer to 1 the

1− better
∑0./!(𝑦. −𝑦)$
0
Mean absolute error (MAE) 1 Equal weights to all distances
V |𝑦. − 𝑓(𝑥. )| Same unit as the target column
𝑛
./!
0
Mean squared error (MSE) 1 Common loss function
V(𝑦. − 𝑓(𝑥. ))$
𝑛
./!

Root mean squared error (RMSE) 0 Weights big differences more

1 Same unit as the target column
V(𝑦. − 𝑓(𝑥. ))$
𝑛
./!

0
Mean signed difference 1 Only informative about the direction
V 𝑦. − 𝑓 𝑥. of the error
𝑛
./!
0
Mean absolute percentage error 1 |𝑦. − 𝑓(𝑥. )| Requires non-zero target column
(MAPE) V values
𝑛 |𝑦. |
./!

© 2021 KNIME AG. All rights reserved. 67

MAE (Mean Absolute Error) vs. RMSE (Root Mean Squared Error)

MAE RMSE

Easy to interpret – mean average absolute error Cannot be directly interpreted as the average error

All errors are equally weighted Larger errors are weighted more

Generally smaller than RMSE Ideal when large deviations need to be avoided

Example:
Actual values = [2,4,5,8], MAE RMSE

Case 1: Predicted Values = [4, 6, 8, 10] Case 1 2.25 2.29

Case 2: Predicted Values = [4, 6, 8, 14] Case 2 3.25 3.64

© 2021 KNIME AG. All rights reserved. 68

R-squared vs. RMSE

R-squared RMSE

Relative measure: Absolute measure:

Proportion of variability explained by the model How much deviation at each point
Range: Same scale as the target
0 (no variability explained) to
1 (all variability explained)

Example:
Actual values = [2,4,5,8], R-sq RMSE

Case 1: Predicted Values = [3, 4, 5, 6] Case 1 0.96 1.12

Case 2: Predicted Values = [3, 3, 7, 7] Case 2 0.65 1.32

© 2021 KNIME AG. All rights reserved. 70

Numeric Scorer

§ Similar to scorer node, but for nodes with numeric predictions

§ Compare dependent variable values to predicted values to evaluate model
quality.
§ Report R2, RMSE, MAPE, etc.

© 2021 KNIME AG. All rights reserved. 75

Regression Tree

76
Regression Tree: Goal

y We want to model the target

variable with piecewise lines
à No knowledge of functional
form required

© 2021 KNIME AG. All rights reserved. 77

Regression Tree: Initial Split
Squared sum of errors:
y
Local mean:
𝐸X = 0 𝑦1 − 𝑐X 5
1
𝑐X = 0 𝑦1
𝑛 Optimal boundary S should minimize
For observations in
the total squared sum:
segment m
0 𝐸X
For all segments m

s x

© 2021 KNIME AG. All rights reserved. 78

Regression Tree: Initial Split

y 𝑥 ≤ 93.5?

Y N

𝐶! = 28.9 𝐶" = 17.8

s x

© 2021 KNIME AG. All rights reserved. 79

Regression Tree: Growing the Tree

Repeat the
y splitting process 𝑥 ≤ 93.5?

within each Y N

segment
𝑥 ≤ 70.5? 𝐶+ = 17.8

Y N

𝐶, = 33.9 𝐶- = 26.4

s x

© 2021 KNIME AG. All rights reserved. 80

Regression Tree: Final Model

© 2021 KNIME AG. All rights reserved. 81

Regression Tree: Algorithm

Start with a single node containing all points.

1. Calculate ci and Ei.
2. If all points have the same value for feature xj, stop.
3. Otherwise, find the best binary splits that reduces Ej,s as much as possible.
§ Ej,s doesn’t reduce as much à stop
§ A node contains less than the minimum node size à stop
§ Otherwise, take that split, creating two new nodes.
§ In each new node, go back to step 1.

© 2021 KNIME AG. All rights reserved. 84

Regression Trees: Summary

§ Differences to decision trees:

§ Splitting criterion: minimizing intra-subset variation (error)
§ Pruning criterion: based on numeric error measure
§ Leaf node predicts average target values of training instances reaching that node

§ Can approximate piecewise constant functions

§ Easy to interpret

© 2021 KNIME AG. All rights reserved. 85

Regression Trees: Pros & Cons

§ Finding of (local) regression values (average)

§ Problems:
§ No interpolation across borders
§ Heuristic algorithm: unstable and not optimal.

§ Extensions:
§ Fuzzy trees (better interpolation)
§ Local models for each leaf (linear, quadratic)

© 2021 KNIME AG. All rights reserved. 86

Ensemble Models
Tree Ensemble Models

§ General idea: take advantage of the

X
“wisdom of the crowd”
§ Ensemble models: Combining predictions
…
1 4 1

from many predictors, e.g. decision trees 2

9 6
2

7 6
2

8 9
7

3 3
7

9 5
6

§ Leads to a more accurate and robust model P1 P2 … Pn

§ Model is difficult to interpret

y
§ There are multiple trees in the model
Typically for classification, the
individual models vote and the
majority wins; for regression,
the individual predictions are
averaged

© 2021 KNIME AG. All rights reserved. 88

Bagging - Idea

§ One option is ”bagging” (Bootstrap AGGregatING)

§ For each tree / model a training set is generated by sampling uniformly with
replacement from the standard training set

Build tree Build tree Build tree

…
1 4 1

5 2 5 7 7 6

2 9 6 7 2 8 9 3 3 9 5 7

© 2021 KNIME AG. All rights reserved. 89

Example for Bagging

Full training set Sampled training set

RowID 𝒙𝟏 𝒙𝟐 𝒚 Sampled𝒙𝒙dataset
RowID
RowID 𝟏 𝟏
𝒙𝒙𝟐 𝟐
𝒚
𝒚

Row_1 2 6 Class 1 Row_3 9 3 Class 2

Row_2 4 1 Class 2 Row_6 2 6 Class 1

Row_3 9 3 Class 2 Row_1 2 6 Class 1

Row_4 2 7 Class 1 Row_3 9 3 Class 2

Row_5 8 1 Class 2 Row_5 8 1 Class 2

Row_6 2 6 Class 1 Row_6 2 6 Class 1

Row_7 5 2 Class 2 Row_1 2 6 Class 1

© 2021 KNIME AG. All rights reserved. 91

An Extra Benefit of Bagging: Out of Bag Estimation

§ Able to evaluate the model using the training data

§ Apply trees to samples that haven’t been used for training

X1 X2

… …
1 4 1 1 4 1

5 2 2 7 7 6 5 2 2 7 7 6

2 9 6 7 6 8 9 3 3 9 5 7 2 9 6 7 6 8 9 3 3 9 5 7

P1 P2 … Pn P1 P2 … Pn

y1OOB y2OOB

© 2021 KNIME AG. All rights reserved. 92

Random Forest

§ Bag of decision trees, with an extra element of

randomization
§ Each node in the decision tree only “sees” a subset of
the input features, typically 𝑁 to pick from
§ Random forests tend to be very robust w.r.t. overfitting

Build tree

5 2

2 9 6 7

© 2021 KNIME AG. All rights reserved. 93

Boosting - Idea

§ Starts with a single tree built from the data

§ Fits a tree to residual errors from the previous model to refine the model
sequentially

Residual Residual
errors
… errors
from previous from previous
model model

Build tree Build tree Build tree

…
1 4 1

5 2 5 7 7 6

2 9 6 7 2 8 9 3 3 9 5 7

© 2021 KNIME AG. All rights reserved. 94

Boosting - Idea

§ Gradient boosting method

§ A shallow tree (depth 4 or less) is built at each step
§ To fit residual errors from the previous step
§ Resulting in a tree ℎN (𝑥)
§ The resulting tree is added to the latest model to update
𝐹N 𝑥 = 𝐹N5! 𝑥 + 𝛾N ℎN (𝑥)
§ Where 𝐹N5! (𝑥) is the model from the previous step
§ The weight 𝛾N is chosen to minimize the loss function
§ Loss function: quantifies the difference between model predictions and data

© 2021 KNIME AG. All rights reserved. 95

Gradient Boosting Example – Regression

Regression tree
with depth 1

© 2021 KNIME AG. All rights reserved. 96

Gradient Boosted Trees

§ Can be used for classification and regression

§ Large number of iterations – prone to overfitting
§ ~100 iterations are sufficient
§ Can introduce randomness in choice of data subsets (“stochastic gradient
boosting”) and choice of input features

© 2021 KNIME AG. All rights reserved. 97

Ensemble Tree Nodes in KNIME Analytics Platform

Classification Problems Regression Problems

© 2021 KNIME AG. All rights reserved. 98

Parameter Optimization

© 2021 KNIME AG. All rights reserved. 99

Logistic Regression
What is a Logistic Regression (algorithm)?

§ Another algorithm to train a classification model

I know already the decision

tree algorithm and tree
ensembles. Why do I need
another one?

© 2021 KNIME AG. All rights reserved. 101

Why Shouldn’t we Always use the Decision Tree?

© 2021 KNIME AG. All rights reserved. 102

Decision Boundary of a Logistic Regression

© 2021 KNIME AG. All rights reserved. 103

Linear Regression vs. Logistic Regression

Linear Regression Logistic Regression

Target variable y Numeric 𝑦 ∈ (−∞, ∞)/[𝑎, 𝑏] Nominal 𝑦 ∈ 0, 1, 2, 3 /{𝑟𝑒𝑑, 𝑤ℎ𝑖𝑡𝑒}

… target value 𝑦
Functional relationship … class probability P (y = class i)
between features 𝑦 = 𝑓(𝑥! , … , 𝑥* , 𝛽& , … , 𝛽* )
and… 𝑦 = 𝛽& +𝛽! 𝑥! + ⋯ + 𝛽* 𝑥* 𝑃 𝑦 = 𝑐) = 𝑓 𝑥! , … , 𝑥* , 𝛽& , … , 𝛽*

Goal: Find the regression coefficients 𝛽# , … , 𝛽$

© 2021 KNIME AG. All rights reserved. 104

Let’s find out how Binary Logistic Regression works!

§ Idea: Train a function, which gives us the probability for each class (0 and 1)
based on the input features

§ Recap on probabilities
§ Probabilities are always between 0 and 1
§ The probability of all classes sum up to 1

𝑃 𝑦 = 1 = 𝑝! => 𝑃 𝑦 = 0 = 1 − 𝑝!

è It’s sufficient to model the probability for one class

© 2021 KNIME AG. All rights reserved. 106

Let’s Find Out How Binary Logistic Regression Works!

1
𝑃 𝑦 = 1 = 𝑓 𝑥; , 𝑥% ; 𝛽# , 𝛽; , 𝛽% ≔
1 + 𝑒 :(D!ED2)2ED3)3)

Feature space Probability function given 𝑥; = 2

© 2021 KNIME AG. All rights reserved. 107

More General: Binary Logistic Regression

§ Model:
;
𝜋 = 𝑃(𝑦 = 1) =
;EGHI(:J)

With 𝑧 = 𝛽& + 𝛽! 𝑥! + ⋯ + 𝛽* 𝑥* = 𝑿𝜷.

§ Goal: Find the regression coefficients 𝜷 = (𝛽# , … , 𝛽$ )
§ Notation:
§ 𝑦) is the class value for sample i
§ 𝑥! , … , 𝑥* is the set of input features, 𝑿 = (1, 𝑥! , … , 𝑥* )
§ The training data set has m observations (𝑦) , 𝑥!) , … , 𝑥*) )

© 2021 KNIME AG. All rights reserved. 110

How can we Find the Best Coefficients β?

Maximize the product of the probabilities è Likelihood function

8 8
4 564K
𝐿 𝛽; 𝑦, 𝑋 = 0 𝑃(𝑦 = 𝑦3 ) = 0 𝜋3 K 1 − 𝜋3
375 375

Why does it make sense to maximize this function?

𝜋3 𝑖𝑓 𝑦3 = 1
𝑃 𝑦 = 𝑦3 = $1 − 𝜋
3 𝑖𝑓 𝑦3 = 0 Remember:
𝜋! = P 𝑦 = 1
4 564K 𝑢# = 1 for 𝑢 ∈ ℝ
= 𝜋3 K 1 − 𝜋3 𝑢; = 𝑢 for 𝑢 ∈ ℝ

© 2021 KNIME AG. All rights reserved. 111

Max Likelihood and Log Likelihood Functions

§ Maximize the Likelihood function 𝐿 𝜷; 𝒚, 𝑿

8
4 564K
max 𝐿 𝛽; 𝑦, 𝑋 = max 0 𝜋3 K 1 − 𝜋3
9 9 375

§ Equivalent to maximizing the Log Likelihood function 𝐿𝐿 𝜷; 𝒚, 𝑿

:
max 𝐿𝐿(𝜷; 𝒚, 𝑿) = max 9 𝑦3 ln 𝜋3 + 1 − 𝑦3 ln 1 − 𝜋3
9 9 375

© 2021 KNIME AG. All rights reserved. 112

How can we find this Coefficients?

§ To find the coefficients of our model we want to find 𝜷 so that the value of the
function 𝐿𝐿 𝜷; 𝒚, 𝑿 is maximal

§ KNIME Analytics Platform provides two algorithms

§ Iteratively re-weighted least squares
§ Uses the idea of the newton method
§ Stochastic average gradient descent

© 2021 KNIME AG. All rights reserved. 113

Idea: Gradient Descent Method

max 𝐿𝐿(𝜷; 𝑿, 𝒚) ⟺ min −𝐿𝐿(𝜷; 𝑿, 𝒚)

Optimal 𝛽z

© 2021 KNIME AG. All rights reserved. 116

Idea: Gradient Descent Method
max 𝐿𝐿(𝜷; 𝑿, 𝒚) ⟺ min −𝐿𝐿(𝜷; 𝑿, 𝒚)

§ Example: min −𝐿𝐿 𝛽 ≔ 𝑓(𝛽)

§ Start from an arbitrary point
§ Move towards the minimum
Δs
§ With step size Δ𝑠
§ If 𝑓(𝛽) is strictly convex
è Only one global minimum exists
§ Z normalization of the input data
lead to better convergence

Optimal 𝛽z

© 2021 KNIME AG. All rights reserved. 117

Learning Rate / Step Length Δ𝑠

Δ𝑠 too small Δ𝑠 too large Just right

Δ𝑠

© 2021 KNIME AG. All rights reserved. 118

Learning Rate Δ𝑠

§ Fixed:
Δ𝑠L = Δ𝑠#
§ Annealing:
Δ𝑠#
Δ𝑠L = 𝛼
1+
𝑘
with iteration number 𝑘 and decay rate 𝛼
§ Line Search: Learning rate strategy that tries to find the optimal learning rate

© 2021 KNIME AG. All rights reserved. 119

Is there a way to handle Overfitting as well? (optional)

§ To avoid overfitting: add regularization by penalizing large weights

!
§ 𝐿$ regularizations = Coefficients are Gauss distributed with 𝜎 = O

𝑙 𝛽; z 𝑦, 𝑋 + M ||𝛽||
z 𝑦, 𝑋 ≔ −𝐿𝐿 𝛽; z %%
%

$
§ 𝐿! regularizations = Coefficients are Laplace distributed with 𝜎 = O

z 𝑦, 𝑋 ≔ −𝐿𝐿 𝛽;
𝑙 𝛽; z 𝑦, 𝑋 + 𝜆||𝛽||
z ;

=> The smaller 𝜎, the smaller the coefficients 𝛽#

© 2021 KNIME AG. All rights reserved. 122

Impact of Regularization

© 2021 KNIME AG. All rights reserved. 123

Interpretation of the Coefficients

§ Interpretation of the sign

§ 𝛽) > 0 : Bigger 𝑥) lead to higher probability
§ 𝛽) < 0 : Bigger 𝑥) lead to smaller probability

© 2021 KNIME AG. All rights reserved. 124

Interpretation of the p Value

§ p- value < 𝛼: input feature has a significant impact on the dependent variable.

© 2021 KNIME AG. All rights reserved. 125

Summary Logistic Regression

§ Logistic regression is used for classification problems

§ The regression coefficients are calculated by maximizing the likelihood function,
which has no closed form solution, hence iterative methods are used.
§ Regularization can be used to avoid overfitting.
§ The p-value shows us whether an independent variable is significant

© 2021 KNIME AG. All rights reserved. 129

Exercises

§ Regression Exercises:
§ Goal: Predicting the house price
§ 01_Linear_Regression
§ 02_Regression_Tree
§ Classification Exercises:
§ Goal: Predicting the house condition (high /low)
§ 03_Radom_Forest (with optional exercise to build a parameter
optimization loop)
§ 04_Logistic_Regression

© 2021 KNIME AG. All rights reserved. 130

Session 3: Neural Networks and
Recommendation Engines

131
Artificial Neurons and Networks
Biological vs. Artificial

Biological Neuron Biological Neural Networks

Artificial Neuron (Perceptron) Artificial Neural Networks

(Multilayer Perceptron, MLP)

𝑦 = 𝑓(𝑥! 𝑤! + 𝑥$ 𝑤$ + 𝑏) 𝑥;
𝑥; 𝑏
𝑤! 𝑏 = 𝑤& 𝑥% y
∑f( )σ y *
𝑤$ 𝑦 = 𝑓(\ 𝑥) 𝑤) ) 𝑥*
𝑥% )P&
𝑥-

© 2021 KNIME AG. All rights reserved. 133

Architecture / Topology

Input Hidden Output

Layer Layer Layer
Forward pass:
𝑜;%
∑ 𝑓 𝒐 = 𝑓 𝑊)% 𝒙
$
𝑊!,! #
𝑊!,!
$
𝑥; 𝑊!,$ 𝑦 = 𝑓(𝑊N* 𝒐)
$
𝑜%%
𝑊$,!
∑ 𝑓 #
𝑊!,$ ∑ 𝑓 𝑦
$
𝑊$,$

𝑥% $
𝑊#,! # fully connected
𝑊!,#
$
𝑊#,$ 𝑜*% feed forward
∑ 𝑓

© 2021 KNIME AG. All rights reserved. 134

Same with Matrix Notations

Input Hidden Output

Layer Layer Layer Forward pass:
𝑾𝟐𝒙 𝑜;% 𝑾𝟑𝒚
∑ 𝑓 𝒐 = 𝑓 𝑊)% 𝒙

𝑥; 𝑦 = 𝑓(𝑊N* 𝒐)
𝑜%%
∑ 𝑓 ∑ 𝑓 𝑦

𝑥%
f( ) = activation function
𝑜*%
∑ 𝑓

© 2021 KNIME AG. All rights reserved. 135

Neural Architectures

completely feedforward recurrent

connected (directed, a-cyclic) (feedback connections)

example: example: example:

§ Associative § auto associative § recurrent neural
neural network neural network network (for time
§ Hopfield § Multi Layer Perceptron series recognition)

© 2021 KNIME AG. All rights reserved. 136

Frequently used activation functions

Sigmoid Tanh Rectified Linear Unit (ReLU)

1 𝑒 %ST − 1
𝑓 𝑎 = 𝑓 𝑎 = %ST 𝑓 𝑎 = 𝑚𝑎𝑥 0, ℎ𝑎
1 + 𝑒 :ST 𝑒 +1

© 2021 KNIME AG. All rights reserved. 137

What can a single Perceptron do?

1 1 0 1

0 1 0 0

𝑥; 𝑏
1 0
𝑤!
∑f( )σ y
𝑥%
𝑤$
?
0 1

© 2021 KNIME AG. All rights reserved. 138

What can a 3-neuron MLP do?

y
1 0

0 1
x

x y
0
0 1

© 2021 KNIME AG. All rights reserved. 139

MLP: Example

out y
2

1 -1
-
1
2
1
?
1 2
2 1
-1
-1 -1
1
x y x
1 2

© 2021 KNIME AG. All rights reserved. 140

MLP: Example

y
out
2

1
2 1
-1
1
x y x
1 2

© 2021 KNIME AG. All rights reserved. 141

MLP: Example

y 1 =0
out
1+ x - y < 0
2 2
=1
1
1+ x - y > 0
1
2

1
2 1
-1
1
x y x
1 2

© 2021 KNIME AG. All rights reserved. 142

MLP: Example

y =0
out
2
=1

1
2- x- y < 0
2 =1 =0
-1
-1
1 2- x- y > 0
x y x
1 2

© 2021 KNIME AG. All rights reserved. 143

MLP: Example

y =0
out
2
=1
1 -1
1
-
2 1

=1 =0

1
x y x
1 2

© 2021 KNIME AG. All rights reserved. 144

MLP: Example

y =0
out
2

=1
1 -1
1 1
-
2 1
- - >0
2

=1 =0

1
x y x
1 2

© 2021 KNIME AG. All rights reserved.

MLP: Example

y 1 =0
out
1+ x - y < 0
2 2
=1
-1 1
1
1 1+ x - y > 0
-
2 1
2

1 2 =1 =0
1
2
-1 -1
-1 2- x- y > 0 2- x- y < 0
1
x y x
1 2

© 2021 KNIME AG. All rights reserved. 146

Back-Propagation
Training of a Feed Forward Neural Network - MLP

§ Teach (ensemble of) neuron(s) a desired input-output behavior.

§ Show examples from the training set repeatedly
§ Networks adjusts parameters to fit underlying function
§ topology
§ weights
§ internal functional parameters

© 2021 KNIME AG. All rights reserved. 148

Training of a Feed Forward Neural Network - MLP

Input Hidden Output Forward pass:

Layer Layer Layer 𝒐 = 𝑓 𝑊)% 𝒙
𝑦 = 𝑓(𝑊N* 𝒐)
𝑾𝟐𝒙 𝑜;% 𝑾𝟑𝒚
∑ 𝑓 1 %
𝐸=p 𝑡 − 𝑦@
2 @
@
𝑥;
𝑜%%
Target (j) Network output (j)
∑ 𝑓 ∑ 𝑓 𝑦

𝑥% Gradient descent
𝑜*%
∑ 𝑓 𝜕𝐸
∆𝑤@! = −𝜂
𝜕𝑤@!

© 2021 KNIME AG. All rights reserved. 149

... Some Calculations for the Output Layer ....

2 3 2 3
UV U 3 X4 :N4 U 3 X4 :N4 UN4 UN4
= = = − 𝑡@ − 𝑦@
UW45 UW45 UN4 UW45 UW45

UN4 US4 US4 U ∑6 )6 W46

= − 𝑡@ − 𝑦@ = − 𝑡@ − 𝑦@ 𝑔′(ℎ@ ) = − 𝑡@ − 𝑦@ 𝑔′(ℎ@ )
US4 UW45 UW45 UW45

= − 𝑡@ − 𝑦@ 𝑔′(ℎ@ )𝑥!

∆𝑤@! = −𝜂 𝑡@ − 𝑦@ 𝑔Z ℎ@ 𝑥! = −𝜂 𝛿@[\X 𝑥!

© 2021 KNIME AG. All rights reserved. 150

… some Calculations for the Hidden Layer …
1 X
𝜕 ∑U∈= ∑XWP! 𝑓 𝑎W1Y. (𝑥) − 𝑦W (𝑥) $
𝜂 𝜕 𝑓 𝑎W1Y. (𝑥) − 𝑦W (𝑥) $
∆𝑤)RS)TT:* = 2 =− \\
𝜕𝑤)RS)TT:* 2 𝜕𝑤)RS)TT:*
U∈= WP!

[ 0 ∑% 7 '() 0 ∑+
!" #$ !" &
7 %*,,-. \U*"
*" #$ *" !"
54& (U)
Z
…= − $ ∑U∈= ∑XWP! 2 𝑓 𝑎W1Y. (𝑥) − 𝑦W (𝑥) %*,,-.
[7*!

[ ∑% 7 '() 0 ∑+
!" #$ !" &
7 %*,,-. \U*"
*" #$ *" !"
…= −𝜂 ∑U∈= ∑XWP! 𝑓 𝑎W1Y. (𝑥) − 𝑦W (𝑥) 𝑓′ ∑SR " P! 𝑤R1Y.
"W 𝑓 ∑N S)TT:*
) " P! 𝑤) " R " c 𝑥) " %*,,-.
[7*!

[ ∑% 7 '() 0 ∑+
!" #$ !" &
7 %*,,-. \U*"
*" #$ *" !"
[0 ∑+ 7 %*,,-. \U*"
*" #$ *" !
1Y.
…= −𝜂 ∑U∈= ∑XWP! 𝛿W1Y. %*,,-. = −𝜂 ∑U∈= ∑XWP! 𝛿W1Y. 𝑤RW %*,,-.
[7*! [7*!

… = −𝜂 ∑U∈= ∑XWP! 𝛿W1Y. 𝑤RW 𝑓 ∑N

1Y. _ S)TT:*
) " P! 𝑤) " R
1Y. _
c 𝑥) " c 𝑥) = −𝜂 ∑U∈= ∑XWP! 𝛿W1Y. 𝑤RW 𝑓 𝑎RS)TT:* c 𝑥)

Do you understand
… = ∑U∈= −𝜂 c 𝛿RS)TT:* c 𝑥) now why the sigmoid
is a commonly used
activation function?

© 2021 KNIME AG. All rights reserved. 151

Step 1. Forward Pass

Input Hidden Output

Layer Layer Layer

𝑾𝟐𝒙 𝑜;% 𝑾𝟑𝒚

∑ 𝑓

𝑥;
𝑜%%
∑ 𝑓 ∑ 𝑓 𝑦

1. Forward pass:
𝑥%
𝑜*%
𝒐 = 𝑓 𝑊)% 𝒙
∑ 𝑓 𝑦 = 𝑓(𝑊N* 𝒐)

© 2021 KNIME AG. All rights reserved. 152

Step 1. Backward Pass

Input Hidden Output

Layer Layer Layer

𝑾𝟐𝒙 𝑜;% 𝑾𝟑𝒚

∑ 𝑓

𝑥; δhidde% 𝑦
n 𝑜%
∑ 𝑓 ∑ 𝑓 δout 2. Backward pass:
𝑥% UV U[4 (𝑜@ − 𝑡@ )𝑜@ (1 − 𝑜@ )
𝑜*% 𝛿@ = = •
U[4 U$(X4 ∑L∈^ 𝑤@L 𝛿L 𝑜@ (1 − 𝑜@ )
∑ 𝑓
∆𝑤!@ = −η 𝑜! 𝛿@

© 2021 KNIME AG. All rights reserved. 153

Learning Rate η

η too small η too large η just right

© 2021 KNIME AG. All rights reserved. 154

Training: Batch vs. Online

§ Batch Training: Weight update after all patterns

§ correct
§ computationally expensive and slow
§ works with reasonably large learning rates (fewer updates!)
§ Online Training: Weight update after each pattern
§ Approximation
§ can (in theory) run into oscillations
§ faster (fewer epochs!)
§ smaller learning rates necessary

© 2021 KNIME AG. All rights reserved. 155

Back-Propagation: Optimizations

§ Weight Decay:
§ try to keep weights small
§ Momentum:
§ increase weight updates as long as they have the same sign
§ Resilient Backpropagation:
§ estimate optimum for weight based on assumption that error surface is a polynomial.

© 2021 KNIME AG. All rights reserved. 156

Overfitting

§ MLP describe potentially very complex relationships

§ Danger of fitting training data too well: Overfitting
§ Modeling of training data instead of underlying concept
§ Modeling of artifacts or outliers

© 2021 KNIME AG. All rights reserved. 157

Knowledge Extraction and MLPs

§ MLPs are powerful but black boxes

§ Rule extraction only possible in some cases
§ VI-Analysis (interval propagation)
§ extraction of decision trees
§ Problems:
§ Global influence of each neuron
§ Interpretation of hidden layer(s) complicated

§ Possible Solution:
§ Local activity of neurons in hidden layer: Local Basis Function Networks

© 2021 KNIME AG. All rights reserved. 158

Deep Learning
Recurrent Neural Networks
What are Recurrent Neural Networks?

§ Recurrent Neural Network (RNN) are a family of neural networks used for
processing of sequential data
§ RNNs are used for all sorts of tasks:
§ Language modeling / Text generation
§ Text classification
§ Neural machine translation
§ Image captioning
§ Speech to text
§ Numerical time series data, e.g. sensor data

© 2021 KNIME AG. All rights reserved. 161

Why do we need RNNs for Sequential Data?

§ But what happens with this question?

Input x Output y

Ich I
“Mag ich Schokolade?”
mag like
=> “Do I like chocolate?”
Schokolade chocolate

© 2021 KNIME AG. All rights reserved. 162

Why do we need RNNs for Sequential Data?

§ Problems:
§ Each time step is completely independent
∑ σ
§ For translations we need context
∑ σ
§ More general: we need a network that remembers inputs from the past
∑ σ
𝑥 ∑ σ ∑σ 𝑦
§ Solution: Recurrent neural networks
∑ σ

∑ σ
∑ σ

Input x Output y

Ich I
mag like
Schokolade chocolate

© 2021 KNIME AG. All rights reserved. 163

What are RNNs?

Image Source: Christopher Olah, https://colah.github.io/posts/2015-08-Understanding-LSTMs/

© 2021 KNIME AG. All rights reserved. 164

From Feed Forward to Recurrent Neural Networks

𝑾𝟐𝒙 𝑾𝟑𝒚
∑ σ ∑ σ
𝑥;
𝑾𝟐𝒙 𝑾𝟑𝒚
∑ σ ∑ σ 𝑦 𝑥 ∑ σ 𝑦
𝑥%
∑ σ ∑ σ

© 2021 KNIME AG. All rights reserved. 165

From Feed Forward to Recurrent Neural Networks

𝑦# 𝑦$ 𝑦% 𝑦&
𝑾𝟑𝒚 𝑾𝟑𝒚 𝑾𝟑𝒚 𝑾𝟑𝒚

∑ σ ∑ σ ∑ σ ∑ σ

𝑾𝟐𝒙 𝑾𝟐𝒙 𝑾𝟐𝒙 𝑾𝟐𝒙

𝑥# 𝑥$ 𝑥% 𝑥&

© 2021 KNIME AG. All rights reserved. 166

Simple RNN unit

Image Source: Christopher Olah, https://colah.github.io/posts/2015-08-Understanding-LSTMs/

© 2021 KNIME AG. All rights reserved. 167

Limitations of Simple Layer Structures

The “memory” of simple RNNs is sometimes too limited to be useful

§ “Cars drive on the ” (road)
§ “I love the beach – my favorite sound is the crashing of the “
(cars? glass? waves?)

© 2021 KNIME AG. All rights reserved. 168

LSTM = Long Short Term Memory Unit

§ Special type of unit with three gates

§ Forget gate
§ Input gate
§ Output gate

Image Source: Christopher Olah, https://colah.github.io/posts/2015-08-Understanding-LSTMs/

© 2021 KNIME AG. All rights reserved. 169

Different Network-Structures and Applications

Many to Many

Ich gehe gerne segeln

I like sailing <eos>

D D D D
A A A A
E E E Ich gehe gerne
<sos> I like sailing
I like sailing

Language model Neural machine translation

© 2021 KNIME AG. All rights reserved. 170

Different Network-Structures and Applications

Many to one One to many

English Couple sailing on a lake

A A A A A A A A A A

I like to go sailing

Language classification Image captioning

Text classification

© 2021 KNIME AG. All rights reserved. 171

Neural Network: Code-free

© 2021 KNIME AG. All rights reserved. 172

Convolutional Neural Networks
(CNN)
Convolutional Neural Networks (CNN)

§ Used when data has spatial relationships,

e.g. images
§ Instead of connecting every neuron to the
new layer a sliding window is used
§ Some convolutions may detect edges or
corners, while others may detect cats,
dogs, or street signs inside an image

Image from: https://towardsdatascience.com/a-

comprehensive-guide-to-convolutional-neural-networks-
the-eli5-way-3bd2b1164a53

© 2021 KNIME AG. All rights reserved. 174

Convolutional Neural Networks

© 2021 KNIME AG. All rights reserved. 175

Building CNNs with KNIME

© 2021 KNIME AG. All rights reserved. 176

Recommendation Engines
Recommendation Engines and Market Basket Analysis

From the analysis of many A-priori algorithm Recommendation

shopping baskets ...

IF +

THEN

© 2021 KNIME AG. All rights reserved. 178

Recommendation Engines or Market Basket Analysis
From the analysis of the reactions
of many people to the same item ... Recommendation

Collaborative Filtering

IF A has the same opinion as B on

an item,
THEN A is more likely to have B's
opinion on another item than that of
a randomly chosen person

© 2021 KNIME AG. All rights reserved. 179

A-priori Algorithm: the Association Rule

IF + THEN

Antecedent Consequent

© 2021 KNIME AG. All rights reserved. 180

Building the Association Rule

N shopping baskets

{A, B, F, H}
Search for {A, B, C}
frequent itemsets {B, C, H}
{D, E , F}
{D, E}
{A, B}
{A, C}
{H, F}
…

© 2021 KNIME AG. All rights reserved. 181

From “Frequent Itemsets“ to “Rules“

{A, B, F} è H

{A, B, H} è F

{A, B, F, H}
{A, F, H} è B

Which rules shall I choose?

{B, F, H} è A

© 2021 KNIME AG. All rights reserved. 182

Support, Confidence, and Lift

{A, B, F} è H
_`(a(b,c,d,e) How often these items
§ Item set support 𝒔 =
f are found together

_`(a(b,c,d,e)
§ Rule confidence 𝒄 = How often the antecedent
_`(a(b,c,d)
is together with the consequent
g\99[`X ( b,c,d ⇒e)
§ Rule lift =
g\99[`X b,c,d × g\99[`X(e) How often antecedent and
consequent happen together
compared with random
chance
The rules with support, confidence and lift above a threshold à most reliable ones

© 2021 KNIME AG. All rights reserved. 183

Association Rule Mining (ARM): Two Phases

Discover all frequent and strong association rules

XÞY à “if X then Y”
with sufficient support and confidence

Two phases:
1. find all frequent itemsets (FI) ß Most of the complexity
§ Select itemsets with a minimum support
𝐹𝐼 = 𝑋, 𝑌 , 𝑋, 𝑌 ⊂ 𝐼|𝑠 𝑋, 𝑌 ≥ 𝑆N)*
2. build strong association rules User parameters
§ Select rules with a minimum confidence:
𝑅𝑢𝑙𝑒𝑠: 𝑋 ⇒ 𝑌, 𝑋, 𝑌 ⊂ 𝐹𝐼, p𝑐 𝑋 ⇒ 𝑌 ≥ 𝐶N)*

© 2021 KNIME AG. All rights reserved. 184

A-Priori Algorithm: Example

§ Let‘s consider milk, diaper, and beer: 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟 ⇒ 𝑏𝑒𝑒𝑟

§ How often are they found together across all shopping baskets?
§ How often are they found together across all shopping baskets containing the
antecedents?
support
TID Transactions
𝑠 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟, 𝑏𝑒𝑒𝑟
1 Bread, Milk 𝑃 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟, 𝑏𝑒𝑒𝑟 2
= = = 0.4
2 Bread, Diaper, Beer, Eggs 𝑇 5
3 Milk, Diaper, Beer, Coke
𝑃 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟, 𝑏𝑒𝑒𝑟 2
4 Bread, Milk, Diaper, Beer 𝑐= = = 0.67
𝑃 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟 3
5 Bread, Milk, Diaper, Coke
confidence

© 2021 KNIME AG. All rights reserved. 185

A-priori algorithm: an example

§ Let‘s consider milk, diaper, and beer: 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟 ⇒ 𝑏𝑒𝑒𝑟

§ How often are they found together across all shooping baskets?
§ How often are they found together across all shopping baskets containing the
antecedents?
𝑃 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟 3
𝑠(𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟) = = = 0.6
𝑇 5
TID Transactions
𝑃 𝑏𝑒𝑒𝑟 3
1 Bread, Milk 𝑠(𝑏𝑒𝑒𝑟) = = = 0.6
𝑇 5
2 Bread, Diaper, Beer, Eggs
𝑠 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟, 𝑏𝑒𝑒𝑟
3 Milk, Diaper, Beer, Coke 𝑅𝑢𝑙𝑒 𝑙𝑖𝑓𝑡 =
𝑠 𝑚𝑖𝑙𝑘, 𝑑𝑖𝑎𝑝𝑒𝑟 ×𝑠(𝑏𝑒𝑒𝑟)
4 Bread, Milk, Diaper, Beer
0.4
5 Bread, Milk, Diaper, Coke = = 1.11
0.6 ×0.6

© 2021 KNIME AG. All rights reserved. 186

Association Rule Mining: Is it Useful?

§ David J. Hand (2004):

„Association Rule Mining is likely the field with the highest ratio of number of
published papers per reported application.“

§ KNIME Blog post:

https://www.knime.com/knime-applications/market-basket-analysis-and-recommendation-engines

© 2021 KNIME AG. All rights reserved. 187

Recommendation Engines or Market Basket Analysis
From the analysis of the reactions Recommendation
of many people to the same item ...

Collaborative Filtering

IF A has the same opinion as B on

an item,
THEN A is more likely to have B's
opinion on another item than that of
a randomly chosen person

© 2021 KNIME AG. All rights reserved. 188

Collaborative Filtering (CF)

Collaborative filtering systems have many forms, but many common systems can
be reduced to two steps:

1. Look for users who share the same rating patterns with the active user (the
user whom the recommendation is for)
2. Use the ratings from those like-minded users found in step 1 to calculate a
prediction for the active user
3. Implemented in Spark

https://www.knime.com/blog/movie-recommendations-with-spark-collaborative-filtering

© 2021 KNIME AG. All rights reserved. 189

Collaborative Filtering: Memory based approach

§ User u to give recommendations to

§ U = set of top N users most similar to user u
§ Rating of user u on item i calculated as average of ratings of all similar users in
U:
# ;
𝑟',) = ∑ Y 𝑟'Y ,) or weighted 𝑟\,! = ∑\7∈i 𝑠𝑖𝑚𝑖𝑙 𝑢, 𝑢Z 𝑟\7,!
* ' ∈, f

Pearson correlation
∑!∈&89 𝑟\,! − 𝑟š\ 𝑟\7,! − 𝑟\7
𝑠𝑖𝑚𝑖𝑙(𝑢, 𝑢Z ) =
% %
∑!∈&89 𝑟\,! − 𝑟š\ ∑!∈&89 𝑟\7,! − 𝑟\7

Set of items rated by both user x and y

© 2021 KNIME AG. All rights reserved. 190

Exercises:

§ Neural Network
§ Goal: Train an MLP to solve our
classification problem (rank: high/low)
§ 01_Simple_Neural_Network

§ Market Basket Analysis

§ 02_Build_Association_Rules_for_MarketBasketAnalysis
§ 03_Apply_Association_Rules_for_MarketBasketAnalysis

© 2021 KNIME AG. All rights reserved. 191

Session 4: Clustering & Data
Preparation
Unsupervised Learning:
Clustering
Goal of Cluster Analysis
Discover hidden structures in unlabeled data (unsupervised)

Clustering identifies a finite set of groups (clusters) 𝐶; , 𝐶% ⋯ , 𝐶L

in the dataset such that:
§ Objects within the same cluster 𝐶) shall be as similar as possible
§ Objects of different clusters 𝐶) , 𝐶R (𝑖 ≠ 𝑗) shall be as dissimilar as possible

© 2021 KNIME AG. All rights reserved. 194

Cluster Properties
§ Clusters may have different sizes, shapes, densities
§ Clusters may form a hierarchy
§ Clusters may be overlapping or disjoint

© 2021 KNIME AG. All rights reserved. 195

Clustering Applications

§ Find “natural” clusters and desc Methods

§ Data understanding
§ K-means
§ Find useful and suitable groups § Hierarchical
§ Data Class Identification
§ DBScan
§ Find representatives for
homogenous groups
§ Data Reduction Examples
§ Find unusual data objects § Customer segmentation
§ Outlier Detection
§ Molecule search
§ Find random perturbations of the
§ Anomaly detection
data
§ Noise Detection

© 2021 KNIME AG. All rights reserved. 196

Clustering as Optimization Problem

Definition:
Given a data set 𝐷, 𝐷 = 𝑛. Determine a clustering 𝐶 of 𝐷 with:
𝐶 = 𝐶; , 𝐶% , ⋯ , 𝐶j
wher 𝐶! ⊆ 𝐷 and ž 𝐶! = 𝐷
that best fits the given data set 𝐷. e ;k!kL

Clustering Methods:
Inside the space Cover the whole space
1. partitioning
2. hierarchical (linkage based)
3. density-based

Clustering: Partitioning
k-Means
Partitioning
Goal:
A (disjoint) partitioning into k clusters with minimal costs
§ Local optimization method:
§ choose k initial cluster representatives
§ optimize these representatives iteratively
§ assign each object to its most similar cluster representative
§ Types of cluster representatives:
§ Mean of a cluster (construction of central points)
§ Median of a cluster (selection of representative points)
§ Probability density function of a cluster (expectation maximization)

k-Means-Algorithm

Given k, the k-Means algorithm is implemented in four steps:

1. Partition objects into 𝑘 non-empty subsets, calculate their centroids (i.e.,
mean point, of the cluster)
2. Assign each object to the cluster with the nearest centroid Euclidean distance
3. Compute the centroids from the current partition
4. Go back to Step 2, repeat until the updated centroids stop moving significantly

k-Means Algorithm

10 10

9 9

8 8

7 7

5
Calculation of 6

4 new centroids 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Cluster assignment
10 10

9 9

8 8

7 7

6 6

5 5

4 4

2
Calculation of 3

1 new centroids 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Comments of the k-Means Method

§ Advantages:
§ Relatively efficient
§ Simple implementation
§ Weaknesses:
§ Often terminates at a local optimum
§ Applicable only when mean is defined (what about categorical data?)
§ Need to specify k, the number of clusters, in advance
§ Unable to handle noisy data and outliers
§ Not suitable to discover clusters with non-convex shapes

Outliers: k-Means vs k-Medoids

Problem with K-Means

An object with an extremely large value can substantially distort the
distribution of the data.
One solution: K-Medoids
Instead of taking the mean value of the objects in a cluster as a reference
point, medoids can be used, which are the most centrally located objects
in a cluster.
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Clustering: Quality Measures
Silhouette
Optimal Clustering: Example

Within-Cluster Variation

Bad 5 5
x
x
Clustering
x
1 1 x Centroide
1 5 1 5

x
Good 5 5
x
Clustering x
1 1 x Centroide
1 5 1 5

Between-Cluster Variation

Cluster Quality Measures

Centroid 𝜇l : mean vector of all objects in clustering C

§ Within-Cluster Variation:
L

𝑇𝐷 % = p p 𝑑𝑖𝑠𝑡(𝑝, 𝜇l5 )%
!"; 9∈𝑪𝒊

§ Between-Cluster Variation:
L L

𝐵𝐶 % = p p 𝑑𝑖𝑠𝑡(𝜇l4 , 𝜇l5 )%
@"; !";

§ Clustering Quality (one possible measure):

𝐵𝐶 %
𝐶𝑄 =
𝑇𝐷 %

Silhouette-Coefficient for object 𝑥

Silhouette-Coefficient [Kaufman & Rousseeuw 1990] measures the quality of

clustering

§ 𝑎(𝑥): distance of object 𝑥 to its cluster representative

§ 𝑏(𝑥): distance of object 𝑥 to the representative of the „second-best“ cluster
§ Silhouette 𝑠(𝑥) of 𝑥

𝑏 𝑥 − 𝑎(𝑥)
𝑠 𝑥 =
max{𝑎 𝑥 , 𝑏(𝑥)}

Silhouette-Coefficient

Good clustering…
Cluster 1
Cluster 2
𝑎(𝑥)

𝑏(𝑥)

𝑎 𝑥 ≪ 𝑏(𝑥)

𝑏(𝑥) − 𝑎(𝑥) 𝑏(𝑥)

𝑠(𝑥) = ≈ =1
max{ 𝑎(𝑥), 𝑏(𝑥)} 𝑏(𝑥)

Silhouette-Coefficient

…not so good…
Cluster 1 Cluster 2

𝑎(𝑥)

𝑏(𝑥)

𝑎(𝑥) ≈ 𝑏(𝑥)

𝑏(𝑥) − 𝑎(𝑥) 0
𝑠(𝑥) = ≈ =0
max{ 𝑎(𝑥), 𝑏(𝑥)} 𝑏(𝑥)

Silhouette-Coefficient

…bad clustering.
Cluster 1

𝑎(𝑥) Cluster 2

𝑏(𝑥)
𝑎(𝑥) ≫ 𝑏(𝑥)

𝑏(𝑥) − 𝑎(𝑥) −𝑎(𝑥)

𝑠(𝑥) = ≈ = −1
max{ 𝑎(𝑥), 𝑏(𝑥)} 𝑎(𝑥)

Silhouette-Coefficient for Clustering C

§ Silhouette coefficient 𝑠n for clustering 𝐶 is the average silhouette over all objects
𝑥∈𝐶

1
𝑠n = p 𝑠(𝑥)
𝑛
)∈l

§ Interpretation of silhouette coefficient:

§ 𝑠X > 0.7 : strong cluster structure,
§ 𝑠X > 0.5 : reasonable cluster structure,
§ ...

Choice of Parameter 𝑘

Method
§ For 𝑘=2, 3, ⋯, 𝑛−1, determine one clustering each
§ Choose 𝑘 resulting in the highest clustering quality

Measure of clustering quality

§ Uncorrelated with 𝑘
§ for k-means and k-medoid:
𝑇𝐷 % and 𝑇𝐷 decrease monotonically with increasing 𝑘

Summary: Clustering by Partitioning

§ Scheme always similar:

§ Find (random) starting clusters
§ Iteratively improve cluster positions
(compute new mean, swap medoids, compute new distribution parameters,…)
§ Important:
§ Number of clusters k
§ Initial cluster position influences (heavily):
§ quality of results
§ speed of convergence
§ Problems for iterative clustering methods:
§ Clusters of varied size, density and shape

Clustering: Distance Functions
Distance Functions for Numeric Attributes
For two objects 𝑥 = 𝑥; , 𝑥% , ⋯ , 𝑥' and 𝑦 = 𝑦; , 𝑦% , ⋯ , 𝑦' :

§ Lp-Metric (Minkowski-Distance) T
/

𝑑𝑖𝑠𝑡(𝑥, 𝑦) = \ 𝑥) − 𝑦) ,

)P!

§ Euclidean Distance (𝑝 = 2)
T

𝑑𝑖𝑠𝑡(𝑥, 𝑦) = \ 𝑥) − 𝑦) $

)P!

§ Manhattan-Distance (𝑝 = 1) T

𝑑𝑖𝑠𝑡 𝑥, 𝑦 = \ 𝑥) − 𝑦)
)P!

§ Maximum-Distance (𝑝 = ∞)
𝑑𝑖𝑠𝑡 𝑥, 𝑦 = max 𝑥) − 𝑦)
!`)`T

Influence of Distance Function / Similarity

§ Clustering vehicles: The distance function

§ red Ferrari affects the shape of the
§ green Porsche clusters
§ red Bobby car

§ Distance Function based on maximum speed

(numeric distance function):
§ Cluster 1: Ferrari & Porsche
§ Cluster 2: Bobby car

§ Distance Function based on color

(nominal attributes):
§ Cluster 1: Ferrari and Bobby car
§ Cluster 2: Porsche

Clustering: Linkage
Hierarchical Clustering
Linkage Hierarchies: Basics

Goal
§ Construction of a hierarchy of clusters (dendrogram)
by merging/separating clusters with minimum/maximum distance

Dendrogram:
§ A tree representing hierarchy of clusters,
with the following properties:

Distance
§ Root: single cluster with the whole data set.
§ Leaves: clusters containing a single object.
§ Branches: merges / separations between larger
clusters and smaller clusters / objects

Linkage Hierarchies: Basics

§ Example dendrogram

2
8 9
7
5
2 4 1 distance between
6
1
3
5 clusters
1
0
1 5 1 2 3 4 5 6 7 8 9

§ Types of hierarchical methods

§ Bottom-up construction of dendrogram (agglomerative)
§ Top-down construction of dendrogram (divisive)

Agglomerative vs. Divisive Hierarchical Clustering

step 1 step 2 step 3 step 4 step 5

AGlomerative NESting
(AGNES)
A
A,B
B
A,B,C,D,E
C
C,D,E
D
D,E
E
DIvisive ANAlysis
(DIANA)
step 5 step 4 step 3 step 2 step 1

Base Algorithm

1. Form initial clusters consisting of a single object, and compute the distance
between each pair of clusters.
2. Merge the two clusters having minimum distance.
3. Calculate the distance between the new cluster and all other clusters.
4. If there is only one cluster containing all objects:
Stop, otherwise go to step 2.

Single Linkage

§ Distance between clusters (nodes):

𝐷𝑖𝑠𝑡(𝐶# , 𝐶$ ) = min {𝑑𝑖𝑠𝑡(𝑝, 𝑞)}

-∈.Z ,/∈.[

Distance of the closest two points, one from each cluster

§ Merge Step: Union of two subsets of data points

Complete Linkage

§ Distance between clusters (nodes):

𝐷𝑖𝑠𝑡(𝐶# , 𝐶$ ) = m𝑎𝑥 {𝑑𝑖𝑠𝑡(𝑝, 𝑞)}

-∈.Z ,/∈.[

Distance of the farthest two points, one from each cluster

§ Merge Step: Union of two subsets of data points

Average Linkage / Centroid Method

§ Distance between clusters (nodes):

1
𝐷𝑖𝑠𝑡Tpq (𝐶; , 𝐶% ) = p p 𝑑𝑖𝑠𝑡(𝑝, 𝑞)
𝐶; ⋅ 𝐶%
9∈l2 9∈l3

Average distance of all possible pairs of points between 𝐶; and 𝐶%

𝐷𝑖𝑠𝑡o(T$ 𝐶; , 𝐶% = 𝑑𝑖𝑠𝑡 𝑚𝑒𝑎𝑛 𝐶; , 𝑚𝑒𝑎𝑛 𝐶%

Distance between two centroids

§ Merge Step:
§ union of two subsets of data points
§ construct the mean point of the two clusters

Comments on Single Linkage and Variants

+ Finds not only a „flat“ clustering, but a hierarchy of clusters

(dendrogram)
+ A single clustering can be obtained from the dendrogram
(e.g., by performing a horizontal cut)

- Decisions (merges/splits) cannot be undone

Distance
- Sensitive to noise (Single-Link)
(a „line“ of objects can connect two clusters)
- Inefficient
à Runtime complexity at least O(n2) for n objects

Linkage Based Clustering

§ Single Linkage:
§ Prefers well-separated clusters
§ Complete Linkage:
§ Prefers small, compact clusters
§ Average Linkage:
§ Prefers small, well-separated clusters…

Clustering: Density
DBSCAN
Clustering: DBSCAN

DBSCAN - a density-based clustering algorithm - defines five types of points in a

dataset.
§ Core Points are points that have at least a minimum number of neighbors
(MinPts) within a specified distance (𝜀).
§ Noise Points are neither core points nor border points.
§ Border Points are points that are within 𝜀 of a core point, but have less than
MinPts neighbors.
§ Directly Density Reachable Points are within 𝜀 of a core point.
§ Density Reachable Points are reachable with a chain of Directly Density
Reachable points.

Clusters are built by joining core and density-reachable points to one another.

Example with MinPts = 3

n § t = Core point
Core Point s § s = Boarder point
vs. Border Point
§ n = Noise point
vs. Noise t

Directly Density Reachable § z is directly density

s reachable from t
vs. Density Reachable
z § s is not directly density
t reachable from t, but
density reachable via z

Note: But t is not density reachable from s, because s is not a Core point

DBSCAN [Density Based Spatial Clustering of Applications with Noise]

§ For each point, DBSCAN determines the e-environment and checks whether it
contains more than MinPts data points è core point
§ Iteratively increases the cluster by adding density-reachable points

Summary: DBSCAN

Clustering:
§ A density-based clustering 𝐶 of a dataset D w.r.t. 𝜀 and MinPts is the set of all
density-based clusters 𝐶! w.r.t. 𝜀 and MinPts in D.
§ The set 𝑁𝑜𝑖𝑠𝑒𝐶𝐿 („noise“) is defined as the set of all objects in D which do not
belong to any of the clusters.
Property:
§ Let 𝐶! be a density-based cluster and 𝑝Î𝐶! be a core object.

𝐶¦ = 𝑜Î𝐷 𝑜 density-reachable from 𝑝 w.r.t. 𝜀 and MinPts}.

DBSCAN [Density Based Spatial Clustering of Applications with Noise]

§ DBSCAN uses (spatial) index structures for determining the e-environment:

à computational complexity 𝑂(𝑛 log 𝑛) instead of 𝑂(𝑛2)
§ Arbitrary shape clusters found by DBSCAN
§ Parameters: 𝜀 and 𝑀𝑖𝑛𝑃𝑡𝑠

Data Preparation
Motivation
§ Real world data is „dirty“
à Contains missing values, noises, outliers, inconsistencies

§ Comes from different information sources

à Different attribute names, values expressed differently, related tuples
§ Different value ranges and hierarchies
à One attribute range may overpower another

§ Huge amount of data

à Makes analyis difficult and time consuming

Data Preparation

§ Data Cleaning & Standardization (domain dependent)

§ Aggregations (often domain dependent)
§ Normalization
§ Dimensionality Reduction
§ Outlier Detection
§ Missing Value Imputation
§ Feature Selection
§ Feature Engineering
§ Sampling
§ Integration of multiple Data Sources

Data Preparation: Normalization
Normalization: Motivation

Example:
§ Lengths in cm (100 – 200) and weights in kilogram (30 – 150) fall both in
approximately the same scale
§ What about lengths in m (1-2) and weights also in gram (30000 – 150000)?
à The weight values in mg dominate over the length values for the similarity of
records!

Goal of normalization:
§ Transformation of attributes to make record ranges comparable

Normalization: Techniques

§ min-max normalization

):);5<
𝑦= 𝑦oT) − 𝑦o!$ + 𝑦o!$
);=8 : );5<

§ z-score normalization

𝑥 − 𝑚𝑒𝑎𝑛(𝑥)
𝑦=
𝑠𝑡𝑑𝑑𝑒𝑣(𝑥)

§ normalization by decimal scaling

)
𝑦= where j is the smallest integer for max(𝑦) < 1
;#4
Here [𝑦𝑚𝑖𝑛, 𝑦𝑚𝑎𝑥] is [0,1] PMML

PMML

§ Predictive Model Mark-up Language (PMML) standard XML-based

interchange format for predictive models.
§ Interchange. PMML provides a way to describe and exchange predictive
models produced by machine learning algorithms
§ Standard. In theory, a PMML model exported from KNIME can be read by
PMML compatible functions in other tools
§ It does not work that well for the modern / ensemble algorithms, such as random
forest or deep learning. In this case, other formats have been experimented.

Data Preparation: Missing Value
Imputation
Missing Value Imputation: Motivation

Data is not always available

§ E.g., many tuples have no recorded value for several attributes, such as weight
in a people database
Missing data may be due to
§ Equipment malfunctioning
§ Inconsistency with other recorded data and thus deleted
§ Data not entered (manually)
§ Data not considered important at the time of collection
§ Data format / contents of database changes

Missing Values: Types

Types of missing values:

Example: Suppose you are modeling weight Y as a function of sex X

§ Missing Completely At Random (MCAR): reason does not depend on its value
or lack of value.
There may be no particular reason why some people told you their weights and others
didn’t.

§ Missing At Random (MAR): the probability that Y is missing depends only on

the value of X.
One sex X may be less likely to disclose its weight Y.
§ Not Missing At Random (NMAR): the probability that Y is missing depends on
the unobserved value of Y itself.
Heavy (or light) people may be less likely to disclose their weight.

Missing Values Imputation

How to handle missing values?

§ Ignore the record
§ Remove the record
§ Fill in missing value as:
§ Fixed value: e.g., “unknown”, -9999, etc.
§ Attribute mean / median / max. / min.
§ Attribute most frequent value
§ Next / previous /avg interpolation / moving avg value (in time series)
§ A predicted value based on the other attributes (inference-based such as Bayesian, Decision Tree,
...)

Data Preparation:
Outlier Detection
Outlier Detection

§ An outlier could be, for example, rare behavior, system defect, measurement
error, or reaction to an unexpected event

Outlier Detection: Motivation

§ Why finding outliers is important?

§ Summarize data by statistics that represent the majority of the data
§ Train a model that generalizes to new data
§ Finding the outliers can also be the focus of the analysis and not only data cleaning

Outlier Detection Techniques

§ Knowledge-based
§ Statistics-based
§ Distance from the median
§ Position in the distribution tails
§ Distance to the closest cluster center
§ Error produced by an autoencoder
§ Number of random splits to isolate a data point
from other data

Material

https://www.knime.com/blog/four-techniques-for-outlier-detection

Data Preparation:
Dimensionality Reduction
Is there such a thing as “too much data”?

“Too much data”:

§ Consumes storage space
§ Eats up processing time
§ Is difficult to visualize
§ Inhibits ML algorithm performance
§ Beware of the model: Garbage in à Garbage out

Dimensionality Reduction Techniques

§ Measure based
§ Ratio of missing values
§ Low variance
§ High Correlation
§ Transformation based
§ Principal Component Analysis (PCA)
§ Linear Discriminant Analysis (LDA)
§ t-SNE
§ Machine Learning based
§ Random Forest of shallow trees
§ Neural auto-encoder

Missing Values Ratio

IF (% missing value > threshold ) THEN remove column

Low Variance

Note: requires min-

max-normalization,
and only works for
numeric columns

§ If column has constant value (variance = 0), it contains no useful information

§ In general: IF (variance < threshold ) THEN remove column

High Correlation

§ Two highly correlated input variables probably carry similar information

§ IF ( corr(var1, var2) > threshold ) => remove var1

Note: requires min-max-normalization of numeric columns

Principal Component Analysis (PCA)
§ PCA is a statistical procedure that orthogonally transforms the
original n coordinates of a data set into a new set of n coordinates,
called principal components.
𝑃𝐶! , 𝑃𝐶$ , ⋯ 𝑃𝐶* = 𝑃𝐶𝐴 𝑋! , 𝑋$ , ⋯ 𝑋*

§ The first principal component 𝑃𝐶! follows the direction (eigenvector)

of the largest possible variance (largest eigenvalue of the
covariance matrix) in the data. Image from Wikipedia
x2
§ Each succeeding component 𝑃𝐶W follows the direction of the next
largest possible variance under the constraint that it is orthogonal
to (i.e., uncorrelated with) the preceding components PC2
𝑃𝐶! , 𝑃𝐶$ , ⋯ 𝑃𝐶W5! .
PC1
If you’re still curious, there’s LOTS of different ways to think about PCA:
https://stats.stackexchange.com/questions/2691/making-sense-of-
principal-component-analysis-eigenvectors-eigenvalues

§ 𝑃𝐶; describes most of the variability in the data, 𝑃𝐶% adds the next big
contribution, and so on. In the end, the last PCs do not bring much more
information to describe the data.

§ Thus, to describe the data we could use only the top 𝑚 < 𝑛 (i.e.,
𝑃𝐶; , 𝑃𝐶% , ⋯ 𝑃𝐶o ) components with little - if any - loss of information

Dimensionality Reduction
§ Caveats:
§ Results of PCA are quite difficult to interpret
§ Normalization required
§ Only effective on numeric columns

Linear Discriminant Analysis (LDA)

§ LDA is a statistical procedure that orthogonally transforms the original n

coordinates of a data set into a new set of n coordinates, called linear
discriminants.
𝐿𝐷; , 𝐿𝐷% , ⋯ 𝐿𝐷$ = 𝐿𝐷𝐴 𝑋; , 𝑋% , ⋯ 𝑋$
§ Here, however, discriminants (components)
maximize the separation between classes

§ PCA : unsupervised
§ LDA : supervised

Linear Discriminant Analysis (LDA)

§ 𝐿𝐷; describes best the class separation in the data, 𝐿𝐷% adds the next big
contribution, and so on. In the end, the last LDs do not bring much more
information to separate the classes.

§ Thus, for our classification problem we could use only the top 𝑚 < 𝑛 (i.e.,
𝐿𝐷; , 𝐿𝐷% , ⋯ 𝐿𝐷o ) discriminants with little - if any - loss of information

§ Caveats: Dimensionality Reduction

§ Results of LDA are quite difficult to interpret
§ Normalization required
§ Only effective on numeric columns

Ensembles of Shallow Decision Trees

§ Often used for classification, but can be used for

feature selection too

§ Generate a large number (we used 2000) of trees

that are very shallow (2 levels, 3 sampled features)

§ Calculate the statistics of candidates and selected

features. The more often a feature is selected in
such trees, the more likely it contains predictive
information

§ Compare the same statistics with a forest of trees

trained on a random dataset.

Autoencoder

§ Feed-Forward Neural Network architecture Image: Wikipedia

with encoder / decoder structure.

The network is trained to reproduce the
input vector onto the output layer.

§ That is, it compresses the input vector (dimension n) into a smaller vector space
on layer “code” (dimension m<n) and then it reconstructs the original vector onto
the output layer.

§ If the network was trained well, the reconstruction operation happens with
minimal loss of information.

Material

https://thenewstack.io/3-new-techniques-for-data-dimensionality-reduction-in-machine-learning/

Data Preparation:
Feature Selection
Feature Selection vs. Dimensionality Reduction

§ Both methods are used for reducing the number of features in a dataset.
However:
§ Feature selection is simply selecting and excluding given features without
changing them.
§ Dimensionality reduction might transform the features into a lower dimension.
§ Feature selection is often a somewhat more aggressive and more
computationally expensive process.
§ Backward Feature Elimination
§ Forward Feature Construction

Backward Feature Elimination (greedy top-down)

1. First train one model on n input features

2. Then train n separate models each on 𝑛 − 1 input features and remove the
feature whose removal produced the least disturbance
3. Then train 𝑛 − 1 separate models each on 𝑛 − 2 input features and remove
the feature whose removal produced the least disturbance
4. And so on. Continue until desired maximum error rate on training data is
reached.

Backward Feature Elimination

Forward Feature Construction (greedy bottom-up)

1. First, train n separate models on one single input feature and keep the feature
that produces the best accuracy.
2. Then, train 𝑛 − 1 separate models on 2 input features, the selected one and
one more. At the end keep the additional feature that produces the best
accuracy.
3. And so on … Continue until an acceptable error rate is reached.

Material

https://thenewstack.io/3-new-techniques-for-data-dimensionality-reduction-in-machine-learning/

Data Preparation:
Feature Engineering
Feature Engineering: Motivation

Sometimes transforming the original data allows for better discrimination

by ML algorithms.

Feature Engineering: Techniques

§ Coordinate Transformations
Remember PCA and LDA?
Polar coordinates , …

§ Distances to cluster centres, after data clustering

§ Simple math transformations on single columns
(𝑒𝑥 , 𝑥2, 𝑥3, tanh(𝑥), log(𝑥) , …)
§ Combining together multiple columns in math functions
(𝑓(𝑥1, 𝑥2, … 𝑥𝑛), 𝑥2 – 𝑥1, …)
§ The whole process is domain dependent

Feature Engineering in Time Series Analysis

§ Second order differences: 𝑦 = 𝑥(𝑡) – 𝑥(𝑡 − 1) & 𝑦‘(𝑡) = 𝑦(𝑡) – 𝑦(𝑡 − 1)

§ Logarithm: log(𝑦‘(𝑡))

Confirmation of Attendance and Survey

§ If you would like to get a “Confirmation of

Attendance” please click on the link below*

Confirmation of Attendance and Survey

§ The link also takes you to our course

feedback survey. Filling it in is optional but
highly appreciated!

Thank you!

*Please send your request within the next 3 days

Exercises

§ Clustering
§ Goal: Cluster location data from California
§ 01_Clustering
§ Data Preparation
§ 02_Missing_Value_Handling
§ 03_Outlier_Detection
§ 04_Dimensionality_Reduction
§ 05_Feature_Selection

Machine Learning Cheat Sheet

https://www.knime.com/sites/default/files/110519_KNIME_Machine_Learning_Cheat%20Sheet.pdf

Thank You!

277

Data Science For Executives
100% (1)
Data Science For Executives
40 pages
Analytics For Decision Making
0% (1)
Analytics For Decision Making
66 pages
NYU Stern Casebook Consulting Case Interview Book 2018 - 2019纽约大学斯特恩商学院咨询案例面试
100% (3)
NYU Stern Casebook Consulting Case Interview Book 2018 - 2019纽约大学斯特恩商学院咨询案例面试
293 pages
ZS Data Science Challenge 2019 PDF
No ratings yet
ZS Data Science Challenge 2019 PDF
14 pages
Tasc 2022 - 03
100% (4)
Tasc 2022 - 03
64 pages
Intermediate Accounting DeMYSTiFieD
From Everand
Intermediate Accounting DeMYSTiFieD
Geri B. Wink
5/5 (4)
Linear Algebra and Computer Application (Math149) Project On Cryptography (Hill Cipher Method)
No ratings yet
Linear Algebra and Computer Application (Math149) Project On Cryptography (Hill Cipher Method)
11 pages
4.4 Edit (Levenshtein) Distance
No ratings yet
4.4 Edit (Levenshtein) Distance
5 pages
Les 2
No ratings yet
Les 2
35 pages
Lab2-AI
No ratings yet
Lab2-AI
51 pages
Oversight Systems Case Solution
No ratings yet
Oversight Systems Case Solution
14 pages
Practicing Data Science A Collection of Case Studies: Ivan Pazin Kathrin Melcher
No ratings yet
Practicing Data Science A Collection of Case Studies: Ivan Pazin Kathrin Melcher
32 pages
Dr Stefan Helfrich PPTs- KNIME Analytics
No ratings yet
Dr Stefan Helfrich PPTs- KNIME Analytics
77 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
31 pages
Data Science Internship Summary Presentation: Vikas Gupta June 2021
100% (1)
Data Science Internship Summary Presentation: Vikas Gupta June 2021
27 pages
Ebook Build Buy or Both
No ratings yet
Ebook Build Buy or Both
15 pages
Concept Marketing Ideas: - A Business Plan
No ratings yet
Concept Marketing Ideas: - A Business Plan
14 pages
NYU Stern Casebook 2018 - 2019 PDF
100% (1)
NYU Stern Casebook 2018 - 2019 PDF
188 pages
Business Plan Project
No ratings yet
Business Plan Project
12 pages
Retail Project
No ratings yet
Retail Project
29 pages
Chapter - 1
No ratings yet
Chapter - 1
21 pages
Intro Data Mining
100% (1)
Intro Data Mining
87 pages
Balram-Pandey-1up
No ratings yet
Balram-Pandey-1up
16 pages
Building A Career in Data Science
No ratings yet
Building A Career in Data Science
15 pages
Data Science
No ratings yet
Data Science
16 pages
Smart Banking - E-Business
No ratings yet
Smart Banking - E-Business
66 pages
17-Massimo Cutaia
No ratings yet
17-Massimo Cutaia
32 pages
697786_697786_XS_PPT_Good_Habits_of_Qlik_Sense_Performance_1
No ratings yet
697786_697786_XS_PPT_Good_Habits_of_Qlik_Sense_Performance_1
40 pages
Lecture 1
No ratings yet
Lecture 1
20 pages
Pitch The Way VCs Think by Khosla Ventures 1712618516
No ratings yet
Pitch The Way VCs Think by Khosla Ventures 1712618516
73 pages
T2 Lecture 1 Slides - S
No ratings yet
T2 Lecture 1 Slides - S
69 pages
Krytotech Investor Presentation V3
No ratings yet
Krytotech Investor Presentation V3
21 pages
FUNDA-QUIZ-1-REVIEWER
No ratings yet
FUNDA-QUIZ-1-REVIEWER
5 pages
Big Data Analytics: Achieving Business Value From Big Data Analyticcs Anoop Dwivedi March 21, 2013
No ratings yet
Big Data Analytics: Achieving Business Value From Big Data Analyticcs Anoop Dwivedi March 21, 2013
24 pages
Advancing Procurement Analytics Tamr
50% (2)
Advancing Procurement Analytics Tamr
18 pages
Fourth Edition: Descriptive Analytics II: Business Intelligence and Data Warehousing
No ratings yet
Fourth Edition: Descriptive Analytics II: Business Intelligence and Data Warehousing
61 pages
Jeremey-Donovan_Hey-Salespeople-How-to-Double-Your-Reply-Rates-1 Salesloft
No ratings yet
Jeremey-Donovan_Hey-Salespeople-How-to-Double-Your-Reply-Rates-1 Salesloft
28 pages
BA_CH01
No ratings yet
BA_CH01
14 pages
MGNF_GONDA PRESENTATION
No ratings yet
MGNF_GONDA PRESENTATION
8 pages
UEL - BIS - Slides
No ratings yet
UEL - BIS - Slides
233 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Data Mining1
No ratings yet
Data Mining1
13 pages
Data Science Day 1
No ratings yet
Data Science Day 1
22 pages
Introdn To Bi Da
No ratings yet
Introdn To Bi Da
35 pages
EIS-SM ABC Analysis N 2022
No ratings yet
EIS-SM ABC Analysis N 2022
2 pages
MITSMR Webinar Skillsoft 1 2023 - Slidedeck
No ratings yet
MITSMR Webinar Skillsoft 1 2023 - Slidedeck
30 pages
PureStorage LANTEL
No ratings yet
PureStorage LANTEL
31 pages
MachineLearning Presentation
No ratings yet
MachineLearning Presentation
71 pages
Data Warehousing and Data Mining
100% (4)
Data Warehousing and Data Mining
169 pages
Page 3 of About 1,16,000 Results (0.61 Seconds)
No ratings yet
Page 3 of About 1,16,000 Results (0.61 Seconds)
2 pages
Event Report For Tableau Experience Malaysia 280818
No ratings yet
Event Report For Tableau Experience Malaysia 280818
3 pages
An IT All PPT For Fifth Year
No ratings yet
An IT All PPT For Fifth Year
105 pages
Forecasting
No ratings yet
Forecasting
126 pages
Smart, Forecasting Original
No ratings yet
Smart, Forecasting Original
58 pages
Forecasting Training
No ratings yet
Forecasting Training
48 pages
Automation in Sales
No ratings yet
Automation in Sales
31 pages
Unit 2 Strat
No ratings yet
Unit 2 Strat
32 pages
UEL_BIS_Slides_Week1 (1)
No ratings yet
UEL_BIS_Slides_Week1 (1)
40 pages
Commonality
No ratings yet
Commonality
39 pages
Data Literacy Practitioner's Guide: EDF Data Literacy Certification workbook
From Everand
Data Literacy Practitioner's Guide: EDF Data Literacy Certification workbook
Michel Dekker
No ratings yet
Java Enterprise Design Patterns: Patterns in Java
From Everand
Java Enterprise Design Patterns: Patterns in Java
Mark Grand
2/5 (3)
Testing the Data Warehouse Practicum: Assuring Data Content, Data Structures and Quality
From Everand
Testing the Data Warehouse Practicum: Assuring Data Content, Data Structures and Quality
Doug Vucevic
No ratings yet
Problem Set 3: General Guideline
No ratings yet
Problem Set 3: General Guideline
12 pages
State of GPT
No ratings yet
State of GPT
50 pages
PCA2 (40 Marks) Code 30 Marks Viva 10 Marks
No ratings yet
PCA2 (40 Marks) Code 30 Marks Viva 10 Marks
3 pages
lec 2 Artificial intelligence
No ratings yet
lec 2 Artificial intelligence
22 pages
ssrn-4389763
No ratings yet
ssrn-4389763
23 pages
Laplace Transform
No ratings yet
Laplace Transform
5 pages
Automatic Control
No ratings yet
Automatic Control
16 pages
Quantum Computing
No ratings yet
Quantum Computing
30 pages
AIML
No ratings yet
AIML
2 pages
Lecture 1, Nature Scope & Importance of Statistics
100% (2)
Lecture 1, Nature Scope & Importance of Statistics
16 pages
List Data Structures: and Doubly) Linked List Search, Adding New Nodes
No ratings yet
List Data Structures: and Doubly) Linked List Search, Adding New Nodes
86 pages
Clna17669enc 001 PDF
No ratings yet
Clna17669enc 001 PDF
372 pages
CLO3 Exercises CLO3 With Answers
No ratings yet
CLO3 Exercises CLO3 With Answers
5 pages
Simona Svoboda - Libor Market Model With Stochastic Volatility PDF
No ratings yet
Simona Svoboda - Libor Market Model With Stochastic Volatility PDF
66 pages
【工程中的矩阵理论】习题答案整理
No ratings yet
【工程中的矩阵理论】习题答案整理
9 pages
Olin Wong ID# 008435258: Assignment 04
No ratings yet
Olin Wong ID# 008435258: Assignment 04
5 pages
Cauchy problem for first-order partial differential equations
No ratings yet
Cauchy problem for first-order partial differential equations
3 pages
GPT4Battery A LLM Driven Framework For Adaptive State of Health Estimation of Raw Li Ion Batteries
No ratings yet
GPT4Battery A LLM Driven Framework For Adaptive State of Health Estimation of Raw Li Ion Batteries
9 pages
A Introduction To Modern Cryptography
No ratings yet
A Introduction To Modern Cryptography
31 pages
Amirian (2024)
No ratings yet
Amirian (2024)
16 pages
Genetic Algorithms and Genetic Programming Modern Concepts and Practical Applications 1st Edition Michael Affenzeller 2024 scribd download
100% (3)
Genetic Algorithms and Genetic Programming Modern Concepts and Practical Applications 1st Edition Michael Affenzeller 2024 scribd download
81 pages
Detection of Tuberculosis With Hybrid Vgg-16: Aswin Sindhu SK 962321106012
No ratings yet
Detection of Tuberculosis With Hybrid Vgg-16: Aswin Sindhu SK 962321106012
17 pages
Chapter 8 Residual Analysis (Auto-Saved)
No ratings yet
Chapter 8 Residual Analysis (Auto-Saved)
28 pages
Get Multiple Criteria Decision Analysis State of The Art Surveys 2nd Edition Salvatore Greco Free All Chapters
100% (5)
Get Multiple Criteria Decision Analysis State of The Art Surveys 2nd Edition Salvatore Greco Free All Chapters
62 pages
Data-Driven Modelling. Exercise ANN-1. Prediction of Flow Using Artificial Neural Networks (ANN)
No ratings yet
Data-Driven Modelling. Exercise ANN-1. Prediction of Flow Using Artificial Neural Networks (ANN)
9 pages
Optimal Control
No ratings yet
Optimal Control
48 pages
Object Tracking Using Kalman Filter
No ratings yet
Object Tracking Using Kalman Filter
5 pages
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
No ratings yet
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
6 pages