0% found this document useful (0 votes)

17 views

Foundations of Machine Learning: Module 7: Computational Learning Theory

Uploaded by

nilayesh.bhattacharya

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Foundations of Machine Learning: Module 7: Computational Learning Theory

Uploaded by

nilayesh.bhattacharya

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Foundations of Machine Learning

Module 7: Computational
Learning Theory
Part A: Finite Hypothesis Space
Sudeshna Sarkar
IIT Kharagpur
Goal of Learning Theory
• To understand
– What kinds of tasks are learnable?
– What kind of data is required for learnability?
– What are the (space, time) requirements of the learning
algorithm.?
• To develop and analyze models
– Develop algorithms that provably meet desired criteria
– Prove guarantees for successful algorithms

2
Goal of Learning Theory
• Two core aspects of ML
– Algorithm Design. How to optimize?
– Confidence for rule effectiveness on future data.
• We need particular settings (models)
– Probably Approximately Correct (PAC)
Pr 𝑃 𝑐⨁ℎ ≤ 𝜖 ≥ 1 − 𝛿

C h

C ⨁h=
Error region3
Prototypical Concept Learning Task
• Given
– 𝑑 𝑑
Instances X (e.g., 𝑋 = 𝑅 or 𝑋 = {0,1}
h 𝑐
+ + −
– Distribution 𝒟 over X - + + -
– Target function c - -
– Hypothesis Space ℋ Instance space X
– Training Examples S = 𝑥𝑖 , 𝑐(𝑥𝑖 ) 𝑥𝑖 i.i.d. from 𝒟
• Determine
– A hypothesis h ∈ ℋ s.t. ℎ 𝑥 = 𝑐(𝑥) for all 𝑥 in S?
– A hypothesis h ∈ ℋ s.t. ℎ 𝑥 = 𝑐 𝑥 for all 𝑥 in X?
• An algorithm does optimization over S, find hypothesis h.
• Goal: Find h which has small error over 𝒟

4
Computational Learning Theory
• Can we be certain about how the learning algorithm
generalizes?
• We would have to see all the examples.

• Inductive inference –
generalizing beyond the training h 𝑐
data is impossible unless we add + + −
- + + -
more assumptions (e.g., - -
priors over H) Instance space X
We need a bias!
Function Approximation
• How many 𝑁
labeled examples in order to determine which
of the 22 hypothesis is the correct one?
• All 2𝑁 instances in X must be labeled!
• Inductive inference: generalizing beyond the training data is
impossible unless we add more assumptions (e.g., bias)

- 𝐻 = ℎ: 𝑋 → 𝑌
|𝑋| 2𝑁
+ ℎ1 ||H|=2 = 2
c + +
- +
- + -
ℎ2
Instance space X
Error of a hypothesis
The true error of hypothesis h, with respect to the target
concept c and observation distribution 𝒟 is the probability that h
will misclassify an instance drawn according to 𝒟
𝑒𝑟𝑟𝑜𝑟𝒟 ℎ 𝑃𝑟𝑥~𝒟 𝑐 𝑥 ≠ ℎ 𝑥
In a perfect world, we’d like the true error to be 0.

Bias: Fix hypothesis space H

c may not be in H => Find h close to c
A hypothesis h is approximately correct if
𝑒𝑟𝑟𝑜𝑟𝒟 ℎ ≤ 𝜀
PAC model
• Goal: h has small error over D.
• True error: 𝑒𝑟𝑟𝑜𝑟𝐷 ℎ = Pr ℎ 𝑥 ≠ 𝑐 ∗ 𝑥
𝑥~𝐷
• How often ℎ 𝑥 ≠ 𝑐 ∗ 𝑥 over future instances drawn at
random from D
• But, can only measure:
1
Training error: 𝑒𝑟𝑟𝑜𝑟𝑆 ℎ = ෍ 𝐼(ℎ 𝑥𝑖 ≠ 𝑐 ∗ 𝑥 )
𝑚 𝑖
How often ℎ 𝑥 ≠ 𝑐 ∗ 𝑥 over training Instances

• Sample Complexity: bound 𝑒𝑟𝑟𝑜𝑟𝐷 (ℎ) in terms of 𝑒𝑟𝑟𝑜𝑟𝑆 (ℎ)

Probably Approximately Correct Learning

• PAC Learning concerns efficient learning

• We would like to prove that
– With high probability an (efficient) learning algorithm will
find a hypothesis that is approximately identical to the
hidden target concept.

• We specify two parameters, 𝜀 and 𝛿 and

require that with probability at least (1−𝛿) a
system learn a concept with error at most 𝜀.
Sample Complexity for Supervised Learning
Theorem
1 1
𝑚 ≥ 𝐼𝑛 |𝐻| + 𝐼𝑛
∈ 𝛿
labeled examples are sufficient so that with prob. 1 − 𝛿, all ℎ ∈
𝐻 with 𝑒𝑟𝑟𝑜𝑟𝐷 (ℎ) ≥∈ have 𝑒𝑟𝑟𝑜𝑟𝑆 (ℎ) > 0.
• inversely linear in 𝜖
• logarithmic in |H|
• 𝜖 error parameter: D might place low weight on certain parts of the
space
• 𝛿 confidence parameter: there is a small chance the examples we
get are not representative of the distribution
Sample Complexity for Supervised
Learning
1 1
Theorem: 𝑚 ≥ 𝐼𝑛 |𝐻| + 𝐼𝑛 labeled examples are sufficient so that
∈ 𝛿
with prob. 1 − 𝛿, all ℎ ∈ 𝐻 with 𝑒𝑟𝑟𝑜𝑟𝐷 (ℎ) ≥∈ have 𝑒𝑟𝑟𝑜𝑟𝑆 (ℎ) > 0.
Proof: Assume k bad hypotheses Hbad={ℎ1 , ℎ2 , … , ℎ𝑘 } with
𝑒𝑟𝑟𝐷 (ℎ𝑖 ) ≥∈
• Fix ℎ𝑖 . Prob. ℎ𝑖 consistent with first training example is ≤ 1 −
∈. Prob. ℎ𝑖 consistent with first m training examples is ≤
(1 −∈)𝑚 .
• Prob. that at least one ℎ𝑖 consistent with first m training
examples is
≤ 𝑘(1 −∈)𝑚 ≤ |𝐻|(1 −∈)𝑚 .
• Calculate value of m so that |𝐻|(1 −∈)𝑚 ≤ 𝛿
• Use the fact that 1 − 𝑥 ≤ 𝑒 −𝑥 , sufficient to set |𝐻|𝑒 −∈𝑚 ≤ 𝛿
Sample Complexity: Finite Hypothesis
Spaces Realizable Case
PAC: How many examples suffice to guarantee small error whp.
Theorem
1 1
𝑚 ≥ 𝐼𝑛 |𝐻| + 𝐼𝑛
∈ 𝛿
labeled examples are sufficient so that with prob. 1 − 𝛿, all ℎ ∈ 𝐻 with
𝑒𝑟𝑟𝐷 (ℎ) ≥∈ have 𝑒𝑟𝑟𝑆 (ℎ) > 0.

Statistical Learning Way:

With probability at least 1 − 𝛿, all ℎ ∈ 𝐻 s.t. 𝑒𝑟𝑟𝑆 ℎ = 0 we have
1 1
𝑒𝑟𝑟𝐷 (ℎ) ≤ | |
𝐼𝑛 𝐻 + 𝐼𝑛
𝑚 𝛿
P(consist( H bad , D))  H e m  
m 
e 
H

 m  ln( )
H
  
m    ln  /  (flip inequality )

 H 
 H 
m   ln /

  
 1 
m   ln  ln H  / 
  
Sample complexity: inconsistent finite |ℋ|
• For a single hypothesis to have misleading training error
−2𝑚𝜀2
Pr 𝑒𝑟𝑟𝑜𝑟𝒟 𝑓 ≤ 𝜀 + 𝑒𝑟𝑟𝑜𝑟𝐷 𝑓 ≤ 𝑒
• We want to ensure that the best hypothesis has error
bounded in this way
– So consider that any one of them could have a large error
−2𝑚𝜀 2
Pr (∃𝑓 ∈ ℋ)𝑒𝑟𝑟𝑜𝑟𝒟 𝑓 ≤ 𝜀 + 𝑒𝑟𝑟𝑜𝑟𝐷 𝑓 ≤ |ℋ|𝑒
• From this we can derive the bound for the number of
samples needed.
1 1
𝑚 ≥ 2 (ln ℋ + ln( ))
2𝜀 𝛿
Sample Complexity: Finite Hypothesis Spaces

Consistent Case
Theorem
1 1
𝑚 ≥ 𝐼𝑛 |𝐻| + 𝐼𝑛
∈ 𝛿
labeled examples are sufficient so that with prob. 1 − 𝛿, all ℎ ∈ 𝐻 with
𝑒𝑟𝑟𝐷 (ℎ) ≥∈ have 𝑒𝑟𝑟𝑆 (ℎ) > 0.

Inconsistent Case
What if there is no perfect h?
Theorem: After m examples, with probability ≥ 1 − 𝛿, all ℎ ∈ 𝐻 have
𝑒𝑟𝑟𝐷 ℎ − 𝑒𝑟𝑟𝑆 (ℎ) <∈, for
2 2
𝑚≥ 2
𝐼𝑛 |𝐻| + 𝐼𝑛
2∈ 𝛿
Sample complexity: example
• 𝒞 : Conjunction of n Boolean literals. Is 𝒞 PAC-learnable?
|ℋ| = 3𝑛
1 1
𝑚 ≥ (𝑛 ln 3 + ln( ))
𝜀 𝛿

• Concrete examples:
– δ=ε=0.05, n=10 gives 280 examples
– δ=0.01, ε=0.05, n=10 gives 312 examples
– δ=ε=0.01, n=10 gives 1,560 examples
– δ=ε=0.01, n=50 gives 5,954 examples
• Result holds for any consistent learner, such as FindS.
Sample Complexity of Learning
Arbitrary Boolean Functions
• Consider any boolean function over n boolean features
such as the hypothesis space of DNF or decision trees.
There are 22^n of these, so a sufficient number of
examples to learn a PAC concept is:
1 2𝑛 1 1 𝑛 1
𝑚 ≥ (ln 2 + ln( )) = (2 ln 2 + ln( ))
𝜀 𝛿 𝜀 𝛿

• δ=ε=0.05, n=10 gives 14,256 examples

• δ=ε=0.05, n=20 gives 14,536,410 examples
• δ=ε=0.05, n=50 gives 1.561x1016 examples

17
Thank You
Concept Learning Task
“Days in which Aldo enjoys swimming”
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

• Hypothesis Representation: Conjunction of constraints on the

6 instance attributes
• “?” : any value is acceptable
• specify a single required value for the attribute
• “” : that no value is acceptable
Concept Learning

h = (?, Cold, High, ?, ?, ?)

indicates that Aldo enjoys his favorite sport on
cold days with high humidity

Most general hypothesis: (?, ?, ?, ?, ?, ? )

Most specific hypothesis: (, , , , , )
Find-S Algorithm
1. Initialize h to the most specific hypothesis in ℋ
2. For each positive training instance x
For each attribute constraint ai in h
IF the constraint ai in h is satisfied by x
THEN do nothing
ELSE replace ai in h by next more general
constraint satisfied by x
3. Output hypothesis h
Concept Learning
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

Finding a Maximally Specific Hypothesis

Find-S Algorithm
h1  (, , , , , )
h2  (Sunny, Warm, Normal, Strong, Warm, Same)
h3  (Sunny, Warm, ?, Strong, Warm, Same)
h4  (Sunny, Warm, ?, Strong, ?, ?)

Back
Thank You
Foundations of Machine Learning
Module 7: Computational
Learning Theory
Part A
Sudeshna Sarkar
IIT Kharagpur
Sample Complexity: Infinite
Hypothesis Spaces
• Need some measure of the expressiveness of infinite
hypothesis spaces.
• The Vapnik-Chervonenkis (VC) dimension provides
such a measure, denoted VC(H).
• Analagous to ln|H|, there are bounds for sample
complexity using VC(H).
Shattering
• Consider a hypothesis for the 2-class problem.
• A set of 𝑁 points (instances) can be labeled as + or
− in 2𝑁 ways.
• If for every such labeling a function can be found in
ℋ consistent with this labeling, we set that the set
of instances is shattered by ℋ.
Three points in R2
• It is enough to find one set of three points that can be
shattered.
• It is not necessary to be able to shatter every possible set of
three points in 2 dimensions
Shattering Instances
• Consider 2 instances described using a single real-
valued feature being shattered by a single
interval.

x y
Shattering Instances (cont)
But 3 instances cannot be shattered by a single interval.
x y z

x y z + –
_ x,y,z
x y,z
y x,z
x,y z
x,y,z
y,z x
z x,y
Cannot do x,z y

7
VC Dimension
• The Vapnik-Chervonenkis dimension, VC(H). of hypothesis
space H defined over instance space X is the size of the largest
finite subset of X shattered by H. If arbitrarily large finite
subsets of X can be shattered then VC(H) = 

• If there exists at least one subset of X of size d that can be

shattered then VC(H) ≥ d.
• If no subset of size d can be shattered, then VC(H) < d.

• For a single intervals on the real line, all sets of 2 instances can
be shattered, but no set of 3 instances can, so VC(H) = 2.
VC Dimension
• An unbiased hypothesis space shatters the entire instance
space.
• The larger the subset of X that can be shattered, the more
expressive (and less biased) the hypothesis space is.
• The VC dimension of the set of oriented lines in 2-d is
three.

• Since there are 2m partitions of m instances, in order for H

to shatter instances: |H| ≥ 2m.
• Since |H| ≥ 2m, to shatter m instances, VC(H) ≤ log2|H|

9
VC Dimension Example
Consider axis-parallel rectangles in the real-plane,
i.e. conjunctions of intervals on two real-valued
features. Some 4 instances can be shattered.

Some 4 instances cannot be shattered:

VC Dimension Example (cont)
• No five instances can be shattered since there can be at most
4 distinct extreme points (min and max on each of the 2
dimensions) and these 4 cannot be included without including
any possible 5th point.

• Therefore VC(H) = 4
• Generalizes to axis-parallel hyper-rectangles (conjunctions of
intervals in n dimensions): VC(H)=2n.

11
Upper Bound on Sample Complexity with VC

• Using VC dimension as a measure of expressiveness, the

following number of examples have been shown to be
sufficient for PAC Learning (Blumer et al., 1989).
1 2  13  
 4 log 2    8VC ( H ) log 2   
     

• Compared to the previous result using ln|H|, this bound has

some extra constants and an extra log2(1/ε) factor. Since
VC(H) ≤ log2|H|, this can provide a tighter upper bound on
the number of examples needed for PAC learning.

12
Sample Complexity Lower Bound with VC
• There is also a general lower bound on the minimum number of
examples necessary for PAC learning (Ehrenfeucht, et al., 1989):
Consider any concept class C such that 𝑉𝐶 𝐻 > 2 , any learner 𝐿
and any 0 < 𝜀 < 1Τ8 , 0 < 𝛿 < 1Τ100 .
Then there exists a distribution D and target concept in C such that if
L observes fewer than:
1  1  VC(C )  1 
max  log 2  , 
   32 
examples, then with probability at least δ, L outputs a hypothesis
having error greater than ε.
• Ignoring constant factors, this lower bound is the same as the upper
bound except for the extra log2(1/ ε) factor in the upper bound.
13
Thank You
Foundations of Machine Learning

Module 8: Ensemble Learning

Part A

Sudeshna Sarkar
IIT Kharagpur
What is Ensemble Classification?
• Use multiple learning algorithms (classifiers)
• Combine the decisions
• Can be more accurate than the individual classifiers
• Generate a group of base-learners
• Different learners use different
– Algorithms
– Hyperparameters
– Representations (Modalities)
– Training sets
Why should it work?
• Works well only if the individual classifiers
disagree
– Error rate < 0.5 and errors are independent
– Error rate is highly correlated with the correlations
of the errors made by the different learners
Bias vs. Variance
• We would like low bias error and low variance error
• Ensembles using multiple trained (high variance/low
bias) models can average out the variance, leaving
just the bias
– Less worry about overfit (stopping criteria, etc.)
with the base models
Combining Weak Learners
• Combining weak learners
– Assume n independent models, each having accuracy of
70%.
– If all n give the same class output then you can be confident
it is correct with probability 1-(1-.7)n.
– Normally not completely independent, but unlikely that all n
would give the same output
• Accuracy better than the base accuracy of the models by using
the majority output.
– If n1 models say class 1 and n2<n1 models say class 2, then
P(class1) = 1 – Binomial(n, n2, .7)

n! n -r
P(r) = p (1 - p)
r

r!(n - r)!
Ensemble Creation Approaches
• Get less correlated errors between models
– Injecting randomness
• initial weights (eg, NN), different learning parameters,
different splits (eg, DT) etc.
– Different Training sets
• Bagging, Boosting, different features, etc.
– Forcing differences
• different objective functions
– Different machine learning model
Ensemble Combining Approaches
• Unweighted Voting (e.g. Bagging)
• Weighted voting – based on accuracy (e.g. Boosting),
Expertise, etc.
• Stacking - Learn the combination function
Combine Learners: Voting
• Unweighted voting
• Linear combination
(weighted vote)
• weight ∝ accuracy
• weight ∝ 1Τvariance
L
y = åw jd j
j=1
L
w j ³ 0 and åw j =1
j=1

• Bayesian
(
P Ci |x = ) å
all models M j
( )( )
P Ci |x , Mj P Mj
Fixed Combination Rules
Bayes Optimal Classifier
• The Bayes Optimal Classifier is an ensemble of all the
hypotheses in the hypothesis space.
• On average, no other ensemble can outperform it.
• The vote for each hypothesis
– proportional to the likelihood that the training dataset
would be sampled from a system if that hypothesis were
true.
– is multiplied by the prior probability of that hypothesis.
• y is the predicted class,
• C is the set of all possible classes,
• H is the hypothesis space,
• T is the training data.
The Bayes Optimal Classifier represents a hypothesis
that is not necessarily in H.
But it is the optimal hypothesis in the ensemble space.
Practicality of Bayes Optimal Classifier
• Cannot be practically implemented.
• Most hypothesis spaces are too large
• Many hypotheses output a class or a value, and not
probability
• Estimating the prior probability for each
hypothesizes is not always possible.
BMA

• All possible models in the model space used

weighted by their probability of being the “Correct”
model
• Optimal given the correct model space and priors
Why are Ensembles Successful?
• Bayesian perspective:
PC i | x    PC | x ,M PM 
allmodelsMj
i j j

• If dj are independent
 1  1   1
Var  y   Var   d j   2 Var   d j   2 L  Var d j   Var d j 
1
  
 j L  L  j  L L

Bias does not change, variance decreases by L

• If dependent, error increase with positive correlation
  1 
Vary   2 Var  d j   2  Vard j   2 Cov(di , d j )
1
L  j  L  j j i j 
Challenge for developing Ensemble Models

• The main challenge is to obtain base models which are

independent and make independent kinds of errors.
• Independence between two base classifiers can be assessed
in this case by measuring the degree of overlap in training
examples they misclassify
(|AB|/|AB|)
Thank You
Foundations of Machine Learning

Module 8: Ensemble Learning

Part B: Bagging and Boosting

Sudeshna Sarkar
IIT Kharagpur
Bagging
• Bagging = “bootstrap aggregation”
– Draw N items from X with replacement
• Desired learners with high variance (unstable)
– Decision trees and ANNs are unstable
– K-NN is stable
• Use bootstrapping to generate L training sets and
train one base-learner with each (Breiman, 1996)
• Use voting
Bagging
• Sampling with replacement

• Build classifier on each bootstrap sample

• Each sample has probability (1 – 1/n)n of being

selected
Boosting
• An iterative procedure. Adaptively change distribution of
training data.
– Initially, all N records are assigned equal weights
– Weights change at the end of boosting round
• On each iteration t:
– Weight each training example by how incorrectly it was
classified
– Learn a hypothesis: ℎ𝑡
– A strength for this hypothesis: 𝛼𝑡
• Final classifier:
– A linear combination of the votes of the different
classifiers weighted by their strength
• “weak” learners
– P(correct) > 50%, but not necessarily much better
Adaboost
• Boosting can turn a weak algorithm into a strong
learner.
• Input: S={ 𝑥1 , 𝑦1 , … , (𝑥𝑚 , 𝑦𝑚 ) }
• 𝐷𝑡 (𝑖) : weight of i th training example
• Weak learner A
• For 𝑡 = 1,2, … , 𝑇
– Construct 𝐷𝑡 on {𝑥1 , 𝑥2 …}
– Run A on 𝐷𝑡 producing ℎ𝑡 : 𝑋 → {−1,1}
𝜖𝑡 =error of ℎ𝑡 over 𝐷𝑡
Given: 𝑥1 , 𝑦1 , … , (𝑥𝑚 , 𝑦𝑚 ) where 𝑥𝑖 ∈ 𝑋, 𝑦𝑖 ∈ 𝑌 = −1, +1
Initialize 𝐷1 𝑖 = 1Τ𝑚.
For 𝑡 = 1, … , 𝑇:
– Train weak learner using distribution 𝐷𝑡 .
– Get weak classifier ℎ𝑡 : 𝑋 → ℝ.
– Choose 𝛼𝑡 ∈ ℝ.
– Update:
𝐷𝑡 𝑖 exp(−𝛼𝑡 𝑦𝑖 ℎ𝑡 𝑥𝑖 )
𝐷𝑡 + 1 𝑖 =
𝑍𝑡
Where 𝑍𝑡 is a normalization factor
𝑚

𝑍𝑡 = ෍ 𝐷𝑡 𝑖 𝑒𝑥𝑝 (−𝛼𝑡 𝑦𝑖 ℎ𝑡 𝑥𝑖 )
𝑖=1
Output the final classifier:
𝑇

𝐻 𝑥 = 𝑠𝑖𝑔𝑛 ෍ 𝛼𝑡 ℎ𝑡 (𝑥) .
𝑡=1
Given: 𝑥1 , 𝑦1 , … , (𝑥𝑚 , 𝑦𝑚 ) where
𝑥𝑖 ∈ 𝑋, 𝑦𝑖 ∈ 𝑌 = −1, +1
Initialize 𝐷1 𝑖 = 1Τ𝑚.
For 𝑡 = 1, … , 𝑇:
– Train weak learner using distribution 𝐷𝑡 .
– Get weak classifier ℎ𝑡 : 𝑋 → ℝ.
Choose 𝛼𝑡 to minimize training error
– Choose 𝛼𝑡 ∈ ℝ.
1 1−∈𝑡
– Update: 𝛼𝑡 = 𝐼𝑛
𝐷𝑡 𝑖 exp(−𝛼𝑡 𝑦𝑖 ℎ𝑡 𝑥𝑖 ) 2 ∈𝑡
𝐷𝑡 + 1 𝑖 =
𝑍𝑡 where
Where 𝑍𝑡 is a normalization factor 𝑚
𝑚

𝑍𝑡 = ෍ 𝐷𝑡 𝑖 𝑒𝑥𝑝 (−𝛼𝑡 𝑦𝑖 ℎ𝑡 𝑥𝑖 ) ∈𝑡 = ෍ 𝐷𝑡 𝑖 δ(ℎ𝑡 𝑥𝑖 ≠ 𝑦𝑖 )

𝑖=1 𝑖=1
Output the final classifier:
𝑇

𝐻 𝑥 = 𝑠𝑖𝑔𝑛 ෍ 𝛼𝑡 ℎ𝑡 (𝑥) .
𝑡=1
Strong weak classifiers
• If each classifiers is (at least slightly) better than random
∈𝑡 < 0.5

• Ican be shown that AdaBoost will achieve zero training

error (expotentially fast):

𝑚 𝑇
1
෍ 𝛿(𝐻(𝑥𝑖 ) ≠ 𝑦𝑖 ) ≤ ෑ 𝑍𝑡 ≤ 𝑒𝑥𝑝 −2 ෍(1Τ2 −∈𝑡 )2
𝑚
𝑖=1 𝑡 𝑡=1
Illustrating AdaBoost
Initial weights for each data point Data points
for training

0.1 0.1 0.1

Original
Data +++ - - - - - ++

B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - -  = 1.9459
Illustrating AdaBoost
B1
0.0094 0.0094 0.4623
Boosting
Round 1 +++ - - - - - - -  = 1.9459

B2
0.3037 0.0009 0.0422
Boosting
Round 2 - - - - - - - - ++  = 2.9323

B3
0.0276 0.1819 0.0038
Boosting
Round 3 +++ ++ ++ + ++  = 3.8744

Overall +++ - - - - - ++
Thank You

UD21665B F Baseline 1 3 Series Multilingual Quick Start Guide 20220804
No ratings yet
UD21665B F Baseline 1 3 Series Multilingual Quick Start Guide 20220804
39 pages
Philips PCR Eleva S Plus Product Overview
No ratings yet
Philips PCR Eleva S Plus Product Overview
4 pages
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
No ratings yet
Learning Theory: Machine Learning 10 - 601B Seyoung Kim
44 pages
Bayesian Learning
No ratings yet
Bayesian Learning
81 pages
SupervisedLearning 2 33
No ratings yet
SupervisedLearning 2 33
32 pages
CS 2750 Machine Learning
No ratings yet
CS 2750 Machine Learning
14 pages
MLSM Lecture2 120923
No ratings yet
MLSM Lecture2 120923
35 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Computational Learning Theory
No ratings yet
Computational Learning Theory
15 pages
Colt Tutorial
No ratings yet
Colt Tutorial
43 pages
ML Notes
No ratings yet
ML Notes
161 pages
INT354 Unit 1 Part2
No ratings yet
INT354 Unit 1 Part2
14 pages
10-601 Machine Learning
No ratings yet
10-601 Machine Learning
7 pages
3maximum-likelyhood
No ratings yet
3maximum-likelyhood
15 pages
Aiml Lab Exp 1 (Find S)
No ratings yet
Aiml Lab Exp 1 (Find S)
24 pages
Questions_for_Unit_4 (2)
No ratings yet
Questions_for_Unit_4 (2)
6 pages
Slide 1
No ratings yet
Slide 1
37 pages
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
No ratings yet
1 The Probably Approximately Correct (PAC) Model: COS 511: Theoretical Machine Learning
6 pages
Eva Uation Methods 273 A Spring 09
No ratings yet
Eva Uation Methods 273 A Spring 09
17 pages
Concept Learning
No ratings yet
Concept Learning
42 pages
CH2 ConceptLearning
No ratings yet
CH2 ConceptLearning
38 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
第八章
No ratings yet
第八章
28 pages
Bark08 Ghahramani Samlbb 01
No ratings yet
Bark08 Ghahramani Samlbb 01
26 pages
Boosting and Applications Yuan
No ratings yet
Boosting and Applications Yuan
41 pages
Discrete Structures/Mathematics
No ratings yet
Discrete Structures/Mathematics
46 pages
Bayesian
No ratings yet
Bayesian
91 pages
SML_Lecture2
No ratings yet
SML_Lecture2
35 pages
SML_Lecture4
No ratings yet
SML_Lecture4
38 pages
Introduction To Hypothesis Testing, Power Analysis and Sample Size Calculations
No ratings yet
Introduction To Hypothesis Testing, Power Analysis and Sample Size Calculations
8 pages
Mathematics Part 1
No ratings yet
Mathematics Part 1
27 pages
Lec05 Brute Force
No ratings yet
Lec05 Brute Force
31 pages
2 concept-learning
No ratings yet
2 concept-learning
42 pages
Main Learning Algorithms: Find-S Algorithm
No ratings yet
Main Learning Algorithms: Find-S Algorithm
13 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Hypothesis Testing Part I
No ratings yet
Hypothesis Testing Part I
3 pages
Lect 0329
No ratings yet
Lect 0329
3 pages
Lesson 2 Statistical Inference
No ratings yet
Lesson 2 Statistical Inference
45 pages
PAC LEARNING
No ratings yet
PAC LEARNING
30 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Cs550 Manuscript
No ratings yet
Cs550 Manuscript
406 pages
Unit 3 Bayesian Learning
No ratings yet
Unit 3 Bayesian Learning
49 pages
KSMF
No ratings yet
KSMF
35 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Perceptron Bound Proof
No ratings yet
Perceptron Bound Proof
27 pages
Module 1- Concept Learning (1)
No ratings yet
Module 1- Concept Learning (1)
50 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
31 pages
Internal
No ratings yet
Internal
25 pages
Lecture 8
No ratings yet
Lecture 8
23 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Hw5 Solution
No ratings yet
Hw5 Solution
4 pages
Chapter 2 Concept Learning
No ratings yet
Chapter 2 Concept Learning
36 pages
CH 2
No ratings yet
CH 2
29 pages
Lecture8_Approximation-Algorithms
No ratings yet
Lecture8_Approximation-Algorithms
13 pages
Lecture 13
No ratings yet
Lecture 13
45 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Nearest Neighbour
No ratings yet
Nearest Neighbour
25 pages
Lecture8_Approximation-Algorithms (1)
No ratings yet
Lecture8_Approximation-Algorithms (1)
12 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Day 1 Coaching For MTS, Postman Mailgurad
No ratings yet
Day 1 Coaching For MTS, Postman Mailgurad
8 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Foundations of Machine Learning: Part A: Probability Basics
No ratings yet
Foundations of Machine Learning: Part A: Probability Basics
75 pages
Debasrita Physics
No ratings yet
Debasrita Physics
3 pages
CHEMISTRY TEST 3rd
No ratings yet
CHEMISTRY TEST 3rd
2 pages
CHEMISTY TEST Class 7 (2nd)
No ratings yet
CHEMISTY TEST Class 7 (2nd)
2 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Remote I/O Is1+ Busrail End Piece End 9494/A1-E0 Art. No. 261934
No ratings yet
Remote I/O Is1+ Busrail End Piece End 9494/A1-E0 Art. No. 261934
2 pages
[FREE PDF sample] Beginning JSP JSF and Tomcat Java Web Development 2nd Edition Giulio Zambon ebooks
100% (12)
[FREE PDF sample] Beginning JSP JSF and Tomcat Java Web Development 2nd Edition Giulio Zambon ebooks
49 pages
Sez Guide
No ratings yet
Sez Guide
24 pages
17DTK20F2024 - SARAHMEI BINTI ABDULLAH - Lab Work 6
No ratings yet
17DTK20F2024 - SARAHMEI BINTI ABDULLAH - Lab Work 6
23 pages
Resume Parse
No ratings yet
Resume Parse
3 pages
SA1 - Variable, Operators and Control Structures
No ratings yet
SA1 - Variable, Operators and Control Structures
9 pages
Important Instructions To Examiners:: Subject Code: 17305
No ratings yet
Important Instructions To Examiners:: Subject Code: 17305
11 pages
Management Information System and Decision-Making: July 2014
No ratings yet
Management Information System and Decision-Making: July 2014
7 pages
Sylabus Lab CSC134
No ratings yet
Sylabus Lab CSC134
1 page
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
FL97 - ML Scientist
No ratings yet
FL97 - ML Scientist
2 pages
Prompt Design and Engineering
No ratings yet
Prompt Design and Engineering
25 pages
Wireless Cycling Speed Sensor W - Bluetooth & ANT+ - Wahoo Fitness EU
No ratings yet
Wireless Cycling Speed Sensor W - Bluetooth & ANT+ - Wahoo Fitness EU
5 pages
TB Ahead With Cpe Key
No ratings yet
TB Ahead With Cpe Key
28 pages
Konfigurasi Routing Dinamis Eigrp
No ratings yet
Konfigurasi Routing Dinamis Eigrp
4 pages
Cie 0522 Coursework
100% (2)
Cie 0522 Coursework
6 pages
Module 5 LexisNexis
No ratings yet
Module 5 LexisNexis
4 pages
Java 5TH Chapter - 22517
No ratings yet
Java 5TH Chapter - 22517
12 pages
dm9 Digital Microscope 9 Manual
100% (1)
dm9 Digital Microscope 9 Manual
6 pages
Rennel PC
No ratings yet
Rennel PC
13 pages
Car Showroom Project File
No ratings yet
Car Showroom Project File
55 pages
Nigel Frank Dynamics Salary Survey 2019 20
No ratings yet
Nigel Frank Dynamics Salary Survey 2019 20
62 pages
Python University Question Paper
100% (3)
Python University Question Paper
3 pages
Sri Vasavi Engineering College: Computer Graphics
No ratings yet
Sri Vasavi Engineering College: Computer Graphics
4 pages
Cluster Analysis in Python Chapter4 PDF
No ratings yet
Cluster Analysis in Python Chapter4 PDF
30 pages
System - Windows.Documents - Serialization Namespace
No ratings yet
System - Windows.Documents - Serialization Namespace
110 pages