0% found this document useful (0 votes)

14 views

Machine Learning-5

The document discusses the topic of machine learning. It provides a syllabus for a course on machine learning that covers topics like supervised learning, bias-variance tradeoff, support vector machines, clustering, reinforcement learning and boosting. The syllabus lists relevant chapters from textbooks and online resources for each topic.

Uploaded by

ralac55582

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Machine Learning-5

Uploaded by

ralac55582

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Machine Learning

B.Tech.

Nayan Ranjan Paul

Department of CSE
Silicon Institute of Technology
Syllabus
Topic Book/Chapter
Module I

Books
Overview of supervised learning T1/Ch. 2
K-nearest neighbour T1/Ch. 2.3.2, R2/Ch.8.2
Multiple linear regression T1/Ch. 3.2.3
Shrinkage methods (Ridge regression, Lasso regression) T1/Ch. 3.4
Logistic regression T1/Ch. 4.3
●
T1: T Hastie, R.Tibshirani and J Friedman, The Elements of Statistical Linear Discriminant Analysis T1/Ch. 4.4
Learning – Data Mining Inference and Prediction, 2 nd Edition, Springer Feature selection
Module II
T1/Ch. 5.3

2009. Bias, Variance, and model complexity T1/Ch. 7.2

Bias-variance trade off T1/Ch. 7.3
●
T2: S. Haykin, Neural Networks and Learning Machines, 3rd Edition, Bayesian approach and BIC T1/Ch. 7.7
Pearson Education, 2009. Cross- validation T1/Ch. 7.10
Boot strap methods T1/Ch. 7.11
●
T3: E. Alpaydin, Introduction to Machine Learning, 2 nd Edition, Performance of Classification algorithms(Confusion Matrix, R7
Prentice Hall of India, 2010. Precision, Recall and ROCCurve)
Module III
●
R1: Y. G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction Generative model for discrete data
Bayesian concept learning R2/Ch. 6
to Statistical Learning with Applications in R, 2nd Edition, Springer, Naive Bayes classifier T1/Ch. 6.6.3, R2/Ch. 6
2013. SVM for classification T1/Ch. 12.3.1, R3/Ch. 1.5, T2/Ch. 6
Reproducing Kernels R5, R6, T2/Ch. 6
●
R2: T. M. Mitchell, Machine Learning, 1 st Edition, McGraw-Hill SVM for regression T1/Ch. 12.3.6, R3/Ch. 1.6, T2/Ch. 6
Education, 2013 Regression and classification trees T1/Ch. 9.2.2, 9.2.3
Random forest T1/Ch. 15
●
R3: B. Scholkopf, A. J. Smola, Learning with Kernels, MIT Press, 2002 Module IV
Clustering (K-means, spectral clustering) T1/Ch. 13.2.1
●
R4: Murphy, Machine Learning - A Probabilistic Perspective, MIT Press, Feature Extraction (Principal Component Analysis (PCA) R1/10.2
2012 Kernel based PCA
Independent Component Analysis (ICA)
T1/Ch. 14.5.4
R4/Ch. 12.6, CS229
●
R5: N. Aronszajn. Theory of reproducing kernels. Transactions of the Non-negative matrix factorization T1/14.6
Mixture of Gaussians R4/Ch. 11.2.1, CS229
American Mathematical Society, 68 (1950): 337–404 Expectation Maximization (EM) algorithm R4/Ch. 11.4, CS229
Module V
●
R6: S. Saitoh. Theory of Reproducing Kernels and its Applications. Boosting methods-exponential loss and AdaBoost T1/Ch. 10.4
Longman Scientific & Technical, 1988 Numerical Optimization via gradient boosting T1/Ch. 10.10
Introduction to Reinforcement Learning T3/Ch. 18.1
●
R7:https://www.kdnuggets.com/2020/01/guide-precision-recall- Elements of Reinforcement Learning T3/Ch. 18.3
confusion-matrix.html Single State Case: K-Armed Bandit T3/Ch. 18.2
Model-Based Learning (Value Iteration, Policy Iteration) T3/Ch. 18.4
Module - V
Boosting
●
Boosting is an ensamble approach.
●
What is Ensemble Classification?
– Combine the decisions of different weak classifiers
– Can be more accurate than the individual classifiers
– Generate a group of base-learners
– Different learners use different
●
Algorithms
●
Hyperparameters
●
Representations (Modalities)
●
Training sets
Boosting
●
Why should ensamble approach work?
– Works well only if the individual classifiers disagree
●
Error rate < 0.5 and errors are independent
●
Error rate is highly correlated with the correlations of the errors made by
the different learners
Boosting
●
Boosting is a general method for improving the accuracy of any given ML models.
●
Therefore, it is not a new model, but an add-on to the existing classification and
regression models.
●
Boosting is a sequential ensemble learning method that combines a set of weak
learners into a strong learner to minimize training errors.
●
Weak learner
– have low prediction accuracy, similar to random guessing.
– They are prone to overfitting
●
Strong learner
– Strong learners have higher prediction accuracy.
Benifits of Boosting
●
Ease of implementation
– Boosting has easy-to-understand and easy-to-interpret algorithms that learn
from their mistakes. No data preprocessing required. Most languages have
built-in libraries to implement boosting algorithms with many parameters that
can fine-tune performance.
●
Reduction of bias
– Boosting algorithms combine multiple weak learners in a sequential method,
which iteratively improves observations. This approach helps to reduce high
bias that is common in machine learning models.
●
Computational efficiency
– Boosting algorithms prioritize features that increase predictive accuracy during
training. They can help to reduce data attributes and handle large datasets
efficiently.
Boosting
●
Consider three weak models having accuracy slightly above 50%

Model Accuracy
M1 60%
M2 62%
M3 55%

●
Boosting is a method of combining weak models M1, M2, M3 and so as to yield a
higher accuracy model (say, 80%).
Boosting
●
Consider a two class problem with output variable coded as .

●
Given a data set D = {(xi,yi)} where i = 1,2,3,....N of patterns,

●
A classifier produces a prediction taking one of the two values {-1,1}.

●
Then the error rate on the training sample is

●
The purpose of boosting is to sequentially apply the weak classification algorithm
to repeatedly modified versions of the data which produces a sequence of weak
classifiers
Boosting
●
The predictions from all of them are then combined through a weighted majority
vote to produce the final prediction:

where are computed by the boosting algorithm.

Boosting
●
The following figure shows a schematic of the AdaBoost procedure.
Algorithm AdaBoost
Algorithm AdaBoost
Algorithm AdaBoost
Algorithm AdaBoost
Gradient Boosting
Gradient Boosting
Minimization of Error in Gradient Boosting
Gradient Boosting Process
Gradient Boosting Algorithm
Example - Gradient Boosting
Example - Gradient Boosting
Example - Gradient Boosting
Example - Gradient Boosting
Example - Gradient Boosting
Example - Gradient Boosting
Example - Gradient Boosting
Example - Gradient Boosting
XGBOOST
Practice Questions-Boosting
●
What is Ensemble Classification?
●
What is Boosting?
●
How does a boosting algorithm help in improving the performance of a set of
weak machine learning algorithms' performance?
●
What is the need of Boosting? State and explain AdaBoost algorithm with
schematic diagram.
Reinforcement Learning (Basic Idea)
●
Learn to take correct actions over time by experience
●
Similar to how humans learn: “trial and error”
●
Try an action –
– “see” what happens
– “remember” what happens
– Use this to change choice of action next time you are in the same situation
– “Converge” to learning correct actions
●
Focus on long term return, not immediate benefit
– Some action may seem not beneficial for short term, but its long term return
will be good.
What is Reinforcement Learning ?
●
An approach to Artificial Intelligence
●
Learning from interaction
●
Goal-oriented learning
●
Learning about, from, and while interacting with an external environment
●
Learning what to do—how to map situations to actions—so as to maximize a
numerical reward signal
●
No detail supervision avaliable
●
Sequence of actions required to obtain reward
●
Typically in a stochastic world.
What is Reinforcement Learning ?
●
Training of ML model(AI Agent) to make a sequence of decisions in an uncertain,
potentially complex environment.
●
In reinforcement learning, the learner is a decision-making agent that takes actions
in an environment and receives reward (or penalty) for its actions in trying to solve
a problem.
●
After a set of trial-and-error runs, it should learn the best policy, which is the
sequence of actions that maximize the total reward.
What is Reinforcement Learning ?
●
Example:
●
To build a machine that learns to play chess.
●
It cannot use a supervised learner for two reasons.
– First, it is very costly to have a teacher that will take us through many games
and indicate us the best move for each position.
– Second, in many cases, there is no such thing as the best move; the goodness
of a move depends on the moves that follow.
●
A single move does not count; a sequence of moves is good if after playing them
we win the game.
●
The only feedback is at the end of the game when we win or lose the game.
What is Reinforcement Learning ?
●
There is a decision maker, called the agent, that is placed in an environment (see
ﬁgure in slide 18).

●
In chess, the game-player is the decision maker(Agent) and the environment is the
board.

●
At any time, the environment is in a certain state that is one of a set of possible
states—for example, the state of the board in chess playing game.

●
The decision maker(Agent) has a set of actions possible-for example, legal
movement of pieces on the chess board in chess playing game.
What is Reinforcement Learning ?
●
Once an action is chosen and taken, the state changes.

●
The solution to the task requires a sequence of actions, and we get feedback, in the
form of a reward rarely, generally only when the complete sequence is carried out.

●
The learning agent learns the best sequence of actions to solve a problem
– where “best” is quantiﬁed as the sequence of actions that has the maximum
cumulative reward.

●
Such is the setting of reinforcement learning.

●
It is learning to make good decisions under uncertanity.
Challenges in Reinforcement Learning
●
Simulation and setup the environment

●
Scaling and tweeking the neural network controlling the agent

●
There does not exist any mode of communication to the network other than reward
and penalties.
Difference between ML, DL, and RL
●
ML is a form of AI in which computers are given the ability to
progressively(supervised and supervised) improve the performance of a specific
task with data.

●
DL models consists of few neural network layers which gradually learn more
abstract features from data.

●
RL employs a system of rewards and penalities and compels the agent to solve a
problem by itself.

●
Fundamental challenge in AI and ML is learning to make good decisions under
uncertainty.
Reinforcement Learning Involves
●
Optimization
●
Delayed consequences
●
Exploration
●
Generalization
Optimization
●
Goal is to find an optimal way to make decisions
– Yielding best outcomes or at least very good outcomes

●
Explicit notion of utility of decisions

●
Example: finding minimum distance route between two cities given network of
roads
Delayed Consequences
●
Decisions now can impact things much later...
– Saving for retirement
– Health insurance
●
Introduces two challenges
– When planning: decisions involve reasoning about not just immediate benefit
of a decision but also its longer term ramifications
– When learning: temporal credit assignment is hard (what caused later high or
low rewards?)
– Temporal credit assignment problem – How do the agent figure out the causal
relationship between the decisions it made in the past and the outcome in
future.
Exploration
●
Learning about the world by making decisions
– Agent as scientist
– Learn to ride a bike by trying (and failing)

●
Decisions impact what we learn about
– If you choose to go to IIT instead of SIT, you will have different later
experiences...
Generalization
●
Policy is mapping from past experience to action.

●
Why not just pre-program a policy?
RL vs Other AI and Machine Learning
RL vs Other AI and Machine Learning
RL vs Other AI and Machine Learning
RL vs Other AI and Machine Learning
RL vs Other AI and Machine Learning
Imitation Learning

●
Use experience to guide future decisions.
●
Sequential decision making under uncertainty
Modeling Real-World Problem
●
Let S: Set of possible states of the model
st : sequence of states with respect to time.
A : Set of possible actions

– P(st+1|st, at, st-1, at-1,...,s1, a1) : Transition dynamics

i.e Probability of entering into state st+1 from “st” with action “at” and previous
states si, action ai for all i=t to i>=1.
Markov Property
●
Consider a stochastic process (s0, s1, s2, . . .) evolving according to some transition
dynamics.
●
We say that the stochastic process has the Markov property if and only if
P(st+1|st, at, st-1, at-1 . . . , s1,a1) = P(st+1|st, at)
●
i.e. the transition probability of the next state conditioned on the history including
the current state and current action is equal to the transition probability of the next
state conditioned only on the current state and current action.
●
In such a scenario, the current state is a suficient statistic of history of the
stochastic process.
●
we say that the future is independent of the past given present.
●
It is memoryless property -> i.e systems only rembers current state and action.
Example: Mars Rover Markov Decision Process
●
States S: Location of rover: {s1, s2, s3, ..., s7}
●
Actions A: TryLeft / TryRight
●
Rewards: +1 in state s1
+10 in state s7
0 in all other states
Components of RL Algorithm
●
Model
●
Policy
●
Value Function
Model
●
It is the mathematical description of the transitions and rewards of the agent’s
environment.

●
Agent’s representation of how world changes given agent’s action

●
Transition / dynamics model predicts next agent state
P(st+1 = s’|st = s, at = a)

●
Reward model predicts immediate reward
r(st = s, at = a) = E(rt|st = s, at = a)
Example: Mars Rover Stochastic Markov Model
Policy
●
It is a function π: S →A that maps agent’s states to action. i.e. it determine which
action should be taken in state s.

– Detrministic policy : π(s) = a

– Stochastic policy : π(a|s) = P(at=a|st=s)
Example: Mars Rover Policy
Value Function
●
Value function Vπ : expected discounted sum of future rewards under a particular
policy π

●
It is the cumulative sum of future rewards obtained by the agent starting from the
state “s” and policy “π”.

●
γ: it is the discount factor

●
Can be used to quantify goodness/badness of states and actions
Example: Mars Rover Value Function
Markov Process
●
A Markov process is a stochastic process that satisfies the Markov property,
because of which we say that a Markov process is memoryless.

●
Assumptions:
– Finite state space : The state space of the Markov process is fnite. This means
that for the Markov process (s0, s1, s2, . . .), there is a state space S with |S| < ∞
– Stationary transition probabilities : The transition probabilities are time
independent. Mathematically, this means the following:
P(si = s’|si-1 = s) = P(sj = s’|sj-1 = s) , ∀ s, s’ ∈ S , ∀ i, j = 1, 2, . . . .

●
A Markov process satisfying these assumptions is called “Markov Chain”
Markov Process
●
A Markov process is defined by the tuple(S,P) where
– S: A finite state space
– P: Transition probability P(st+1= s’|st = s)
The matrix P is a non-negatve row-stochastic matrix, i.e. the sum of each row
=1
– If finite number (N) of states, can express P as a matrix
Example: Mars Rover Markov Chain Transition Matrix, P
MRP:Markov Reward Process
●
Example: Mars Rover Markov Chain Episodes
– Let Mars rover is at s4
– Example: Sample episodes starting from s4
– s4,s5,s6,s7,s7,s7, . . .
– s4,s3,s2,s1,s2,s3,s4,s5 . . .
– s4,s4,s5,s6,s6 . . .
MRP:Markov Reward Process
●
Markov Reward Process is a Markov Chain + rewards

●
Definition of Markov Reward Process (MRP) is a 4 tuple (S,P,R,γ) where
– S is a (finite) set of states (s ∈ S)
– P is dynamics/transition probability that specifices P(st+1 = s’|st = s)
– R is a reward function that maps states to rewards (real numbers),
i.e R : S → R
– Discount factor between 0 and 1 i.e. γ ∈ [0, 1]
MRP:Markov Reward Process
●
Reward function:
– In a Markov reward process, whenever a transition happens from a current
state s to a successor state s’ , a reward is obtained depending on the current
state s.

– Thus for the Markov process (s0, s1, s2, . . .),each transition si → si+1 is
accompanied by a reward ri for all i = 0, 1, . . .

– A particular episode of the Markov reward process is represented as (s0, r0, s1,
r1, s2, r2, . . .)
MRP:Markov Reward Process
●
Reward function:
– Rewards can be either deterministic or stochastic.

– In the deterministic case, mathematically this means that for all realizations of
the process we must have that: ri = rj , whenever si = sj ∀ i, j = 0, 1, . . . ,

– For a state s∈S, the expected reward R(s) = E[ri|si=s] ∀ i = 0, 1, . . .

– Example: +1 in s1
+10 in s7
0 in others
MRP:Markov Reward Process
●
Horizon(H) :
– The horizon H of a Markov reward process is defined as the number of time
steps in each episode (realization) of the process.
– The horizon can be finite or infinite.
– If the horizon is finite, then the process is also called a finite Markov reward
process.
●
Return(Gt):
– The return Gt of a Markov reward process is defined as the discounted sum of
rewards starting at time t up to the horizon H
– It is given by the following mathematical formula:
MRP:Markov Reward Process
●
State value function(Vt(s)) :
– The state value function Vt(s) for a Markov reward process and a state s ∈ S is
defined as the expected return starting from state s at time t, and is given by the
following expression
–

●
Discount factor(γ):
– if the horizon is infinite and γ = 1 ,then the return can become infinite even if
the rewards are all bounded.
– If this happens, then the value function V(s) can also become infinite.
– Such problems cannot then be solved using a computer.
– To avoid such mathematical difficulties and make the problems computationally
tractable we set γ < 1 .This quantity γ is called the discount factor .
MRP:Markov Reward Process
●
Discount factor(γ):
– Other than for purely computational reasons, it should be noted that humans
behave in much the same way -
●
we tend to put more importance in immediate rewards over rewards
obtained at a later time.
– When γ = 0 , we only care about the immediate reward
– When γ = 1 , we put as much importance on future rewards as compared the
present.
– if the horizon of the Markov reward process is nite, i.e. H < ∞ , then we can set
γ = 1 , as the returns and value functions are always finite.
MRP:Markov Reward Process
●
Example of a MRP: Mars Rover
– Given set of ststes S = {S1,S2,S3,S4,S5,S6,S7}, Discount factor =.5, rewards
r1=1, r2=r3=r4=r5=r6=0,r7 =10, and horizon H=4. Compute the return G0 of
the following episodes.
– S4, S5, S6, S7, S7: G0 = 0 + 0.5 ∗ 0 + 0.5 2 ∗ 0 + 0.5 3 ∗ 10 = 1.25
– S4, S4, S5, S4, S5 : G0 = 0 + 0.5 ∗ 0 + 0.5 2 ∗ 0 + 0.5 3 ∗ 0 = 0
– S4, S3, S2, S1, S2 : G0 = 0 + 0.5 ∗ 0 + 0.5 2 ∗ 0 + 0.5 3 ∗ 1 = 0.125
How to compute the value function of a MRP?
●
There are three dierent ways to compute the value function of a Markov reward
process:
– Simulation
– Analytic solution
– Iterative solution
How to compute the value function of a MRP?
●
Monte Carlo simulation:
– Generate a large number(N) of episodes starting with state “s” and time “t”.
– Compute Gt , γ t
– Mean Gt = ΣGt / N
– For a Markov reward process M = (S, P, R, γ), state “s”, time “t”, and the number of
simulation episodes N, the pseudo-code of the simulation algorithm is given as below:
How to compute the value function of a MRP?
●
Iterative solution for finite horizon:
– Dynamic programming based solution
–

– For M = (S, P, R, γ))

How to compute the value function of a MRP?
●
Iterative solution for infinite horizon:
– The algorithm takes as input a Markov reward process M = (S, P, R, γ) , and a tolerance
ε, and computes the value function for all states.

– For both these algorithms , the computational cost of each loop is O(|S|2)
Markov Decision Process(MDP)
●
It is formally represented using the tuple (S, A, P, R, γ) which are listed below:
– S : A finite state space.
– A : A finite set of actions which are available from each state
– P : A transition probability model that specifies P (s’|s, a) .
– R :A reward function that maps a state-action pair to rewards (real numbers),
i.e. R : S×A → R .
– γ : Discount factor γ ∈ [0, 1] .
Multi-Armed Bandit
●
It is a classic reinforcement learning problem which signifies the exploration-
exploitation tradeoff dilemma.

●
K-armed bandit / N-armed bandit

●
It is a problem in which a limited set of resources must be allocated between
competiting choices in a way that maximizes their expected gain, when each
choice’s properties are only partially known at the time of allocation, and may
become better understand as time passes or by allocating resources to the choice.

●
It is also categorized as stochastic scheduling.
Multi-Armed Bandit - Application
●
How must a given budget be distributed among some research departments (where
their outcomes are particularlly known) to maximize results?

●
How to design effective financial portfolio?

●
How to design adaptive routing to maximize end-to-end dely, delay jitter,
propagation delay etc. ?

●
How to execute effective investigation of clinical trials of different experimental
treatments while minimizing patient losses?

●
How to balance EXPLOITATION vs EXPLORATION tradeoff?
Modeling Multi-Armed Bandit Model (MAB)
●
It is a set of real-distributions:
– B = {R1,R2,....,Rk} , k∈N+ livers
μ1,μ2,μ3,.....,μk - Mean values associated with these reward distributions.

●
The gambler iteratively plays one liver per round and observes the associated
reward.

●
Objective : Maximize the sum of the collected rewards.

●
Horizon(H): The number of rounds that remain to be played.
Modeling Multi-Armed Bandit Model (MAB)
●
MAB ≡ One-state Markov Decision Process
●
: Regret after T rounds
●

●
Where : Maximal reward mean =
●
: is the reward in round t.
●
Defination (Zero-Regret Strategy): It is a strategy whose average regret per round
, with probability = 1
Bandit Strategies
●
1. Optimal solutions:
UCBC[Upper Confidence Bounds with Cluster]
– The algorithm incorporates the clustering information by playing at two levels:

●
First picking a cluster using a UCB -like strategy at each time step

●
Subsequently picking an arm within the cluster, again using a UCB-like
strategy
Bandit Strategies
●
2. approximate Solutions:
– Epsilon-greedy strategy
– Epsilon-first strategy
– Epsilon-decreasing strategy
– Adaptive epsilon-greedy strategy based on value differences(VDBE)
– Adaptive Epsilon-greedy strategy based on Baysian ensembles (Epsilon-BMC)
– Contextual-Epsilon-greedy strategy
Contextual Bandit
●
It is a generalization of the multi-armed bandit.
●
At each iteration an agent has to choose between arms, by referring the d-
dimensional feature vector.
●
Approximate solutions are
– Online linear Bandits
●
LinUCB(Upper Confidence Bandit)
●
LinRel(Linear Associative Reinforcement Learning)
– Online non-Linear Bandits
●
UCB...
●
Generalized Linear algorithm
●
Neural Bandit algorithm
●
Kernel UCB algorithm
●
Bandit forest algorithm
●
Oracle-based algorithm
Adversarial Bandit
●
Another variant of multi-armed bandit problem is called the adversial bandit.
●
Working Principle:
●
At each iteration, an agent chooses an arm and an adversary simultaneously
chooses the payoff structure for each arm.
●
Example:(iterated prisoner’s dilema)
– Each adversary has two arms to pull
– They can either deny or confess.
– Standard stochastic bandit algorithms did not work very well with these
iterations
Adversarial Bandit
– Example : if,(the opponent cooperates in the first 10 rounds, defects for the
next 20, then cooperate in the following 30, etc.)

– Then the algorithms such as Upper Confidence Bandit(UCB) won’t be able to

react very quickly to these changes.

– After a certain point sub-optimal arms are rarely pulled to limit exploration
and focus on exploitation

– When the environment changes the algorithm is unable to adapt or may not
even detect the change.

●
Note: it is one of the strongest generalizations of the bandit problem as it removes
all assumptions of the distribution and a solution to the adversial bandit problem.
Pseudocode: Appropriate solutions of Adversarial Bandit
●
Input: γ∈(0,1], V∈R
●
Initialization: wi(1) = 1 ∀ i=1 to k
●
For each t=1,2,3,...T
1. set , i = 1 to k
2. Draw “it” randomly according to the probabilities p1(t),p2(t),...,pk(t)
3. Reeive reward ∈(0,1]
4. for j=1,2,3.....,k
, if j = it

0 , Otherwise
Pseudocode: Appropriate solutions of Adversarial Bandit

●
Explanation:
– The algorithm chooses an arm at random with probability (1-γ) it prefers
arms with higher weights - Exploitation

– It chooses another arm with probability “γ” to uniformly randomly - explore

Follow the Perturbed Leader Algorithm(FPL)
●
Input:
●
Initialization: Ri(1)=0
for each t = 1,2 ...,T
1. for each arm generate a random noise from an experimental distribution:

2. Pull arm I(t): I(t) =

Add noise to each arm and pull the one with the highest value.
3. Update value:
4. The will be computed as before.
Follow the Perturbed Leader Algorithm(FPL)
●
Working Priniciple:
●
Here the agent selects the “best performing arm” by adding exponential noise to it.
i.e. explore the potential of the arm by adding noise or challenges

Approximate Solution FPL

Maintains weights for each arm to Doesn’t need to know the pulling
calculate pulling probability probability for each arm

The standard FPL does not have good

It has efficient theoritical guarantes
theoritical guarentes

It is computionally expensive(Calculation
Computationally efficient
for each arm)

Prosthodontic MCQ
71% (14)
Prosthodontic MCQ
72 pages
Machine Learning Unit-1.2
No ratings yet
Machine Learning Unit-1.2
23 pages
ML All Units Mca 3rd Semester Anna University
No ratings yet
ML All Units Mca 3rd Semester Anna University
100 pages
machine learning notes
No ratings yet
machine learning notes
20 pages
Unit I
No ratings yet
Unit I
28 pages
lksk ML typesToStudents
No ratings yet
lksk ML typesToStudents
18 pages
ML Module 5 2022 PDF
100% (2)
ML Module 5 2022 PDF
31 pages
Intorduction of ML
No ratings yet
Intorduction of ML
14 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Unit-I (R20 Syllabus) Machine Learning Basics
No ratings yet
Unit-I (R20 Syllabus) Machine Learning Basics
50 pages
M4 - FDS
No ratings yet
M4 - FDS
15 pages
There Are Key Areas in The Process of Machine Learning, Like
No ratings yet
There Are Key Areas in The Process of Machine Learning, Like
45 pages
Lecture 1
No ratings yet
Lecture 1
24 pages
Learning and Planning
No ratings yet
Learning and Planning
107 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Chapter Five
No ratings yet
Chapter Five
42 pages
Machine Learning Notes
100% (1)
Machine Learning Notes
8 pages
An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
20 pages
CH 01 Intro To ML - Updated
No ratings yet
CH 01 Intro To ML - Updated
66 pages
Chapter 1 Introduction To Machine Learning
No ratings yet
Chapter 1 Introduction To Machine Learning
29 pages
Machine Learning Unit1
No ratings yet
Machine Learning Unit1
151 pages
chapter 3- boosting theory
No ratings yet
chapter 3- boosting theory
7 pages
Machine Learning (R20a0518)
No ratings yet
Machine Learning (R20a0518)
87 pages
9e27d2e7-5dfa-4b8b-b760-d1fb4a21abd0
No ratings yet
9e27d2e7-5dfa-4b8b-b760-d1fb4a21abd0
24 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Lec 1
No ratings yet
Lec 1
12 pages
ML 2
No ratings yet
ML 2
4 pages
ML 7th Sem AIML ITE Notes Complete LONG
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG
202 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
Reinforcement Learning in a Id_12008003
No ratings yet
Reinforcement Learning in a Id_12008003
43 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
MLES
No ratings yet
MLES
30 pages
Unit V -Multiple Learners
No ratings yet
Unit V -Multiple Learners
54 pages
ensemble
No ratings yet
ensemble
33 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Machine Learning Chapter 1
No ratings yet
Machine Learning Chapter 1
12 pages
Machine Learning Approachs (AI)
100% (1)
Machine Learning Approachs (AI)
11 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
2 - Types of Machine Learning
No ratings yet
2 - Types of Machine Learning
26 pages
_LECTURE+NOTES_Boosting
No ratings yet
_LECTURE+NOTES_Boosting
8 pages
Reinforcement Learning Ebook Part1 PDF
No ratings yet
Reinforcement Learning Ebook Part1 PDF
24 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
02TSRevised Reinforcement Learning Ebook All Chapters PDF
No ratings yet
02TSRevised Reinforcement Learning Ebook All Chapters PDF
87 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
AAI Lecture 9 Sp 25
No ratings yet
AAI Lecture 9 Sp 25
26 pages
F20-AI-L10
No ratings yet
F20-AI-L10
45 pages
Unit I Introduction To RL
No ratings yet
Unit I Introduction To RL
30 pages
Introduction to Machine Learing
No ratings yet
Introduction to Machine Learing
4 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Week-12
No ratings yet
Week-12
59 pages
lec001
No ratings yet
lec001
17 pages
DataScience Unit1 (+notes)
No ratings yet
DataScience Unit1 (+notes)
56 pages
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Data Science Using Python and R
From Everand
Data Science Using Python and R
Chantal D. Larose
No ratings yet
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
Machine Learning-2
No ratings yet
Machine Learning-2
87 pages
Module-IV - 1.2 - Bezier Curve, Bspline
No ratings yet
Module-IV - 1.2 - Bezier Curve, Bspline
23 pages
SOFTWARE ENGINEERING (18CS1T06) - Mid Term Exam - 2021-2022
No ratings yet
SOFTWARE ENGINEERING (18CS1T06) - Mid Term Exam - 2021-2022
1 page
Tell Farakh PDF
No ratings yet
Tell Farakh PDF
41 pages
CORPS Down in Flames
No ratings yet
CORPS Down in Flames
76 pages
Tcs 1
No ratings yet
Tcs 1
5 pages
Further Practice - E10 - GK1
No ratings yet
Further Practice - E10 - GK1
7 pages
Kumo Desu Ga Nani Ka - Blood 2 - Wuxiaworld
No ratings yet
Kumo Desu Ga Nani Ka - Blood 2 - Wuxiaworld
1 page
Panasonic Broadcast Mechanical PARTS LIST: AG-DVX100BP/BE/BAN, AG-DVX102BEN, AG-DVC180BMC
No ratings yet
Panasonic Broadcast Mechanical PARTS LIST: AG-DVX100BP/BE/BAN, AG-DVX102BEN, AG-DVC180BMC
11 pages
SD VISta Program Guide v1.0
No ratings yet
SD VISta Program Guide v1.0
46 pages
CH 4 PDF
No ratings yet
CH 4 PDF
20 pages
A Review of Probabilistic Map Based Techniques
No ratings yet
A Review of Probabilistic Map Based Techniques
9 pages
Cells at Work - Volume 1
100% (1)
Cells at Work - Volume 1
200 pages
Endopeptidase: Carbo Protein
No ratings yet
Endopeptidase: Carbo Protein
12 pages
Compendium of Food Additive Specifications - Joint FAOWHO Expert Committee On Food Additives 74th Meeting 2011
100% (1)
Compendium of Food Additive Specifications - Joint FAOWHO Expert Committee On Food Additives 74th Meeting 2011
139 pages
Tds Tasnee LD 0222n
No ratings yet
Tds Tasnee LD 0222n
2 pages
‘Hidden images’ in Pollock’s work might have been intended – without him realising
No ratings yet
‘Hidden images’ in Pollock’s work might have been intended – without him realising
14 pages
RAMC Datasheet
No ratings yet
RAMC Datasheet
28 pages
Math. H90 Mock Putnam Exam's Solutions Fall 2007
No ratings yet
Math. H90 Mock Putnam Exam's Solutions Fall 2007
6 pages
Found and Forged 1st Edition Ivy Asher instant download
100% (1)
Found and Forged 1st Edition Ivy Asher instant download
40 pages
Global Gold Mine and Deposit Rankings 2013
No ratings yet
Global Gold Mine and Deposit Rankings 2013
40 pages
LTI Systems and Random Signals
No ratings yet
LTI Systems and Random Signals
2 pages
NIIR Project Consultancy Services (NPCS) 0/1849
No ratings yet
NIIR Project Consultancy Services (NPCS) 0/1849
1,849 pages
Module Chem F4 Chapter 2-8
No ratings yet
Module Chem F4 Chapter 2-8
29 pages
General Principles of Antimicrobial Therapy
No ratings yet
General Principles of Antimicrobial Therapy
11 pages
Paper
No ratings yet
Paper
112 pages
Chapter 12 Organic Chemistry Some Basic Principles and Techniques
No ratings yet
Chapter 12 Organic Chemistry Some Basic Principles and Techniques
21 pages
Food Web Worksheet
No ratings yet
Food Web Worksheet
3 pages
d20 System™ Espionage Role-Playing Game
100% (1)
d20 System™ Espionage Role-Playing Game
31 pages
Secretarial Beauty Salon
No ratings yet
Secretarial Beauty Salon
37 pages
Cold Vessel Lay Up
No ratings yet
Cold Vessel Lay Up
10 pages
Automotive Radar
No ratings yet
Automotive Radar
26 pages

Machine Learning-5

Uploaded by

Machine Learning-5

Uploaded by

Machine Learning

Nayan Ranjan Paul

2009. Bias, Variance, and model complexity T1/Ch. 7.2

where are computed by the boosting algorithm.

– P(st+1|st, at, st-1, at-1,...,s1, a1) : Transition dynamics

– Detrministic policy : π(s) = a

– For a state s∈S, the expected reward R(s) = E[ri|si=s] ∀ i = 0, 1, . . .

– For M = (S, P, R, γ))

– Then the algorithms such as Upper Confidence Bandit(UCB) won’t be able to

– It chooses another arm with probability “γ” to uniformly randomly - explore

2. Pull arm I(t): I(t) =

Approximate Solution FPL

The standard FPL does not have good

You might also like