0% found this document useful (0 votes)
12 views

Machine Learning: Sem-VII

The document discusses various machine learning algorithms and concepts including rule-based classification, Bayesian belief networks, hidden Markov models, support vector machines, clustering, and kernels. It provides details on how each technique works and examples of their applications.

Uploaded by

rar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Machine Learning: Sem-VII

The document discusses various machine learning algorithms and concepts including rule-based classification, Bayesian belief networks, hidden Markov models, support vector machines, clustering, and kernels. It provides details on how each technique works and examples of their applications.

Uploaded by

rar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Machine Learning

Sem-VII
 Rule based classification
 Bayesian Belief networks
 Hidden Markov Models
 Support Vector Machine
 Maximum Margin Linear Separators-
 Quadratic Programming solution to finding maximum
margin separators
 Kernels for learning non-linear
 Expectation Maximization Algorithm
 Supervised learning after clustering
 Radial Basis functions
 Using IF-THEN Rules for Classification
 IF condition THEN conclusion.
 rule antecedent or precondition
 rule consequent
 R1: IF age = youth AND student = yes THEN
buys computer = yes.
 R1: (age = youth) ^ (student = yes))(buys
computer = yes).
Use Rule based classification technique

Training
Data R1,R2,---,Rn Class
Label

Use model
to find class
lable
 If HSM=H and CS=E then ?
 Match the rule and predict the label
 Rule Assessment
 Coverage=N(Cover)/|D|
 Accuracy=N(Correct)/N(Covers)
 example
 Rule triggering
 Rule Properties:
 Mutually Exclusive rule
 Exhaustive Rules
 Default rule
 If rule set is not mutually exclusive, then
record can be covered by several rules some
of which may predict conflicting classes.
There are two ways to overcome this issue
 x1-R1
 X2-R2,R3,R4 (A,B,A)-Which one to select?
 X3-R3
 X4-R4
 X5-R5
 X6-R4
 X7-R5
 The rules in a rule set orderded in decreasing
order of their priority based on coverage and
accuracy or order in which it is generated.
 It is also known as Decision list
 X-test reocrd-Highest rank rule that covers
record
 This approach allows a test record to trigger
multiple classification rules and considers the
consequent of each rule as a vote for
particular class
 X2=R2,R3,R4
 R2-A
 R3-B
 R4-A
 MAJORITY
 DIRECT METHOD
 INDIRECT METHOD
R1: IF age = youth AND student = no THEN buys
computer = no
R2: IF age = youth AND student = yes THEN
buys computer = yes
R3: IF age = middle aged THEN buys computer
= yes
R4: IF age = senior AND credit rating = excellent
THEN buys computer = yes
R5: IF age = senior AND credit rating = fair
THEN buys computer = no
Likelihood Prior

Posterior
Marginal

P(B/A)=Probability of evidence provided


hypothesis is true
P(A)=Probability A is hypothesis before
observing evidence
P(B)=Given Data
 A Bayesian network is a probabilistic
graphical model which represents a set of
variables and their conditional dependencies
using a directed acyclic graph
 Bayes network, belief network, decision
network, Bayesian model.
 It is convient way for representing
probabilistic relationship between multiple
events
 Directed Acyclic Graph
 Node or Random
 Possible events for node
 Arrow-Conditional Probability
 P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Rain

Local Train Cancelled


/Rain P(X/Parent node)
Local Train
Cancelled

College Declare Holiday


/Local Train Cancelled
College
Declare
Holiday
R ~R
L 9/48 18/48
~L 3/48 18/48

(L=T & R=T)=0.19


(L=T & R=F)=0.375
(L=F & R=T)=0.06

(L=F & R=F)=0.375


P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
 P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^
¬E) *P (¬B) *P (¬E).
 = 0.75* 0.91* 0.001* 0.998*0.999
 = 0.00068045.
 Basic classification
Each row depend upon previous row or state
 Markov Models
 E1 E2 E3
 Any event only depend on previous event:
Markov Property

Sunny Rainy Cloudy

 Set of events following Markov Property form


Markov chain
 State Space={Sunny,Rainy}
 Initial Probabilities={0.4,0.6}
 Transmission Matrix/Transmittion
Probabilities:
Sunny Rainy
Sunny 0.8 0.2
Rainy 0.3 0.7
 G={Sunny(Today),Rainy(Tomorrow),Rainy(Day
after tomorrow)}
 E1-Sunny
 E2-Rainy
 E3-Rainy
 P(E2E3/E1)=P(E2/E1)*P(E3/E2)
 =0.2*0.7
 =0.14
 {Sunny,Rainy,Sunny,Sunny,Rainy} What is the
probability that this sequence will happen?
 For first we use initial probability
 P(S)*P(R/S)*P(S/R)*P(S/S)*P(R/S)
 0.4*0.2*0.3*0.8*0.2
 =0.00576
 Evaluation Problem
 Decoding Problem
 Learning problem
 What is clustering?
Clustering is the task of dividing the population or data points
into a number of groups such that data points in the same
groups are more similar to other data points in the same
group and dissimilar to the data points in other groups
Algorithms
Expectation Maximization Algorithm
Supervised learning after clustering
Radial Basis functions
 Clustering for dataexploration
 Find similarities between instances and thus
group instances
 Example CRM
 K groups-Demographic and transaction
 Churn
 clustering is also used as a preprocessing
stage.
 One advantage of preceding a supervised
learner with unsupervised clustering or
dimensionality reduction is that the latter
does not need labeled data. Labeling the data
is costly.
 We can use a large amount of unlabeled data
for learning the cluster parameters
 Then use a smaller labeled data to learn the
second stage of classification or regression.
 Classes and not linearly separable –Hidden
layers
 Convert non linear to linar
 Increase dimenstions
 Types:
 Linear SVM
 Non-linear SVM:
 Hard Margin: not allow any misclassification
 Soft Margin: allow little misclassification
 Classification
 Regression Analysis
 Pattern Recognition
 Outlier Detection
 Maximum margin helps in minimizing the
misclassification
 Nonlinear data
 Kernel LD->HD
 How does it
 Polynomial Kernel
 Radial Basis Function Kernel
 Sigmoid Kernel

You might also like