SlideShare a Scribd company logo
Conditional Random Fields - A probabilistic graphical model Stefan Mutter
Motivation Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain  Conditional Random Field General  Conditional Random Field
Outline different views on building a conditional random field (CRF) from directed to undirected graphical models from generative to discriminative models sequence models from HMMs to CRFs CRFs and maximum entropy markov models (MEMM) parameter estimation / inference applications
Overview: directed graphical models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain  Conditional Random Field General  Conditional Random Field
Bayesian Networks: directed graphical models in general: a graphical model - family of probability  distributions that factorise according to an  underlying graph one-to-one correspondence between  nodes and random variables a set V of random variables consisting of a set X of input variables and a set Y of output variables to predict independence assumption using topological ordering: a node is v conditionally independent of its predecessors given its direct parents π(v) (Markov blanket) direct probabilistic interpretation: family of distributions factorises into:
Overview: undirected graphical models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain  Conditional Random Field General  Conditional Random Field
Markov Random Field: undirected graphical models undirected graph for joint probability p(x) allows no direct probabilistic interpretation define potential functions    on maximal cliques A map joint assignment to non-negative real number requires normalisation  green  red
Markov Random Fields and CRFs A CRF is a Markov Random Field globally conditioned on X How do the potential functions     look like?
Overview: generative    discriminative models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain  Conditional Random Field General  Conditional Random Field  
Generative models based on joint probability distribution p(y,x) includes a model of p(x) which is not needed for classification interdependent features either enhance model structure to represent them complexity problems or make simplifying independence assumptions e.g. naive bayes: once the class label is known, all features are independent
Discriminative models based directly on conditional probability p(y|x) need no model for p(x) simply:  make independence assumptions among y but  not  among x in general: computed by inference conditional approach more freedom to fit data
Naive bayes and logistic regression (1) naive bayes and logistic regression are generative-discriminative pair naive bayes:  It can be shown that a gaussian naive bayes classifier implies the parametric form of p(y|x) of its discriminative pair logistic regression! LR is a MRF globally conditioned on X  Use log-linear model as potential  functions in CRFs LR is a very simple CRF
Naive bayes and logistic regression (2) if GNB assumptions hold, then GNB and LR converge asymptotically toward identical classifiers in generative models set of parameters must represent input distribution and conditional well. in discriminative models are not as strongly tied to their input distribution e.g. LR fits its parameter to the data although the naive bayes assumption might be violated in other words: there are more (complex) joint models than GNB whose conditional also have the “LR form” GNB and LR mirror relationship between HMM and linear chain CRF
Overview: sequence models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain  Conditional Random Field General  Conditional Random Field
Sequence models: HMMs power of graphical models: model many interdependent variables HMM models joint distribution uses two independence assumptions to do it tractably given the direct predecessor, each state is independent of his ancestors each observation depends only on current state
From HMMs to linear chain CRFs (1) key: conditional distribution p(y|x) of an HMM is a CRF with a particular choice of feature function parameters are not required to be log probabilities, therefore introduce normalisation using feature functions: with
From HMMs to linear chain CRFs (2) last step: write conditional probability for the HMM This is a linear chain CRF that includes features only HMM features, richer features are possible
Linear chain conditional random fields Definition: for general CRFs use arbitrary cliques with
Side trip: maximum entropy markov models entropy - measure of the uniformity of a distribution maximum entropy model maximises entropy, subject to constraints imposed by training data model conditional probabilities of reaching a state given an observation o and previous state s’ instead of joint probabilities observations on transitions split P(s|s’,o) in |S| separately trained transition functions P s’ (s|o) leads to per state normalisation
Side Trip: label bias problem CRF like log-linear models, but label bias problem: per state normalisation requires that probabilities of transitions leaving a state must some to one  conservation of probability mass states with one outgoing transition ignore observation Calculate:
Inference in a linear chain CRF slight variants of HMM algorithms:  Viterbi: use definition from HMM but define:  because CRF model can be written as: where
Parameter estimation in general So far major drawback generative model tend to have higher asymptotic error, but it approaches its asymptotic error faster than a discriminative one with number of training examples logarithmic in number of parameters rather than linear remember: discriminative models make no independent assumptions for observations x
Principles in parameter estimation basic principle: maximum likelihood estimation with conditional log likelihood of advantage: conditional log likelihood is concave, therefore every local optimum is a global one use gradient descent: quasi-Newton methods runtime in O(tm 2 ng) t length of sequence, m number of labels, n number of training instances, g number of required gradient computations
Application: gene prediction use finite-state CRFs to locate introns and exons in DNA sequences advantages of CRFs: ability to straightforwardly incorporate homology evidence from protein databases. used feature functions: e.g. frequencies of base conjunctions and disjunctions in sliding windows over 20 bases upstream and 40 bases downstream (motivation: splice site detection) How many times did “C or G” occurred in the prior 40 bases with sliding window of size 5? E.g. frequencies how many times a base appears in related protein (via BLAST search)  Outperforms 5th order hidden semi markov model by 10% reduction in harmonic mean of precision and recall  (86.09 <-> 84.55)
Summary: graphical models
The end Questions ?
References An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton and Andrew McCallum. In Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006. (including figures and formulae) H. Wallach, &quot;Efficient training of conditional random fields,&quot; Master's thesis, University of Edinburgh, 2002.  http: //citeseer . ist . psu .edu/wallach02efficient.html  John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-01, pages 282-289, 2001. Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005.
References Kevin Murphy. An introduction to graphical models. Technical report, Intel Research Technical Report., 2001.  http://citeseer.ist.psu.edu/murphy01introduction.html On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, Andrew Y. Ng and Michael Jordan. In NIPS 14,, 2002.  T. Minka. Discriminative models, not discriminative training. Technical report, Microsoft Research Cambridge, 2005. P. Blunsom. Maximum Entropy Classification. Lecture slides 433-680. 2005.  http://www. cs . mu .oz.au /680/lectures/week06a. pdf
Ad

More Related Content

What's hot (20)

Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Praxitelis Nikolaos Kouroupetroglou
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trend
Salah Amean
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
guestfee8698
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
Sharayu Patil
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
Shivangi Saxena
 
Hidden Markov Model
Hidden Markov Model Hidden Markov Model
Hidden Markov Model
Mahmoud El-tayeb
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
Nikhil Sharma
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
Haitham Ahmed
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
Ashraf Uddin
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Salah Amean
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimension
butest
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
FAO
 
07 regularization
07 regularization07 regularization
07 regularization
Ronald Teo
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Praxitelis Nikolaos Kouroupetroglou
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma
 
Data Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trendData Mining: Concepts and techniques: Chapter 13 trend
Data Mining: Concepts and techniques: Chapter 13 trend
Salah Amean
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
guestfee8698
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
Sharayu Patil
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
Nikhil Sharma
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Salah Amean
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimension
butest
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
FAO
 
07 regularization
07 regularization07 regularization
07 regularization
Ronald Teo
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
butest
 

Similar to PowerPoint Presentation - Conditional Random Fields - A ... (20)

Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...
jpstudcorner
 
Executive SummaryIntroductionProtein engineering
Executive SummaryIntroductionProtein engineeringExecutive SummaryIntroductionProtein engineering
Executive SummaryIntroductionProtein engineering
BetseyCalderon89
 
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
SSA KPI
 
Proposed entrancetestsyllabus
Proposed entrancetestsyllabusProposed entrancetestsyllabus
Proposed entrancetestsyllabus
bikram ...
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
Christian Robert
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Crf
CrfCrf
Crf
炜航 蒋
 
Simulation Software Performances And Examples
Simulation Software Performances And ExamplesSimulation Software Performances And Examples
Simulation Software Performances And Examples
Hector Alberto Cerdan Arteaga
 
Template attack versus Bayes classifier
Template attack  versus Bayes classifierTemplate attack  versus Bayes classifier
Template attack versus Bayes classifier
Shahid Beheshti University
 
Samplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfSamplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdf
Fadel Adoe
 
BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R
BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R
BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R
UVCE
 
Viva extented final
Viva extented finalViva extented final
Viva extented final
Silia Vitoratou
 
Methods of Track Circuit Fault Diagnosis Based on Hmm
Methods of Track Circuit Fault Diagnosis Based on HmmMethods of Track Circuit Fault Diagnosis Based on Hmm
Methods of Track Circuit Fault Diagnosis Based on Hmm
IJRESJOURNAL
 
RS
RSRS
RS
Alexander Litvinenko
 
A Hitchikers Guide To Parallel G As
A Hitchikers Guide To Parallel G AsA Hitchikers Guide To Parallel G As
A Hitchikers Guide To Parallel G As
ysemet
 
Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...
jpstudcorner
 
Executive SummaryIntroductionProtein engineering
Executive SummaryIntroductionProtein engineeringExecutive SummaryIntroductionProtein engineering
Executive SummaryIntroductionProtein engineering
BetseyCalderon89
 
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
SSA KPI
 
Proposed entrancetestsyllabus
Proposed entrancetestsyllabusProposed entrancetestsyllabus
Proposed entrancetestsyllabus
bikram ...
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
Christian Robert
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
butest
 
Samplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdfSamplying in Factored Dynamic Systems_Fadel.pdf
Samplying in Factored Dynamic Systems_Fadel.pdf
Fadel Adoe
 
BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R
BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R
BU (UVCE)5th Sem Electronics syllabus copy from Lohith kumar R
UVCE
 
Methods of Track Circuit Fault Diagnosis Based on Hmm
Methods of Track Circuit Fault Diagnosis Based on HmmMethods of Track Circuit Fault Diagnosis Based on Hmm
Methods of Track Circuit Fault Diagnosis Based on Hmm
IJRESJOURNAL
 
A Hitchikers Guide To Parallel G As
A Hitchikers Guide To Parallel G AsA Hitchikers Guide To Parallel G As
A Hitchikers Guide To Parallel G As
ysemet
 
Ad

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
PPT
PPTPPT
PPT
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
hier
hierhier
hier
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 
Ad

PowerPoint Presentation - Conditional Random Fields - A ...

  • 1. Conditional Random Fields - A probabilistic graphical model Stefan Mutter
  • 2. Motivation Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain Conditional Random Field General Conditional Random Field
  • 3. Outline different views on building a conditional random field (CRF) from directed to undirected graphical models from generative to discriminative models sequence models from HMMs to CRFs CRFs and maximum entropy markov models (MEMM) parameter estimation / inference applications
  • 4. Overview: directed graphical models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain Conditional Random Field General Conditional Random Field
  • 5. Bayesian Networks: directed graphical models in general: a graphical model - family of probability distributions that factorise according to an underlying graph one-to-one correspondence between nodes and random variables a set V of random variables consisting of a set X of input variables and a set Y of output variables to predict independence assumption using topological ordering: a node is v conditionally independent of its predecessors given its direct parents π(v) (Markov blanket) direct probabilistic interpretation: family of distributions factorises into:
  • 6. Overview: undirected graphical models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain Conditional Random Field General Conditional Random Field
  • 7. Markov Random Field: undirected graphical models undirected graph for joint probability p(x) allows no direct probabilistic interpretation define potential functions  on maximal cliques A map joint assignment to non-negative real number requires normalisation  green  red
  • 8. Markov Random Fields and CRFs A CRF is a Markov Random Field globally conditioned on X How do the potential functions  look like?
  • 9. Overview: generative  discriminative models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain Conditional Random Field General Conditional Random Field  
  • 10. Generative models based on joint probability distribution p(y,x) includes a model of p(x) which is not needed for classification interdependent features either enhance model structure to represent them complexity problems or make simplifying independence assumptions e.g. naive bayes: once the class label is known, all features are independent
  • 11. Discriminative models based directly on conditional probability p(y|x) need no model for p(x) simply: make independence assumptions among y but not among x in general: computed by inference conditional approach more freedom to fit data
  • 12. Naive bayes and logistic regression (1) naive bayes and logistic regression are generative-discriminative pair naive bayes: It can be shown that a gaussian naive bayes classifier implies the parametric form of p(y|x) of its discriminative pair logistic regression! LR is a MRF globally conditioned on X Use log-linear model as potential functions in CRFs LR is a very simple CRF
  • 13. Naive bayes and logistic regression (2) if GNB assumptions hold, then GNB and LR converge asymptotically toward identical classifiers in generative models set of parameters must represent input distribution and conditional well. in discriminative models are not as strongly tied to their input distribution e.g. LR fits its parameter to the data although the naive bayes assumption might be violated in other words: there are more (complex) joint models than GNB whose conditional also have the “LR form” GNB and LR mirror relationship between HMM and linear chain CRF
  • 14. Overview: sequence models Bayesian Network Naive Bayes Markov Random Field Hidden Markov Model Logistic Regression Linear Chain Conditional Random Field General Conditional Random Field
  • 15. Sequence models: HMMs power of graphical models: model many interdependent variables HMM models joint distribution uses two independence assumptions to do it tractably given the direct predecessor, each state is independent of his ancestors each observation depends only on current state
  • 16. From HMMs to linear chain CRFs (1) key: conditional distribution p(y|x) of an HMM is a CRF with a particular choice of feature function parameters are not required to be log probabilities, therefore introduce normalisation using feature functions: with
  • 17. From HMMs to linear chain CRFs (2) last step: write conditional probability for the HMM This is a linear chain CRF that includes features only HMM features, richer features are possible
  • 18. Linear chain conditional random fields Definition: for general CRFs use arbitrary cliques with
  • 19. Side trip: maximum entropy markov models entropy - measure of the uniformity of a distribution maximum entropy model maximises entropy, subject to constraints imposed by training data model conditional probabilities of reaching a state given an observation o and previous state s’ instead of joint probabilities observations on transitions split P(s|s’,o) in |S| separately trained transition functions P s’ (s|o) leads to per state normalisation
  • 20. Side Trip: label bias problem CRF like log-linear models, but label bias problem: per state normalisation requires that probabilities of transitions leaving a state must some to one conservation of probability mass states with one outgoing transition ignore observation Calculate:
  • 21. Inference in a linear chain CRF slight variants of HMM algorithms: Viterbi: use definition from HMM but define: because CRF model can be written as: where
  • 22. Parameter estimation in general So far major drawback generative model tend to have higher asymptotic error, but it approaches its asymptotic error faster than a discriminative one with number of training examples logarithmic in number of parameters rather than linear remember: discriminative models make no independent assumptions for observations x
  • 23. Principles in parameter estimation basic principle: maximum likelihood estimation with conditional log likelihood of advantage: conditional log likelihood is concave, therefore every local optimum is a global one use gradient descent: quasi-Newton methods runtime in O(tm 2 ng) t length of sequence, m number of labels, n number of training instances, g number of required gradient computations
  • 24. Application: gene prediction use finite-state CRFs to locate introns and exons in DNA sequences advantages of CRFs: ability to straightforwardly incorporate homology evidence from protein databases. used feature functions: e.g. frequencies of base conjunctions and disjunctions in sliding windows over 20 bases upstream and 40 bases downstream (motivation: splice site detection) How many times did “C or G” occurred in the prior 40 bases with sliding window of size 5? E.g. frequencies how many times a base appears in related protein (via BLAST search) Outperforms 5th order hidden semi markov model by 10% reduction in harmonic mean of precision and recall (86.09 <-> 84.55)
  • 27. References An Introduction to Conditional Random Fields for Relational Learning. Charles Sutton and Andrew McCallum. In Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006. (including figures and formulae) H. Wallach, &quot;Efficient training of conditional random fields,&quot; Master's thesis, University of Edinburgh, 2002. http: //citeseer . ist . psu .edu/wallach02efficient.html John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-01, pages 282-289, 2001. Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005.
  • 28. References Kevin Murphy. An introduction to graphical models. Technical report, Intel Research Technical Report., 2001. http://citeseer.ist.psu.edu/murphy01introduction.html On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, Andrew Y. Ng and Michael Jordan. In NIPS 14,, 2002. T. Minka. Discriminative models, not discriminative training. Technical report, Microsoft Research Cambridge, 2005. P. Blunsom. Maximum Entropy Classification. Lecture slides 433-680. 2005. http://www. cs . mu .oz.au /680/lectures/week06a. pdf