0% found this document useful (0 votes)

11 views

Machine Learning

The document provides an overview of machine learning and probabilistic graphical models. It discusses supervised learning, unsupervised learning, and graphical models. It then describes several specific machine learning algorithms and applications, including linear regression, decision trees, neural networks, clustering, and Bayesian networks.

Uploaded by

Adesh Jagtap

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Machine Learning

Uploaded by

Adesh Jagtap

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

An introduction to machine

learning and probabilistic

graphical models
Overview

• Supervised learning
• Unsupervised learning
• Graphical models
• Learning relational models

2
Supervised learning
yes no

Color Shape Size Output

Blue Torus Big Y
Blue Square Small Y
Blue Star Small Y
Red Arrow Small N
Learn to approximate function F(x1, x2, x3) -> t
3
from a training set of (x,t) pairs
Supervised
Training data
learning
X1 X2 X3 T
B T B Y Learne
r
B S S Y
B S S Y Prediction

R A S N
T
Testing data
Y
X1 X2 X3 T
B A S ? Hypothesi N
s
Y C S ?
4
Key issue: generalization
yes n
o

? ?
5
Can’t just memorize the training set (overfitting)
Hypothesis spaces
• Decision trees
• Neural networks
• K-nearest neighbors
• Naïve Bayes classifier
• Support vector machines (SVMs)
• Boosted decision stumps
• …

6
Perceptron
(neural net with no hidden layers)

Linearly separable data

7
Which separating hyperplane?

8
The linear separator with the
largest margin is the best one to
pick

margin

9
What if the data is not linearly
separable?

10
Kernel trick

x1
x2 kernel
z2
z1

Kernel implicitly maps from 2D to 3D,

making problem linearly separable 11
Support Vector Machines
(SVMs)
• Two key ideas:
– Large margins
– Kernel trick

12
Boosting

Simple classifiers (weak learners) can have their performance

boosted by taking weighted combinations

Boosting maximizes the margin 13

Supervised learning success
stories
– Face detection
– Steering an autonomous car across the US
– Detecting credit card fraud
– Medical diagnosis
– …

14
Unsupervised learning
• What if there are no output labels?

15
K-means clustering
1. Guess number of clusters, K
2. Guess initial cluster centers, μ1, μ2

Reiterate
3. Assign data points xi to nearest cluster center
4. Re-compute cluster centers based on
assignments

16
AutoClass (Cheeseman et al,
1986)
• EM algorithm for mixtures of Gaussians
• “Soft” version of K-means
• Uses Bayesian criterion to select K
• Discovered new types of stars from
spectral data
• Discovered new classes of proteins and
introns from DNA/protein sequence
databases
17
Hierarchical clustering

18
Principal Component Analysis
(PCA)
PCA seeks a projection that best
represents the data in a least-squares
sense.

PCA reduces the

dimensionality of
feature space by
restricting attention to
those directions along
which the scatter of the
cloud is greatest.
Discovering nonlinear manifolds

20
Combining supervised and
unsupervised learning

21
Discovering rules (data mining)
Occup. Income Educ. Sex Married Age
Student 10k MA M S 22
Student 20k PhD F S 24
Doctor 80k MD M M 30
Retired 30k HS F M 60
Find the most frequent patterns (association rules)

Num in household = 1 ^ num children = 0 => language = English

Language = English ^ Income < $40k ^ Married = false ^

num children = 0 => education {college, grad school}

22
Unsupervised learning:
summary
• Clustering
• Hierarchical clustering
• Linear dimensionality reduction (PCA)
• Non-linear dim. Reduction
• Learning rules

23
Discovering networks

From data visualization to causal discovery 24

Networks in biology
• Most processes in the cell are controlled
by networks of interacting molecules:
– Metabolic Network
– Signal Transduction Networks
– Regulatory Networks
• Networks can be modeled at multiple
levels of detail/ realism
– Molecular level Decreasing detail
– Concentration level
– Qualitative level 25
Molecular level: Lysis-Lysogeny
circuit in Lambda phage

Arkin et al. (1998), Genetics

149(4):1633-48

5 genes, 67 parameters based on 50 years of research

Stochastic simulation required supercomputer 26
Concentration level: metabolic
pathways
• Usually modeled with differential equations

g1 w12
w55 g2
g5 w23

g4 g3

27
Qualitative level: Boolean
Networks

28
Probabilistic graphical models
• Supports graph-based modeling at various levels
of detail
• Models can be learned from noisy, partial data
• Can model “inherently” stochastic phenomena,
e.g., molecular-level fluctuations…
• But can also model deterministic, causal
processes.
"The actual science of logic is conversant at present only with
things either certain, impossible, or entirely doubtful. Therefore
the true logic for this world is the calculus of probabilities."
-- James Clerk Maxwell
"Probability theory is nothing but common sense reduced to
calculation." -- Pierre Simon Laplace 29
Graphical models: outline
• What are graphical models?
• Inference
• Structure learning

30
Simple probabilistic model:
linear regression
Y=μ+βX+ Deterministic (functional) relationship
noise
Y

31
Simple probabilistic model:
linear regression
Y=μ+βX+ Deterministic (functional) relationship
noise
Y

“Learning” = estimating
parameters μ, β, σ
from
(x,y) pairs.
Is the empirical mean

Can be estimate by
least squares
X
Is the residual variance
32
Piecewise linear regression

Latent “switch” variable – hidden process at work 33

Probabilistic graphical model for
piecewise linear regression
input

X
•Hidden variable Q chooses which set of
parameters to use for predicting Y.

Q •Value of Q depends on value of

input X.
•This is an example of “mixtures of experts”
Y
output

Learning is harder because Q is hidden, so we don’t know which

data points to assign to each line; can be solved with EM (c.f., K-means)
34
Classes of graphical models
Probabilistic models
Graphical models

Directed Undirected

Bayes nets MRFs

DBNs

35
Bayesian Networks
Compact representation of probability
distributions via conditional independence
Qualitative part: Family of Alarm
Earthq Burgla E B P(A | E,B)
Directed acyclic graph uake ry
e b 0.9 0.1
(DAG) e b 0.2 0.8
• Nodes - random Radio Alarm
e b 0.9 0.1
variables e b 0.01 0.99
• Edges - direct influence
Call

Together: Quantitative part:

Define a unique distribution Set of conditional
in a factored form probability distributions
36
Example: “ICU Alarm” network
Domain: Monitoring Intensive-Care Patients
• 37 variables MINVO
LSET

• 509 parameters PULMEM

BOLUS
INTUBA
TION
KINKED
TUBE
VENT
MAC
H
DISCON
NECT

…instead of 254 PA
P
SH
UN
T
VEN
TLU
NG
PR
VENI
TUB
E
ES
MIN FI VEN S
OVL O2 TALV

ANAPH PVS ART

YLAXIS AT CO2

TP SA INSUFFA EXP
R O2 NESTH CO2

CAT
HYPOV LVFAIL
ECH
OLEMIA URE
OL

ERRC
LVEDVOL STROEV HIST ERRBLOW H
AUTE
UME OLUME ORY OUTPUT R
R
H
P H
R
CV C C R
H E
P W O S
R K
P AT
B G
B P 37
P
Success stories for graphical
models
• Multiple sequence alignment
• Forensic analysis
• Medical and fault diagnosis
• Speech recognition
• Visual tracking
• Channel coding at Shannon limit
• Genetic pedigree analysis
• …
38
Graphical models: outline
• What are graphical models? p
• Inference
• Structure learning

39
Probabilistic Inference
• Posterior probabilities
– Probability of any event given any evidence
• P(X|E)

Earthq Burgla
uake ry

Radio Alarm

Call
40
Viterbi decoding
Compute most probable explanation (MPE) of observed data

Hidden Markov Model (HMM)

X1 X2 X3 hidden

Y1 Y3 observed
Y2

“Tomato”
41
Inference: computational issues
Easy Hard
Dense, loopy graphs
Chains

M
I
P

Trees
U
L
M
I
N
T
K
I
N
N
VV
EO
D
I
S
E
U
V
K NL C
B E TS V O

P
A
M
B
O
L
S
H
U
A
T
M I
E
N
TV
LE
D
T
U P
M

C
E
AT
E
N
I
N
N
E
Grids
P N I O B R H T C
U UN E U
T N N E T
S I NT S B
O A
PN GA
V R S E
VS L
L T
H SU V E
C
Y L S AF X
T O
P V A TC
F E P
P SF A 2
R
LOR O A C
TA 2 T R O E
VV N
EO
RI H EE B 2 R
DL
OL I CS L R
VE
EU S HT O C
H
OM
VR T OH W A
R
H HU
P O O L O
CLI L
E R RT
C C R U
H
VUA U E SE
W O Y T
R
PM M K AR
P
E BP B G T
E U
P
P T

42
Inference: computational issues
Easy Hard
Dense, loopy graphs
Chains

M
I
P

Trees
U
L
M
I
N
T
K
I
N
N
VV
EO
D
I
S
E
U
V
K NL C
B E TS V O

P
A
M
B
O
L
S
H
U
A
T
M I
E
N
TV
LE
D
T
U P
M

Many difference inference algorithms,

both exact and approximate 43
Bayesian inference
• Bayesian probability treats parameters as random
variables
• Learning/ parameter estimation is replaced by
probabilistic inference P(θ|D)
• Example: Bayesian linear regression; parameters
are
θ Parameters are tied (shared)
θ = (μ, β, σ) across repetitions of the data

X1 Xn

Y1 Yn
44
Bayesian inference
• + Elegant – no distinction between
parameters and other hidden variables
• + Can use priors to learn from small data
sets (c.f., one-shot learning by humans)
• - Math can get hairy
• - Often computationally intractable

45
Graphical models: outline
p
• What arep graphical models?
• Inference
• Structure learning

46
Why Struggle for Accurate
Structure?
Earth Alarm Burgl
quake Set ary

Sound

Missing an arc Adding an arc

Earth Alarm Burgl
quake Set ary Earth Alarm Burgl
quake Set ary

Sound
Sound
• Cannot be compensated
for by fitting parameters • Increases the number of
• Wrong assumptions about parameters to be estimated
domain structure • Wrong assumptions about
domain structure 47
Scorebased Learning
Define scoring function that evaluates how well a
structure matches the data

E, B, A
<Y,N,N>
<Y,Y,Y>
<N,N,Y>
<N,Y,Y>
.
. E B E
E
<N,Y,Y> A
A
A B
B

Search for a structure that maximizes the score

48
Learning Trees

• Can find optimal tree structure in O(n2 log n)

time: just find the max-weight spanning tree
• If some of the variables are hidden, problem
becomes hard again, but can use EM to fit
mixtures of trees
49
Heuristic Search
• Learning arbitrary graph structure is
NP-hard.
So it is common to resort to heuristic search
• Define a search space:
– search states are possible structures
– operators make small changes to structure
• Traverse space looking for high-scoring
structures
• Search techniques:
– Greedy hill-climbing
– Best first search
– Simulated Annealing
– ... 50
Local Search Operations
S C
• Typical operations:
E
d d C
S C A
→D D
E
Δscore =
S({C,E} →D)
D Re
e teC ver
s
- S({E} →D)
e l →E e C
D E
S C → S C

E E

D D 51
Problems with local search
Easy to get stuck in local optima
“truth”
you
S(G|D)

52
Problems with local search II
P(G|D)
Picking a single best model can be misleading

E B

R A

53
P(G|D) Problems with local search II
Picking a single best model can be misleading

E B E B E B E B E B

R A R A R A R A R A

C C C C C

– Small sample size ⇒ many high scoring models

– Answer based on one model often useless
54
– Want features common to many models
Bayesian Approach to Structure
Learning
• Posterior distribution over structures
• Estimate probability of features
– Edge X→Y
Bayesian score
– Path X→… → Y for G
–…

Feature of G,
Indicator function
e.g., X→Y for feature f
55
Bayesian approach: computational
issues
• Posterior distribution over structures

How compute sum over super-exponential number of graphs?

•MCMC over networks

•MCMC over node-orderings (Rao-Blackwellisation)

56
Structure learning: other issues
• Discovering latent variables
• Learning causal models
• Learning from interventional data
• Active learning

57
Discovering latent variables

a) 17 parameters b) 59 parameters

There are some techniques for automatically detecting the

possible presence of latent variables 58
Learning causal models
• So far, we have only assumed that X -> Y
-> Z means that Z is independent of X
given Y.
• However, we often want to interpret
directed arrows causally.
• This is uncontroversial for the arrow of
time.
• But can we infer causality from static
observational data?
59
Learning causal models
• We can infer causality from static observational
data if we have at least four measured variables
and certain “tetrad” conditions hold.
• See books by Pearl and Spirtes et al.
• However, we can only learn up to Markov
equivalence, not matter how much data we
have.
X Y Z

X Y Z
X Y Z

X Y Z 60
Learning from interventional
data
• The only way to distinguish between Markov
equivalent networks is to perform interventions,
e.g., gene knockouts.
• We need to (slightly) modify our learning
algorithms.
smoking smoking

Cut arcs coming

into nodes which
were set by
intervention
Yellow Yellow
fingers fingers

P(smoker|observe(yellow)) >> prior P(smoker | do(paint yellow)) = prior 61

Active learning
• Which experiments (interventions) should
we perform to learn structure as efficiently
as possible?
• This problem can be modeled using
decision theory.
• Exact solutions are wildly computationally
intractable.
• Can we come up with good approximate
decision making techniques?
• Can we implement hardware to
automatically perform the experiments?
62
• “AB: Automated Biologist”
Learning from relational data
Can we learn concepts from a set of relations between objects,
instead of/ in addition to just their attributes?

63
Learning from relational data:
approaches
• Probabilistic relational models (PRMs)
– Reify a relationship (arcs) between nodes
(objects) by making into a node (hypergraph)

• Inductive Logic Programming (ILP)

– Top-down, e.g., FOIL (generalization of C4.5)
– Bottom up, e.g., PROGOL (inverse deduction)

64
ILP for learning protein folding:
input
yes no

TotalLength(D2mhr, 118) ^ NumberHelices(D2mhr, 6) ^ …

100 conjuncts describing structure of each pos/neg example

65
ILP for learning protein folding:
results
• PROGOL learned the following rule to
predict if a protein will form a “four-helical
up-and-down bundle”:

• In English: “The protein P folds if it

contains a long helix h1 at a secondary
structure position between 1 and 3 and h1
is next to a second helix”
66
ILP: Pros and Cons
• + Can discover new predicates (concepts)
automatically
• + Can learn relational models from
relational (or flat) data
• - Computationally intractable
• - Poor handling of noise

67
The future of machine learning for
bioinformatics?

Oracle

68
The future of machine learning for
Prior knowledge bioinformatics

Hypotheses
Replicated experiments
Learne
r

Biological literature

Real world
Expt.
design
69
•“Computer assisted pathway refinement”
The end

70
Decision trees
blue?

yes oval?

no
big?

no yes

71
Decision trees
blue?

yes oval?

+ Handles mixed variables

+ Handles missing data no
+ Efficient for large data sets big?
+ Handles irrelevant attributes
+ Easy to understand
- Predictive power
no yes

72
Feedforward neural network
input Hidden layer Output

Weights on each arc Sigmoid function at each node

73
Feedforward neural network
input Hidden layer Output

- Handles mixed variables

- Handles missing data
- Efficient for large data sets
- Handles irrelevant attributes
- Easy to understand
+ Predicts poorly

74
Nearest Neighbor
– Remember all your data
– When someone asks a question,
• find the nearest old data point
• return the answer associated with it

75
Nearest Neighbor

- Handles mixed variables

- Handles missing data
- Efficient for large data sets
- Handles irrelevant attributes
- Easy to understand
+ Predictive power
76
Support Vector Machines
(SVMs)
• Two key ideas:
– Large margins are good
– Kernel trick

77
SVM: mathematical details
▪ Training data : l-dimensional vector with flag of true or false

▪ Separating hyperplane :
▪ Margin :
▪ Inequalities :
▪ Support vector expansion:

▪ Support vectors :

▪ Decision:
margin

78
Replace all inner products with
kernels

Kernel function

79
SVMs: summary
- Handles mixed variables
- Handles missing data
- Efficient for large data sets
- Handles irrelevant attributes
- Easy to understand
+ Predictive power

General lessons from SVM success:

•Kernel trick can be used to make many linear methods non-linear e.g.,
kernel PCA, kernelized mutual information

•Large margin classifiers are good 80

Boosting: summary
• Can boost any weak learner
• Most commonly: boosted decision
“stumps”
+ Handles mixed variables
+ Handles missing data
+ Efficient for large data sets
+ Handles irrelevant attributes
- Easy to understand
+ Predictive power

81
Supervised learning: summary
• Learn mapping F from inputs to outputs using a
training set of (x,t) pairs
• F can be drawn from different hypothesis
spaces, e.g., decision trees, linear separators,
linear in high dimensions, mixtures of linear
• Algorithms offer a variety of tradeoffs
• Many good books, e.g.,
– “The elements of statistical learning”,
Hastie, Tibshirani, Friedman, 2001
– “Pattern classification”, Duda, Hart, Stork, 2001

82
Inference
• Posterior probabilities
– Probability of any event given any evidence
• Most likely explanation
– Scenario that explains evidence
• Rational decision making Earthq Burgla
uake ry
– Maximize expected utility
– Value of Information
Radio Alarm
• Effect of intervention
Call
83
Assumption needed to make
learning work
• We need to assume “Future futures will
resemble past futures” (B. Russell)
• Unlearnable hypothesis: “All emeralds are
grue”, where “grue” means:
green if observed before time t, blue
afterwards.

84
Structure learning success stories:
gene regulation network (Friedman
et al.)

Yeast data
[Hughes et al 2000]
• 600 genes
• 300 85
experiments
Structure learning success stories II:
Phylogenetic Tree Reconstruction (Friedman
et al.)
Input: Biological sequences
Uses structural EM,
Human CGTTGC… with max-spanning-tree
Chimp CCTAGG… in the inner loop

Orang CGAACG…
….
Output: a phylogeny
10 billion years

leaf

86
Instances of graphical models
Probabilistic models
Graphical models
Naïve Bayes classifier

Directed Undirected

Bayes nets MRFs

Mixtures
DBNs
of experts

Kalman filter
model Ising model
Hidden Markov Model (HMM)
87
ML enabling technologies
• Faster computers
• More data
– The web
– Parallel corpora (machine translation)
– Multiple sequenced genomes
– Gene expression arrays
• New ideas
– Kernel trick
– Large margins
– Boosting
– Graphical models
– … 88

Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Lecture 2 - Principle of Machine Learning
No ratings yet
Lecture 2 - Principle of Machine Learning
39 pages
BIM2005 Lecture10 Clustering Bri 副本
No ratings yet
BIM2005 Lecture10 Clustering Bri 副本
107 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
PRCV Viva Notes
No ratings yet
PRCV Viva Notes
32 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
Clustering, A Tool To Analyze Data Points
No ratings yet
Clustering, A Tool To Analyze Data Points
61 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
DWDM PPT
No ratings yet
DWDM PPT
35 pages
Chapter 5 - Machine Learning Basics
No ratings yet
Chapter 5 - Machine Learning Basics
58 pages
slide11-dimred-BK-v3-0104
No ratings yet
slide11-dimred-BK-v3-0104
91 pages
Lec13 Image Classification
No ratings yet
Lec13 Image Classification
71 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
69 pages
Machine Learning Techniques in Image Processing: CSCI 8810 Course Project
No ratings yet
Machine Learning Techniques in Image Processing: CSCI 8810 Course Project
24 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Pengantar Datamining: Anto Satriyo Nugroho, DR - Eng
No ratings yet
Pengantar Datamining: Anto Satriyo Nugroho, DR - Eng
33 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Lect05 Instance ML
No ratings yet
Lect05 Instance ML
59 pages
Bark08 Ghahramani Samlbb 01
No ratings yet
Bark08 Ghahramani Samlbb 01
26 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Microsoft PowerPoint - Clustering - Week - 12 - 2 - 4.04
No ratings yet
Microsoft PowerPoint - Clustering - Week - 12 - 2 - 4.04
31 pages
Introduction to Machine Learning (1)
No ratings yet
Introduction to Machine Learning (1)
89 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Exploratory Data Analysis and Data Visualization: Credits: Chrisvolinsky - Columbia University
No ratings yet
Exploratory Data Analysis and Data Visualization: Credits: Chrisvolinsky - Columbia University
49 pages
4. Learning Algorithm
No ratings yet
4. Learning Algorithm
58 pages
1 - Nearest Neighbor Classification Handout
No ratings yet
1 - Nearest Neighbor Classification Handout
6 pages
4.ML_Estimation
No ratings yet
4.ML_Estimation
19 pages
BayesNets2016
No ratings yet
BayesNets2016
62 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Exploratory Data Analysis Reference
No ratings yet
Exploratory Data Analysis Reference
50 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Cse291d 7
No ratings yet
Cse291d 7
39 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Class 1 X
No ratings yet
Class 1 X
18 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
EE769-11 Dimension Reduction
No ratings yet
EE769-11 Dimension Reduction
16 pages
8.NN and Clustering Moodle
No ratings yet
8.NN and Clustering Moodle
51 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
So Far : Lecture 1: Review of Classical & Modern Control Lecture 2: MATLAB Lecture
No ratings yet
So Far : Lecture 1: Review of Classical & Modern Control Lecture 2: MATLAB Lecture
12 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
Lecture 2
No ratings yet
Lecture 2
37 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
58 pages
Lec03 Classification DecisionTree
No ratings yet
Lec03 Classification DecisionTree
38 pages
Stat Learn Big Data 20130401
No ratings yet
Stat Learn Big Data 20130401
53 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Defense
No ratings yet
Defense
91 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
String Theory Demystified
From Everand
String Theory Demystified
David McMahon
3/5 (4)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Genomics in the AWS Cloud: Analyzing Genetic Code Using Amazon Web Services
From Everand
Genomics in the AWS Cloud: Analyzing Genetic Code Using Amazon Web Services
David Wall
No ratings yet
Optical Sensor
No ratings yet
Optical Sensor
69 pages
The Phase-Locked Loop (PLL) : Zero-Order View of PLL
No ratings yet
The Phase-Locked Loop (PLL) : Zero-Order View of PLL
22 pages
AC Voltage Controller
No ratings yet
AC Voltage Controller
5 pages
Module 04
No ratings yet
Module 04
19 pages
Power Electronics Devices
No ratings yet
Power Electronics Devices
59 pages
Eo T1
No ratings yet
Eo T1
13 pages
DSP Processor Eee M.tech
No ratings yet
DSP Processor Eee M.tech
33 pages
Quantum machine learning in medical image analysis: A survey
No ratings yet
Quantum machine learning in medical image analysis: A survey
12 pages
Be Electrical Engineering Semester 8 2024 May Dloc IV Power System Planning and Reliabiliy Rev 2019 c Scheme
No ratings yet
Be Electrical Engineering Semester 8 2024 May Dloc IV Power System Planning and Reliabiliy Rev 2019 c Scheme
1 page
Signals & Systems B38SA 2018: Chapter 2 Assignment Question 1 - Theory - 10 Marks
No ratings yet
Signals & Systems B38SA 2018: Chapter 2 Assignment Question 1 - Theory - 10 Marks
6 pages
Deep Learning Techniques Notes
No ratings yet
Deep Learning Techniques Notes
42 pages
ML Project Proposal
No ratings yet
ML Project Proposal
4 pages
L-2 - Controllability and Observabilty
No ratings yet
L-2 - Controllability and Observabilty
13 pages
On Integral Control in Backstepping: Analysis of Different Techniques
No ratings yet
On Integral Control in Backstepping: Analysis of Different Techniques
6 pages
Planar Data Classification With One Hidden Layer v5
No ratings yet
Planar Data Classification With One Hidden Layer v5
19 pages
Sheet For Assignment-Summer-2020: X X X X X X
No ratings yet
Sheet For Assignment-Summer-2020: X X X X X X
2 pages
L25 - Lexicographically First Palindromic String
No ratings yet
L25 - Lexicographically First Palindromic String
11 pages
Lab 02: Decision Tree With Scikit-Learn: About The Mushroom Data Set
No ratings yet
Lab 02: Decision Tree With Scikit-Learn: About The Mushroom Data Set
3 pages
Information Theory and Coding - Chapter 5
No ratings yet
Information Theory and Coding - Chapter 5
41 pages
ML Lec 15 Alexnet CNN
No ratings yet
ML Lec 15 Alexnet CNN
8 pages
Introduction To Meshing
No ratings yet
Introduction To Meshing
22 pages
Bee3143:Power System ANALYSIS-Power Flow Solution - Gauss-Seidel
No ratings yet
Bee3143:Power System ANALYSIS-Power Flow Solution - Gauss-Seidel
12 pages
Stacks Java
No ratings yet
Stacks Java
30 pages
Performance Comparison of Various Face Detection Techniques
No ratings yet
Performance Comparison of Various Face Detection Techniques
9 pages
Traffic Prediction For Intelligent Transportation System Using Machine Learning
No ratings yet
Traffic Prediction For Intelligent Transportation System Using Machine Learning
4 pages
PS9 Solutions
No ratings yet
PS9 Solutions
5 pages
Cse330:Competitive Coding Approaches-Techniques: Course Outcomes
No ratings yet
Cse330:Competitive Coding Approaches-Techniques: Course Outcomes
2 pages
CS C363
No ratings yet
CS C363
3 pages
Autoregressive - Models (AR)
No ratings yet
Autoregressive - Models (AR)
24 pages
Unit - 4 Recursion
No ratings yet
Unit - 4 Recursion
9 pages
LECTURE 4 HASH FUNCTIONS
No ratings yet
LECTURE 4 HASH FUNCTIONS
9 pages
Sun 2019
No ratings yet
Sun 2019
37 pages
Even Semester 2013 - 2014 Course Handout LTPC 3 0 0 3
No ratings yet
Even Semester 2013 - 2014 Course Handout LTPC 3 0 0 3
5 pages
HW 3
No ratings yet
HW 3
3 pages
Assignment Model
No ratings yet
Assignment Model
17 pages
Honours LY Project
No ratings yet
Honours LY Project
31 pages
Anova Kacang Panjang
No ratings yet
Anova Kacang Panjang
8 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

An introduction to machine

learning and probabilistic

Color Shape Size Output

Linearly separable data

Kernel implicitly maps from 2D to 3D,

Simple classifiers (weak learners) can have their performance

Boosting maximizes the margin 13

PCA reduces the

Num in household = 1 ^ num children = 0 => language = English

Language = English ^ Income < $40k ^ Married = false ^

From data visualization to causal discovery 24

Arkin et al. (1998), Genetics

5 genes, 67 parameters based on 50 years of research

Latent “switch” variable – hidden process at work 33

Q •Value of Q depends on value of

Learning is harder because Q is hidden, so we don’t know which

Bayes nets MRFs

Together: Quantitative part:

• 509 parameters PULMEM

ANAPH PVS ART

Hidden Markov Model (HMM)

Many difference inference algorithms,

Missing an arc Adding an arc

Search for a structure that maximizes the score

• Can find optimal tree structure in O(n2 log n)

– Small sample size ⇒ many high scoring models

How compute sum over super-exponential number of graphs?

•MCMC over networks

There are some techniques for automatically detecting the

Cut arcs coming

P(smoker|observe(yellow)) >> prior P(smoker | do(paint yellow)) = prior 61

• Inductive Logic Programming (ILP)

TotalLength(D2mhr, 118) ^ NumberHelices(D2mhr, 6) ^ …

100 conjuncts describing structure of each pos/neg example

• In English: “The protein P folds if it

+ Handles mixed variables

Weights on each arc Sigmoid function at each node

- Handles mixed variables

- Handles mixed variables

General lessons from SVM success:

•Large margin classifiers are good 80

Bayes nets MRFs

You might also like