0% found this document useful (0 votes)
71 views88 pages

CS194 Fall 2011 Lecture 01

This document contains notes from the introduction lecture of the CS194-10 Fall 2011 machine learning course at Stanford University. It provides an overview of the course outline, which will cover both classical and probabilistic supervised learning techniques as well as learning probabilistic models. It also gives a high-level introduction to machine learning, including why learning is important, different learning approaches like supervised, unsupervised and reinforcement learning, and some basic questions in the philosophy of learning like choosing a hypothesis space and measuring fit to data.

Uploaded by

hanamachi09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views88 pages

CS194 Fall 2011 Lecture 01

This document contains notes from the introduction lecture of the CS194-10 Fall 2011 machine learning course at Stanford University. It provides an overview of the course outline, which will cover both classical and probabilistic supervised learning techniques as well as learning probabilistic models. It also gives a high-level introduction to machine learning, including why learning is important, different learning approaches like supervised, unsupervised and reinforcement learning, and some basic questions in the philosophy of learning like choosing a hypothesis space and measuring fit to data.

Uploaded by

hanamachi09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 88

CS194-10 Fall 2011

Introduction to Machine Learning


Machine Learning: An Overview
People Avital Steinitz
2nd year CS
PhD student

Stuart Russell
30th-year CS
PhD student

Mert Pilanci
2nd year EE
PhD student

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 2


Administrative details
• Web page
• Newsgroup

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 3


Course outline
• Overview of machine learning (today)
• Classical supervised learning
– Linear regression, perceptrons, neural nets, SVMs, decision trees,
nearest neighbors, and all that
– A little bit of theory, a lot of applications
• Learning probabilistic models
– Probabilistic classifiers (logistic regression, etc.)
– Unsupervised learning, density estimation, EM
– Bayes net learning
– Time series models
– Dimensionality reduction
– Gaussian process models
– Language models
• Bandits and other exciting topics

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 4


Lecture outline
• Goal: Provide a framework for understanding all the
detailed content to come, and why it matters
• Learning: why and how
• Supervised learning
– Classical: finding simple, accurate hypotheses
– Probabilistic: finding likely hypotheses
– Bayesian: updating belief in hypotheses
• Data and applications
• Expressiveness and cumulative learning
• CTBT

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 5


Learning is….
… a computational process for improving
performance based on experience

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 6


Learning: Why?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 7


Learning: Why?
• The baby, assailed by eyes, ears, nose, skin, and
entrails at once, feels it all as one great blooming,
buzzing confusion …
– [William James, 1890]

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 8


Learning: Why?
• The baby, assailed by eyes, ears, nose, skin, and
entrails at once, feels it all as one great blooming,
buzzing confusion …
– [William James, 1890]

Learning is essential for unknown environments,


i.e., when the designer lacks omniscience

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 9


Learning: Why?
• Instead of trying to produce a programme to simulate the
adult mind, why not rather try to produce one which
simulates the child's? If this were then subjected to an
appropriate course of education one would obtain the adult
brain. Presumably the child brain is something like a
notebook as one buys it from the stationer's. Rather little
mechanism, and lots of blank sheets.
– [Alan Turing, 1950]
• Learning is useful as a system construction
method, i.e., expose the system to reality rather
than trying to write it down

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 10


Learning: How?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 11


Learning: How?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 12


Learning: How?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 13


Learning: How?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 14


Structure of a learning agent

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 15


Design of learning element
• Key questions:
– What is the agent design that will implement the desired
performance?
– Improve the performance of what piece of the agent
system and how is that piece represented?
– What data are available relevant to that piece? (In
particular, do we know the right answers?)
– What knowledge is already available?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 16


Examples
Agent design Component Representation Feedback Knowledge
Alpha-beta Evaluation Linear Win/loss Rules of game;
search function polynomial Coefficient signs
Logical planning Transition model Successor-state Action outcomes Available actions;
agent (observable envt) axioms Argument types
Utility-based Physiology/senso Dynamic Observation Gen physiology;
patient monitor r model Bayesian sequences Sensor design
network
Satellite image Classifier (policy) Markov random Partial labels Coastline;
pixel classifier field Continuity scales

Supervised learning: correct answers for each training instance


Reinforcement learning: reward sequence, no correct answers
Unsupervised learning: “just make sense of the data”

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 17


Supervised learning
• To learn an unknown target function f
• Input: a training set of labeled examples (xj,yj)
where yj = f(xj)
• E.g., xj is an image, f(xj) is the label “giraffe”
• E.g., xj is a seismic signal, f(xj) is the label “explosion”
• Output: hypothesis h that is “close” to f, i.e., predicts
well on unseen examples (“test set”)
• Many possible hypothesis families for h
– Linear models, logistic regression, neural networks, decision
trees, examples (nearest-neighbor), grammars, kernelized
separators, etc etc

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 18


Supervised learning
• To learn an unknown target function f
• Input: a training set of labeled examples (xj,yj)
where yj = f(xj)
• E.g., xj is an image, f(xj) is the label “giraffe”
• E.g., xj is a seismic signal, f(xj) is the label “explosion”
• Output: hypothesis h that is “close” to f, i.e., predicts
well on unseen examples (“test set”)
• Many possible hypothesis families for h
– Linear models, logistic regression, neural networks, decision
trees, examples (nearest-neighbor), grammars, kernelized
separators, etc etc

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 19


Supervised learning
• To learn an unknown target function f
• Input: a training set of labeled examples (xj,yj)
where yj = f(xj)
• E.g., xj is an image, f(xj) is the label “giraffe”
• E.g., xj is a seismic signal, f(xj) is the label “explosion”
• Output: hypothesis h that is “close” to f, i.e., predicts
well on unseen examples (“test set”)
• Many possible hypothesis families for h
– Linear models, logistic regression, neural networks, decision
trees, examples (nearest-neighbor), grammars, kernelized
separators, etc etc

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 20


Example: object recognition
x

f(x) giraffe giraffe giraffe llama llama llama

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 21


Example: object recognition
x

f(x) giraffe giraffe giraffe llama llama llama

X= f(x)=?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 22


Example: curve fitting

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 23


Example: curve fitting

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 24


Example: curve fitting

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 25


Example: curve fitting

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 26


Example: curve fitting

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 27


Basic questions
• Which hypothesis space H to choose?
• How to measure degree of fit?
• How to trade off degree of fit vs. complexity?
– “Ockham’s razor”
• How do we find a good h?
• How do we know if a good h will predict well?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 28


Philosophy of Science (Physics)
• Which hypothesis space H to choose?
– Deterministic hypotheses, usually mathematical formulas
and/or logical sentences; implicit relevance determination
• How to measure degree of fit?
– Ideally, h will be consistent with data
• How to trade off degree of fit vs. complexity?
– Theory must be correct up to “experimental error”
• How do we find a good h?
– Intuition, imagination, inspiration (invent new terms!!)
• How do we know if a good h will predict well?
– Hume’s Problem of Induction: most philosophers give up

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 29


Kolmogorov complexity (also MDL, MML)
• Which hypothesis space H to choose?
– All Turing machines (or programs for a UTM)
• How to measure degree of fit?
– Fit is perfect (program has to output data exactly)
• How to trade off degree of fit vs. complexity?
– Minimize size of program
• How do we find a good h?
– Undecidable (unless we bound time complexity of h)
• How do we know if a good h will predict well?
– (recent theory borrowed from PAC learning)

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 30


Classical stats/ML: Minimize loss function
• Which hypothesis space H to choose?
– E.g., linear combinations of features: hw(x) = wTx
• How to measure degree of fit?
– Loss function, e.g., squared error Σj (yj – wTx)2
• How to trade off degree of fit vs. complexity?
– Regularization: complexity penalty, e.g., ||w||2
• How do we find a good h?
– Optimization (closed-form, numerical); discrete search
• How do we know if a good h will predict well?
– Try it and see (cross-validation, bootstrap, etc.)

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 31


Probabilistic: Max. likelihood, max. a priori
• Which hypothesis space H to choose?
– Probability model P(y | x,h) , e.g., Y ~ N(wTx,σ2)
• How to measure degree of fit?
– Data likelihood Πj P(yj | xj,h)
• How to trade off degree of fit vs. complexity?
– Regularization or prior: argmaxh P(h) Πj P(yj | xj,h) (MAP)
• How do we find a good h?
– Optimization (closed-form, numerical); discrete search
• How do we know if a good h will predict well?
– Empirical process theory (generalizes Chebyshev, CLT, PAC…);
– Key assumption is (i)id

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 32


Bayesian: Computing posterior over H
• Which hypothesis space H to choose?
– All hypotheses with nonzero a priori probability
• How to measure degree of fit?
– Data probability, as for MLE/MAP
• How to trade off degree of fit vs. complexity?
– Use prior, as for MAP
• How do we find a good h?
– Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h)
• How do we know if a good h will predict well?
– Silly question! Bayesian prediction is optimal!!

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 33


Bayesian: Computing posterior over H
• Which hypothesis space H to choose?
– All hypotheses with nonzero a priori probability
• How to measure degree of fit?
– Data probability, as for MLE/MAP
• How to trade off degree of fit vs. complexity?
– Use prior, as for MAP
• How do we find a good h?
– Don’t! Bayes predictor P(y|x,D) = Σh P(y|x,h) P(D|h) P(h)
• How do we know if a good h will predict well?
– Silly question! Bayesian prediction is optimal!!

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 34


Neon sculpture at Autonomy Corp.

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 35


Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 36
Lots of data
• Web: estimated Google index 45 billion pages
• Clickstream data: 10-100 TB/day
• Transaction data: 5-50 TB/day
• Satellite image feeds: ~1TB/day/satellite
• Sensor networks/arrays
– CERN Large Hadron Collider ~100 petabytes/day
• Biological data: 1-10TB/day/sequencer
• TV: 2TB/day/channel; YouTube 4TB/day uploaded
• Digitized telephony: ~100 petabytes/day
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 37
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 38
Real data are messy

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 39


Arterial blood pressure (high/low/mean) 1s

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 40


Application: satellite image analysis

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 41


Application: Discovering DNA motifs
...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCG
GTTGTGATG
TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTTC
TTCAGTAGTA
TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCC
TTACTGTTAT
CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTT
ATGTCACAAGG
GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCT
TGACTGAAT
AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTG
AGTTGGAGTT
GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGT
ATTTCCACTA
GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATT
ACATCTTGAA
GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCG
CGGAGATAAC
TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCG
TCAGTGA...

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 42


Application: Discovering DNA motifs
...TTGGAACAACCATGCACGGTTGATTCGTGCCTGTGACCGCGCGCCTCACACGGAAGACGCAGCCACCG
GTTGTGATG
TCATAGGGAATTCCCCATGTCGTGAATAATGCCTCGAATGATGAGTAATAGTAAAACGCAGGGGAGGTT
CTTCAGTAGTA
TCAATATGAGACACATACAAACGGGCGTACCTACCGCAGCTCAAAGCTGGGTGCATTTTTGCCAAGTGCC
TTACTGTTAT
CTTAGGACGGAAATCCACTATAAGATTATAGAAAGGAAGGCGGGCCGAGCGAATCGATTCAATTAAGTT
ATGTCACAAGG
GTGCTATAGCCTATTCCTAAGATTTGTACGTGCGTATGACTGGAATTAATAACCCCTCCCTGCACTGACCT
TGACTGAAT
AACTGTGATACGACGCAAACTGAACGCTGCGGGTCCTTTATGACCACGGATCACGACCGCTTAAGACCTG
AGTTGGAGTT
GATACATCCGGCAGGCAGCCAAATCTTTTGTAGTTGAGACGGATTGCTAAGTGTGTTAACTAAGACTGGT
ATTTCCACTA
GGACCACGCTTACATCAGGTCCCAAGTGGACAACGAGTCCGTAGTATTGTCCACGAGAGGTCTCCTGATT
ACATCTTGAA
GTTTGCGACGTGTTATGCGGATGAAACAGGCGGTTCTCATACGGTGGGGCTGGTAAACGAGTTCCGGTCG
CGGAGATAAC
TGTTGTGATTGGCACTGAAGTGCGAGGTCTTAAACAGGCCGGGTGTACTAACCCAAAGACCGGCCCAGCG
TCAGTGA...

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 43


Application: User website behavior from
clickstream data (from P. Smyth, UCI)
128.195.36.195, -, 3/22/00, 10:35:11, W3SVC, SRVR1, 128.200.39.181, 781, 363, 875, 200, 0, GET, /top.html, -,
128.195.36.195, -, 3/22/00, 10:35:16, W3SVC, SRVR1, 128.200.39.181, 5288, 524, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.195, -, 3/22/00, 10:35:17, W3SVC, SRVR1, 128.200.39.181, 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.195.36.101, -, 3/22/00, 16:18:50, W3SVC, SRVR1, 128.200.39.181, 60, 425, 72, 304, 0, GET, /top.html, -,
128.195.36.101, -, 3/22/00, 16:18:58, W3SVC, SRVR1, 128.200.39.181, 8322, 527, 414, 200, 0, POST, /spt/main.html, -,
128.195.36.101, -, 3/22/00, 16:18:59, W3SVC, SRVR1, 128.200.39.181, 0, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:54:37, W3SVC, SRVR1, 128.200.39.181, 140, 199, 875, 200, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 17766, 365, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:54:55, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:07, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 1061, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:55:36, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:55:39, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:03, W3SVC, SRVR1, 128.200.39.181, 1081, 382, 414, 200, 0, POST, /spt/main.html, -,
128.200.39.17, -, 3/22/00, 20:56:04, W3SVC, SRVR1, 128.200.39.181, 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -,
128.200.39.17, -, 3/22/00, 20:56:33, W3SVC, SRVR1, 128.200.39.181, 0, 262, 72, 304, 0, GET, /top.html, -,
128.200.39.17, -, 3/22/00, 20:56:52, W3SVC, SRVR1, 128.200.39.181, 19598, 382, 414, 200, 0, POST, /spt/main.html, -,

User 1 2 3 2 2 3 3 3 1 1 1 3 1 3 3 3 3
User 2 3 3 3 1 1 1
User 3 7 7 7 7 7 7 7 7
User 4 1 5 1 1 1 5 1 5 1 1 1 1 1 1
User 5 5 1 1 5
… …
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 44
Application: social network analysis

HP Labs email data


500 users, 20k connections
evolving over time

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 45


Application: spam filtering
• 200 billion spam messages sent per day
• Asymmetric cost of false positive/false negative
• Weak label: discarded without reading
• Strong label (“this is spam”) hard to come by
• Standard iid assumption violated: spammers alter
spam generators to evade or subvert spam filters
(“adversarial learning” task)

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 46


Learning

Learning knowledge

data

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 47


Learning

prior Learning knowledge


knowledge

data

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 48


Learning

prior Learning knowledge


knowledge

data

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 49


Learning

prior Learning knowledge


knowledge

data

Crucial open problem: weak intermediate forms of knowledge


that support future generalizations

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 50


Example

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 51


Example

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 52


Example

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 53


Example – arriving at Sao Paulo, Brazil

Bem-vindo!

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 54


Example – arriving at Sao Paulo, Brazil

Bem-vindo!

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 55


Example – arriving at Sao Paulo, Brazil

Bem-vindo!
Bem-vindo!

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 56


Example – arriving at Sao Paulo, Brazil

Bem-vindo!
Bem-vindo!

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 57


Weak prior knowledge
• In this case, people in a given country (and city)
tend to speak the same language
• Where did this knowledge come from?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 58


Weak prior knowledge
• In this case, people in a given country (and city)
tend to speak the same language
• Where did this knowledge come from?
– Experience with other countries
– “Common sense” – i.e., knowledge of how societies and
languages work

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 59


Weak prior knowledge
• In this case, people in a given country (and city)
tend to speak the same language
• Where did this knowledge come from?
– Experience with other countries
– “Common sense” – i.e., knowledge of how societies and
languages work
• And where did that knowledge come from?

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 60


Knowledge? What is knowledge?
All I know is samples!! [V. Vapnik]
• All knowledge derives, directly or indirectly, from
experience of individuals
• Knowledge serves as a directly applicable shorthand
for all that experience – better than requiring
constant review of the entire sensory/evolutionary
history of the human race

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 61


CTBT: Comprehensive Nuclear-Test-Ban Treaty

• Bans testing of nuclear weapons on earth


– Allows for outside inspection of 1000km2
• 182/195 states have signed
• 153/195 have ratified
• Need 9 more ratifications including US, China
• US Senate refused to ratify in 1998
– “too hard to monitor”

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 71


2053 nuclear explosions

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 72


Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 73
254 monitoring stations

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 74


Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 75
The problem
• Given waveform traces from all seismic stations,
figure out what events occurred when and where
• Traces at each sensor station may be preprocessed
to form “detections” (90% are not real)
ARID ORID STA PH BEL DELTA SEAZ ESAZ TIME TDEF AZRES ADEF SLORES SDEF WGT VMODEL LDDATE

49392708 5295499 WRA P -1.0 23.673881 342.00274 163.08123 0.19513991 d -1.2503497 d 0.24876981 d -999.0 0.61806399 IASP 2009-04-02 12:54:27
49595064 5295499 FITZ P -1.0 20.835616 4.3960142 184.18581 1.2515257 d 2.7290018 d 5.4541182 n -999.0 0.46613527 IASP 2009-04-02 12:54:27

49674189 5295499 MKAR P -1.0 58.574266 124.26633 325.35514 -0.053738765 d -4.6295428 d 1.5126035 d -999.0 0.76750542 IASP 2009-04-02 12:54:27
49674227 5295499 ASAR P -1.0 27.114852 345.18433 166.42383 -0.71255454 d -6.4901126 d 0.95510033 d -999.0 0.66453657 IASP 2009-04-02 12:54:27

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 76


What do we know?
• Events happen randomly; each has a time, location,
depth, magnitude; seismicity varies with location
• Seismic waves of many kinds (“phases”) travel
through the Earth
– Travel time and attenuation depend on phase and
source/destination
• Arriving waves may or may not be detected,
depending on sensor and local noise environment
• Local noise may also produce false detections

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 77


Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 78
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 79
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 80
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 81
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 82
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 83
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 84
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 85
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 86
Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 87
# SeismicEvents ~ Poisson[TIME_DURATION*EVENT_RATE];
IsEarthQuake(e) ~ Bernoulli(.999);
EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution()
Else UniformEarthDistribution();
Magnitude(e) ~ Exponential(log(10)) + MIN_MAG;
Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s));
IsDetected(e,p,s) ~ Logistic[SITE_COEFFS(s,p)](Magnitude(e), Distance(e,s);
#Arrivals(site = s) ~ Poisson[TIME_DURATION*FALSE_RATE(s)];
#Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0;
Time(a) ~ If (event(a) = null) then Uniform(0,TIME_DURATION)
else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a);
TimeRes(a) ~ Laplace(TIMLOC(site(a)), TIMSCALE(site(a)));
Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360)
else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a);
AzRes(a) ~ Laplace(0, AZSCALE(site(a)));
Slow(a) ~ If (event(a) = null) then Uniform(0,20)
else IASPEI-SLOW(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a));

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 88


Learning with prior knowledge
• Instead of learning a mapping from detection
histories to event bulletins, learn local pieces of an
overall structured model:
– Event location prior (A6)
– Predictive travel time model (A1)
– Phase type classifier (A2)

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 89


Event location prior (A6)

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 90


Travel time prediction (A1)
How long does it take for a
seismic signal to get from A to
B? This is the travel time T(A,B)
If we know this accurately, and
we know the arrival times t1, t2,
t3, … at several stations B1, B2,
B3, …, we can find an accurate
estimate of the location A and
time t for the event, such that
– T(A,Bi) ≈ ti – t for all i

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 91


Earth 101

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 92


Seismic “phases” (wave types/paths)
Seismic energy is emitted
in different types of
waves; there are also
qualitatively distinct
paths (e.g., direct vs
reflected from surface
vs. refracted through
core). P and S are the
direct waves; P is faster

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 93


Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 94
IASP 91 reference velocity model
Spherically symmetric, Vphase(depth); from this, obtain Tpredicted(A,B).

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 95


IASP91 inaccuracy is too big!
• Earth is inhomogeneous: variations in crust
thickness and rock properties (“fast” and “slow”)

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 96


Travel time residuals (Tactual – Tpredicted)
• Residual surface (wrt a particular station) is locally
smooth; estimate by local regression

Lecture 1 8/25/11 CS 194-10 Fall 2011, Stuart Russell 97

You might also like