0% found this document useful (0 votes)
2 views49 pages

s m s t c Inverse Problems Lecture 2

The document outlines the schedule and content for a series of lectures on inverse problems, specifically focusing on least squares methods and their applications. It includes details on assessments, lecture topics, and exercises for self-study. Key concepts discussed include prediction error, norms for quantifying errors, and various problem classifications in linear discrete inverse problems.

Uploaded by

miru park
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views49 pages

s m s t c Inverse Problems Lecture 2

The document outlines the schedule and content for a series of lectures on inverse problems, specifically focusing on least squares methods and their applications. It includes details on assessments, lecture topics, and exercises for self-study. Key concepts discussed include prediction error, norms for quantifying errors, and various problem classifications in linear discrete inverse problems.

Uploaded by

miru park
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Lectures are 9:30am -11am (approximately)

We will have a small break at 10:10am until


10:20am approximately

SMSTC Inverse problems

Lecture 2
9:30am – 11am January, 21 2021
Assessments

Homeworks solutions will available


soon (not assessed)

Assessment 1 will be out 18th of


February 2021
Deadline: 11th March 2021
Plan for Week 2: Least squares
• Recall linear discrete IP
• Introduce the concept of prediction error
• and the norms that quantify it (vector and matrix)
• Develop the Least Squares Solution
• Develop the Minimum Length Solution
• Look at all that at various perspectives

Exercises for self-work for this week will be highlighted during the lecture, the
solutions will appear on the course webpage later.
Recall: linear discrete inverse problem

Data is an M-vector The main goal is to “invert” K


Model parameters is an N vector 𝐾𝒎 = 𝒅
K is 𝑀 × 𝑁

model
parameters data
data kernel

an estimate of the model parameters can be used to predict the


data 𝑲𝒎𝑒𝑠𝑡 = 𝒅𝑝𝑟𝑒
𝒎𝑒𝑠𝑡 is an estimate of model parameters, could be given as bounds, or with some degree of certainty.
For example, 𝒎𝑒𝑠𝑡 = 1.2 ± 0.1 95% would mean that there is 0.95 probability that the true value of
the model parameter 𝑚1 = 𝑚1𝑡𝑟𝑢𝑒 is between 1.1 and 1.3
Slightly aside: weakly nonlinear problems
Main consideration

the prediction may not match the observed data


(e.g. due to observational error) We then want to concentrate
dpre ≠ dobs on measuring the size/length
of the error
this mismatch leads us to define the prediction
error
e = dobs -dpre
e=0

when the model parameters exactly predict the


data
Example: least squares

m=[grad, y-intersect], M=2

A) B) 𝒅𝑝𝑟𝑒 as close as possible to 𝒅𝑎𝑏𝑠

15 15 This word “close” suggests we


need a way to measure it

diobs Residuals (prediction error):


10 10
ei 𝒆 = 𝒅𝑜𝑏𝑠 − 𝒅𝑝𝑟𝑒
dipre
d

d
The total overall error
5 5 𝑵

𝑬 = ෍ 𝒆𝟐𝒊 → min
𝒊=𝟏
0 0
0 5 10 0 5 zi 10
z z
Measures of length

“norm”
rule for quantifying the overall size of the error vector e

Lots of ways to define the norm


Vector norms

Euclidian length
Vector norms: what’s the difference?
higher norms give increasing weight to largest element of e 𝑇
Example: 𝒆 = 0.01 0.1 3

1
0
e

-1
0 1 2 3 4 5 6 7 8 9 10
z
1
|e|

0
-1
0 1 2 3 4 5 6 7 8 9 10
z
1
|e|2

0
-1
0 1 2 3 4 5 6 7 8 9 10
z
1
|e|10

0
-1
0 1 2 3 4 5 6 7 8 9 10
z
Which ones to choose?
15
A) B)
0.5 0.5
L1
10 L2 0.4 0.4

0.3 0.3
d

p(d)

p(d)
5
L∞ 0.2 0.2
outlier
0.1 0.1

0 0 0
0 2 4 6 8 10 0 5 10 0 5 10
z d d

long tails short tails


outliers common outliers uncommon
use L2 norm when data has outliers unimportant outliers important
Gaussian-distributed error (not use low norm use high norm
justified for now) gives low weight to gives high weight to
outliers outliers
Matrix norms
𝐿1 matrix norm

Matrix L1 norm is the largest L1 norm of the columns of the matrix


𝐿∞ matrix norm
𝐿2 matrix norm
𝐿2 matrix norm: cont.
𝐿2 matrix norm: cont. + computational aspects
Classification of Km=d according to information

(Purely) Underdetermined: if the data provided isn’t enough to reconstruct model


parameters (uniquely), usually N<M and there are several solutions such that E=0.

Approach: min norm of the estimated solution

Overdetermined: if the data too much to reconstruct model parameters, usually N>M
and there are several solutions such that E=0.

Approach: min total error

Even-determined: exactly enough data to reconstruct m uniquely

Approach: standard linear algebra

Mixed-determined: some parameters can be uniquely determined, the others cannot.

Approach: separation of groups


Overdetermined problems: least squares for a straight line

No solution as N>M (except when all


points belong to a straight line)

= T
ee
so E is the square of the Euclidean length
mimimize E
Principle of Least Squares

Calculus problem!
Overdetermined problems: least squares for a straight line: minimisation

Calculus problem!

This is a square
matrix: likely to
have an inverse!
Overdetermined problem: least squares
Example: fitting a plane surface

Earthquakes in the Kurile subduction zone, northwest Pacific


ocean. The x-axis points north and the y-axis east. The
earthquakes scatter about a dipping planar surface (colored
grid), determine using least squares. Data courtesy of the US
Geological Survey.
Another approach to least squares solution:
When does the lest squares fail?

?
d ?
?
z

zero determinant
hence no inverse
Least Squares will fail when [KT K] has no inverse
Mixed-determined problems: separation of information
Purely underdetermined problems: minimum length solution

Least Squares will fail when [KT K] has no inverse, i.e. when more than one solution
minimizes the error the inverse problem is “underdetermined” “A priori information” =
preconceptions about the world.
What should we do? We use another guiding principle

A priori information might be needed (we need to add information!)

Mainly it quantifies expectations about the character of the solution that is not based
on actual data

Examples: density is positive; density of the Earth is likely to be the one’s of rocks
Purely underdetermined problems: minimum length
solution

A priori information: The solution is small = “close to zero” or “simple” in


some measure of length.
This is not the most sophisticated type of a priori information,
but cases arise where it makes sense.
For instance, if the model parameters correspond to the velocities
of objects, then a solution that is small in the L2 sense has small kinetic
energy:
minimize ||m||2
Purely underdetermined problems: minimum length
solution

purely underdetermined: more than one solution has zero error


The key point is that the data can be exactly satisfied. This is a special case (usually
mixed-determined)
This is the formal mathematical
definition of the problem: find
the smallest set of model
parameters that
have zero prediction error.

We still want to minimise the error (but several solutions) so we have added
that as a constraint:

Lagrange multipliers!
This is the formal mathematical
Purely underdetermined problems: minimum length definition of the problem: find the
solution smallest set of model parameters that
have zero prediction error.

Lagrange multipliers! And hence no


constraints:

Setup: first term is Euclidean length L of model parameter


vector; second term is sum of constraints
(that error is zero), each multiplied by a Lagrange multiplier.
Lagrange multipliers

e(x,y)=0 Graphical interpretation of the method of


y Lagrange Multipliers, in which the function
L(x,y) is minimized subject to the constraint
that e(x,y)=0. The solution (bold dot) occurs
at the point, (x0,y0), on the curve, e(x,y)=0,
where the surface normal (gray arrows) is
parallel to the gradient, ∇L(x,y) (white
arrows). At this point, E can only be further
(x0,y0) L(x,y) minimized by moving the point, (x0,y0), off of
the curve, which is disallowed by the
constraint.

x
This is the formal mathematical
Purely underdetermined problems: minimum length definition of the problem: find the
solution smallest set of model parameters that
have zero prediction error.

Lagrange multipliers! And hence no


constraints:
Purely underdetermined problems: minimum length
solution
Weak undedetermination: damped least squares

If ε is large, we minimize the underdetermined part of the


solution: BUT it tends to minimum together with the
overdetermined part also, hence E is not minimised. If ε = 0 the
E is minimal, but no a priori information will be provided to
single out.
Weak undedetermination: damped least squares

If ε is large, we minimize the underdetermined part of the


solution: BUT it tends to minimum together with the
overdetermined part also, hence E is not minimised. If ε = 0 the
E is minimal, but no a priori information will be provided to
single out.
A priori information: ocean fluctuations + flat solution
A priori information: solution is smooth
A priori information: solution is smooth
A priori information and weights together:
A priori information and weights together: Weighted
least squares
A priori information and weights together: Weighted
minimal length
A priori information and weights together: Weighted
damped least squares
Looking at normal equation again: Moore –Penrose
pseudoinverse
Looking at normal equation again: Moore –Penrose
pseudoinverse
Looking at normal equation again: Moore –Penrose
pseudoinverse:
Looking at normal equation again: Moore –Penrose
pseudoinverse:
Looking at normal equation again: Moore –Penrose
pseudoinverse: Example
Looking at normal equation again: Moore –Penrose
pseudoinverse:
Exercises/Homework
for week 1
Any questions about
todays material?

You might also like