Session 07 (Inference)
Session 07 (Inference)
Leland
Wilkinson
Adjunct
Professor
UIC
Computer
Science
Chief
Scien<st
H2O.ai
[email protected]
Inference
o Inference
involves
drawing
conclusions
from
evidence
o In
logic,
the
evidence
is
a
set
of
premises
o In
data
analysis,
the
evidence
is
a
set
of
data
o In
sta<s<cs,
the
evidence
is
a
sample
from
a
popula<on
o A
popula<on
is
assumed
to
have
a
distribu<on
o The
sample
is
assumed
to
be
random
(There
are
ways
around
that)
o The
popula<on
may
be
the
same
size
as
the
sample
o There
are
two
historical
approaches
to
sta<s<cal
inference
o Frequen<st
o Bayesian
o There
are
many
widespread
abuses
of
sta<s<cal
inference
o We
cherry
pick
our
results
(scien<sts,
journals,
reporters,
…)
o We
didn’t
have
a
big
enough
sample
to
detect
a
real
difference
o We
think
a
large
sample
guarantees
accuracy
(the
bigger
the
beRer)
2
Copyright
©
2016
Leland
Wilkinson
Inference
Deduc<ve
(top
down)
All
men
are
mortal.
(premise)
Apollo
is
a
man.
(premise)
Therefore,
Apollo
is
mortal.
(conclusion)
The
conclusion
is
guaranteed
if
premises
are
true
Abduc<ve
Bill
and
Jane
had
a
fight
and
stopped
seeing
each
other
I
just
saw
Bill
and
Jane
having
coffee
together
I
conclude
they
are
friends
again
The
conclusion
is
not
guaranteed
even
if
premise(s)
are
true
Induc<ve
(boRom
up)
o All
of
the
swans
we
have
seen
are
white.
o Therefore,
all
swans
are
white.
The
conclusion
is
not
guaranteed
even
if
premise(s)
are
true
There
exist
black
swans
(also
blue
lobsters)
Mathema<cal
proofs
are
deduc<ve
Data-‐analy<c
inference
tends
to
be
abduc<ve
Sta<s<cal
inference
tends
to
be
induc<ve
Abduc<ve
and
induc<ve
inference
necessarily
involve
risk
Conclude Collect
Combine Evaluate
Conclude Sample
Assess Model
o The
product
func<on
is
rather
awkward,
so
we
log
the
likelihood
l(θ; x1 , . . . , xn ) = log [L(θ; x1 , . . . , xn )]
�n
= log f (x1 , . . . , xn ; θ)
i=1
o requires
∂
l(µ, σ 2 ; x1 , . . . , xn ) = 0
∂µ
∂
l(µ, σ 2 ; x1 , . . . , xn ) = 0
∂σ 2
o The
respec<ve
par<al
deriva<ves
are
� n
�
∂ 1 �
l(µ, σ 2 ; x1 , . . . , xn ) = 2 xi − nµ
∂µ σ i=1
o and,
� �
n
∂ 1 1 �
2
l(µ, σ 2 ; x1 , . . . , xn ) = (xi − µ)2 − n
∂σ 2σ 2 σ 2 i=1
o implies
n
2 1�
σ̂ = (xi − µ̂)2
n i=1
σ 214.1
σµ = √ central
limit
theorem
42.82 = √ preRy
close!
n 25
18
Copyright
©
2016
Leland
Wilkinson
Inference
o Inferring
Parameters
of
a
Distribu<on
via
the
Bootstrap
o 20
Bootstrap
es<mates
of
robust
piecewise
regression
Females
30
Bone Alkaline Phosphatase
25
20
15
10
0
0 10 20 30 40 50 60 70 80 90
Age
o Construct
Null
Hypothesis
H0
(usually,
that
a
result
is
due
to
chance)
o State
rule
for
rejec<ng
H0
o Compute
likelihood
of
observed
result
under
H0
o Draw
a
conclusion
based
on
decision
rule
23
Copyright
©
2016
Leland
Wilkinson
Inference
o Hypothesis
tes<ng
o Protec<ng
against
false
posi<ves
–
the
first
significance
test
o An
Argument
for
Divine
Providence,
taken
from
the
Constant
Regularity
observed
in
the
Births
of
both
Sexes.
By
Dr.
John
Arbuthnot,
Physician
in
Ordinary
to
her
Majesty,
and
Fellow
of
the
College
of
Physicians
and
the
Royal
Society
o There
seems
no
more
probable
Cause
to
be
assigned
in
Physics
for
this
Equality
of
the
Births,
than
that
in
our
’first
Parents
Seed
there
were
at
first
formed
an
equal
Number
of
both
Sexes.
o […]
From
hence
it
follows,
that
Polygamy
is
contrary
to
the
Law
of
Nature
and
Jus<ce,
and
to
the
Propaga<on
of
the
Human
Race;
for
where
Males
and
Females
are
in
equal
number,
if
one
Man
take
Twenty
Wives,
Nineteen
Men
must
live
in
Celibacy,
which
is
repugnant
to
the
Design
of
Nature;
nor
is
it
probable
that
Twenty
Women
will
be
so
well
impregnated
by
one
Man
as
by
Twenty.
� � � �n
n 1
P (exactly equal numbers of Males and Females) =
n/2 2
Answer
Milk
Tea
Total
First
First
0
4
1
3
2
2
3
1
4
0
Milk
4
0
4
4
0
3
1
2
2
1
3
0
4
Truth
First
Total
Tea
1
16
36
16
1
70
First
0
4
4
.014
.229
.514
.229
.014
1
Total
4
4
8
3 .0 0.3
)x(f
f(x)
2 .0 0.2
α 1 .0 0.1
α
x̄
0 .0 0.0 x̄
3 2 H
Reject 0
1 µ0
0
1- 2- 3- -3 -2 -1 µ0 0 1 2 H
Reject 0
3
x x
HA : µ �= µ0
0.4
0.3
f(x)
0.2
0.1
α/2 α/2
0.0
µ0 0 x̄
-3 -2 H -1
Reject 1 2 H
Reject 3
0 0
x
Smeeters & Liu (2011) JEXP Thanks to Uri Simonsohn and Richard Gill
Null
Hypothesis
Type
I
error
(α)
True
True
Nega<ve
False
Posi<ve
m=25,
α
=
.05
p
Unadjusted
Bonferroni
o M.
Feych<ng
and
M.
Alhbom
(1992).
Magne<c
fields
and
cancer
in
children
residing
near
Swedish
high-‐voltage
power
lines.
American
Journal
of
Epidemiology,
138,
467-‐481.
• Surveyed
everyone
living
within
300
meters
of
high-‐voltage
power
lines
from
1960
through
1985.
• Looked
for
sta<s<cally
significant
increases
in
rela<ve
risk
(against
baseline)
of
over
800
illnesses.
• Found
that
there
was
a
significant
rela<ve
risk
of
childhood
leukemia
for
those
living
near
power
lines.
• The
number
of
illnesses
considered
was
so
large,
however,
that
there
was
high
probability
that
the
increased
risk
of
at
least
one
illness
would
appear
sta<s<cally
significant
by
chance
alone.
• Subsequent
studies
failed
to
show
any
links
between
power
lines
and
childhood
leukemia.
Inference
o Hypothesis
tes<ng
Highly
significant
p-‐value
doesn’t
mean
effect
is
large
or
strong
or
influen<al
o Suppose
X
is
the
number
of
heads
in
12
flips
of
a
fair
coin
and
Y
is
the
number
of
flips
needed
to
get
3
heads.
o A
frequen<st
tests
the
result
that
X
=
3
against
a
Binomial,
with
resul<ng
p
=
.073.
o But
she
tests
the
result
that
Y
=
12
against
a
Nega<ve
Binomial,
with
p
=
.0327.
o The
data
are
the
same
in
both
circumstances,
but
the
experiments
differ
o The
difference
between
observing
X
=
3
and
observing
Y
=
12
lies
not
in
the
actual
data,
but
merely
in
the
design
of
the
experiment.
In
the
first
case,
one
has
decided
in
advance
to
try
12
flips.
In
the
second,
one
has
decided
to
keep
flipping
un<l
3
successes
are
observed.
Bayesians
say
the
inference
about
θ
should
be
the
same
because
the
two
likelihoods
are
propor<onal
to
each
other.
L(θ) ∝ p3 (1 − p)9
P (E|H)P (H)
P (H|E) =
P (E)
Andrewgelman.com
o With
a
lot
of
work,
it
may
be
possible
to
elicit
a
one
dimensional
prior.
o With
a
lot
of
work,
it
may
be
possible
to
elicit
a
one
dimensional
prior.
o There
may
be
some
circumstances
where,
with
a
WHOLE
lot
of
work,
it
is
possible
to
elicit
a
two-‐dimensional
prior.
o With
a
lot
of
work,
it
may
be
possible
to
elicit
a
one
dimensional
prior.
o There
may
be
some
circumstances
where,
with
a
WHOLE
lot
of
work,
it
is
possible
to
elicit
a
two-‐dimensional
prior.
o NO
ONE
can
specify
a
three-‐dimensional
prior!
o Persi
Diaconis:
o It’s
very
hard
to
put
meaningful
priors
on
high-‐dimensional
real
problems.
And
the
choices
can
really
make
a
difference.
o Elici<ng
opinions
about
covariances
is
likely
to
be
much
more
difficult
that
elici<ng
opinions
about
means
and
standard
devia<ons,
and
the
actual
values
maRer…a
lot!
49
Copyright
©
2016
Leland
Wilkinson
Inference
o Jerry
Dallal:
How
to
annoy
a
Bayesian
o If
the
prior
is
sharp
rela<ve
to
the
likelihood,
the
posterior
distribu<on
will
look
like
the
prior.
o If
the
likelihood
is
sharp
rela<ve
to
the
prior,
the
posterior
distribu<on
will
look
like
the
likelihood.
o If
neither
the
prior
nor
the
likelihood
is
sharp
rela<ve
to
the
other,
the
posterior
distribu<on
will
be
a
mix
of
the
two.
o If
the
prior
is
sharp
rela<ve
to
the
likelihood,
the
posterior
distribu<on
will
look
like
the
prior.
o If
the
likelihood
is
sharp
rela<ve
to
the
prior,
the
posterior
distribu<on
will
look
like
the
likelihood.
…and
the
results
will
be
the
same
as
from
a
frequenRst
analysis!
This
is
scary.
It
says
that
frequenRst
results
that
could
be
mistaken
for
probability
statements
really
are
probability
statements!
o If
neither
the
prior
nor
the
likelihood
is
sharp
rela<ve
to
the
other,
the
posterior
distribu<on
will
be
a
mix
of
the
two.
o Whose
prior?
o Sponsors
o Special
Interest
Groups*
o Inves<gators
o Reviewers
o Policy
Makers
o Consumers
*If
you
are
my
friend,
you
will
do
your
best
to
avoid
using
the
word
“stakeholder”
in
my
presence,
says
Jerry.
54
Copyright
©
2016
Leland
Wilkinson
Inference
o Jerry
Dallal:
How
to
annoy
a
Bayesian
o Something
that
is
impossible
under
the
prior
distribu<on
MUST
be
impossible
under
the
posterior
distribu<on.
Okay,
nothing
is
impossible,
so
we'll
withhold
a
bit
of
prior
probability
to
spread
around
(…but
how
much?)
o This
doesn't
solve
the
problem.
Something
that
is
unexpected
under
the
prior
must
s<ll
be
rare
under
the
posterior
unless
there
is
a
HUGE
amount
of
data,
or
it
really
wasn't
all
that
unexpected.
Inference
o Problems
with
Frequen<st
Inference
o Sir
David
Cox
(a
Frequen<st)
o “…
I
felt,
for
instance,
that
various
aspects
of
the
Neyman-‐Pearson
theory
-‐-‐
choose
alpha,
choose
a
cri<cal
region,
reject
or
accept
the
null
hypothesis
-‐-‐
give
a
rigid
procedure,
that
this
isn't
the
way
to
do
science.”
o “Neyman
talked
a
lot
about
induc<ve
rules
of
behavior
and
it
seemed
to
me
he
took
the
view
that
the
only
thing
that
you
could
ever
say
is
if
you
follow
this
procedure
again
and
again,
then
95%
of
the
<me
something
will
happen;
that
you
couldn't
say
anything
about
a
par<cular
instance.
Now,
I
don't
think
that's
how
he
actually
used
sta<s<cal
methods
when
it
came
to
applica<ons;
he
took
a
much
more
flexible
way.
o But
even
apart
from
that,
you
can
say,
is
this
no<on
of
5%
or
95%
region
-‐-‐
is
this
just
an
explana<on
of
what
a
95%
confidence
interval
would
mean?
A
sort
of
hypothe<cal
explana<on,
if
you
were
to
do
so
and
so,
such
and
such
would
happen?
Or
is
it
an
instruc<on
on
how
to
do
science?
It
seems
to
me
okay
as
the
first,
in
fact
very
good
as
the
first,
terrible
as
the
second.”
o “The
degree
to
which
I
am
a
Bayesian
is
directly
propor<onal
to
the
distance
I
am
from
the
computer
center.”
o “Most
Bayesians
reject
frequen<st
ideas
as
being
outrageous
and
the
next
thing
that
comes
out
of
their
mouths
is,
‘Let’s
use
these
Bayesian
techniques
that
match
up
with
frequen<st
methods.’
Therefore,
although
they
object
to
frequen<st
philosophy,
they
follow
frequen<st
prac<ce.”