Gaussian Copula
Gaussian Copula
Yuxuan Zhao
1
Why I am here today?
2
Why I am here today?
2
Why I am here today?
2
Table of Contents
1 Motivation
3 Demo
3
Motivation
Figure 1: 2538 participants and 9 questions. 18.2% entries are missing in total.
Motivation 4
Motivation
Example variables
Motivation 5
Motivation
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X Xn d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
Motivation 6
Motivation
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X Xn d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
Motivation 6
Motivation
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X n
X d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
I What `j to choose?
I How to assign weights to `j when columns have different scales?
I What regularizer ri , r˜j to use?
Motivation 7
Motivation
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X n
X d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
I What `j to choose?
I How to assign weights to `j when columns have different scales?
I What regularizer ri , r˜j to use?
And there are tuning parameters...
Motivation 7
Motivation
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X Xn d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
Motivation 8
Motivation
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X Xn d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
Motivation 8
Motivation
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X Xn d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
Generalized low rank model: find low rank matrix X ∈ Rn×k and
W ∈ Rk×p such that XW approximates Y ∈ Rn×p well:
X n
X d
X
minimize `j Yij , xiT wj + ri (xi ) + r˜j (wj )
(i,j)∈Ω i=1 j=1
Motivation 9
Motivation
Motivation 10
Motivation
Motivation 10
Motivation
Motivation 10
Motivation
800
600
800
Counts
Counts
400
400
200
0
0
1 2 3 1 2 3 4 5 6 7 8 9 10 11 12
from left to right: Very happy to Not too happy from left to right: Less than $1000 to more than $25000
How many people in contact in a typical weekday Weeks r. worked last year
350
1000
250
Counts
Counts
600
150
200
0 50
1 2 3 4 5 0 3 6 11 15 20 24 28 34 38 42 46 50
Motivation 11
Motivation
ordinal x value 2
1
-∞ -1 0 1 ∞
normal z value
0.4
1200
0.3
800
Counts
0.2
y
400
0.1
0.0
0
1 2 3 −4 −2 0 2 4
0.4
0.3
250
Counts
0.2
y
150
0.1
0 50
0.0
1 2 3 4 5 −4 −2 0 2 4
Table of Contents
1 Motivation
3 Demo
xj = fj (zj ), j = 1, . . . , p
xj = fj (zj ), j = 1, . . . , p
? ?
? ?
x x
x x
Multiple imputation
I Sample z(i)
M from the above distribution for i = 1, .., m.
I map back to observed space x̂(i) (i)
M = fM (ẑM ) for i = 1, .., m.
Table of Contents
1 Motivation
3 Demo
Demo 24
Demo
I Python package
https://github.com/udellgroup/GaussianCopulaImp
I Single line installment: pip install GaussianCopulaImp
I More tutorials on multiple imputation, accelerating the algorithm for
large datasets, etc.
Demo 25