Moment-based estimation for hierarchical models in Apache Spark

Kyle Schmaus, Stitch Fix
Moment-Based Estimation
for Hierarchical Models in
Apache Spark
#DSSAIS17

Overview
• About Stitch Fix
• Hierarchical (Mixed Effects) Models
• Moment Based Estimation
• Spark Implementation & Application
2#DSSAIS17

Stitch Fix
7#DSSAIS17
Machine learning Human curation
Algorithmic
recommendations

Stitch Fix
• Algorithms Team: 80+ Data Scientists and Data
Engineers
• Data Integrates into every aspect of the
business
• Our Blog: multithreaded.stitchfix.com
8#DSSAIS17

Hierarchical Models Motivation
9#DSSAIS17
ℙ(sale) = ?

Hierarchical Models Motivation
10#DSSAIS17
Color
Cut
Size
Brand
Material
…

Shared Model
ℙ(sale) = 𝑔−1 𝑥1
𝑇
𝛽
11#DSSAIS17
ℙ(sale) = 𝑔−1 𝑥2
𝑇
𝛽
𝑔 𝑝 = log
𝑝
1 − 𝑝

Individual Model Per Group
12#DSSAIS17

Individual Model Per Group
13#DSSAIS17
ℙ(sale) = 𝑔−1 𝑋𝛽1
ℙ(sale) = 𝑔−1 𝑋𝛽2
𝑔 𝑝 = log
𝑝
1 − 𝑝

Hierarchical Models
14#DSSAIS17
• 𝔼[𝑌𝑖] = 𝑔−1 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖
• 𝛾𝑖 ~ 𝒩 0, Σ
• 𝑖 ∈ 1, … , 𝑀
• 𝛽, 𝛾𝑖, and Σ are unknown

Simulation Results
15#DSSAIS17
• 𝑖 ∈ 1, … , 𝑀 = 100
• 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖
• dim 𝛽 = dim 𝛾𝑖 = 11 × 1
• 𝛾𝑖 ~ 𝒩 0, Σ
• 𝜀𝑖 ~ 𝒩 0, 𝕀
• Σ ~ 𝒲−1 1
11
𝕀, 11

Software Implementations
16#DSSAIS17
lme4, nlme, mbest, …
statsmodels
MixedModels

Software Implementations
17#DSSAIS17
lme4, nlme, mbest, …
statsmodels
MixedModels
Probabilistic programming

Likelihood Based Methods
Expectation-Maximization, Variational Approximations, or
Likelihood Maximization require an 𝑶 𝑵𝒒 𝟐 initial cost, then
a series of iterations costing 𝑶 𝑴𝒒 𝟒 , where
• 𝑵 is the number of total observations
• 𝒒 the number of fixed and random effects
• 𝑴 the number of groups
18#DSSAIS17

Moment Based Methods
Using a moment-based approach laid out in Perry (2015)
and implemented in the mbest package, we can achieve
a non-iterative fit in 𝑶 𝑵𝒒 𝟐 + 𝑶 𝑴𝒒 𝟒 steps. This can be
trivially spread across 𝑲 processors.
19#DSSAIS17

Moment Based Methods
Using a moment-based approach laid out in Perry (2015)
and implemented in the mbest package, we can achieve
a non-iterative fit in 𝑶 𝑵𝒒 𝟐 + 𝑶 𝑴𝒒 𝟒 steps. This can be
trivially spread across 𝑲 processors.
This improvement in computational efficiency is paid for by
sacrificing some statistical efficiency.
20#DSSAIS17

Moment Based Setup
𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖 for 𝑖 ∈ {1, … , 𝑀}
𝛾𝑖 ∼ 𝒩(0, Σ)
𝜀𝑖 ∼ 𝒩(0, 𝜙𝕀)
𝛾𝑖 and 𝜀𝑖 independent
21#DSSAIS17

Moment Based Setup
• Define 𝐹𝑖 ≡ 𝑋𝑖 𝑍𝑖 and 𝜂𝑖
𝑇
≡ 𝛽 𝑇 𝛾𝑖
𝑇
• 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖 ⟹
• 𝑌𝑖 = 𝐹𝑖 𝜂𝑖 + 𝜀𝑖
• Estimate 𝜂𝑖 = 𝐹𝑖
𝑇
𝐹𝑖
†
𝐹𝑖
𝑇
𝑦𝑖, for each 𝑖 ∈ 1, … , 𝑀
• Note when 𝐹𝑖 is rank deficient, 𝜂𝑖 is not an
unbiased estimator
22#DSSAIS17

Moment Based Setup
• 𝐹𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖
𝑇
• 𝑋𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖1
𝑇
• 𝑍𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖2
𝑇
• ϕ = 𝜎2 =
1
𝑁−𝜌 𝑖=1
𝑀
𝑌𝑖 − 𝐹𝑖 𝜂𝑖
2
𝜌 ≡
𝑖=1
𝑀
𝑟𝑖
23#DSSAIS17

Estimate 𝛽
𝛽 𝑊 = Ω−1
𝑖=1
𝑀
𝑉𝑖1 𝑊𝑖 𝑉𝑖
𝑇
𝜂𝑖
is an unbiased estimator for 𝛽, for
Ω ≡
𝑖=1
𝑀
𝑉𝑖1 𝑊𝑖 𝑉𝑖1
𝑇
24#DSSAIS17

Estiamte 𝚺
Say we knew 𝛽, not just an estimate.
• 𝐴 ≡ 𝑖=1
𝑀
𝑉𝑖2 𝑊𝑖(𝑉𝑖
𝑇
𝜂𝑖 − 𝑉𝑖1
𝑇
𝛽) Vi
T
𝑇
𝛽
𝑇
𝑊𝑖 𝑉𝑖2
𝑇
• 𝐵 ≡ 𝑖=1
𝑀
𝑉𝑖2 𝑊𝑖 𝐷𝑖
−2
𝑊𝑖 𝑉𝑖2
𝑇
• Ω2 ≡ 𝑖=1
𝑀
𝑉𝑖2 𝑊𝑖 𝑉𝑖2
𝑇
⨂𝑉𝑖2 𝑊𝑖 𝑉𝑖2
𝑇
• vec{Σ 𝑊} = Ω2
−1
vec{𝐴 − 𝜙𝐵}
• Σ 𝑊 is an unbiased estimator for Σ
25#DSSAIS17

Estiamte 𝚺
• Σ 𝑊 is an unbiased estimator for Σ if you know 𝜷
a priori. We don’t …
• Instead, we use 𝛽. This mean Σ 𝑊 is not an
unbiased estimator in practice.
• It can be shown the bias is usually negligible.
• We can project Σ 𝑊 to be Positive Semidefinite
26#DSSAIS17

Estimate 𝜸𝒊| 𝜼𝒊
• Using a gaussian approximation for p( 𝜂𝑖|𝛾𝑖) and
p(𝛾𝑖), compute posterior distribution
– p(𝛾𝑖| 𝜂𝑖) ∝ p( 𝜂𝑖|𝛾𝑖) p(𝛾𝑖)
• There exists a formula for 𝐶𝑖(Σ) such that
– 𝔼(𝛾𝑖 𝑦 = 𝐶𝑖 𝑉𝑖2 (𝑉𝑖
𝑇
𝑇
𝛽)
– Var(𝛾𝑖 𝑦 = 𝜙𝑖 𝐶𝑖
27#DSSAIS17

That was a lot of math! Read
Fast moment-based estimation for hierarchical models
on arXiv, it’s a great paper!
28#DSSAIS17

Moment Based Estimation Summary
29#DSSAIS17
• Estimate 𝜂𝑖

30#DSSAIS17
• Use 𝜂𝑖 to estimate 𝜙

31#DSSAIS17
• Use 𝜂𝑖 to estimate 𝛽 𝑊

32#DSSAIS17
• Use 𝜂𝑖, 𝜙, 𝛽 𝑊 to estimate Σ 𝑊

33#DSSAIS17
• Use 𝜂𝑖, 𝜙, 𝛽 𝑊 to estimate Σ 𝑊
• Use 𝜂𝑖, 𝜙, 𝛽 𝑊, Σ 𝑊 to estimate 𝛾𝑖| 𝜂𝑖

Memory Constrained For Large N, M
• Current software is memory constrained on a
single machine
• My macbook pro (2015) starts to run out of
memory for
– M = 100,000
– N = 10,000,000
35#DSSAIS17

Spark Implementation
36#DSSAIS17

37#DSSAIS17

38#DSSAIS17

Application
Logistic Model with
• N = 120,000,000
• M = 4,000,000
• p = q = 7
• 8 Executors, 16gb memory each
• Running time approximately 40 minutes
39#DSSAIS17

Limitations & Next Steps
40#DSSAIS17

Thank You
github.com/stitchfix/MomentMixedModels
41#DSSAIS17

Moment-based estimation for hierarchical models in Apache Spark

Recommended

More Related Content

Similar to Moment-based estimation for hierarchical models in Apache Spark (20)

More from Stitch Fix Algorithms (11)

Recently uploaded (20)

Moment-based estimation for hierarchical models in Apache Spark

Editor's Notes