SlideShare a Scribd company logo
Kyle Schmaus, Stitch Fix
Moment-Based Estimation
for Hierarchical Models in
Apache Spark
#DSSAIS17
Overview
• About Stitch Fix
• Hierarchical (Mixed Effects) Models
• Moment Based Estimation
• Spark Implementation & Application
2#DSSAIS17
Stitch Fix
3#DSSAIS17
Stitch Fix
4#DSSAIS17
Stitch Fix
5#DSSAIS17
Stitch Fix
6#DSSAIS17
Stitch Fix
7#DSSAIS17
Machine learning Human curation
Algorithmic
recommendations
Stitch Fix
• Algorithms Team: 80+ Data Scientists and Data
Engineers
• Data Integrates into every aspect of the
business
• Our Blog: multithreaded.stitchfix.com
8#DSSAIS17
Hierarchical Models Motivation
9#DSSAIS17
ℙ(sale) = ?
Hierarchical Models Motivation
10#DSSAIS17
Color
Cut
Size
Brand
Material
…
Shared Model
ℙ(sale) = 𝑔−1 𝑥1
𝑇
𝛽
11#DSSAIS17
ℙ(sale) = 𝑔−1 𝑥2
𝑇
𝛽
𝑔 𝑝 = log
𝑝
1 − 𝑝
Individual Model Per Group
12#DSSAIS17
Individual Model Per Group
13#DSSAIS17
ℙ(sale) = 𝑔−1 𝑋𝛽1
ℙ(sale) = 𝑔−1 𝑋𝛽2
𝑔 𝑝 = log
𝑝
1 − 𝑝
Hierarchical Models
14#DSSAIS17
• 𝔼[𝑌𝑖] = 𝑔−1 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖
• 𝛾𝑖 ~ 𝒩 0, Σ
• 𝑖 ∈ 1, … , 𝑀
• 𝛽, 𝛾𝑖, and Σ are unknown
Simulation Results
15#DSSAIS17
• 𝑖 ∈ 1, … , 𝑀 = 100
• 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖
• dim 𝛽 = dim 𝛾𝑖 = 11 × 1
• 𝛾𝑖 ~ 𝒩 0, Σ
• 𝜀𝑖 ~ 𝒩 0, 𝕀
• Σ ~ 𝒲−1 1
11
𝕀, 11
Software Implementations
16#DSSAIS17
lme4, nlme, mbest, …
statsmodels
MixedModels
Software Implementations
17#DSSAIS17
lme4, nlme, mbest, …
statsmodels
MixedModels
Probabilistic programming
Likelihood Based Methods
Expectation-Maximization, Variational Approximations, or
Likelihood Maximization require an 𝑶 𝑵𝒒 𝟐 initial cost, then
a series of iterations costing 𝑶 𝑴𝒒 𝟒 , where
• 𝑵 is the number of total observations
• 𝒒 the number of fixed and random effects
• 𝑴 the number of groups
18#DSSAIS17
Moment Based Methods
Using a moment-based approach laid out in Perry (2015)
and implemented in the mbest package, we can achieve
a non-iterative fit in 𝑶 𝑵𝒒 𝟐 + 𝑶 𝑴𝒒 𝟒 steps. This can be
trivially spread across 𝑲 processors.
19#DSSAIS17
Moment Based Methods
Using a moment-based approach laid out in Perry (2015)
and implemented in the mbest package, we can achieve
a non-iterative fit in 𝑶 𝑵𝒒 𝟐 + 𝑶 𝑴𝒒 𝟒 steps. This can be
trivially spread across 𝑲 processors.
This improvement in computational efficiency is paid for by
sacrificing some statistical efficiency.
20#DSSAIS17
Moment Based Setup
𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖 for 𝑖 ∈ {1, … , 𝑀}
𝛾𝑖 ∼ 𝒩(0, Σ)
𝜀𝑖 ∼ 𝒩(0, 𝜙𝕀)
𝛾𝑖 and 𝜀𝑖 independent
21#DSSAIS17
Moment Based Setup
• Define 𝐹𝑖 ≡ 𝑋𝑖 𝑍𝑖 and 𝜂𝑖
𝑇
≡ 𝛽 𝑇 𝛾𝑖
𝑇
• 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖 ⟹
• 𝑌𝑖 = 𝐹𝑖 𝜂𝑖 + 𝜀𝑖
• Estimate 𝜂𝑖 = 𝐹𝑖
𝑇
𝐹𝑖
†
𝐹𝑖
𝑇
𝑦𝑖, for each 𝑖 ∈ 1, … , 𝑀
• Note when 𝐹𝑖 is rank deficient, 𝜂𝑖 is not an
unbiased estimator
22#DSSAIS17
Moment Based Setup
• 𝐹𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖
𝑇
• 𝑋𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖1
𝑇
• 𝑍𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖2
𝑇
• ϕ = 𝜎2 =
1
𝑁−𝜌 𝑖=1
𝑀
𝑌𝑖 − 𝐹𝑖 𝜂𝑖
2
𝜌 ≡
𝑖=1
𝑀
𝑟𝑖
23#DSSAIS17
Estimate 𝛽
𝛽 𝑊 = Ω−1
𝑖=1
𝑀
𝑉𝑖1 𝑊𝑖 𝑉𝑖
𝑇
𝜂𝑖
is an unbiased estimator for 𝛽, for
Ω ≡
𝑖=1
𝑀
𝑉𝑖1 𝑊𝑖 𝑉𝑖1
𝑇
24#DSSAIS17
Estiamte 𝚺
Say we knew 𝛽, not just an estimate.
• 𝐴 ≡ 𝑖=1
𝑀
𝑉𝑖2 𝑊𝑖(𝑉𝑖
𝑇
𝜂𝑖 − 𝑉𝑖1
𝑇
𝛽) Vi
T
𝜂𝑖 − 𝑉𝑖1
𝑇
𝛽
𝑇
𝑊𝑖 𝑉𝑖2
𝑇
• 𝐵 ≡ 𝑖=1
𝑀
𝑉𝑖2 𝑊𝑖 𝐷𝑖
−2
𝑊𝑖 𝑉𝑖2
𝑇
• Ω2 ≡ 𝑖=1
𝑀
𝑉𝑖2 𝑊𝑖 𝑉𝑖2
𝑇
⨂𝑉𝑖2 𝑊𝑖 𝑉𝑖2
𝑇
• vec{Σ 𝑊} = Ω2
−1
vec{𝐴 − 𝜙𝐵}
• Σ 𝑊 is an unbiased estimator for Σ
25#DSSAIS17
Estiamte 𝚺
• Σ 𝑊 is an unbiased estimator for Σ if you know 𝜷
a priori. We don’t …
• Instead, we use 𝛽. This mean Σ 𝑊 is not an
unbiased estimator in practice.
• It can be shown the bias is usually negligible.
• We can project Σ 𝑊 to be Positive Semidefinite
26#DSSAIS17
Estimate 𝜸𝒊| 𝜼𝒊
• Using a gaussian approximation for p( 𝜂𝑖|𝛾𝑖) and
p(𝛾𝑖), compute posterior distribution
– p(𝛾𝑖| 𝜂𝑖) ∝ p( 𝜂𝑖|𝛾𝑖) p(𝛾𝑖)
• There exists a formula for 𝐶𝑖(Σ) such that
– 𝔼(𝛾𝑖 𝑦 = 𝐶𝑖 𝑉𝑖2 (𝑉𝑖
𝑇
𝜂𝑖 − 𝑉𝑖1
𝑇
𝛽)
– Var(𝛾𝑖 𝑦 = 𝜙𝑖 𝐶𝑖
27#DSSAIS17
That was a lot of math! Read
Fast moment-based estimation for hierarchical models
on arXiv, it’s a great paper!
28#DSSAIS17
Moment Based Estimation Summary
29#DSSAIS17
• Estimate 𝜂𝑖
Moment Based Estimation Summary
30#DSSAIS17
• Estimate 𝜂𝑖
• Use 𝜂𝑖 to estimate 𝜙
Moment Based Estimation Summary
31#DSSAIS17
• Estimate 𝜂𝑖
• Use 𝜂𝑖 to estimate 𝜙
• Use 𝜂𝑖 to estimate 𝛽 𝑊
Moment Based Estimation Summary
32#DSSAIS17
• Estimate 𝜂𝑖
• Use 𝜂𝑖 to estimate 𝜙
• Use 𝜂𝑖 to estimate 𝛽 𝑊
• Use 𝜂𝑖, 𝜙, 𝛽 𝑊 to estimate Σ 𝑊
Moment Based Estimation Summary
33#DSSAIS17
• Estimate 𝜂𝑖
• Use 𝜂𝑖 to estimate 𝜙
• Use 𝜂𝑖 to estimate 𝛽 𝑊
• Use 𝜂𝑖, 𝜙, 𝛽 𝑊 to estimate Σ 𝑊
• Use 𝜂𝑖, 𝜙, 𝛽 𝑊, Σ 𝑊 to estimate 𝛾𝑖| 𝜂𝑖
lme4 vs mbest
34#DSSAIS17
Memory Constrained For Large N, M
• Current software is memory constrained on a
single machine
• My macbook pro (2015) starts to run out of
memory for
– M = 100,000
– N = 10,000,000
35#DSSAIS17
Spark Implementation
36#DSSAIS17
Spark Implementation
37#DSSAIS17
Spark Implementation
38#DSSAIS17
Application
Logistic Model with
• N = 120,000,000
• M = 4,000,000
• p = q = 7
• 8 Executors, 16gb memory each
• Running time approximately 40 minutes
39#DSSAIS17
Limitations & Next Steps
40#DSSAIS17
Thank You
github.com/stitchfix/MomentMixedModels
41#DSSAIS17
Ad

More Related Content

Similar to Moment-based estimation for hierarchical models in Apache Spark (20)

A presentation about machine learning and NN
A presentation about machine learning and NNA presentation about machine learning and NN
A presentation about machine learning and NN
cowsaysmoo1
 
From Genomics to NLP: One algorithm to rule them all
From Genomics to NLP: One algorithm to rule them allFrom Genomics to NLP: One algorithm to rule them all
From Genomics to NLP: One algorithm to rule them all
Santi Adavani
 
From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...
From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...
From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...
Databricks
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
JCMwave
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
Venkata Reddy Konasani
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
Pattern recognition UNIT 5
Pattern recognition UNIT 5Pattern recognition UNIT 5
Pattern recognition UNIT 5
Dr. SURBHI SAROHA
 
Advanced SQL for Analytics in Oracle
Advanced SQL for Analytics in OracleAdvanced SQL for Analytics in Oracle
Advanced SQL for Analytics in Oracle
Michelle Kolbe
 
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken SanfordMigrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Sri Ambati
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
QuantUniversity
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
JCMwave
 
Linear regression
Linear regressionLinear regression
Linear regression
ansrivas21
 
Predictive Testing
Predictive TestingPredictive Testing
Predictive Testing
Herminio Vazquez
 
STLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationSTLtalk about statistical analysis and its application
STLtalk about statistical analysis and its application
JulieDash5
 
Pydata presentation
Pydata presentationPydata presentation
Pydata presentation
Thomas Huijskens
 
MACHINE LEARNING.pptx
MACHINE LEARNING.pptxMACHINE LEARNING.pptx
MACHINE LEARNING.pptx
SOURAVGHOSH623569
 
[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin - SAP, dos, dos, race condi...
[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin -  SAP, dos, dos, race condi...[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin -  SAP, dos, dos, race condi...
[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin - SAP, dos, dos, race condi...
PROIDEA
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
SourajitMaity1
 
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdfWeek_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Aliker5
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明
Satoshi Hara
 
A presentation about machine learning and NN
A presentation about machine learning and NNA presentation about machine learning and NN
A presentation about machine learning and NN
cowsaysmoo1
 
From Genomics to NLP: One algorithm to rule them all
From Genomics to NLP: One algorithm to rule them allFrom Genomics to NLP: One algorithm to rule them all
From Genomics to NLP: One algorithm to rule them all
Santi Adavani
 
From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...
From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...
From Genomics to NLP – One Algorithm to Rule Them All with Santi Adavani and ...
Databricks
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
JCMwave
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
Fares Al-Qunaieer
 
Advanced SQL for Analytics in Oracle
Advanced SQL for Analytics in OracleAdvanced SQL for Analytics in Oracle
Advanced SQL for Analytics in Oracle
Michelle Kolbe
 
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken SanfordMigrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Sri Ambati
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
JCMwave
 
Linear regression
Linear regressionLinear regression
Linear regression
ansrivas21
 
STLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationSTLtalk about statistical analysis and its application
STLtalk about statistical analysis and its application
JulieDash5
 
[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin - SAP, dos, dos, race condi...
[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin -  SAP, dos, dos, race condi...[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin -  SAP, dos, dos, race condi...
[CONFidence 2016] Dmitry Chastuhin, Dmitry Yudin - SAP, dos, dos, race condi...
PROIDEA
 
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdfWeek_2_Neural_Networks_Basichhhhhhhs.pdf
Week_2_Neural_Networks_Basichhhhhhhs.pdf
Aliker5
 
機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明機械学習モデルの判断根拠の説明
機械学習モデルの判断根拠の説明
Satoshi Hara
 

More from Stitch Fix Algorithms (11)

Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test Velocity
Stitch Fix Algorithms
 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorch
Stitch Fix Algorithms
 
Tracking data lineage at Stitch Fix
Tracking data lineage at Stitch FixTracking data lineage at Stitch Fix
Tracking data lineage at Stitch Fix
Stitch Fix Algorithms
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
Stitch Fix Algorithms
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
Stitch Fix Algorithms
 
Production model deployment
Production model deploymentProduction model deployment
Production model deployment
Stitch Fix Algorithms
 
Optimizing Spark
Optimizing SparkOptimizing Spark
Optimizing Spark
Stitch Fix Algorithms
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
Stitch Fix Algorithms
 
Incrementality
IncrementalityIncrementality
Incrementality
Stitch Fix Algorithms
 
Apache Spark & ML Workflows
Apache Spark & ML WorkflowsApache Spark & ML Workflows
Apache Spark & ML Workflows
Stitch Fix Algorithms
 
Enabling full stack data scientists
Enabling full stack data scientistsEnabling full stack data scientists
Enabling full stack data scientists
Stitch Fix Algorithms
 
Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test Velocity
Stitch Fix Algorithms
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
Stitch Fix Algorithms
 
A compute infrastructure for data scientists
A compute infrastructure for data scientistsA compute infrastructure for data scientists
A compute infrastructure for data scientists
Stitch Fix Algorithms
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
Stitch Fix Algorithms
 
Ad

Recently uploaded (20)

Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Ad

Moment-based estimation for hierarchical models in Apache Spark

  • 1. Kyle Schmaus, Stitch Fix Moment-Based Estimation for Hierarchical Models in Apache Spark #DSSAIS17
  • 2. Overview • About Stitch Fix • Hierarchical (Mixed Effects) Models • Moment Based Estimation • Spark Implementation & Application 2#DSSAIS17
  • 7. Stitch Fix 7#DSSAIS17 Machine learning Human curation Algorithmic recommendations
  • 8. Stitch Fix • Algorithms Team: 80+ Data Scientists and Data Engineers • Data Integrates into every aspect of the business • Our Blog: multithreaded.stitchfix.com 8#DSSAIS17
  • 11. Shared Model ℙ(sale) = 𝑔−1 𝑥1 𝑇 𝛽 11#DSSAIS17 ℙ(sale) = 𝑔−1 𝑥2 𝑇 𝛽 𝑔 𝑝 = log 𝑝 1 − 𝑝
  • 12. Individual Model Per Group 12#DSSAIS17
  • 13. Individual Model Per Group 13#DSSAIS17 ℙ(sale) = 𝑔−1 𝑋𝛽1 ℙ(sale) = 𝑔−1 𝑋𝛽2 𝑔 𝑝 = log 𝑝 1 − 𝑝
  • 14. Hierarchical Models 14#DSSAIS17 • 𝔼[𝑌𝑖] = 𝑔−1 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 • 𝛾𝑖 ~ 𝒩 0, Σ • 𝑖 ∈ 1, … , 𝑀 • 𝛽, 𝛾𝑖, and Σ are unknown
  • 15. Simulation Results 15#DSSAIS17 • 𝑖 ∈ 1, … , 𝑀 = 100 • 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖 • dim 𝛽 = dim 𝛾𝑖 = 11 × 1 • 𝛾𝑖 ~ 𝒩 0, Σ • 𝜀𝑖 ~ 𝒩 0, 𝕀 • Σ ~ 𝒲−1 1 11 𝕀, 11
  • 16. Software Implementations 16#DSSAIS17 lme4, nlme, mbest, … statsmodels MixedModels
  • 17. Software Implementations 17#DSSAIS17 lme4, nlme, mbest, … statsmodels MixedModels Probabilistic programming
  • 18. Likelihood Based Methods Expectation-Maximization, Variational Approximations, or Likelihood Maximization require an 𝑶 𝑵𝒒 𝟐 initial cost, then a series of iterations costing 𝑶 𝑴𝒒 𝟒 , where • 𝑵 is the number of total observations • 𝒒 the number of fixed and random effects • 𝑴 the number of groups 18#DSSAIS17
  • 19. Moment Based Methods Using a moment-based approach laid out in Perry (2015) and implemented in the mbest package, we can achieve a non-iterative fit in 𝑶 𝑵𝒒 𝟐 + 𝑶 𝑴𝒒 𝟒 steps. This can be trivially spread across 𝑲 processors. 19#DSSAIS17
  • 20. Moment Based Methods Using a moment-based approach laid out in Perry (2015) and implemented in the mbest package, we can achieve a non-iterative fit in 𝑶 𝑵𝒒 𝟐 + 𝑶 𝑴𝒒 𝟒 steps. This can be trivially spread across 𝑲 processors. This improvement in computational efficiency is paid for by sacrificing some statistical efficiency. 20#DSSAIS17
  • 21. Moment Based Setup 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖 for 𝑖 ∈ {1, … , 𝑀} 𝛾𝑖 ∼ 𝒩(0, Σ) 𝜀𝑖 ∼ 𝒩(0, 𝜙𝕀) 𝛾𝑖 and 𝜀𝑖 independent 21#DSSAIS17
  • 22. Moment Based Setup • Define 𝐹𝑖 ≡ 𝑋𝑖 𝑍𝑖 and 𝜂𝑖 𝑇 ≡ 𝛽 𝑇 𝛾𝑖 𝑇 • 𝑌𝑖 = 𝑋𝑖 𝛽 + 𝑍𝑖 𝛾𝑖 + 𝜀𝑖 ⟹ • 𝑌𝑖 = 𝐹𝑖 𝜂𝑖 + 𝜀𝑖 • Estimate 𝜂𝑖 = 𝐹𝑖 𝑇 𝐹𝑖 † 𝐹𝑖 𝑇 𝑦𝑖, for each 𝑖 ∈ 1, … , 𝑀 • Note when 𝐹𝑖 is rank deficient, 𝜂𝑖 is not an unbiased estimator 22#DSSAIS17
  • 23. Moment Based Setup • 𝐹𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖 𝑇 • 𝑋𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖1 𝑇 • 𝑍𝑖 = 𝑈𝑖 𝐷𝑖 𝑉𝑖2 𝑇 • ϕ = 𝜎2 = 1 𝑁−𝜌 𝑖=1 𝑀 𝑌𝑖 − 𝐹𝑖 𝜂𝑖 2 𝜌 ≡ 𝑖=1 𝑀 𝑟𝑖 23#DSSAIS17
  • 24. Estimate 𝛽 𝛽 𝑊 = Ω−1 𝑖=1 𝑀 𝑉𝑖1 𝑊𝑖 𝑉𝑖 𝑇 𝜂𝑖 is an unbiased estimator for 𝛽, for Ω ≡ 𝑖=1 𝑀 𝑉𝑖1 𝑊𝑖 𝑉𝑖1 𝑇 24#DSSAIS17
  • 25. Estiamte 𝚺 Say we knew 𝛽, not just an estimate. • 𝐴 ≡ 𝑖=1 𝑀 𝑉𝑖2 𝑊𝑖(𝑉𝑖 𝑇 𝜂𝑖 − 𝑉𝑖1 𝑇 𝛽) Vi T 𝜂𝑖 − 𝑉𝑖1 𝑇 𝛽 𝑇 𝑊𝑖 𝑉𝑖2 𝑇 • 𝐵 ≡ 𝑖=1 𝑀 𝑉𝑖2 𝑊𝑖 𝐷𝑖 −2 𝑊𝑖 𝑉𝑖2 𝑇 • Ω2 ≡ 𝑖=1 𝑀 𝑉𝑖2 𝑊𝑖 𝑉𝑖2 𝑇 ⨂𝑉𝑖2 𝑊𝑖 𝑉𝑖2 𝑇 • vec{Σ 𝑊} = Ω2 −1 vec{𝐴 − 𝜙𝐵} • Σ 𝑊 is an unbiased estimator for Σ 25#DSSAIS17
  • 26. Estiamte 𝚺 • Σ 𝑊 is an unbiased estimator for Σ if you know 𝜷 a priori. We don’t … • Instead, we use 𝛽. This mean Σ 𝑊 is not an unbiased estimator in practice. • It can be shown the bias is usually negligible. • We can project Σ 𝑊 to be Positive Semidefinite 26#DSSAIS17
  • 27. Estimate 𝜸𝒊| 𝜼𝒊 • Using a gaussian approximation for p( 𝜂𝑖|𝛾𝑖) and p(𝛾𝑖), compute posterior distribution – p(𝛾𝑖| 𝜂𝑖) ∝ p( 𝜂𝑖|𝛾𝑖) p(𝛾𝑖) • There exists a formula for 𝐶𝑖(Σ) such that – 𝔼(𝛾𝑖 𝑦 = 𝐶𝑖 𝑉𝑖2 (𝑉𝑖 𝑇 𝜂𝑖 − 𝑉𝑖1 𝑇 𝛽) – Var(𝛾𝑖 𝑦 = 𝜙𝑖 𝐶𝑖 27#DSSAIS17
  • 28. That was a lot of math! Read Fast moment-based estimation for hierarchical models on arXiv, it’s a great paper! 28#DSSAIS17
  • 29. Moment Based Estimation Summary 29#DSSAIS17 • Estimate 𝜂𝑖
  • 30. Moment Based Estimation Summary 30#DSSAIS17 • Estimate 𝜂𝑖 • Use 𝜂𝑖 to estimate 𝜙
  • 31. Moment Based Estimation Summary 31#DSSAIS17 • Estimate 𝜂𝑖 • Use 𝜂𝑖 to estimate 𝜙 • Use 𝜂𝑖 to estimate 𝛽 𝑊
  • 32. Moment Based Estimation Summary 32#DSSAIS17 • Estimate 𝜂𝑖 • Use 𝜂𝑖 to estimate 𝜙 • Use 𝜂𝑖 to estimate 𝛽 𝑊 • Use 𝜂𝑖, 𝜙, 𝛽 𝑊 to estimate Σ 𝑊
  • 33. Moment Based Estimation Summary 33#DSSAIS17 • Estimate 𝜂𝑖 • Use 𝜂𝑖 to estimate 𝜙 • Use 𝜂𝑖 to estimate 𝛽 𝑊 • Use 𝜂𝑖, 𝜙, 𝛽 𝑊 to estimate Σ 𝑊 • Use 𝜂𝑖, 𝜙, 𝛽 𝑊, Σ 𝑊 to estimate 𝛾𝑖| 𝜂𝑖
  • 35. Memory Constrained For Large N, M • Current software is memory constrained on a single machine • My macbook pro (2015) starts to run out of memory for – M = 100,000 – N = 10,000,000 35#DSSAIS17
  • 39. Application Logistic Model with • N = 120,000,000 • M = 4,000,000 • p = q = 7 • 8 Executors, 16gb memory each • Running time approximately 40 minutes 39#DSSAIS17
  • 40. Limitations & Next Steps 40#DSSAIS17

Editor's Notes

  • #2: Hello, thank you all for attending my talk. Thank you Spark Summit for accepting this talk, I'm very happy to be here. My name is Kyle Schmaus, and I’m a data scientist at Stitch Fix. Today I’m going to be speaking about methods and software for fitting hierarchical models using moment-based estimation techniques.
  • #3: Here is a brief overview of my talk. First, I am going to start by telling you a little about Stitch Fix, the company I work for. Then I’ll introduce hierarchical models (or mixed effects models as they are often called). I'm not going to assume any past experience with this model type. Of course if you've have utilized these models before, I'm happy you're at this talk. Next, I'm going to discuss the relative benefits of fitting these models using moment based estimation techniques. I'll close, with a brief survey of the software designed to fit these models, including the spark package we recently open sourced at Stitch Fix. A bit of a warning, this is definietely going to be a data science and methods heavy talk. But, in learning about the methods, you'll see why the Spark engine is the perfect peice of technology to impliment them.
  • #4: Stitch Fix is an online styling service that delivers a personalized shopping experience for our customers, using a combination of machine learning and expert human judgement.
  • #5: When a customers signs up, they fill out a style profile providing us with information about their fashion tastes, needs, and budget.
  • #6: We leverage this information, using algorithms and human judgement, to select a combination of 5 items to send to the client.
  • #7: Clients recieve a shipment in the mail (we call it a fix), and can try on their new clothes in the privacy of their own home. They can keep what they want, and send back what they don’t. When checking out, clients provide additional information about their fix, telling us what they liked, how things fit, and how we can improve in the future. So with every interaction, we learn a bit more about them and our inventory.
  • #8: I work on the styling recommendations team, where we are responsible for the companies core item-to-user recomendations algorithms. We use the output of a variety of models to inform and power the software utilized by our expert human stylists. They in-turn, curate the results of our algorithms, ultimately making the final selection decisions on what to send to our clients. We believe this "human in the loop" model for recommendation systems is more powerful and provides a better experience, than an 100% AI or 100% human powered alternative.
  • #9: The Algorithms team at Stitch Fix is some 80 strong, including Data Scientists and Data Engeineers working on a variety of capabilities core to the business. If you're interested in learning more, and I hope you are, please check out our blog at multithreaded.stitchfix.com. (Hopefully you like the pun in the blog name).
  • #10: I want to motivate the use of hierachical models with a simple example. I’ll use a Stitch Fix use case, where we want to predict the probability of selling an item to a client conditioned on sending it. Obviously there are a variety of approaches to this problem, almost any supervised learning method is arguably appropriate. For simplicities sake, let's imagine we want to use a logistic regression model, where our only source of data is the descriptive attributes of each item.
  • #11: For instance, we can differentiate items by their item type, their size, the material they are constructed from. We can analyze how they fit, and how they are cut. We can utilize their image information, including their color and pattern, and so on.
  • #12: The important idea is that, for each item in our catalog, we can develop a fixed attribute vector representing it. Using historical sales logs, we can create a design matrix for each previous item sent, where each row represents that’s item’s attribute vector. We can also create a response vector of 1s and 0s, representing sale and non-sale events. From there, we can use any logistic regression software, including the implimentation Spark MLLib, to learn feature weights. This approach is quite simple, and obuviously open to criticism. As a data scienctist, my main criticism would be that it's not personalized to individual clients. All of our input data, and corresponding weights, are tied to item information, so predictions from one user to the next will be identical.
  • #13: This is obviously a problem. In any recommender system, users will have different tastes, different preferences, and different needs. Our models have to be able to take these differences into account.
  • #14: So here's a modification to our first approach that can capture client variation. Instead of fitting one global model for all our clients, we could fit several models, once per cluster of clients, potentially even once per individual client. In the set up I'm imagining, the models are identically parameterized, meaning they are fit using the same features. All we have to do is partition our data, and call the model fitting function on each partition. This certainly has the capability to learn individual user preference, but it’s potentially very inefficient. If one of our feature weights does not vary by client, this approach requires us to noisily relearn it for each model. Also, for users or clusters of users without much data, their individual model will necesarily be imprecise.
  • #15: So I've presented two more or less bad approaches to this modeling problem, this next one is a bit more useful. A hierarchical model is a natural, data driven middle ground between using a global, shared model, and a collection of individually fit models, one per data group. For a generalized linear mixed effects model, we have M grouping levels, representing some natural partitioning of our data. We fit a model with two types of coefficients vectors. The beta coefficient vector is shared across each grouping level, while the gamma vectors are distinct for each level. The gamma coefficients are assumed to i.i.d. drawn from a multivariate gaussian, 0 centered with covariance matrix Sigma. <Talk about the model on the slide, similar to both approaches.> <Talk about Z>. <Talk about fixed and random effects> This is the simplest sort of hierarchical models, we can generalized to multiple partitions of our data, either nested parittions or arbitrary partitions. Importantly, Sigma is not a tuning parameter – we do not set it ahead of time. Instead, we learn an estimate of it, alongside beta and the gammas. This approach is similar to ridge regression. In at least one way it is supperior to a standard ridge model, as we can different strengths of regularization for different features. The cost is the model is not as general as a ridge model.
  • #16: Here is a toy simulation comparing the three methods we discussed. I’ve switched to a linear model instead of a logistic, but the conclusions should be similar. For 100 groups, I've randomly generated a design matrix of 10 columns with unit interval distribution, plus an intercept column. I've generated random beta, gamma, and Sigma coefficeints as well. we compare the performance of the three approaches I've introduced, using Root Mean Squared Error on a held out test dataset. We repeat this process several times, varying the number of observations per grouping level. <discuss plot, log scale horizontal axis> As you can see, the shared model performance is essentially constant with respect the number of observations. There are only 11 parameters in the share model, so even with 10 observations per group, we have 1000 observations in total, which is plenty for a simple model. The individually fit models do much worse for extremely low numbers of observation per group, but do substantially better as we increase the number of observations The hierarchical model dominates the performance of both approaches. It's outputs will interpolate those of the other two models - for groups with little data we will provide outputs close to the global average, and for groups with ample observations, we will converge to the same result as an individuall fit model. For out toy setup, once we have about 1000 observations per group, there really is no difference between the individual model and the hierarchical model. As number of observations increases, this will be true in general, but the "point of convergence" is problem dependent. ...
  • #17: There are a variety of software implimentations avaliable for hierarhical models, here are a few that I've used. Probably the most commmonly used implimentation is the R package lme4. There are also implimentations in python and julia. The R package mbest is relevant for us today, because it impliments the moment based estimation techniques we'll discuss shortly. If you aren't working on #BigData problems, this is the implimentation I suggest.
  • #18: Finally, you can fit just about any hierarhical model in your probabilistic programming language of choice.
  • #19: The most common method for fitting hierarchical models is some form of likelihood maximization. When it comes computational complexity, these approaches requires an initial cost on the order of O(Nq^2), and then a series of iterations on the order of O(Mq^4), WHERE N = ... For industrial scale data, this can be slow, sometimes prohibitively slow. It's specifically the iterative step in the algorithm that can make this approach impractical for larger data sets. Interesting models can take hours or days to fit
  • #20: Recent develpments have produced moment-based approaches, that do not require an iterative step. They still require a Nq^2 step, but then only have one Mq4 step. Once more, this can be trivially spread across K processors.
  • #21: This improvement in computational efficiency is paid for by sacrificing some statistical efficienty. In practice, the degredation in model accuracy is negligible to non-existant. Once more, The moment based procedure is asymptotically consistent, and has some well understood theoretical guarentees.
  • #22: So I warned you at the beginning of the talk - here's where I introduce a little math. I couldn't help myself, hopefully my fellow data scientists in the crowd will follow. I encourage you all to read the paper I'll reference after this talk – it’s really a great read. That said, I wanted to share the basic approach, for one, because it’s not too complicated, and for two, I think it’s just pretty cool. I’m going to focus only on linear models here, but the approach extends to any generalized linear model we’d be interested in fitting. <describe model> so again, we have M grouping levels ...
  • #24: Next we perform a compact singular value decomposition of the design matix. D_i will have dimension equal to F_i's rank. We can V_i1 and V_i2 as the first p and the last qu columns of V_i, where p is the length of beta and q the length of gamma. using this set up, an unbiased estimator of the disperions of our model is defined as such. No guarnettees it's the best unbiased estimator, but it is unbiased.
  • #26: each eta is a combination of beta and a gamma, so in the center we have a covariance like term where we are subtracting a scaled beta from a scaled eta -- the results should be close to our scaled gamma.
  • #28: Speaker Notes: Adlib.
  • #29: No iterative steps
  • #31: collect
  • #32: Speaker Notes: Adlib
  • #33: Speaker Notes: Adlib
  • #34: broadcast phi, beta, Sigma
  • #35: We compare the performance of MLE vs MOM where our reference points are two R packges. Lme4, which is written in R and C++, and mbest, which is 100% R. We use the simulation problem we explored earlier. While lme4 does outperform the mbest package, the difference is extremely negligible. The time difference is staggering though. In practical problems, the speed up can easily be x100 or greater.
  • #36: Speaker Notes: In practice, computation is usually not the bottleneck for the the moment based approach. Instead it’s memory. I don’t have the big 0 notation for the memory … Adlib
  • #37: So we ported the algorithm to Spark! As I described in the methods description, the algorithmic approach works very nicely in Spark.
  • #38: So we ported the algorithm to Spark! As I described in the methods description, the algorithmic approach works very nicely in Spark.
  • #41: - Only Linear and Logistic Regression. We will be adding a standard distribution families - One level - nested levels should come. - Model instrospection fairly barebones, no methods for generating credible intervals, etc. Mixed models are useful in ab testing, so I'm hoping to add this functionality soon. s