SlideShare a Scribd company logo
Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression
1. How do  cost   and  fuel type   affect pizza quality? 2. How do those factors vary by  neighborhood ?
Linear Regression (OLS) rating i  =  β 0  +  β price *price i  +  ε i   find  β’s  to minimize Σ ε i 2  
Linear Regression (OLS) rating i  =  β p  +  β price *price i  +  ε i   find  β ’s to minimize Σ ε i 2  
Multiple Regression rating i  = beta[intercept] * 1 +                beta[price] * price i  +                beta[oven=wood] * I(oven i =wood) +                beta[oven=coal] * I(oven i =coal) +                error i Goal: find betas/coefficients that minimize Σ i  error i 2 3 types of oven =  2 coefficients (gas is reference)
Multiple Regression (OLS) rating i  =  β 0  +  β price *price i  +    β wood *I(oven i  = "wood") +   β coal *I(oven i  = "coal") +         ε i   find  β’ s to minimize Σ ε i 2    
Multiple Regression (OLS) with Interactions rating i  =  β 0  +  β price *price i  +    β wood *I(oven i  = "wood") + β wood,price *price i * I(oven i  = "wood") +     β coal *I(oven i  = "coal") +         β coal,price *price i *          I(oven i  = "coal") +    ε i    
Groups Examples: teachers / test scores states / poll results  pizza ratings /  neighborhoods
Full Pooling (ignore groups) Examples: teachers / test scores states / poll results  pizza ratings /                  neighborhoods rating i  =  β 0  +  β price *price i  +   ε i
No Pooling (groups as factors) rating i  =  β 0  +  β price *price i  +    β B *I(group i  = " B ") +      β B ,price *price i * I(group i  = " B ") +         β C *I(group i  = " C ") +       β C ,price *price i * I(group i  = " C ") +             ε i      
Pizzas Name Rating $/Slice Fuel Type Neighborhood Rosario’s 3.5 2.00 Gas Lower East Side Ray’s 2.8 2.50 Gas Chinatown Joe’s 3.3 1.75 Wood East Village Pomodoro 3.8 3.50 Coal SoHo Response Continuous Categorical Group
Data Summary in R > za.df <- read.csv(&quot;Fake Pizza Data.csv&quot;)  > summary(za.df)      Rating       CostPerSlice   HeatSource      Neighborhood  Min.   :0.030   Min.   :1.250   Coal: 17   Chinatown  :14     1st Qu.:1.445   1st Qu.:2.000   Gas :158   EVillage   :48     Median :4.020   Median :2.500   Wood: 25   LES        :35     Mean   :3.222   Mean   :2.584              LittleItaly:43     3rd Qu.:4.843   3rd Qu.:3.250              SoHo       :60     Max.   :5.000   Max.   :5.250                                http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza
Viewing the Data in R > plot(za.df)  
Visualize ggplot(za.df, aes(CostPerSlice, Rating,      color=HeatSource)) +  geom_point() + facet_wrap(~ Neighborhood) +  geom_smooth(aes(color=NULL),      color='black', method='lm',       se=FALSE, size=2)
> lm.full.main <- lm(Rating ~ CostPerSlice + HeatSource, data=za.df) > plotCoef(lm.full.main) Multiple Regression in R http://www.jaredlander.com/code/plotCoef.r
Full-Pooling: Include Interaction > lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df) > plotCoef(lm.full.int)                                
Visualize the Fit (Full-Pooling) > lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df)                                
 
No Pooling Model lm(Rating ~ CostPerSlice * Neighborhood +  HeatSource,data=za.df)  
Visualize the Fit (No-Pooling) lm(Rating ~ CostPerSlice * Neighborhood + HeatSource,data=za.df)
Evaluation of Fitted Model Cross-Validation Error Adjusted-R 2 AIC BIC RSS Tests for Normal Residuals
Use Natural Groupings Cluster Sampling Intercluster Differences Intracluster Similarities
Multilevel Characteristics Model gravitates toward big groups Small groups gravitate toward the model   Best when groups are similar to each other    y_i = Intercept_j[i] + Slope_j[i] + noise Intercept[j] = Intercept_alpha + Slope_alpha + noise Slope[j] = Intercept_beta + Slope_beta + noise Model the effects of the groups
Multi-Names for Multilevel Models Multilevel Hierarchical Mixed-Effects Bayesian Partial-Pooling
Multi-Names for Multilevel Models (1) Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts a_i and fixed slope b corresponds to parallel lines for different individuals i, or the model y_it = a_i + b t. Kreft and De Leeuw (1998) thus distinguish between fixed and random coefficients. (2) Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth. (3) &quot;When a sample exhausts the population, the corresponding variable is fixed ; when the sample is a small (i.e., negligible) part of the population the corresponding variable is  random .&quot; (Green and Tukey, 1960) (4) &quot;If an effect is assumed to be a realized value of a random variable, it is called a random effect.&quot; (LaMotte, 1983) (5) Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage (&quot;linear unbiased prediction&quot; in the terminology of Robinson, 1991). This definition is standard in the multilevel modeling literature (see, for example, Snijders and Bosker, 1999, Section 4.2) and in econometrics. http://www.stat.columbia.edu/~cook/movabletype/archives/2005/01/why_i_dont_use.html
Bayesian Interpretation Everything has a distribution (including the groups) Group-level model is  prior information  for the individual-level coefficients Group-level model has an assumed-normal prior (Can fit multilevel models with Bayesian methods, or with simpler/faster/easier approximations.)
R Options lme4::lmer() nlme::lme() MCMCglmm() BUGS Others/niche approaches…
Back to the Pizza Model the overall pattern among neighborhoods Natural clustering of pizzerias in neighborhoods adds information Neighborhoods with many/few pizzerias Many:  trust data , ala no-pooling model Few:  trust overall patterns , ala full-pooling model
Back to the Pizza Use Neighborhoods as natural grouping  
5 slope coefficients and 5 intercept coefficients, one of each per neighborhood Slopes/intercepts are assumed to have Gaussian distribution  Ideally, could describe all 5 slopes with 2 numbers (mean/variance) Neighborhoods with little data don’t get freedom to set their own coefficients – get pulled towards overall slope or intercept Multilevel Pizza
R syntax lm.me.cost2 <- lmer(Rating ~ HeatSource +  (1+CostPerSlice | Neighborhood), data=za.df)
Results (Partial-Pooling) lm.me.cost2 <- lmer(Rating ~ HeatSource +  (1+CostPerSlice | Neighborhood), data=za.df)
Predicting a New Pizzeria Neighborhood:  Chinatown Cost:  $4.20 Fuel:  Wood
Uncertainty in Prediction Fitted coefficients are uncertain arm::sim() Model error term rnorm(1, model matrix %*% sim$Neighborhood[ , ‘Chinatown’, ], variance) New neighborhood – model possible coefficients mvrnorm(1, 0, VarCorr(model)$Neighborhood) http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza
Red State Blue State Other Examples
Tobacco Usage Other Examples
Diabetes Prevalence Other Examples
Insufficient Fruit and Vegetable Intake Other Examples
Clean Drinking Water Other Examples
Full-Pooling Model No-Pooling Model Separate Models Two–Step Analysis Steps to Multilevel Models
As few as one or two groups Even two observations per group Can have many groups with just one observation How Many Groups?  How Many Observations?
Andy Gelman:  “The Blessing of Dimensionality” More Data    Add Complexity Because you can Larger Datasets
Resources Gelman and Hill (ARM) Pineiro & Bates Snijders and Bosker R-SIG-Mixed-Models (http://glmm.wikidot.com/faq) (SAS/SPSS)   
Thanks!
Ad

More Related Content

What's hot (20)

partial fractions calculus integration
partial fractions calculus integrationpartial fractions calculus integration
partial fractions calculus integration
student
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems
Ceni Babaoglu, PhD
 
My Lecture Notes from Linear Algebra
My Lecture Notes fromLinear AlgebraMy Lecture Notes fromLinear Algebra
My Lecture Notes from Linear Algebra
Paul R. Martin
 
Linear regression
Linear regressionLinear regression
Linear regression
vermaumeshverma
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
Ahmed Othman
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
naveen kumar
 
Linear regression
Linear regressionLinear regression
Linear regression
Karishma Chaudhary
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
Akhilesh Joshi
 
Divisibility
DivisibilityDivisibility
Divisibility
mstf mstf
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
SHIMI S L
 
KNN
KNN KNN
KNN
West Virginia University
 
Non- Parametric Tests
Non- Parametric TestsNon- Parametric Tests
Non- Parametric Tests
Parag Shah
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
Vishal Patel
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
DrZahid Khan
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Linear regression
Linear regressionLinear regression
Linear regression
SreerajVA
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 
Linear regression
Linear regression Linear regression
Linear regression
Vani011
 
Linear regression theory
Linear regression theoryLinear regression theory
Linear regression theory
Saurav Mukherjee
 
F Distribution
F  DistributionF  Distribution
F Distribution
jravish
 
partial fractions calculus integration
partial fractions calculus integrationpartial fractions calculus integration
partial fractions calculus integration
student
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems
Ceni Babaoglu, PhD
 
My Lecture Notes from Linear Algebra
My Lecture Notes fromLinear AlgebraMy Lecture Notes fromLinear Algebra
My Lecture Notes from Linear Algebra
Paul R. Martin
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
Ahmed Othman
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
naveen kumar
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
Akhilesh Joshi
 
Divisibility
DivisibilityDivisibility
Divisibility
mstf mstf
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
SHIMI S L
 
Non- Parametric Tests
Non- Parametric TestsNon- Parametric Tests
Non- Parametric Tests
Parag Shah
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
Vishal Patel
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
DrZahid Khan
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Linear regression
Linear regressionLinear regression
Linear regression
SreerajVA
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 
Linear regression
Linear regression Linear regression
Linear regression
Vani011
 
F Distribution
F  DistributionF  Distribution
F Distribution
jravish
 

Similar to An Introduction to Multilevel Regression Modeling for Prediction (8)

Noise pollution suvarnabhumi airport
Noise pollution suvarnabhumi airportNoise pollution suvarnabhumi airport
Noise pollution suvarnabhumi airport
patnut99
 
Higher Order Procedures (in Ruby)
Higher Order Procedures (in Ruby)Higher Order Procedures (in Ruby)
Higher Order Procedures (in Ruby)
Nate Murray
 
(Machine Learning) Clustering & Classifying Houses in King County, WA
(Machine Learning) Clustering & Classifying Houses in King County, WA(Machine Learning) Clustering & Classifying Houses in King County, WA
(Machine Learning) Clustering & Classifying Houses in King County, WA
Mohammed Al Hamadi
 
Normalization
NormalizationNormalization
Normalization
Burhan Ahmed
 
Data type2 c
Data type2 cData type2 c
Data type2 c
thirumalaikumar3
 
CS253: Priority queues (2019)
CS253: Priority queues (2019)CS253: Priority queues (2019)
CS253: Priority queues (2019)
Jinho Choi
 
Document Classification In PHP
Document Classification In PHPDocument Classification In PHP
Document Classification In PHP
Ian Barber
 
Perry’s Chemical Engineers’ Handbook 7ma Ed Chap 01
Perry’s Chemical Engineers’ Handbook  7ma Ed Chap 01Perry’s Chemical Engineers’ Handbook  7ma Ed Chap 01
Perry’s Chemical Engineers’ Handbook 7ma Ed Chap 01
Grey Enterprise Holdings, Inc.
 
Noise pollution suvarnabhumi airport
Noise pollution suvarnabhumi airportNoise pollution suvarnabhumi airport
Noise pollution suvarnabhumi airport
patnut99
 
Higher Order Procedures (in Ruby)
Higher Order Procedures (in Ruby)Higher Order Procedures (in Ruby)
Higher Order Procedures (in Ruby)
Nate Murray
 
(Machine Learning) Clustering & Classifying Houses in King County, WA
(Machine Learning) Clustering & Classifying Houses in King County, WA(Machine Learning) Clustering & Classifying Houses in King County, WA
(Machine Learning) Clustering & Classifying Houses in King County, WA
Mohammed Al Hamadi
 
CS253: Priority queues (2019)
CS253: Priority queues (2019)CS253: Priority queues (2019)
CS253: Priority queues (2019)
Jinho Choi
 
Document Classification In PHP
Document Classification In PHPDocument Classification In PHP
Document Classification In PHP
Ian Barber
 
Perry’s Chemical Engineers’ Handbook 7ma Ed Chap 01
Perry’s Chemical Engineers’ Handbook  7ma Ed Chap 01Perry’s Chemical Engineers’ Handbook  7ma Ed Chap 01
Perry’s Chemical Engineers’ Handbook 7ma Ed Chap 01
Grey Enterprise Holdings, Inc.
 
Ad

More from NYC Predictive Analytics (10)

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive Models
NYC Predictive Analytics
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
NYC Predictive Analytics
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System Competition
NYC Predictive Analytics
 
R package Recommendation Engine
R package Recommendation EngineR package Recommendation Engine
R package Recommendation Engine
NYC Predictive Analytics
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive Analytics
NYC Predictive Analytics
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive Change
NYC Predictive Analytics
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
NYC Predictive Analytics
 
Recommendation Engine Demystified
Recommendation Engine DemystifiedRecommendation Engine Demystified
Recommendation Engine Demystified
NYC Predictive Analytics
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
NYC Predictive Analytics
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive Models
NYC Predictive Analytics
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
NYC Predictive Analytics
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System Competition
NYC Predictive Analytics
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive Analytics
NYC Predictive Analytics
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive Change
NYC Predictive Analytics
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
NYC Predictive Analytics
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
NYC Predictive Analytics
 
Ad

Recently uploaded (20)

03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.
MCH
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
Debunking the Myths behind AI - v1, Carl Dalby
Debunking the Myths behind AI -  v1, Carl DalbyDebunking the Myths behind AI -  v1, Carl Dalby
Debunking the Myths behind AI - v1, Carl Dalby
Association for Project Management
 
Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...
Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...
Kasdorf "Accessibility Essentials: A 2025 NISO Training Series, Session 5, Ac...
National Information Standards Organization (NISO)
 
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
Nguyen Thanh Tu Collection
 
Kenan Fellows Participants, Projects 2025-26 Cohort
Kenan Fellows Participants, Projects 2025-26 CohortKenan Fellows Participants, Projects 2025-26 Cohort
Kenan Fellows Participants, Projects 2025-26 Cohort
EducationNC
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18
Celine George
 
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdfRanking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Rafael Villas B
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
Real GitHub Copilot Exam Dumps for Success
Real GitHub Copilot Exam Dumps for SuccessReal GitHub Copilot Exam Dumps for Success
Real GitHub Copilot Exam Dumps for Success
Mark Soia
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Sugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptxSugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptx
Dr. Renu Jangid
 
03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.03#UNTAGGED. Generosity in architecture.
03#UNTAGGED. Generosity in architecture.
MCH
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
BỘ ĐỀ TUYỂN SINH VÀO LỚP 10 TIẾNG ANH - 25 ĐỀ THI BÁM SÁT CẤU TRÚC MỚI NHẤT, ...
Nguyen Thanh Tu Collection
 
Kenan Fellows Participants, Projects 2025-26 Cohort
Kenan Fellows Participants, Projects 2025-26 CohortKenan Fellows Participants, Projects 2025-26 Cohort
Kenan Fellows Participants, Projects 2025-26 Cohort
EducationNC
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18
Celine George
 
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdfRanking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Ranking_Felicidade_2024_com_Educacao_Marketing Educacional_V2.pdf
Rafael Villas B
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
Real GitHub Copilot Exam Dumps for Success
Real GitHub Copilot Exam Dumps for SuccessReal GitHub Copilot Exam Dumps for Success
Real GitHub Copilot Exam Dumps for Success
Mark Soia
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Sugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptxSugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptx
Dr. Renu Jangid
 

An Introduction to Multilevel Regression Modeling for Prediction

  • 1. Harlan D. Harris, PhD Jared P. Lander, MA NYC Predictive Analytics Meetup October 14, 2010 Predicting Pizza in Chinatown: An Intro to Multilevel Regression
  • 2. 1. How do cost and  fuel type affect pizza quality? 2. How do those factors vary by neighborhood ?
  • 3. Linear Regression (OLS) rating i = β 0 + β price *price i + ε i   find β’s to minimize Σ ε i 2  
  • 4. Linear Regression (OLS) rating i = β p + β price *price i + ε i   find β ’s to minimize Σ ε i 2  
  • 5. Multiple Regression rating i = beta[intercept] * 1 +               beta[price] * price i +               beta[oven=wood] * I(oven i =wood) +               beta[oven=coal] * I(oven i =coal) +               error i Goal: find betas/coefficients that minimize Σ i error i 2 3 types of oven =  2 coefficients (gas is reference)
  • 6. Multiple Regression (OLS) rating i = β 0 + β price *price i +   β wood *I(oven i = &quot;wood&quot;) + β coal *I(oven i = &quot;coal&quot;) +     ε i   find β’ s to minimize Σ ε i 2    
  • 7. Multiple Regression (OLS) with Interactions rating i = β 0 + β price *price i +   β wood *I(oven i = &quot;wood&quot;) + β wood,price *price i * I(oven i = &quot;wood&quot;) +   β coal *I(oven i = &quot;coal&quot;) +     β coal,price *price i *          I(oven i = &quot;coal&quot;) +   ε i    
  • 8. Groups Examples: teachers / test scores states / poll results pizza ratings / neighborhoods
  • 9. Full Pooling (ignore groups) Examples: teachers / test scores states / poll results pizza ratings /                 neighborhoods rating i = β 0 + β price *price i + ε i
  • 10. No Pooling (groups as factors) rating i = β 0 + β price *price i +   β B *I(group i = &quot; B &quot;) +     β B ,price *price i * I(group i = &quot; B &quot;) +       β C *I(group i = &quot; C &quot;) +     β C ,price *price i * I(group i = &quot; C &quot;) +         ε i      
  • 11. Pizzas Name Rating $/Slice Fuel Type Neighborhood Rosario’s 3.5 2.00 Gas Lower East Side Ray’s 2.8 2.50 Gas Chinatown Joe’s 3.3 1.75 Wood East Village Pomodoro 3.8 3.50 Coal SoHo Response Continuous Categorical Group
  • 12. Data Summary in R > za.df <- read.csv(&quot;Fake Pizza Data.csv&quot;) > summary(za.df)      Rating       CostPerSlice   HeatSource      Neighborhood  Min.   :0.030   Min.   :1.250   Coal: 17   Chinatown  :14    1st Qu.:1.445   1st Qu.:2.000   Gas :158   EVillage   :48    Median :4.020   Median :2.500   Wood: 25   LES        :35    Mean   :3.222   Mean   :2.584              LittleItaly:43    3rd Qu.:4.843   3rd Qu.:3.250              SoHo       :60    Max.   :5.000   Max.   :5.250                               http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza
  • 13. Viewing the Data in R > plot(za.df)  
  • 14. Visualize ggplot(za.df, aes(CostPerSlice, Rating,     color=HeatSource)) +  geom_point() + facet_wrap(~ Neighborhood) + geom_smooth(aes(color=NULL),     color='black', method='lm',      se=FALSE, size=2)
  • 15. > lm.full.main <- lm(Rating ~ CostPerSlice + HeatSource, data=za.df) > plotCoef(lm.full.main) Multiple Regression in R http://www.jaredlander.com/code/plotCoef.r
  • 16. Full-Pooling: Include Interaction > lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df) > plotCoef(lm.full.int)                                
  • 17. Visualize the Fit (Full-Pooling) > lm.full.int <- lm(Rating ~ CostPerSlice * HeatSource,data=za.df)                                
  • 18.  
  • 19. No Pooling Model lm(Rating ~ CostPerSlice * Neighborhood + HeatSource,data=za.df)  
  • 20. Visualize the Fit (No-Pooling) lm(Rating ~ CostPerSlice * Neighborhood + HeatSource,data=za.df)
  • 21. Evaluation of Fitted Model Cross-Validation Error Adjusted-R 2 AIC BIC RSS Tests for Normal Residuals
  • 22. Use Natural Groupings Cluster Sampling Intercluster Differences Intracluster Similarities
  • 23. Multilevel Characteristics Model gravitates toward big groups Small groups gravitate toward the model   Best when groups are similar to each other   y_i = Intercept_j[i] + Slope_j[i] + noise Intercept[j] = Intercept_alpha + Slope_alpha + noise Slope[j] = Intercept_beta + Slope_beta + noise Model the effects of the groups
  • 24. Multi-Names for Multilevel Models Multilevel Hierarchical Mixed-Effects Bayesian Partial-Pooling
  • 25. Multi-Names for Multilevel Models (1) Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts a_i and fixed slope b corresponds to parallel lines for different individuals i, or the model y_it = a_i + b t. Kreft and De Leeuw (1998) thus distinguish between fixed and random coefficients. (2) Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella, and McCulloch (1992, Section 1.4) explore this distinction in depth. (3) &quot;When a sample exhausts the population, the corresponding variable is fixed ; when the sample is a small (i.e., negligible) part of the population the corresponding variable is  random .&quot; (Green and Tukey, 1960) (4) &quot;If an effect is assumed to be a realized value of a random variable, it is called a random effect.&quot; (LaMotte, 1983) (5) Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage (&quot;linear unbiased prediction&quot; in the terminology of Robinson, 1991). This definition is standard in the multilevel modeling literature (see, for example, Snijders and Bosker, 1999, Section 4.2) and in econometrics. http://www.stat.columbia.edu/~cook/movabletype/archives/2005/01/why_i_dont_use.html
  • 26. Bayesian Interpretation Everything has a distribution (including the groups) Group-level model is prior information for the individual-level coefficients Group-level model has an assumed-normal prior (Can fit multilevel models with Bayesian methods, or with simpler/faster/easier approximations.)
  • 27. R Options lme4::lmer() nlme::lme() MCMCglmm() BUGS Others/niche approaches…
  • 28. Back to the Pizza Model the overall pattern among neighborhoods Natural clustering of pizzerias in neighborhoods adds information Neighborhoods with many/few pizzerias Many: trust data , ala no-pooling model Few: trust overall patterns , ala full-pooling model
  • 29. Back to the Pizza Use Neighborhoods as natural grouping  
  • 30. 5 slope coefficients and 5 intercept coefficients, one of each per neighborhood Slopes/intercepts are assumed to have Gaussian distribution Ideally, could describe all 5 slopes with 2 numbers (mean/variance) Neighborhoods with little data don’t get freedom to set their own coefficients – get pulled towards overall slope or intercept Multilevel Pizza
  • 31. R syntax lm.me.cost2 <- lmer(Rating ~ HeatSource + (1+CostPerSlice | Neighborhood), data=za.df)
  • 32. Results (Partial-Pooling) lm.me.cost2 <- lmer(Rating ~ HeatSource + (1+CostPerSlice | Neighborhood), data=za.df)
  • 33. Predicting a New Pizzeria Neighborhood: Chinatown Cost: $4.20 Fuel: Wood
  • 34. Uncertainty in Prediction Fitted coefficients are uncertain arm::sim() Model error term rnorm(1, model matrix %*% sim$Neighborhood[ , ‘Chinatown’, ], variance) New neighborhood – model possible coefficients mvrnorm(1, 0, VarCorr(model)$Neighborhood) http://github.com/HarlanH/nyc-pa-meetup-multilevel-pizza
  • 35. Red State Blue State Other Examples
  • 38. Insufficient Fruit and Vegetable Intake Other Examples
  • 39. Clean Drinking Water Other Examples
  • 40. Full-Pooling Model No-Pooling Model Separate Models Two–Step Analysis Steps to Multilevel Models
  • 41. As few as one or two groups Even two observations per group Can have many groups with just one observation How Many Groups? How Many Observations?
  • 42. Andy Gelman: “The Blessing of Dimensionality” More Data  Add Complexity Because you can Larger Datasets
  • 43. Resources Gelman and Hill (ARM) Pineiro & Bates Snijders and Bosker R-SIG-Mixed-Models (http://glmm.wikidot.com/faq) (SAS/SPSS)