SlideShare a Scribd company logo
Galit ShmuéliGeorgetown UniversityOctober 30, 2009To Explain or To Predict?Explanatory vs. Predictive Modeling in Scientific Research
The path to discovery
ExplainPredict
What are		“explaining”?				“predicting”?
Statistical modeling in social science researchPurpose: test causal theory (“explain”)Association-based statistical models Prediction nearly absent
Lesson #1:Whether statisticians like it or not,in the social sciences,association-based statistical models are used for testing causal theory.Justification: a strong underlying theoretical model provides the causality.
Definition: Explanatory ModelA statistical model used for testing causal theory(“proper” or not)
Definition: Predictive ModelAn empirical model used for predicting new records/scenarios
To Explain Or To Predict?
Multi-page sections with theoretical justifications of each hypothesis
Concept operationalizationPovertyTrustAngerEconomic stabilityWell-being4 pages of such tables
Statistical model (here: path analysis)
“Statistical” conclusions
Research conclusions
Lesson #2In the social sciences,empirical analysis is mainly used for testing causal theory.Empirical prediction is considered un-academic.Some statisticians share this view: 	The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth.Parzen, Statistical Science 2001
Prediction in the Information Systems literature
Predictive goal stated?Predictive power assessed?
1072 articles of which52 empirical with predictive claims“Examples of [predictive] theory in IS do not come readily to hand, suggesting that they are not common”Gregor, MISQ 2006
Breakdown of the 52 “predictive” articles
Why Predict?Scientific use of empirical modelsTo PredictTo Explaintest causal  theory(utility)relevancenew theorypredictability
Why are statistical explanatory models different than predictive models?
Theory vs. its manifestation?
“The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
Given the research environment in the social sciences, two critically important points are:Explanatory power and predictive accuracy cannot be inferred from one another.The “best” explanatory model is (nearly) never the “best” predictive model, and vice versa.
Point #1Explanatory PowerPredictive Power≠Cannot infer one from the other
What is R2 ?
In-sample vs. out-of-sample evaluation
out-of-sampleinterpretationprediction accuracyp-valuesPerformance EvaluationR2costsgoodness-of-fitrun timeDanger: type I,II errorsDanger: over-fitting
Suggestion for social scientists:Report predictive accuracy in addition to explanatory power
Predictive PowerExplanatory Power
Point #2Best explanatory model≠Best predictive model
Predict ≠ Explain“We should mention that not all data features were found to be useful. For example, we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However, we concluded that they could not help at all for improving the accuracy of well tuned collaborative filtering models.” Bell et al., 2008 +?
Predict ≠ ExplainThe FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125%“We are planning to… develop predictive models for bioavailability and bioequivalence”Lester M. Crawford, 2005Acting Commissioner of Food & Drugs
Let’s dig in
Explanatory goal: minimize model biasPredictive goal: minimize MSE(model bias + sampling variance)
What isOptimized?BiasPrediction MSEorVar(Y)= uncontrollablebias2 = model misspecificationestimation  (sampling variance)
Linear Regression ExampleUnderspecified modelEstimated modelTrue modelEstimated modelMSE2 < MSE1 when: σ2 large |β2| small corr(x1,x2) high limited range of x’s
China's Diverging Paths, photo by Clark SmithTwostatistical modeling paths
Design  & CollectionData PreparationGoal DefinitionEDAVariables? Methods?Model Use & ReportingEvaluation, Validation    & Model Selection
Hierarchical dataStudy design & data collectionObservational or experiment?Primary or secondary data?Instrument (reliability+validity vs. measur accuracy) How much data? How to sample?
reduced-feature modelspartitioningData preparationmissing
summary statsplotsoutlierstrendsInteractive visualizationPCASVD
Which variables?Multicollinearity?A, B, A*B?theoryassociationsex-post availability
Methods / ModelsbiasvarianceBlackbox / interpretableMapping to theoryridge regressionensemblesboostingPLSPCR
Model fit ≠ValidationExplanatory powerEmpirical modelTheoretical modelDataEvaluation, Validation& Model SelectionTraining dataEmpirical modelOver-fitting analysisHoldout dataPredictive power
Model UseInferenceTest causal  theoryNull hypothesisPredictions(utility)RelevanceNew theoryPredictabilityPredictive performanceOver-fitting analysisNaïve/baseline
Design  & CollectionData PreparationGoal DefinitionEDAVariables? Methods?Model Use & ReportingEvaluation, Validation,     & Model Selection
How does all this impact research in the (social) sciences?
Three Current ProblemsPrediction underappreciatedDistinction blurredInappropriate modeling/assessment“While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.”Helmer & Rescher, 1959
Why?What can be done?Statisticians should acknowledge the difference and teach it!
It’s time for ChangeTo PredictTo Explain

More Related Content

PPTX
To explain or to predict
Galit Shmueli
 
PPTX
To Explain, To Predict, or To Describe?
Galit Shmueli
 
PPTX
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Galit Shmueli
 
PPTX
Repurposing predictive tools for causal research
Galit Shmueli
 
PPTX
Repurposing Classification & Regression Trees for Causal Research with High-D...
Galit Shmueli
 
PPTX
Predictive Model Selection in PLS-PM (SCECR 2015)
Galit Shmueli
 
PDF
Shmueli
yairgo11
 
PPTX
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Galit Shmueli
 
To explain or to predict
Galit Shmueli
 
To Explain, To Predict, or To Describe?
Galit Shmueli
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Galit Shmueli
 
Repurposing predictive tools for causal research
Galit Shmueli
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Galit Shmueli
 
Predictive Model Selection in PLS-PM (SCECR 2015)
Galit Shmueli
 
Shmueli
yairgo11
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Galit Shmueli
 

What's hot (20)

PDF
Data Visualization in Exploratory Data Analysis
Eva Durall
 
PPTX
Reviewing quantitative articles_and_checklist
Lasse Torkkeli
 
PPT
2 types of research
Naveed Saeed
 
PPTX
Research methodology for business .pptx
Parmeshwar Biradar
 
PDF
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
PPT
Emil Pulido on Qualitative Research: Analyzing Qualitative Data
EmilEJP
 
PPT
02 mixed methods designs
Kanagaraj Easwaran
 
PPTX
Qual data analysis and interpretation
Sam Ladner
 
PPTX
Collecting, analyzing and interpreting data
Jimi Kayode
 
PPT
data interpretation
Naatchammai Ramanathan
 
PPT
Business research (1)
007donmj
 
PPT
Lecture 4
cocolatto
 
PPTX
Process of Research (Research Methodology)
SERAJUL HAQUE
 
PPT
Mixed methods research2012
Gus Cons
 
PDF
Exploratory Factor Analysis; Concepts and Theory
Hamed Taherdoost
 
DOCX
Theoretical framework and data analysis.
MINISTRY OF DEFENCE PAK
 
PDF
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
Hamed Taherdoost
 
PPTX
Theoretical Framework
Farrukh Nazir
 
PPTX
Hypothesis
Dr. Sameer Singh Faujdar
 
Data Visualization in Exploratory Data Analysis
Eva Durall
 
Reviewing quantitative articles_and_checklist
Lasse Torkkeli
 
2 types of research
Naveed Saeed
 
Research methodology for business .pptx
Parmeshwar Biradar
 
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
Emil Pulido on Qualitative Research: Analyzing Qualitative Data
EmilEJP
 
02 mixed methods designs
Kanagaraj Easwaran
 
Qual data analysis and interpretation
Sam Ladner
 
Collecting, analyzing and interpreting data
Jimi Kayode
 
data interpretation
Naatchammai Ramanathan
 
Business research (1)
007donmj
 
Lecture 4
cocolatto
 
Process of Research (Research Methodology)
SERAJUL HAQUE
 
Mixed methods research2012
Gus Cons
 
Exploratory Factor Analysis; Concepts and Theory
Hamed Taherdoost
 
Theoretical framework and data analysis.
MINISTRY OF DEFENCE PAK
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
Hamed Taherdoost
 
Theoretical Framework
Farrukh Nazir
 
Ad

Similar to To Explain Or To Predict? (20)

PPTX
Statistical Modeling in 3D: Describing, Explaining and Predicting
Galit Shmueli
 
PPTX
Statistical Modeling in 3D: Explaining, Predicting, Describing
Galit Shmueli
 
PDF
What's the Science in Data Science? - Skipper Seabold
PyData
 
PPT
Methodological Mistakes and Econometric Consequences
Asad Zaman
 
PPTX
theory testing in psychology: risky predictions and that pesky data prior
wlfgngvnpml
 
PPTX
Chapter 4: Of Tests and Testing
로이 로제
 
PDF
Confirmatory Factor Analysis J Micah Roos Shawn Bauldry
tortezhanger
 
PPT
1. Understanding research and statistics.ppt
KamalAdhikari26
 
PDF
4_5_Model Interpretation and diagnostics part 4.pdf
Leonardo Auslender
 
DOCX
Hypothesis testing
Shivasharana Marnur
 
PPTX
CS194Lec0hbh6EDA.pptx
PrudhvirajEluri1
 
PDF
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
jemille6
 
PPTX
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
jemille6
 
PDF
Fuzzy Set Theory Applications In The Social Sciences 1st Edition Michael Smit...
bapstyosan13
 
PDF
Advance Researcha and Statistic Guidence.pdf
chandora1
 
PPTX
Student Affairs Assessment Committee Training Part 2
Stan Dura
 
PDF
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
jemille6
 
PPT
Chapter14
tashillary
 
PDF
Mayo O&M slides (4-28-13)
jemille6
 
PDF
Mayo, DG March 8-Emory AI Systems and society conference slides.pdf
jemille6
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Galit Shmueli
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Galit Shmueli
 
What's the Science in Data Science? - Skipper Seabold
PyData
 
Methodological Mistakes and Econometric Consequences
Asad Zaman
 
theory testing in psychology: risky predictions and that pesky data prior
wlfgngvnpml
 
Chapter 4: Of Tests and Testing
로이 로제
 
Confirmatory Factor Analysis J Micah Roos Shawn Bauldry
tortezhanger
 
1. Understanding research and statistics.ppt
KamalAdhikari26
 
4_5_Model Interpretation and diagnostics part 4.pdf
Leonardo Auslender
 
Hypothesis testing
Shivasharana Marnur
 
CS194Lec0hbh6EDA.pptx
PrudhvirajEluri1
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
jemille6
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
jemille6
 
Fuzzy Set Theory Applications In The Social Sciences 1st Edition Michael Smit...
bapstyosan13
 
Advance Researcha and Statistic Guidence.pdf
chandora1
 
Student Affairs Assessment Committee Training Part 2
Stan Dura
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
jemille6
 
Chapter14
tashillary
 
Mayo O&M slides (4-28-13)
jemille6
 
Mayo, DG March 8-Emory AI Systems and society conference slides.pdf
jemille6
 
Ad

More from Galit Shmueli (20)

PDF
“Improving” prediction of human behavior using behavior modification
Galit Shmueli
 
PPTX
Behavioral Big Data & Healthcare Research
Galit Shmueli
 
PDF
Reinventing the Data Analytics Classroom
Galit Shmueli
 
PPTX
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Galit Shmueli
 
PPTX
Workshop on Information Quality
Galit Shmueli
 
PPTX
Behavioral Big Data: Why Quality Engineers Should Care
Galit Shmueli
 
PPTX
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Galit Shmueli
 
PPTX
Prediction-based Model Selection in PLS-PM
Galit Shmueli
 
PDF
When Prediction Met PLS: What We learned in 3 Years of Marriage
Galit Shmueli
 
PPTX
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
Galit Shmueli
 
PPTX
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Galit Shmueli
 
PDF
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
Galit Shmueli
 
PDF
Research Using Behavioral Big Data (BBD)
Galit Shmueli
 
PDF
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Galit Shmueli
 
PDF
Information Quality: A Framework for Evaluating Empirical Studies
Galit Shmueli
 
PPTX
E.SUN Academic Award presentation (Jan 2016)
Galit Shmueli
 
PPTX
Big Data & Analytics in the Digital Creative Industries
Galit Shmueli
 
PPTX
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
Galit Shmueli
 
PPTX
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Galit Shmueli
 
PPTX
Opening Data With Kaggle
Galit Shmueli
 
“Improving” prediction of human behavior using behavior modification
Galit Shmueli
 
Behavioral Big Data & Healthcare Research
Galit Shmueli
 
Reinventing the Data Analytics Classroom
Galit Shmueli
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Galit Shmueli
 
Workshop on Information Quality
Galit Shmueli
 
Behavioral Big Data: Why Quality Engineers Should Care
Galit Shmueli
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Galit Shmueli
 
Prediction-based Model Selection in PLS-PM
Galit Shmueli
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
Galit Shmueli
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
Galit Shmueli
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Galit Shmueli
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
Galit Shmueli
 
Research Using Behavioral Big Data (BBD)
Galit Shmueli
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Galit Shmueli
 
Information Quality: A Framework for Evaluating Empirical Studies
Galit Shmueli
 
E.SUN Academic Award presentation (Jan 2016)
Galit Shmueli
 
Big Data & Analytics in the Digital Creative Industries
Galit Shmueli
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
Galit Shmueli
 
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Galit Shmueli
 
Opening Data With Kaggle
Galit Shmueli
 

Recently uploaded (20)

PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PDF
1.Natural-Resources-and-Their-Use.ppt pdf /8th class social science Exploring...
Sandeep Swamy
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PDF
The Final Stretch: How to Release a Game and Not Die in the Process.
Marta Fijak
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Odoo 18 Sales_ Managing Quotation Validity
Celine George
 
PDF
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
PPTX
Understanding operators in c language.pptx
auteharshil95
 
PPTX
Open Quiz Monsoon Mind Game Final Set.pptx
Sourav Kr Podder
 
PPTX
ACUTE NASOPHARYNGITIS. pptx
AneetaSharma15
 
PDF
Arihant Class 10 All in One Maths full pdf
sajal kumar
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
RAKESH SAJJAN
 
PPTX
Introduction and Scope of Bichemistry.pptx
shantiyogi
 
PPTX
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
TumwineRobert
 
DOCX
UPPER GASTRO INTESTINAL DISORDER.docx
BANDITA PATRA
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
1.Natural-Resources-and-Their-Use.ppt pdf /8th class social science Exploring...
Sandeep Swamy
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
The Final Stretch: How to Release a Game and Not Die in the Process.
Marta Fijak
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Odoo 18 Sales_ Managing Quotation Validity
Celine George
 
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
Understanding operators in c language.pptx
auteharshil95
 
Open Quiz Monsoon Mind Game Final Set.pptx
Sourav Kr Podder
 
ACUTE NASOPHARYNGITIS. pptx
AneetaSharma15
 
Arihant Class 10 All in One Maths full pdf
sajal kumar
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
RAKESH SAJJAN
 
Introduction and Scope of Bichemistry.pptx
shantiyogi
 
vedic maths in python:unleasing ancient wisdom with modern code
mistrymuskan14
 
Cardiovascular Pharmacology for pharmacy students.pptx
TumwineRobert
 
UPPER GASTRO INTESTINAL DISORDER.docx
BANDITA PATRA
 

To Explain Or To Predict?

Editor's Notes

  • #28: Example: confidence interval vs. prediction intervalLift , costs
  • #30: Relevance; reality check; predictability