0% found this document useful (0 votes)

286 views6 pages

The Four Questions of Data Analysis: Donald J. Wheeler

Quality Guru

Uploaded by

tehky63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

286 views6 pages

The Four Questions of Data Analysis: Donald J. Wheeler

Quality Guru

Uploaded by

tehky63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Quality Digest Daily, Dec.

1, 2009

Manuscript No. 204

The Four Questions of Data Analysis

Donald J. Wheeler

The four qustions of data analysis are the questions of description, probability, inference, and homogeneity. Any data analyst needs to know how to orgainze and use these four questions in order to obtain meaningful and correct results. THE DESCRIPTION QUESTION Given a collection of numbers, are there arithmetic values that will summarize the information contained in those numbers in some meaningful way? The objective is to capture those aspects of the data that are of interest. Intuitive summaries such as totals, averages, and proportions need little explanation. Other summaries that are less commonly used may require some explanation, and even some justification, before they make sense. However, in the end, in order to be effective a descriptive statistic has to make senseit has to distill some essential characteristic of the data into a value that is both appropriate and understandable. In every case, this distillation takes on the form of some arithmetic operation: Data + Arithmetic = Statistic As soon as we have said this, it becomes apparent that the justification for computing any given statistic must come from the nature of the data themselvesit cannot come from the arithmetic, nor can it come from the statistic. If the data are a meaningless collection of values, then the summary statistics will also be meaninglessno arithmetic operation can magically create meaning out of nonsense. Therefore, the meaning of any statistic has to come from the context for the data, while the appropriateness of any statistic will depend upon the use we intend to make of that statistic.
Given a Collection of Data: Data consists of 5 Black Beads & 45 White Beads Summarize Those Data in Some Meaningful Manner:

Should we compute an average, a median, or a proportion?

Figure 1: The Question of Description This means that before we compute the simplest average, range, or proportion, it has to make sense to do so. Thus we have to know the context for any collection of values before we can select appropriate summary statistics. Among other things, this means that we will need to be careful to avoid mixing up apples, pineapples, and watermelons prior to computing the average weight per piece.

2009 SPC Press

December 2009

The Four Questions of Data Analysis

Donald J. Wheeler

THE PROBABILITY QUESTION Given a known universe, what can we say about samples drawn from this universe? Here we enter the world of deductive logic, the enumeration of possible outcomes, and mathematical models. For simplicity we usually begin with a universe that consists of a bowl filled with known numbers of black and white beads. We then consider the likelihoods of various sample outcomes that might be drawn from this bowl. This is illustrated in Figure 2.
Beginning with a Known Universe: Describe Chance of Sample Outcomes: The chance of 1 Black in 1 Draw = 0.2

Bowl contains 1000 Black Beads 4000 White Beads Universe Proportion Black = 20%

The chance of 2 Blacks in 2 Draws = 0.04 and so on to more complex questions: The probability of getting exactly Five Black Beads in a Random Sample of 50 Beads is 0.030 or 3%

Figure 2: The Question of Probability When we reason from a general situation, which is known, to descriptions of specific outcomes, which are presently unknown, we have an argument that is said to be deductive in nature. Deductive logic proceeds from generalities to specifics and always has a correct answer. It is a process of reasoning in which a conclusion follows necessarily from the premises presented.

When we begin with simple universes, such as beads in a bowl, we can often list all of the possible outcomes. From these enumerations it is then possible to characterize the likelihoods of different events.
Since the enumeration of outcomes quickly becomes tedious, shortcuts are sought. By developing mathematical models we can skip the enumeration step and jump directly from the known universe to the likelihoods of different possible outcomes. As the mathematical models became increasingly sophisticated, and as the methods of computing and approximating the probabilities progressed, the models could be used to characterize more complex problemsproblems that could never be handled by the enumeration approach. Thus, in probability theory we are, in effect, playing a game. We play this game to learn how things behave so that we can use this knowledge later. In introductory classes we restrict ourselves to playing this game with homogeneous and fixed universes. Obviously, before students can make much headway in probability theory, they will need to be comfortable with deductive logic and mathematical modelstwo more elements of the foreign language of statistics. Fortunately, while probability theory is a necessary step in the development of modern statistical techniques, it is not a step that has to be mastered in order to analyze data effectively.

www.spcpress.com/pdf/DJW204.pdf

December 2009

The Four Questions of Data Analysis

Donald J. Wheeler

THE INFERENCE QUESTION Given an unknown universe, and given a sample that is known to have been drawn from that unknown universe, and given that we know everything about the sample, what can we say about the unknown universe? This is usually thought of as the inverse of the problem addressed by the probability question. Here, it is the sample that is known and the universe that is unknown. Now the argument proceeds from the specific to the general, which makes it inductive in nature. Unfortunately, all inductive inference is fraught with uncertainty.
Given a Known Sample Drawn from One Universe: Characterize the Unknown Universe:

Sample contains 5 Black Beads & 45 White Beads (10% Black)

90% Interval Estimate for Universe Proportion Black is 5.4% to 20.5%

Figure 3: The Question of Inference A sample result of 5 black beads and 45 white beads corresponds to a 90% Interval Estimate of 5.4% to 20.5% for the proportion of black beads in the bowl. The middle ninety percent of the plausible proportions fall in this interval. Thus, with inductive logic there is not a single right answer, but many plausible answers. Given this sample result, any percentage from 5.4% to 20.5% is plausible. Statistical inference is the realm of tests of hypotheses, confidence intervals, and regression. These techniques allow us to estimate and evaluate the parameters of the unknown universe proportions, means, and standard deviations. Of course such estimates make sense only when our outcomes are all obtained from a single universe. This assumption of a single universe is equivalent to the assumption that the behavior of these outcomes is described by one probability model. Once we have made this assumption, it is possible to use the probability model in reversegiven this outcome, these are the parameter values that are most consistent with the outcome. While the mathematics of using the probability model in reverse makes everything seem to be rigorous and scientific, you should note that the whole argument begins with an assumption and ends with an indefinite statement. The assumption is that all of the outcomes came from the same universe, and the indefinite statement is couched in terms of interval estimates. Again, with inductive inference there is not one right answer, but many plausible answers.

www.spcpress.com/pdf/DJW204.pdf

December 2009

The Four Questions of Data Analysis

Donald J. Wheeler

THE HOMOGENEITY QUESTION Given a collection of observations, is it reasonable to assume that they came from one universe, or do they show evidence of having come from multiple universes? To understand the fundamental nature of the homogeneity question, consider what happens if the collection of values does not come from one universe. Descriptive statistics are built on the assumption that we can use a single value to characterize a single property for a single universe. If the data come from different sources, how can any single value be used to describe what is, in effect, not one property but many? In Figure 4 the sample has 10 percent black. But if the 50 beads are the result of three separate draws from the three bowls at the bottom of Figure 4, each of which has a different number of black beads, which bowl is characterized by the sample result?

Given a Collection of Data:

Data consists of 5 Black Beads & 45 White Beads (10% Black) Or Could These Data Have Come From Many Different Universes?

Did These Data Come From One Universe?

? ?

Figure 4: The Question of Homogeneity Probability theory is focused on what happens to samples drawn from a known universe. If the data happen to come from different sources, then there are multiple universes with different probability models. If you cannot answer the homogeneity question, then you will not know if you have one probability model or many. Statistical inference assumes that you have a sample that is known to have come from one universe. If the data come from different sources, what does your interval estimate represent? Which of the multiple universes does it characterize? Therefore, before you can use the structure and techniques developed to answer the first three problems, you will need to examine your data for evidence of that homogeneity which is implicitly assumed by the use of descriptive statistics, the concepts of probability theory, and the techniques of statistical inference. This implicit assumption of homogeneity, that is part of everything we do in traditional statistics classes, becomes a real obstacle whenever we try to analyze data.

www.spcpress.com/pdf/DJW204.pdf

December 2009

The Four Questions of Data Analysis

Donald J. Wheeler

HOW TO ANALYZE DATA When we find evidence of a changing universe in a situation where there should be only one universe we will be unable to learn anything from descriptive statistics. When the universe is changing we cannot gain from statistical inference, nor can we make predictions using probability theory. Any nonhomogeneity in our collection of values completely undermines the techniques developed to answer each of the first three questions. The lack of homogeneity is a signal that unknown things are happening, and until we discover what is happening and remove its causes, we will continue to suffer the consequences. Computations cannot remedy the problem of a lack of homogeneity; action is required. How can we answer the homogeneity question? We can either assume that our data possess the appropriate homogeneity, or we can examine them for signs of nonhomogeneity. Since anomalous things happen in even the most carefully controlled experiments, prudence demands that we choose the second course. And the primary tool for examining a collection of values for homogeneity is the process behavior chart. To examine our data for signs of nonhomogeneity we begin with the tentative assumption that the data are homogeneous and then look for evidence that is inconsistent with this assumption. When we reject the assumption of homogeneity we will have strong evidence which will justify taking action to remedy the situation. When we fail to reject the assumption of homogeneity we will know that any nonhomogeneity present is below the level of detection. While this is a weak result, we will at least have a reasonable basis for proceeding with estimation and prediction.

Yes

Are the Data Homogeneous?

Statistical Inference

Find Out Why

Probabilistic Predictions

Fix Underlying Process

Collect More Data Figure 5: Homogeneity is the Primary Question of Analysis Thus, for practitioners the first question must always be the question of homogeneity. Given a collection of data, did these data come from one universe? In fact, is it reasonable to assume that there is a universe? Only after this fundamental question has been addressed does the practitioner know how to proceed. If the assumption of a universe is reasonable, then the techniques of statistical inference may be used to characterize that universe, and then, with reasonable estimates of the parameters, probability models may be used to make predictions. But if the

www.spcpress.com/pdf/DJW204.pdf

December 2009

The Four Questions of Data Analysis

Donald J. Wheeler

assumption of a universe is not justified, the practitioner needs to find out why. This is not the way classes in statistics are taught, but it is the way you have to do data analysis. Look at your data on a process behavior chart. If there are surprises in your data, and there often will be, then learn from these surprises. If there are no surprises, then you may proceed to analyze your data as if they came from a single universe. Any attempt to analyze data that does not begin by addressing the question of homogeneity is flawed.

www.spcpress.com/pdf/DJW204.pdf

December 2009

Painless Statistics
From Everand
Painless Statistics
Barron's Educational Series
No ratings yet
General Metrology - : Part 4: Practical Guide To Measurement Uncertainty
No ratings yet
General Metrology - : Part 4: Practical Guide To Measurement Uncertainty
46 pages
Hypothesis Testing - A Visual Introduction To Statistical Significance
100% (4)
Hypothesis Testing - A Visual Introduction To Statistical Significance
137 pages
Ten Big Statistical Ideas in Research
100% (1)
Ten Big Statistical Ideas in Research
32 pages
Spare Parts Catalogue Yagmur 550
100% (1)
Spare Parts Catalogue Yagmur 550
41 pages
Faucet Production Process
100% (3)
Faucet Production Process
4 pages
SWT301
No ratings yet
SWT301
235 pages
(NASA P.reese J.harben) Risk Mitigation Strategies For Compliance Testing (Measure Article 2012-Mar) Final
No ratings yet
(NASA P.reese J.harben) Risk Mitigation Strategies For Compliance Testing (Measure Article 2012-Mar) Final
12 pages
Statistical Procedures For Measurement Systems Verification and Validation Elsmar
No ratings yet
Statistical Procedures For Measurement Systems Verification and Validation Elsmar
15 pages
Linear Algebra in 4 Pages PDF
No ratings yet
Linear Algebra in 4 Pages PDF
4 pages
Download full Navidi, W: ISE Elementary Statistics 4TH Edition Edition William Navidi ebook all chapters
100% (3)
Download full Navidi, W: ISE Elementary Statistics 4TH Edition Edition William Navidi ebook all chapters
41 pages
En 438-2 - 2005
No ratings yet
En 438-2 - 2005
74 pages
Risk Assement Cookies
No ratings yet
Risk Assement Cookies
9 pages
Attacking Probability and Statistics Problems
From Everand
Attacking Probability and Statistics Problems
David S. Kahn
No ratings yet
Lecture_1 Introduction to Statistics
No ratings yet
Lecture_1 Introduction to Statistics
54 pages
Advance Statistics
No ratings yet
Advance Statistics
23 pages
Course On Bayesian Methods in Environmental Valuation: Basics (Continued) : Models For Proportions and Means
No ratings yet
Course On Bayesian Methods in Environmental Valuation: Basics (Continued) : Models For Proportions and Means
34 pages
The Matrixial Brain: Experiments in Reality
From Everand
The Matrixial Brain: Experiments in Reality
Paul Chaplin
No ratings yet
Bayesian Reasoning and Methods
No ratings yet
Bayesian Reasoning and Methods
341 pages
Lecture 1
No ratings yet
Lecture 1
12 pages
Quantitative Methods in Management: Chapter 4, 5 (Part) : 149-188
No ratings yet
Quantitative Methods in Management: Chapter 4, 5 (Part) : 149-188
96 pages
Probability and Statistics Made Easy
From Everand
Probability and Statistics Made Easy
Pasquale De Marco
No ratings yet
Lecture 1 Inferential Statistics
No ratings yet
Lecture 1 Inferential Statistics
32 pages
QMM Epgdm 6
No ratings yet
QMM Epgdm 6
110 pages
Statistics Lecture Notes
No ratings yet
Statistics Lecture Notes
15 pages
Lec 23
No ratings yet
Lec 23
38 pages
CH4-5-6-7 not
No ratings yet
CH4-5-6-7 not
21 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
MATH REVIEWER.docx
No ratings yet
MATH REVIEWER.docx
5 pages
Lecture 5 Bayesian Model 1
No ratings yet
Lecture 5 Bayesian Model 1
61 pages
Chapter1-Nature of Statistics
No ratings yet
Chapter1-Nature of Statistics
10 pages
PROBABILITY AND STATISTICS (1)
No ratings yet
PROBABILITY AND STATISTICS (1)
8 pages
Statistics and Probability Q3
No ratings yet
Statistics and Probability Q3
27 pages
1 Introduction PDF
No ratings yet
1 Introduction PDF
63 pages
Introduction to biostatistics
No ratings yet
Introduction to biostatistics
8 pages
Statistics A4
No ratings yet
Statistics A4
44 pages
Probability
No ratings yet
Probability
92 pages
Lecture 4 - Basic Probabaility Theory - Full
No ratings yet
Lecture 4 - Basic Probabaility Theory - Full
26 pages
Lecture 01
No ratings yet
Lecture 01
23 pages
Isds Exam 2 Notes
No ratings yet
Isds Exam 2 Notes
10 pages
Chapter 1 To Chapter 2 Stat 222
No ratings yet
Chapter 1 To Chapter 2 Stat 222
21 pages
Probability Theory
No ratings yet
Probability Theory
90 pages
Introduction To Probability
No ratings yet
Introduction To Probability
66 pages
Quantitative Research Techniques and Statistics Notes
No ratings yet
Quantitative Research Techniques and Statistics Notes
12 pages
Week-4
No ratings yet
Week-4
40 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
8 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
37 pages
EDA Reviewer
No ratings yet
EDA Reviewer
8 pages
Chapter_7_REPLICATION STATISTICS
No ratings yet
Chapter_7_REPLICATION STATISTICS
22 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Revision Exercises in Basic Engineering Mechanics
From Everand
Revision Exercises in Basic Engineering Mechanics
Gregory Pastoll
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Probability and Statistics
No ratings yet
Probability and Statistics
11 pages
Class 10-Distribution in Data Science
No ratings yet
Class 10-Distribution in Data Science
22 pages
Lecture_02 - TP
No ratings yet
Lecture_02 - TP
28 pages
Statistics Notes
No ratings yet
Statistics Notes
44 pages
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
No ratings yet
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
49 pages
552 Notes 1
No ratings yet
552 Notes 1
35 pages
Week 1 Intro To Statistics and Level of Measurement
100% (1)
Week 1 Intro To Statistics and Level of Measurement
6 pages
Chapter 4 Part 1
No ratings yet
Chapter 4 Part 1
52 pages
Unit 3 r as a Set of Statistical Tables
No ratings yet
Unit 3 r as a Set of Statistical Tables
31 pages
CPK and PPK
No ratings yet
CPK and PPK
6 pages
Some Misconceptions About Confidence Intervals
No ratings yet
Some Misconceptions About Confidence Intervals
4 pages
MSA With Attribute Data
No ratings yet
MSA With Attribute Data
2 pages
Two-Sample T-Test & Randomization Test - Bower
No ratings yet
Two-Sample T-Test & Randomization Test - Bower
4 pages
Non-Traditional MSA With Continuous Data - Bower
No ratings yet
Non-Traditional MSA With Continuous Data - Bower
7 pages
Minitab Capability Analysis 1
No ratings yet
Minitab Capability Analysis 1
5 pages
The Final Common Cause Strategy - Balestracci
No ratings yet
The Final Common Cause Strategy - Balestracci
3 pages
The Sobering Reality of Beginner's Mind' - Balestracci
No ratings yet
The Sobering Reality of Beginner's Mind' - Balestracci
6 pages
Four Data Processes, Eight Questions, Part 2 - Balestracci
No ratings yet
Four Data Processes, Eight Questions, Part 2 - Balestracci
3 pages
Jurnal Ieee
No ratings yet
Jurnal Ieee
496 pages
Technological University of The Philippines College of Engineering Department of Electronics Engineering
No ratings yet
Technological University of The Philippines College of Engineering Department of Electronics Engineering
3 pages
AutoSperm Manual
No ratings yet
AutoSperm Manual
49 pages
Carbon Nanotubes
100% (2)
Carbon Nanotubes
21 pages
DAIKIN General Catalogue 2014
No ratings yet
DAIKIN General Catalogue 2014
413 pages
Plex For Samsung App Manual v1016
No ratings yet
Plex For Samsung App Manual v1016
91 pages
James Mccoy Resume
No ratings yet
James Mccoy Resume
2 pages
test report
No ratings yet
test report
4 pages
Crochet DROPS Cape PDF
No ratings yet
Crochet DROPS Cape PDF
5 pages
11111111111111111111111
No ratings yet
11111111111111111111111
35 pages
Magnochem - : Standardised Chemical Pump With Magnetic Coupling
No ratings yet
Magnochem - : Standardised Chemical Pump With Magnetic Coupling
2 pages
1-En - stm32L4 Introduction Welcome
No ratings yet
1-En - stm32L4 Introduction Welcome
11 pages
Syntex PU 2
No ratings yet
Syntex PU 2
2 pages
Something Something
No ratings yet
Something Something
2 pages
Comparator - Comparable
No ratings yet
Comparator - Comparable
12 pages
MOS-SRWSE-20220329-Distribution Pump Installation
No ratings yet
MOS-SRWSE-20220329-Distribution Pump Installation
58 pages
612f7302a3a18 Colgate Transcend 2021 Case Brief
No ratings yet
612f7302a3a18 Colgate Transcend 2021 Case Brief
17 pages
Ch1 - Management and OB
No ratings yet
Ch1 - Management and OB
36 pages
DAS Factor 5 Specs
No ratings yet
DAS Factor 5 Specs
2 pages
Timeline List Equipment Outstanding Rig Pdsi 54.2
No ratings yet
Timeline List Equipment Outstanding Rig Pdsi 54.2
1 page
04 Working Unit
No ratings yet
04 Working Unit
40 pages
Infograf
No ratings yet
Infograf
1 page
J-Hope - Chicken Noodle Soup (Feat. Becky G.) MP3 - Lebah Musik
No ratings yet
J-Hope - Chicken Noodle Soup (Feat. Becky G.) MP3 - Lebah Musik
6 pages
Phrasal Verbs
No ratings yet
Phrasal Verbs
18 pages
C1 Kolb Paper
No ratings yet
C1 Kolb Paper
13 pages
Experimental Investigation On Bricks by Using Various Waste Materials
No ratings yet
Experimental Investigation On Bricks by Using Various Waste Materials
10 pages
GenAI Image Prompt Cheatsheet (1)
No ratings yet
GenAI Image Prompt Cheatsheet (1)
1 page

The Four Questions of Data Analysis: Donald J. Wheeler

Uploaded by

The Four Questions of Data Analysis: Donald J. Wheeler

Uploaded by

Quality Digest Daily, Dec.

Manuscript No. 204

The Four Questions of Data Analysis

Should we compute an average, a median, or a proportion?

2009 SPC Press

The Four Questions of Data Analysis

The Four Questions of Data Analysis

Sample contains 5 Black Beads & 45 White Beads (10% Black)

90% Interval Estimate for Universe Proportion Black is 5.4% to 20.5%

The Four Questions of Data Analysis

Given a Collection of Data:

Did These Data Come From One Universe?

The Four Questions of Data Analysis

Are the Data Homogeneous?

Find Out Why

Fix Underlying Process

The Four Questions of Data Analysis

You might also like