Data mining va True size of test

The document discusses the concept of rejecting a correct null hypothesis, highlighting that this can occur by chance due to the random distribution of test statistics. It warns against 'data mining' in regression analysis, where selecting variables without theoretical backing can inflate the true significance level. To mitigate this risk, it suggests using out-of-sample testing to validate model performance and avoid spurious relationships.

Uploaded by

nguyenttngan1995

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Data mining va True size of test

Uploaded by

nguyenttngan1995

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Recall that the probability of rejecting a correct null hypothesis is equal to the size of the test, denoted α.

The possibility of rejecting a correct null hypothesis arises from the fact that test statistics are assumed
to follow a random distribution and hence they will take on extreme values that fall in the rejection
region some of the time by chance alone. A consequence of this is that it will almost always be possible
to find significant relationships between variables if enough variables are examined. For example,
suppose that a dependent variable yt and twenty explanatory variables x2t, …, x21t (excluding a constant
term) are generated separately as independent normally distributed random variables. Then y is
regressed separately on each of the twenty explanatory variables plus a constant, and the significance of
each explanatory variable in the regressions is examined. If this experiment is repeated many times, on
average one of the twenty regressions will have a slope coefficient that is significant at the 5% level for
each experiment. The implication is that for any regression, if enough explanatory variables are
employed in a regression, often one or more will be significant by chance alone. More concretely, it
could be stated that if an α% size of test is used, on average one in every (100/α) regressions will have a
significant slope coefficient by chance alone.

Trying many variables in a regression without basing the selection of the candidate variables on a
financial or economic theory is known as ‘data mining’ or ‘data snooping’. The result in such cases is that
the true significance level will be considerably greater than the nominal significance level assumed. For
example, suppose that twenty separate regressions are conducted, of which three contain a significant
regressor, and a 5% nominal significance level is assumed, then the true significance level would be
much higher (e.g., 25%). Therefore, if the researcher then shows only the results for the regression
containing the final three equations and states that they are significant at the 5% level, inappropriate
conclusions concerning the significance of the variables would result.

As well as ensuring that the selection of candidate regressors for inclusion in a model is made on the
basis of financial or economic theory, another way to avoid data mining is by examining the forecast
performance of the model in an ‘out-of-sample’ data set (see Chapter 6). The idea is essentially that a
proportion of the data is not used in model estimation, but is retained for model testing. A relationship
observed in the estimation period that is purely the result of data mining, and is therefore spurious, is
very unlikely to be repeated for the out-of-sample period. Therefore, models that are the product of data
mining are likely to fit very poorly and to give very inaccurate forecasts for the out-of-sample period

Job Safety Analysis For Scaffolding 1
76% (41)
Job Safety Analysis For Scaffolding 1
3 pages
Roads 2000 Strategic Plan
No ratings yet
Roads 2000 Strategic Plan
117 pages
Chapte 1
No ratings yet
Chapte 1
40 pages
The Nature of Regression Analysis: Explanatory Variables, With A View To Estimating And/predicting The (Populatiojn)
No ratings yet
The Nature of Regression Analysis: Explanatory Variables, With A View To Estimating And/predicting The (Populatiojn)
6 pages
Research Method
No ratings yet
Research Method
18 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Four Assumptions Test in Multiple Regression
No ratings yet
Four Assumptions Test in Multiple Regression
10 pages
4 Asumsi Multiple Regresi
No ratings yet
4 Asumsi Multiple Regresi
5 pages
Af Notes by Midhila)
No ratings yet
Af Notes by Midhila)
60 pages
lecture 6 linear regression
No ratings yet
lecture 6 linear regression
8 pages
RESEARCH METHODS LESSON 18 - Multiple Regression
No ratings yet
RESEARCH METHODS LESSON 18 - Multiple Regression
6 pages
Chapter 9 Multiple Regression Analysis: The Problem of Inference
No ratings yet
Chapter 9 Multiple Regression Analysis: The Problem of Inference
10 pages
CH3. Multiple Linear Regression 2023
No ratings yet
CH3. Multiple Linear Regression 2023
76 pages
Levenbach Causal2017
No ratings yet
Levenbach Causal2017
15 pages
Econometric Project - Linear Regression Model
No ratings yet
Econometric Project - Linear Regression Model
17 pages
ECONOMETRICS
No ratings yet
ECONOMETRICS
2 pages
Econometrics
No ratings yet
Econometrics
12 pages
Econometrics I Handout
No ratings yet
Econometrics I Handout
41 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Chapter9 PDF
No ratings yet
Chapter9 PDF
28 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Fem 11110 Iar 2022 Key Definitions
No ratings yet
Fem 11110 Iar 2022 Key Definitions
4 pages
On Polynomial Regression and Regression Surface Analysis
No ratings yet
On Polynomial Regression and Regression Surface Analysis
3 pages
EE4 Ch02 Solutions Manual
No ratings yet
EE4 Ch02 Solutions Manual
13 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Basics
No ratings yet
Basics
8 pages
Inferential Report 1
No ratings yet
Inferential Report 1
7 pages
Multiple Regression in SPSS
No ratings yet
Multiple Regression in SPSS
17 pages
Regression Analysis I
No ratings yet
Regression Analysis I
46 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
Chap5 Chris Brooks
No ratings yet
Chap5 Chris Brooks
8 pages
The Nature of Regression Analysis
No ratings yet
The Nature of Regression Analysis
20 pages
STAT 112 – STATISTICS AND PROBABILITYWEEK11-15 (1)
No ratings yet
STAT 112 – STATISTICS AND PROBABILITYWEEK11-15 (1)
8 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Statistical Test of Significance
No ratings yet
Statistical Test of Significance
14 pages
Brown Durbin Evans 1975
No ratings yet
Brown Durbin Evans 1975
45 pages
Lecture #1
No ratings yet
Lecture #1
22 pages
Econometrics Revision Work
100% (6)
Econometrics Revision Work
6 pages
DS Theory
No ratings yet
DS Theory
4 pages
IST172 Problem Set II-2
No ratings yet
IST172 Problem Set II-2
7 pages
Multiple Regression (2)
No ratings yet
Multiple Regression (2)
20 pages
2. Chapter_1 _lecture notes_
No ratings yet
2. Chapter_1 _lecture notes_
5 pages
HW1 (1)
No ratings yet
HW1 (1)
7 pages
Ardl 1
No ratings yet
Ardl 1
166 pages
Multiple Regression A Leisurely Primer
No ratings yet
Multiple Regression A Leisurely Primer
26 pages
Lecture 04
No ratings yet
Lecture 04
45 pages
correlation
No ratings yet
correlation
13 pages
Hypothesis Testing- Different Approaches
No ratings yet
Hypothesis Testing- Different Approaches
29 pages
Distinguishing Between Random and Fixed: Variables, Effects, and Coefficients
No ratings yet
Distinguishing Between Random and Fixed: Variables, Effects, and Coefficients
3 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Data Analysis Exam
No ratings yet
Data Analysis Exam
9 pages
Note Simple Linear Regression
No ratings yet
Note Simple Linear Regression
17 pages
Problems2 Solutions
No ratings yet
Problems2 Solutions
4 pages
Econometrics Unit 1
No ratings yet
Econometrics Unit 1
34 pages
Chapter 2. Simple Linear Regression Module May13
No ratings yet
Chapter 2. Simple Linear Regression Module May13
20 pages
Chapters 2-4
No ratings yet
Chapters 2-4
72 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
The Regression Equation
No ratings yet
The Regression Equation
1 page
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
N Trigeminus
No ratings yet
N Trigeminus
11 pages
Arris Cable Modem Router User-Manual
No ratings yet
Arris Cable Modem Router User-Manual
38 pages
Param Devi Sukt of Ma Tripursundari त्रिपुरसुन्दरी परम् देवी सूक्त
100% (5)
Param Devi Sukt of Ma Tripursundari त्रिपुरसुन्दरी परम् देवी सूक्त
8 pages
Automation Cheat Sheet
No ratings yet
Automation Cheat Sheet
2 pages
7 Essential Skills To Fly High As A Cabin Crew
No ratings yet
7 Essential Skills To Fly High As A Cabin Crew
5 pages
Serves: June 2020
No ratings yet
Serves: June 2020
19 pages
Client Complaint Form (Final)
No ratings yet
Client Complaint Form (Final)
4 pages
Assessment of The Endocrine System
100% (1)
Assessment of The Endocrine System
29 pages
Ashok Stambh
No ratings yet
Ashok Stambh
10 pages
Industrial Accident and Safety
100% (1)
Industrial Accident and Safety
64 pages
Krampus Claus
No ratings yet
Krampus Claus
47 pages
Payloads Ejemplo S
No ratings yet
Payloads Ejemplo S
2 pages
ICH Guidelines for Evaluation of Herbal Drugs Poin..
No ratings yet
ICH Guidelines for Evaluation of Herbal Drugs Poin..
2 pages
Chang 2014
No ratings yet
Chang 2014
18 pages
Zoom Basic Functions - Final
No ratings yet
Zoom Basic Functions - Final
22 pages
Socio 101 Homework No. 4: Word Count: 120
No ratings yet
Socio 101 Homework No. 4: Word Count: 120
1 page
Introduction To Innodb Monitoring System
No ratings yet
Introduction To Innodb Monitoring System
31 pages
Chapter - 02 Introduction To Instrumentation
No ratings yet
Chapter - 02 Introduction To Instrumentation
56 pages
Gender & Regular Adjectives
No ratings yet
Gender & Regular Adjectives
8 pages
API Standards For Tanks
No ratings yet
API Standards For Tanks
3 pages
Semi Detailed Lesson Plan Math
No ratings yet
Semi Detailed Lesson Plan Math
5 pages
Lesson Plan By: Jeff Mendenhall Lesson: Cooking With Math Grade: 4 Length: 25 Minutes Academic Standard
No ratings yet
Lesson Plan By: Jeff Mendenhall Lesson: Cooking With Math Grade: 4 Length: 25 Minutes Academic Standard
4 pages
Presented By-Shruti Jain S.D. College of Management Studies, Muzaffarnagar
No ratings yet
Presented By-Shruti Jain S.D. College of Management Studies, Muzaffarnagar
14 pages
Paul Et Al. - 2023
No ratings yet
Paul Et Al. - 2023
15 pages
Welder SMAW MAG MIG SAW
No ratings yet
Welder SMAW MAG MIG SAW
88 pages
Write A C Program To Identify Different Types of Tokens in A Given Program
No ratings yet
Write A C Program To Identify Different Types of Tokens in A Given Program
46 pages
Mark 301 Group Assignment Handout 2023
No ratings yet
Mark 301 Group Assignment Handout 2023
7 pages
Mid Exam
No ratings yet
Mid Exam
3 pages

Data mining va True size of test

Uploaded by

Data mining va True size of test

Uploaded by

Recall that the probability of rejecting a correct null hypothesis is equal to the size of the test, denoted α.

You might also like