0% found this document useful (0 votes)

81 views38 pages

ML Unit2 SimpleLinearRegression pdf-60-97

Here are the steps to find the F statistic and t statistic from the given data: 1. Calculate the sum of squares for regression (SSR): SSR = Σ(ŷi - y̅)2 = (90 - 80)2 + (85 - 80)2 + (95 - 80)2 + (80 - 80)2 + (75 - 80)2 = 100 2. Calculate the sum of squares for error (SSE): SSE = Σ(yi - ŷi)2 = (88 - 90)2 + (83 - 85)2 + (93 - 95)2 + (78 - 80)2 + (77 - 75)2 = 20

Uploaded by

Deepali Koirala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views38 pages

ML Unit2 SimpleLinearRegression pdf-60-97

Uploaded by

Deepali Koirala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Some concepts

Some Concepts
Sampling distribution of a statistic
 Sampling distribution of a statistic is the probability distribution for the possible values of the statistic that results when random
sample of size 𝑛 are repeatedly drawn from population
 Suppose you randomly sampled 10 people from the population of women in Houston, Texas, between the ages of 21 and 35
years and computed the mean height of your sample.
 Now sample mean would not be equal to mean of all the women in Houston
 It might be somewhat lower or it might be somewhat higher, but it would not equal the population mean exactly.
 Similarly, second sample is taken of 10 people from the same population
 Not necessarily mean of the second sample to equal the mean of the first sample
 A critical part of inferential statistics involves determining how far sample statistics are likely to vary from each other and from
the population parameter
 Sample statisticscould be
 Mean, Mean absolute value of the deviation from the mean, Standard Deviation of the sample, Variance of the sample
 In the above example, the statistics is sample mean and population mean
 The numerical descriptive measures calculated from the samples are known as statistics

Ref-
David Scott , Mikki Hebl, Rudy Guerra , Dan Osherson, and Heidi Zimmer, Introduction to Statistics, Online edition
1 1

William Mendenhall, Robert Beaver, Barbara Beaver, Introduction to probability and statistics, Cengage, 14 th edition
Sampling Distributions and Inferential
Statistics
 We collect sample data
 From this data we estimate parameters of the sampling distribution
 This knowledge of the sampling distribution is useful for knowing the degree to which means from different samples would
differ from each other and from the population mean
 It would give you a sense of how close your particular sample mean is likely to be to the population mean
 This information is directly available from a sampling distribution
 The most common measure of how much sample means differ from each other is the standard deviation of the sampling
distribution of the mean
 This standard deviation is called the standard error of the mean
 If all the sample means were very close to the population mean, then the standard error of the mean would be small
 On the other hand, if the sample means varied considerably, then the standard error of the mean would be large
 Example
 Assume that sample mean were 125 and we estimated that the standard error of the mean were 5
 For a normal distribution, then it would be likely that your sample mean would be within 10 units of the population mean since most of a normal
distribution is within two standard deviations of the mean
Sampling distribution of the mean
 Mean
 The mean of the sampling distribution of the mean is the
mean of the population from which the scores were sampled
 If a population has a mean μ, then the mean of the
sampling distribution of the mean is also μ.
 Symbol 𝜇M is used to refer to the mean of the sampling
distribution of the mean
 Formula for the mean of the sampling distribution of the
mean can be written as: 𝜇𝑀 = 𝜇
Sampling distribution of the mean
 Variance
 The variance of the sampling distribution of the mean is
2
computed as follows 𝜎𝑀 = 𝜎 2 /𝑛
 That is, the variance of the sampling distribution of the mean
is the population variance divided by 𝑛, the sample size (the
number of samples used to compute a mean).
 Thus, the larger the sample size, the smaller the variance of
the sampling distribution of the mean
Sampling distribution of the mean
 Standard Error
 The standard error of the mean is the standard deviation of
the sampling distribution of the mean. It is therefore the
square root of the variance of the sampling distribution of
the mean and can be written as: 𝜎𝑀 = 𝜎/ 𝑁
 The standard error is represented by a σ because it is a
standard deviation
 The subscript (M) indicates that the standard error in
question is the standard error of the mean
Conditions for inference
 A good sample must have the following characteristics
 Representative of entire population
 Big enough to draw conclusion from (n>=30)
 Randomly picked
 Sampling distribution of the sample mean needs to be approximately normal
 This is true if our parent population is normal
 or if sample size is reasonably large (n ≥30)
 Independent
 Individual observations need to be independent
 If sampling is done without replacement, the sample size should not be more than 10% of
the population
Need to know
 If the sampled population is normal, then the sampling
distribution will also be normal
 When the sampled population is approximately
symmetric, the sampling distribution becomes
approximately normal
 When the sampled population is skewed, the sample
size 𝑛 ≥ 30 , should be taken to so that the sampling
distribution becomes approximately normal
Central limit theorem
 The central limit theorem states that: Given a
population with a finite mean μ and a finite non-zero
variance σ2, the sampling distribution of the mean
approaches a normal distribution with a mean of μ
and a variance of 𝜎 2 /𝑁 as N, the sample size,
increases.
Analysis of variance ANOVA
 Analysis of Variance (ANOVA) is a statistical
method used to test differences between two or
more means
 ANOVA is used to test general rather than specific
differences among means
Analysis of variance ANOVA for linear
regression
 Divide total variation in y ("total sum of squares") into two components:
• due to the change in x ("regression sum of squares")
• due to random error ("error sum of squares“)
• Data= Fit + Error
• SST= SSR+SSE
2
• σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത 2 = σ𝑛𝑖=1 𝑦ෝ𝑖 − 𝑦ത + σ𝑛𝑖=1(𝑦𝑖 −𝑦ෝ𝑖 )2
 If the regression sum of squares is a "large" component of the total sum of
squares
 it suggests that there is a linear association between the predictor x and the
response y
ANOVA for linear regression
 SST= SSR+SSE
 The degrees of freedom associated with each of
these sums of squares follow a similar decomposition
 That is
 𝑑𝑓 𝑜𝑓 𝑆𝑆𝑇 = 𝑑𝑓 𝑜𝑓 𝑆𝑆𝑅 + 𝑑𝑓 𝑜𝑓 𝑆𝑆𝐸
Parameters of ANOVA

 Mean squares in analysis of variance table can be used to test

 the 𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻0 𝑣𝑒𝑟𝑠𝑢𝑠 𝑡ℎ𝑒 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝐻𝑎
 𝐻0: 𝜇1 = 𝜇2 = ⋯ 𝜇𝑘
 𝐻𝑎 = 𝐴𝑡𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑜𝑡ℎ𝑒𝑟𝑠
Calculating F Test
 SSR and SSE are used to find two mean squares, MST and
MSE respectively
𝑆𝑆𝑅 𝑆𝑆𝑅
 𝑀𝑆𝑅 = =
𝑑𝑓 𝑜𝑓 𝑆𝑆𝑅 1
𝑆𝑆𝐸 𝑆𝑆𝐸
 𝑀𝑆𝐸 = =
𝑑𝑓 𝑜𝑓 𝑆𝑆𝐸 𝑛−2
𝑀𝑆𝑅 𝑆𝑆𝑅
 The test statistic 𝐹= = 𝑆𝑆𝐸
𝑀𝑆𝐸
𝑛−2
 F is often referred to as the analysis of variance F-test
Calculating F Test
 F Test is a ratio of two values.
 Each has its own degree of freedom
 Numerator degree of freedom is k, where k is number of independent variables
 Denominator degree of freedom is n-k-1
 F distribution is skewed to right and is never negative
 Null hypothesis is B= 0
 Alternate hypothesis is B not equal to zero
 We always reject in upper tail
 So if Fcalculated > Fcritical (from the table), reject the null hypothesis.
 That is the correlation exists

T-distribution and F-distribution
F-distribution curve is dependent on degrees of freedom for numerator and
denominator
 F- Distribution is positive sided
 Therefore F- distribution does not have two tails
 T- distribution is positive and negative sided, has two tails
T- distribution F- distribution

p p

Significance level, α = 2p Significance level, α = p

Ex: ANOVA table and the F-test
 Relation between skin cancer mortality and latitude
 There were 49 states in the data set (49 samples)
 DF associated with SSR = 1 for the simple linear regression model
 DF associated with SSTO = n-1 = 49-1 = 48
 DF associated with SSE = n-2 = 49-2 = 47
 Total degrees of freedom: 1 + 47 = 48
ANOVA table and the F-test
 The sums of squares add up: SSTO = SSR + SSE
53637 = 36464 + 17173
 F = [SSR/(1)]/[SSE/(n-2)]
= 36464×47/17173
= 99.97
ANOVA table and the F-test
 Assume C = 99%
 Therefore α = 1% = 0.001
 For the data, calculated p-value is 0.000
 Which is less than 0.001
 Therefore null hypothesis can be rejected (slope is not 0)
Example
 Suppose the ANOVA table is given as follows-

Verify whether the regression model provides a better fit to the data than a model that
contains no independent variables
Using the F-distribution table for alpha = 0.05, with numerator of degrees of
freedom 2 (df for Regression) and denominator degrees of freedom 9
Conclusion from the above example
 We find that the F critical value is 4.2565
 Since our f statistic (5.09) is greater than the F
critical value (4.2565),
 We can conclude that the regression model as a
whole is statistically significant.
Example
 Find F stat and T Math score xi
Final Calculus
grade yi
stat for simple 39
43
65
78

linear regression 21
64
52
82

for the given data 57

47
92
89
28 73

 Do this example in 75
34
98
56

your notebooks 52 75
Hypothesis Using ANOVA

 For computation of F-statistics, numerator df and

denominator degree of freedom are considered
 In excel ‘=finv(α,df1,df2)’ shows F* value
 Same can be used for ‘=tinv(p,df1)’
Ex 1: SLR Evaluation using ANOVA
 Given data set contains the winning times (in seconds) of the 22 men's 200
meter Olympic sprints held between 1900 and 1996
 Is there a linear relationship between year and the winning times?
 Are Sprinters Getting Faster?

Example from : online.stat.psu.edu/stat462/node/108/

Conduct the formal F-test
 Null hypothesis H0: β1 = 0
 Alternative hypothesis Ha: β1 ≠ 0
 Consider P-value, which is 0.000 (to three decimal places), That is, the P-value is less than 0.001
 Therefore, we reject the null hypothesis H0: β1 = 0 in favor of the alternative hypothesis HA: β1 ≠
0
Equivalence of ANOVA F-test and t-test

(-13.33)2 = 177.7
Equivalence of ANOVA F-test and t-test

 For a given significance level α, the F-test of β1 = 0 versus β1 ≠ 0 is algebraically

equivalent to the two-tailed t-test.
 We get the same P-values,
 If F-test rejects H0 then t-test also rejects it
 Same is for Ha
 The F-test is appropriate for testing that the slope differs from 0 (β1 ≠ 0).
 Use the t-test to test that the slope is positive (β1 > 0) or negative (β1 < 0)
 The F-test is more useful for the multiple regression model for which more than one slope
parameter are to be tested
Equivalence of ANOVA F-test and t-test

 P-value associated with the t-test is the same as the P-value

associated with the analysis of variance F-test
 This is always true for the simple linear regression model
 Both P-values are 0.000 (to three decimal places)
Limitations of Statistical Model

 Regression model is selected in order to approximate the true population

 Simple Linear Regression model uses one independent variable
and has two parameters, intercept and slope
y^ = bo + b1x
 Multiple regression model uses more than one independent variables
y^ = bo+ b1x + b2x2 + …. + bnxn
 And tries to fit all data points
Simple Multiple Linear Regression Model

• May lead to overfitting and may cause random error

• Overfit regression models have too many terms for the number of observations
• Which results in noise coefficients rather than actual relationships

actual
relationship
between
variables overfit model
Over-fitting
 Overfitting is a modeling error that occurs when a function or model is too
closely fit the training set and getting a drastic difference of fitting in test set
 A statistical model begins to describe
the random error in the data rather than the
relationships between variables
 R-squared is a popular measure of quality of fit in regression
 However it does not offer significant information about how well a given
regression model can predict future values
 Overfitting lead to erroneous R-squared, regression coefficients and p-values
in the population
 Overfitting a regression model reduces its generalizability outside the original
dataset.
Detecting over-fit models: Cross validation
• We can detect overfitting by determining whether your model fits new data as
well as it fits the data used to estimate the model
• Used to estimate the behaviour of the large data set based on the a small part
of data set
• Evaluate machine learning models on a limited data sample
• Use k number of groups to split the dataset
• Called k-fold cross validation
• Randomly split the dataset into k fold/groups of equal size
• First fold is considered as validation set and is verified against the remaining k-1
folds
K-fold cross validation
 Choosing the right value of k is quite complex
 Behaviour of the model is dependent on the dataset
 Some ways of choosing the value of k is
 Each train test group of data should be large enough to statistically
representative
 K=10, experimentally proven to be optimum
 K=n, where n is size of data set such that each sample is given
equal opportunity
 This is called Leave One Out Cross Validation (LOOCV)
Ex: k-fold cross validation
 Data samples: [1, 2, 3, 4, 5, 6]
 K=3
 Fold1 = [5,2], Fold2 = [1,3], Fold3 = [4,6]
 Model1: trained on fold1 + fold2, tested on fold3
 Model2: trained on fold2 + fold3, tested on fold1
 Model3: trained on fold1 + fold3, tested on fold2
 Minimum number of samples can be 1
Cross validation: The ideal procedure

• Divide data into three sets, training, validation and test sets
• Parameters of regression model are calculated based on training data
• And accuracy is measured for new data
• The validation error gives an unbiased estimate of the predictive power of a
model.
K- fold Cross validation
 Split data into 5 samples
 Fit a model to the training sample
 Use test sample to determine Cross Validation Metric
 Repeat the process for next sample
References
 Machine Learning, IBM
 David Scott1, Mikki Hebl, Rudy Guerra1, Dan Osherson, and
Heidi Zimmer, Introduction to Statistics, Online edition
 William Mendenhall, Robert Beaver, Barbara Beaver,
Introduction to probability and statistics, Cengage, 14th edition
 Ref: https://online.stat.psu.edu/stat462/node/91/
 https://openstax.org/details/books/introductory-business-
statistics

CRE Equations and Formulas Print Out
No ratings yet
CRE Equations and Formulas Print Out
30 pages
Sampling and Statistical Inference: Eg: What Is The Average Income of All Stern Students?
100% (1)
Sampling and Statistical Inference: Eg: What Is The Average Income of All Stern Students?
11 pages
(eBook PDF) Introductory Econometrics: A Modern Approach 6th Edition pdf download
100% (2)
(eBook PDF) Introductory Econometrics: A Modern Approach 6th Edition pdf download
48 pages
Variance and Standard Deviation
100% (3)
Variance and Standard Deviation
15 pages
Confidence Intervals
No ratings yet
Confidence Intervals
14 pages
ICSE Arihant Sample Papers 2024
100% (4)
ICSE Arihant Sample Papers 2024
96 pages
Cost Sheet Problems
0% (1)
Cost Sheet Problems
18 pages
AYURSURE (Research and Stat) 4
No ratings yet
AYURSURE (Research and Stat) 4
44 pages
SBE11 CH 13
No ratings yet
SBE11 CH 13
38 pages
MM II - 61 - Session 4 - Quantitative Research-II
No ratings yet
MM II - 61 - Session 4 - Quantitative Research-II
34 pages
stat 2 unit 5 (2)
No ratings yet
stat 2 unit 5 (2)
16 pages
12 Statistical Inferences Hypothesis Testing Docx 891688442659173
No ratings yet
12 Statistical Inferences Hypothesis Testing Docx 891688442659173
25 pages
ANOVA
No ratings yet
ANOVA
44 pages
Sampling
No ratings yet
Sampling
50 pages
L2-More On Describing Data
No ratings yet
L2-More On Describing Data
154 pages
Reviewer
No ratings yet
Reviewer
8 pages
Term 1: Business Statistics: Session 5: Sampling Distributions
No ratings yet
Term 1: Business Statistics: Session 5: Sampling Distributions
19 pages
FALLSEM2020-21_MAT2001_ETH_VL2020210107492_Reference_Material_I_18-Oct-2020_m (1)
No ratings yet
FALLSEM2020-21_MAT2001_ETH_VL2020210107492_Reference_Material_I_18-Oct-2020_m (1)
25 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
Chapter Five
No ratings yet
Chapter Five
19 pages
ECO2004_Ch3
No ratings yet
ECO2004_Ch3
16 pages
Cha - 4
No ratings yet
Cha - 4
22 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
22 pages
Regression Analysis
No ratings yet
Regression Analysis
68 pages
Biost 6.1
No ratings yet
Biost 6.1
28 pages
Statistics From PLTW
No ratings yet
Statistics From PLTW
64 pages
2_analyze - Inferential Statistics
No ratings yet
2_analyze - Inferential Statistics
27 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
Basic Univariate Statistics for Engineers 2019
No ratings yet
Basic Univariate Statistics for Engineers 2019
32 pages
Biostatistics Revision Dr.nj
No ratings yet
Biostatistics Revision Dr.nj
67 pages
Chapter 5
No ratings yet
Chapter 5
11 pages
An Ova
No ratings yet
An Ova
82 pages
Confidence Intervals and Hypothesis Tests For Means
No ratings yet
Confidence Intervals and Hypothesis Tests For Means
40 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
5 BSM214 Lecture5 Fall2023
No ratings yet
5 BSM214 Lecture5 Fall2023
25 pages
Chapter Five Ed
No ratings yet
Chapter Five Ed
19 pages
Chapter 4. Sampling Distributions
No ratings yet
Chapter 4. Sampling Distributions
31 pages
Sampling Design and Analysis MTH 494 Lecture-32: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494 Lecture-32: Ossam Chohan Assistant Professor CIIT Abbottabad
119 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
29 pages
Evergreen Book
No ratings yet
Evergreen Book
33 pages
Stat (I) 4-6 Material
No ratings yet
Stat (I) 4-6 Material
42 pages
Statistics and Probability Module 3 CLT - RPUNO - Digital
No ratings yet
Statistics and Probability Module 3 CLT - RPUNO - Digital
17 pages
Pre FinalExam Reviewer
No ratings yet
Pre FinalExam Reviewer
4 pages
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
No ratings yet
Measures of Variability Lec 7: DR - Nesrin H. Darwesh University of Duhok-College of Dentistry
48 pages
RM Note Unit - 4
No ratings yet
RM Note Unit - 4
21 pages
PLU Quantitative Techniques 4
No ratings yet
PLU Quantitative Techniques 4
13 pages
Adelia Salsabila - TT Umvue
No ratings yet
Adelia Salsabila - TT Umvue
4 pages
Sampling Distribution and Estimation
No ratings yet
Sampling Distribution and Estimation
46 pages
Lecture Two 2019-2020
No ratings yet
Lecture Two 2019-2020
30 pages
Measures of Dispersion and Relative Standing
No ratings yet
Measures of Dispersion and Relative Standing
11 pages
Chapter 5, ANOVA
No ratings yet
Chapter 5, ANOVA
6 pages
Practical Assignment #2 tests your ability
No ratings yet
Practical Assignment #2 tests your ability
31 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Quiz Sample of Business Statistic'S: January 2020
No ratings yet
Quiz Sample of Business Statistic'S: January 2020
79 pages
Statistics and Probability Quarter 2 - Module 3: For Senior High School
100% (1)
Statistics and Probability Quarter 2 - Module 3: For Senior High School
18 pages
Chi-Square, F-Tests & Analysis of Variance (Anova)
No ratings yet
Chi-Square, F-Tests & Analysis of Variance (Anova)
37 pages
Anova (Keller)
No ratings yet
Anova (Keller)
91 pages
test-bank
No ratings yet
test-bank
36 pages
EC303 MST 2022 Solution - R
100% (2)
EC303 MST 2022 Solution - R
15 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
Bayesian_Renewables_Scenario_Generation_via_Deep_G
No ratings yet
Bayesian_Renewables_Scenario_Generation_via_Deep_G
7 pages
CH V Anova
No ratings yet
CH V Anova
22 pages
Lecture 7 - Overfitting, Bias-Variance Trade Off (DONE!!) PDF
No ratings yet
Lecture 7 - Overfitting, Bias-Variance Trade Off (DONE!!) PDF
42 pages
IPS (Points and Interval Estimate)
No ratings yet
IPS (Points and Interval Estimate)
23 pages
Mentoplate AOJ
No ratings yet
Mentoplate AOJ
10 pages
Iimk Adam Brochure
No ratings yet
Iimk Adam Brochure
15 pages
Introduction To Sample Size Calculation Using G Power: Principles of Frequentist Statistics
No ratings yet
Introduction To Sample Size Calculation Using G Power: Principles of Frequentist Statistics
35 pages
Stats Exam 1 Cheat Sheet
No ratings yet
Stats Exam 1 Cheat Sheet
3 pages
Factorial Experiments
No ratings yet
Factorial Experiments
24 pages
Completely Randomized Designs: Gary W. Oehlert
No ratings yet
Completely Randomized Designs: Gary W. Oehlert
33 pages
Chapter Five: Analysis of Variance: Terminologies
No ratings yet
Chapter Five: Analysis of Variance: Terminologies
10 pages
Measures of Correlation PDF
No ratings yet
Measures of Correlation PDF
14 pages
Assignment No1 ML
No ratings yet
Assignment No1 ML
5 pages
Explain The Basic Concepts and Scope of Statistics and State The Difference Between Descriptive and Inferential Statistics Give A Clear Example?
No ratings yet
Explain The Basic Concepts and Scope of Statistics and State The Difference Between Descriptive and Inferential Statistics Give A Clear Example?
9 pages
One and Two Way ANOVA
No ratings yet
One and Two Way ANOVA
11 pages
ML Unit3 MultipleLinearRegression
No ratings yet
ML Unit3 MultipleLinearRegression
70 pages
Unit 1 Ai
No ratings yet
Unit 1 Ai
68 pages
Chapter 5 (Anova)
No ratings yet
Chapter 5 (Anova)
9 pages
CH - 12 - Serial Correlation and Heteroskedasticity in Time Series Regressions
No ratings yet
CH - 12 - Serial Correlation and Heteroskedasticity in Time Series Regressions
19 pages
Machine Learning 2. 3rd Year.
No ratings yet
Machine Learning 2. 3rd Year.
51 pages
Machine Learning - Lec1
No ratings yet
Machine Learning - Lec1
56 pages
Lesson 1
No ratings yet
Lesson 1
52 pages
Anova
No ratings yet
Anova
6 pages
SSRN Id2437936
No ratings yet
SSRN Id2437936
43 pages
Statistics and Probability 1
No ratings yet
Statistics and Probability 1
2 pages
1.2.TEST of Proportions Solved Problems
No ratings yet
1.2.TEST of Proportions Solved Problems
2 pages
MCQ Computer Application by VXL Computer Classes
No ratings yet
MCQ Computer Application by VXL Computer Classes
44 pages
Basic Anova PDF
No ratings yet
Basic Anova PDF
6 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Anova
No ratings yet
Anova
16 pages
English Literature 80 Mark Question Paper 2nd Term PTV ICSE
No ratings yet
English Literature 80 Mark Question Paper 2nd Term PTV ICSE
7 pages
A Mixed Integer Linear Programing Approa
No ratings yet
A Mixed Integer Linear Programing Approa
30 pages
A Mixed Integer Programming Approach To
No ratings yet
A Mixed Integer Programming Approach To
29 pages
Weighted Mean
No ratings yet
Weighted Mean
3 pages
Forward & Backward Chaining
No ratings yet
Forward & Backward Chaining
23 pages
Econometrics Assinment 2
No ratings yet
Econometrics Assinment 2
3 pages
Quiz 2 Review Questions-1
No ratings yet
Quiz 2 Review Questions-1
3 pages
1 A B C D: Find Output
No ratings yet
1 A B C D: Find Output
6 pages
A Mixed Integer Programming Approach To The Patient Admission Scheduling Problem
No ratings yet
A Mixed Integer Programming Approach To The Patient Admission Scheduling Problem
16 pages
100 Prepositions
No ratings yet
100 Prepositions
9 pages
T Test Jahnvi Kumar 9077
No ratings yet
T Test Jahnvi Kumar 9077
14 pages
Steps in Hypothesis Testing
100% (1)
Steps in Hypothesis Testing
4 pages
Paired Samples T Test by Hand
No ratings yet
Paired Samples T Test by Hand
5 pages
English Literature Prelim Paper
No ratings yet
English Literature Prelim Paper
5 pages
Mathematical Statistics
No ratings yet
Mathematical Statistics
1 page
String+Handling+Programs 220408 211628
No ratings yet
String+Handling+Programs 220408 211628
10 pages
Practice Worksheet 1
No ratings yet
Practice Worksheet 1
4 pages
A Mixed Integer Linear Programming Appro
No ratings yet
A Mixed Integer Linear Programming Appro
7 pages
Practice Worksheet 3
No ratings yet
Practice Worksheet 3
7 pages
Vis Comp Prelim 23
No ratings yet
Vis Comp Prelim 23
5 pages
Game Theory
No ratings yet
Game Theory
4 pages
Guidelines For The Exam
No ratings yet
Guidelines For The Exam
4 pages
MAE QP May - 2023
No ratings yet
MAE QP May - 2023
4 pages
Exxam
No ratings yet
Exxam
3 pages
Grade 10 Lit 2 Revision
No ratings yet
Grade 10 Lit 2 Revision
2 pages
Worksheet Comp 2
No ratings yet
Worksheet Comp 2
2 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

ML Unit2 SimpleLinearRegression pdf-60-97

Uploaded by

ML Unit2 SimpleLinearRegression pdf-60-97

Uploaded by

Some concepts

 Mean squares in analysis of variance table can be used to test

Significance level, α = 2p Significance level, α = p

for the given data 57

 For computation of F-statistics, numerator df and

Example from : online.stat.psu.edu/stat462/node/108/

 For a given significance level α, the F-test of β1 = 0 versus β1 ≠ 0 is algebraically

 P-value associated with the t-test is the same as the P-value

 Regression model is selected in order to approximate the true population

• May lead to overfitting and may cause random error

You might also like