Week 4 Lecture Q A
Week 4 Lecture Q A
Week 4 – Lecture
Shafiq Ur Rehman
Lecturer (Statistics)
Department of Mathematics & Statistics
University of Central Punjab, Lahore.
F-DISTRIBUTION (INTRODUCTION)
In order to test for equality of two population variances, we have to use the F-test. It is necessary to know
what the F-distribution is. The F-distribution is named after the English statistician Sir Ronald A Fisher.
A major difference between t and chi-square distributions and F-distribution is that the former two
distributions have only one number of degrees of freedom whereas the latter has two numbers of degrees of
freedom. This is because it has degrees of freedom for the numerator as also for the denominator. These two
numbers of the degrees of freedom are the parameters of the F-distribution. The two degrees of freedom
taken together give a particular F-distribution curve. It takes only positive values. The F-distribution curve is
skewed to the right. But, as the number of degrees of freedom increases, the skewness decreases.
F-DISTRIBUTION (INTRODUCTION)
In order to use this table, we need to know three quantities, viz. (i) the degrees of freedom for the numerator,
(ii) the degrees of freedom for the denominator, and (iii) an area in the right tail of an F-distribution curve. It
may be noted that the F-distribution table is read only for an area in the right tail of the F-distribution curve.
HYPOTHESIS TESTING INFERENCES ABOUT POPULATION VARIANCE
Hypothesis testing is a statistical technique used to make inferences about a population based on a sample of
data. In the case of population variance, hypothesis testing involves testing whether the variance of a
population is equal to a specified value.
The hypothesis testing procedure involves the following steps:
1. Formulating the null and alternative hypotheses: The null hypothesis, denoted by H0, is the
hypothesis that the population variance is equal to a specified value, while the alternative hypothesis,
denoted by Ha, is the hypothesis that the population variance is not equal to the specified value.
2. Selecting a significance level: The significance level, denoted by α, is the probability of rejecting the
null hypothesis when it is actually true. Commonly used values for α are 0.05 and 0.01.
3. Test Statistic: A sample of data is collected, and the sample variance, denoted by s², is calculated. The
test statistic, denoted by F, is then calculated using the formula F = s²/σ², where σ² is the specified value
of the population variance.
4. Making a decision: The decision to reject or fail to reject the null hypothesis is based on the calculated
value of the test statistic and the corresponding table value or p-value.
HYPOTHESIS TESTING INFERENCES ABOUT POPULATION VARIANCE
Hypothesis testing for inferences about the difference between two population variances is a statistical
technique used to determine whether the variances of two populations are significantly different from each
other. The procedure involves testing whether the ratio of the sample variances of the two populations is
statistically different from 1.
The hypothesis testing procedure involves the following steps:
1. Formulating the null and alternative hypotheses: he null hypothesis, denoted by H0, is the
hypothesis that the variances of the two populations are equal, while the alternative hypothesis, denoted
by Ha, is the hypothesis that the variances of the two populations are not equal.
2. Selecting a significance level: The significance level, denoted by α, is the probability of rejecting the
null hypothesis when it is actually true. Commonly used values for α are 0.05 and 0.01.
3. Test Statistic: Samples are collected from both populations, and the sample variances, denoted by 𝑆12
and 𝑆22 , are calculated. The test statistic, denoted by F, is then calculated using the formula F = 𝑆12 /𝑆22 ,
where 𝑆12 is the larger of the two sample variances.
4. Making a decision: The decision to reject or fail to reject the null hypothesis is based on the calculated
value of the test statistic and the corresponding table value or p-value.
TABLE VALUE CALCULATION
For table of the F-Distribution
One Tail Test (Right Tail Test) 𝑯𝟎 : 𝝈𝟐𝟏 ≤ 𝝈𝟐𝟐 ; 𝑯𝟏 : 𝝈𝟐𝟏 > 𝝈𝟐𝟐
Table value 𝐹𝛼 𝑉1 , 𝑉2 𝐹𝛼 𝑉1 , 𝑉2
One Tail Test (Right Tail Test) 𝑯𝟎 : 𝝈𝟐𝟏 ≥ 𝝈𝟐𝟐 ; 𝑯𝟏 : 𝝈𝟐𝟏 < 𝝈𝟐𝟐
Sample A Sample B
𝑛1 = 16 𝑛2 = 21
𝑥ҧ1 = 1200 ℎ𝑟 𝑥ҧ2 = 1300 ℎ𝑟
𝑆1 = 60 ℎ𝑟 𝑆2 = 50 ℎ𝑟
We have to test the hypothesis that the variability of the two processes is the same.
Solution
𝐻𝑜 : 𝜎1 = 𝜎2 𝐻1 : 𝜎1 ≠ 𝜎2
Level of Significance: 𝛼 = 0.05
2 2
Test Statistic: 𝐹 = 𝑆1ൗ𝑆 2 = 60 ൗ502 = 1.44
2
Table Value: Since this is a two-tail test, this F is compared against F(15, 20) for 𝛼/2 = 0.05/2 = 0.025.
The critical value for right side is 𝐹0.025 𝑉1 , 𝑉2 = 𝐹0.025 (15, 20) is 2.7559.
Solution
𝐻𝑜 : 𝜎1 = 𝜎2 𝐻1 : 𝜎1 ≠ 𝜎2
Level of Significance: 𝛼 = 0.05
2 2
Test Statistic: 𝐹 = 𝑆1ൗ𝑆 2 = 60 ൗ502 = 1.44
2
Table Value: Since this is a two-tail test, this F is compared against F0.025(15, 20) and F0.025(20,15) for 𝛼 =
0.05.
Decision: Since our calculated value 1.44 falls between the table value or the acceptance region, we
accept the null hypothesis that 𝜎12 = 𝜎22 indicating that there is no significant difference in the variability
of the two samples.
REGRESSION ANALYSIS
Regression analysis involves identifying the relationship between a dependent variable and one or more
independent variables. A model of the relationship is hypothesized, and estimates of the parameter values
are used to develop an estimated regression equation. If the model is believed satisfactory, the estimated
regression equation can be used to predict the value of the dependent variable given values for the
independent variables.
SIMPLE LINEAR REGRESSION MODEL:
In simple linear regression, the model used to describe the relationship between a single dependent variable
y and a single independent variable x is 𝑦 = 𝛼 + 𝛽𝑋 + 𝜀. 𝛼 and β are referred to as the model parameters,
and ε is a probabilistic error term that accounts for the variability in y that cannot be explained by the linear
relationship with x.
MULTIPLE LINEAR REGRESSION MODEL:
In multiple regression analysis, the model for simple linear regression is extended to account for the
relationship between the dependent variable y and p independent variables i.e., x1, x2, . . ., xp. The general
form of the multiple regression model is
y = 𝛼 + β1 x1 + β2 x2 + . . . + βp xp + ε
SCATTER POLT:
A scatter plot (scatter graph) uses dots to represent values for two different numeric variables. The position
of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are
used to observe relationships between variables.
Scatter Plot
25
20
15
Y-Values
10
0
0 2 4 6 8 10 12 14 16 18
X-Values
(Cont.) SCATTER POLT:
Scatter plots’ primary uses are to observe and show relationships between two numeric variables. The dots
in a scatter plot not only report the values of individual data points, but also patterns to identify the
correlational relationships.
SIMPLE LINEAR REGRESSION MODEL:
In simple linear regression, the model used to describe the relationship between a single dependent variable
y and a single independent variable x is Y = 𝛼 + 𝛽𝑋 + 𝜀. 𝛼 and β are referred to as the model parameters,
and ε is a probabilistic error term that accounts for the variability in y that cannot be explained by the linear
relationship with x.
Original Model:
𝑌 = 𝛼 + 𝛽𝑋 + 𝜖; 𝛼 = Intercept; 𝛽 = Slope or Regression coefficient; 𝜖 = Error term
Estimated Model:
𝑌 = 𝑎 + 𝑏𝑋; a = Intercept; b = Slope or Regression coefficient.
Formula(s):
𝑛 σ 𝑌𝑋−(σ 𝑌)(σ 𝑋)
𝛽=𝑏=
𝑛 σ 𝑋 2− σ 𝑋 2
EXAMPLE
𝑛 σ 𝑌𝑋−(σ 𝑌)(σ 𝑋)
𝛼 = 𝑎 = 𝑌ത − 𝑏𝑋;
ത 𝛽=𝑏= 𝑛 σ 𝑋2− σ 𝑋 2
EXAMPLE:
𝑛 σ 𝑌𝑋−(σ 𝑌)(σ 𝑋) (5∗241)−(55∗20)
𝑏= = = 2.1; 𝑎 = 𝑌ത − 𝑏𝑋ത = 11 – 2.1(5) = 2.6
𝑛 σ 𝑋2− σ 𝑋 2 (5∗90)− 20 2
X 5 6 8 10 12 13 15 16 17
Y 16 19 23 28 36 41 44 45 50
2- In an experiment to measure the stiffness of a spring, the length of the spring under different loads was measured as follows
X = Loads (1b) 3 5 6 7 10 12 15 20 22 28
Y = Length (in) 10 12 15 18 20 22 27 30 32 34