Statistics and Probability (4TH Q)
Statistics and Probability (4TH Q)
Hypothesis Testing
Hypothesis Testing is a statistical procedure of drawing a conclusion that generally pertains to the
characteristic/s of a population through sample data.
So, in this case, we have to undergo several steps for us to conclude if the general assumptions are valid or not.
The null hypothesis is denoted by Ho and refers to any claim or assertion about the parameter of the
population. This hypothesis is intended to be rejected in conducting the hypothesis testing. It is known as the “null
hypothesis” because of the absence of statistical evidence or facts to warrant its truthfulness. In mathematics, it is
commonly equated to zero (a customary way of denoting null).
The alternative hypothesis is denoted by Ha and is an assertion or claim that contradicts the null hypothesis. is
denoted by Ha and is an assertion or claim that contradicts the null hypothesis.
When we conduct statistical analysis, we use mathematical statements to identify the null and alternative
hypotheses' statistical validity or invalidity. The following table below is a guide on how to translate.
So on the Symbolic form, you can see pairs of Ho and Ha. For example, if Ho is μ = A, then its Ha is μ ≠ A. If Ho
is μ ≠ A, then its Ha is μ = A. If Ho μ > A, then its Ha is µ ≤ A. And so forth. But not all of these pairs are acceptable since
we say that the null hypothesis means no significant difference or equal. Equal (=) is only allowed to Ho. So the only
acceptable symbolic forms of Ho and Ha are as follows:
If Ho: < A, then its Ha: and if Ho: , then it Ha: < . Since the Ha has > and < as its symbols, then
it is a one tailed test or directional test.
The equality symbol of Ho and Ha. (µ1 and µ2 are means of two groups).
If Ho: 1 < 2, then its Ha: 1 2 and if Ho: 1 2, then it Ha: 1 < 2. Since the Ha has > and < as its
symbols, then it is a one-tailed test or directional test.
The Level of Significance is the degree of uncertainty or doubtfulness about the statistical statement under the same
conditions. Usually, the researcher uses a 1 to 10% level of significance on his/her research. The researcher might not be
fully confident enough in his/her research because of different factors affecting his/her studies. And in research, we
allow it. The symbol we use to denote the level of significance is δ (delta).
The Level of Confidence is the degree of assurance or certainty that a particular statistical statement is correct under
specified conditions. It is opposed to the level of significance. Researchers typically use a 90 to 99% confidence level to
communicate how valid their results are. We denote 1 - δ as the level of confidence.
Types of Error
Given the following situations: Ho is true, and Ho is false, and the decisions: Reject Ho and Fails to reject Ho.
If Ho is true and you reject Ho, you will commit a Type I error.
If Ho is false and you reject Ho, then your decision is correct.
If Ho is true and you fail to reject Ho or simply accept Ho, then that is a correct decision.
And lastly, if Ho is false and you fail to reject Ho, you will commit a Type II error.
The most commonly used δ is 0.1, 0.05, and 0.01. Choosing a 0.05 level of significance means that the researcher is
95% confident and has 5% to commit type I error
The area of rejection or critical region is the area under the normal curve wherein the null hypothesis is
rejected based on the set condition or Decision Rule.
The Critical Value (cv) or Tabular Value separates the area of rejection and the area in which the null
hypothesis is not rejected under the normal curve. Usually, the critical values or tabular values can be found on the back
pages of any statistics book.
DECISION RULE:
Reject Ho and if the computed value is < -cv or > +cv, otherwise do not reject Ho.
DECISION RULE:
DECISION RULE:
Parametric Test
2. Assumptions are made concerning the parameters of the population from which sample are drawn,
5. Normal distribution of the population from which the sample was drawn.
The null hypothesis (Ho) is the commonly accepted fact; it is the opposite of the alternative hypothesis (Ha).
Researchers work to reject, nullify or disprove the null hypothesis. Researchers come up with an alternative
hypothesis, one that they think explains a phenomenon and then work to reject the null hypothesis.
The alternative hypothesis is just an opposite to the null. For example, if your null is I’m going to win 1 million
then your alternative is I’m going to win more than 1 million. Basically, you are looking at whether there’s enough
change (with the alternative hypothesis) to be able to reject the null hypothesis.
Formulate the null and alternative hypothesis of the following and express in terms of the equality symbols used
for Ho and Ha.
Example 1
1. A Barangay Captain from a certain barangay in Manila claims that the average monthly income of families with
five members from his vicinity is P 12,000.
Solution:
So the given the problem above, Ho: The average monthly income of families with five members from his vicinity
is P12000. (Ho: μ = 12000) And to oppose Ho, we have Ha: The average monthly income of families with five
members from his vicinity is not P12000. (Ha: μ ≠ 12000)
Example 2
1.1. A Barangay Captain from a certain barangay in Manila claims that the average monthly income of families with
five members from his vicinity is greater than P 12,000.
Solution:
Given the problem above, since equal sign is only allowed in Null Hypothesis then the statement above should
be placed on Alternative Hypothesis. Ha: The average monthly income of families with five members from his
vicinity is greater than P12, 000. (Ha: μ > 12000) And to oppose that therefore, the null hypothesis is Ho: The
average monthly income of families with five members from his vicinity is not greater than P12000. (Ho: μ ≤
12000)
Example 3
1.2. A Barangay Captain from a certain barangay in Manila claims that the average monthly income of families with
five members from his vicinity is lower than P 12,000.
Solution:
Given the problem above, since equal sign is only allowed in Null Hypothesis then the statement above should
be placed on Alternative Hypothesis. Ha: The average monthly income of families with five members from his
vicinity is lower than P12000. (Ha: μ < 12000)
And to oppose that therefore, the null hypothesis is Ho: The average monthly income of families with five
members from his vicinity is not lower than P12000. (Ho: μ ≥12000)
Example 4
The teacher claims that there is no significant difference the mean scores of morning and afternoon classes in
their Math Exam.
Solution:
Let μ1 = mean score of students in morning class in a Math Exam
Let μ2 = mean score of students in afternoon class in a Math Exam
No significant difference means that they are just the same so null hypothesis would be:
Ho: There is no significant difference between the mean scores of morning and afternoon classes.
(Ho: μ1 = μ2)
And to oppose that, therefore alternative hypothesis would be:
Ha: There is a significant difference between the mean scores of morning and afternoon classes.
(Ha: μ1 ≠ μ2)
Example 5
The teacher claims that the mean score of morning class is higher than the mean score of afternoon class in their
Math Exam.
Solution:
Let μ1 = mean score of students in morning class in a Math Exam
Let μ2 = mean score of students in afternoon class in a Math Exam
Given the problem above, since equal sign is only allowed in Null Hypothesis then the statement above should
be placed on Alternative Hypothesis.
Ha: The mean score of morning class is higher than the mean score of afternoon class in their
Math Exam. (Ha: μ1>μ2)
So its null hypothesis will be
Ho: The mean score of morning class is not higher than the mean score of afternoon class in their
Math Exam. (Ho: μ1 ≤ μ2)
Test Statistic
It is mathematical formula that allows the researchers to determine the likelihood of obtaining sample outcomes if the
null hypothesis were true.
Then the formula given above is 𝑧 = 𝑥̅−𝜇 𝜎/√𝑛 where 𝑥̅is sample mean, µ is population mean, σ is the population
standard deviation, and n is the number of observations collected.
The formula given above is 𝑧 = 𝜇1−𝜇2 √ 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 where µ1 and µ2 are population means, σ1 2 and σ2 2 are
population variances and n1 and n2 are number of observations on each group.
Z table:
Then the formula given above is t= 𝑥̅−𝜇 𝑠/√𝑛 where 𝑥̅is sample mean, µ is population mean, s is the sample standard
deviation and n is the number of observations collected.
This statistic is compared to student t-distribution with n – 1 degrees of freedom. If n ≥ 30 then by Central Limit
Theorem we may instead compare it to the standard normal. We use t-test of the number of samples (n) is less than 30
or if there is no population variance.
T table:
Recalling the area of rejection or critical region we have discussed in the previous meeting; it is the area under the
normal curve where in the null hypothesis is rejected based on the set condition or Decision Rule.
Critical Value (cv) or Tabular Value separates the area of rejection and the area in which null hypothesis is not
rejected under the normal curve. Usually, the critical values or tabular values can be found at the back pages of any
statistics books.
Recalling also the above table we discussed from the previous meetings, this is the Critical Value or the Tabular
Value for Z – test. We used this table to determine the critical region.
Example
Suppose it is known from experience that the standard deviation of the weight of 9-ounce packages of cookies is 0.20
ounces. To check whether the true average is, on a given day, 9 ounces, employees select a random sample of 43
packages and found that their mean weight is 𝑥̅= 9.33 ounces with 0.01 level of significance. Determine its area of
rejection.
Solution:
Given an example of Z test of One Population Mean: Suppose it is known from experience that the standard deviation
of the weight of 9-ounce packages of cookies is 0.20 ounces. To check whether the true average is, on a given day, 9
ounces, employees select a random sample of 43 packages and found that their mean weight is 𝑥̅= 9.33 ounces with 0.01
level of significance. Determine its area of rejection.
So in determining the area of rejection, we need to identify first if it’s one tailed or two tailed test. Based on the above
problem, that the claim of the employees about their package is 9-ounce, then it is a two-tailed test, that is Ha: µ ≠ 9.
Then the level of significance as stated on the problem is 0.01. Therefore, δ = 0.01.
Then looking to the Z table from the previous slide, the Critical Values are ± 2.576
Area of Rejection
DECISION RULE:
Example
A Math Professor claims that there is no significant difference between the mean scores obtained by students in
afternoon and morning session in an examination in Math 101. Data are shown below:
Solution:
Given an example of Z test of Two Population Means: A Math Professor claims that there is no significant
difference between the mean scores obtained by students in afternoon and morning session in an examination in Math
101. Data are shown below:
So in determining the area of rejection, we need to identify first if it’s one tailed or two tailed test. Based on the
above problem that the claim of the professor is that there is no significant difference between the mean scores
obtained by students in afternoon and morning session in an examination in Math101, then it is a two-tailed test, that is
Ha: µ1 ≠ µ2. Since the professor is 95% confident, then the level of significance is 5% or 0.05. Therefore, δ = 0.05.
Then looking to the Z table from the previous slide, the Critical Values are ± 1.960.
Area of Rejection
DECISION RULE:
Reject Ho and if the computed value is < -1.960 or > +1.960, otherwise do not reject Ho.
And based on the Area or Rejection discussed in the previous meeting, the above shaded region would be the
rejection area, hence it will also follow the above decision rule for two tailed test.
Test Statistic
T table:
The above table is the Critical Value or the Tabular Value for t – test. This table needs degrees of freedom where a
One Sample Mean is df = n – 1, while Two Independent Means has formula df = n1 + n2 – 2.
Example
A brochure inviting subscriptions for a new diet program states that the participants are expected to lose over 22
pounds in five weeks. Suppose that, from the data of the five-week weight losses of 19 participants, the sample mean
and sample standard deviation are found to be 23.5 and 10.2, respectively. Could the statement in the brochure be
substantiated on the basis of these findings? Test at the α = .05 level.
Solution:
Given an example of t test of One Sample Mean: A brochure inviting subscriptions for a new diet program states that
the participants are expected to lose over 22 pounds in five weeks. Suppose that, from the data of the five-week weight
losses of 56 participants, the sample mean and sample standard deviation are found to be 23.5 and 10.2, respectively.
Could the statement in the brochure be substantiated on the basis of these findings? Test at the α = .05 level.
So in determining the area of rejection, we need to identify first if it’s one tailed or two tailed test. Based on the above
problem, that participants are expected to lose over 22 pounds in five weeks, that is Ha: µ > 22. So it’s a One-Tailed test.
Then the level of significance as stated on the problem is 0.05. Therefore, δ = 0.05. Since n = 19, therefore df = 19 – 1 =
18.
Then looking to the t table from the previous slide, the Critical Values are ± 1.734. And since it is a right tailed test,
therefore, we consider the critical value of 1.734.
Area of Rejection
DECISION RULE:
And based on the Area or Rejection discussed in the previous meeting, the above shaded region would be the
rejection area, hence it will also follow the above decision rule for one tailed test.
Example
A study of the number of business dinner that executives in the insurance and banking industries claim as deductible
expenses per month was based on random samples and yielded the following results:
Test the null hypothesis µ1 = µ2 against the alternative hypothesis µ1 ≠ µ2 at the α = .05 significance level.
Solution:
Given an example of t test of Two Independent Means: A study of the number of business dinner that executives in
the insurance and banking industries claim as deductible expenses per month was based on random samples and yielded
the following results:
Test the null hypothesis µ1 = µ2 against the alternative hypothesis µ1 ≠ µ2 at the α = .05 significance level.
So in determining the area of rejection, we need to identify first if it’s one tailed or two tailed test. Based on the above
problem, the alternative hypothesis is µ1 ≠ µ2. So it’s a Two-Tailed test. Then the level of significance as stated on the
problem is 0.05. Therefore, δ = 0.05. Since n1 = 14 and n2 = 15, therefore df = 14 + 15 – 2 = 27.
Then looking to the t table from the previous slide, the Critical Values are ± 2.052.
Area of Rejection
DECISION RULE:
Reject Ho and if the computed value is < -2.052 or > +2.052, otherwise do not reject Ho.
And based on the Area or Rejection discussed in the previous meeting, the above shaded region would be the
rejection area, hence it will also follow the above decision rule for two tailed test.
𝑧 = 𝑥̅−𝜇 𝜎/√𝑛
Recalling Large Sample Size or Known Variance is also known as Z-test. Z-test is a test statistic whose population
variance is 𝜎 2 .
Then the formula given above is 𝑧 = 𝑥̅−𝜇 𝜎/√𝑛 where 𝑥̅is sample mean, µ is population mean, σ is the population
standard deviation and n is the number of observations collected.
Z table:
Recalling also the above table we discussed from the previous meetings, this is the Critical Value or the Tabular Value
for Z – test.
DECISION RULE:
Reject Ho and if the computed value is < -cv or > +cv, otherwise do not reject Ho.
DECISION RULE:
DECISION RULE:
Example 1
Suppose it is known from experience that the standard deviation of the weight of 9-ounce packages of cookies is 0.20
ounces. To check whether the true average is, on a given day, 9 ounces, employees select a random sample of 43
packages and find that their mean weight is 𝑥̅ = 9.33 ounces with 0.01 level of significance.
Solution:
Ho: The average weight of the cookie packages is 9 ounces (Ho: μ = 9)
Ha: The average weight of the cookie packages is not 9 ounces (Ho: μ ≠ 9)
Decision Rule: Reject Ho if Computed Value is >+cv or <-cv, otherwise do not reject Ho.
So to solve for this, given the formula for Z test of One Population Mean, we have 𝑧 = 𝑥̅−𝜇 𝜎/√𝑛 where 𝑥̅is sample mean,
µ is population mean, σ is the population standard deviation and n is the number of observations collected.
Conclusion:
Since the Computed Value (10.82) is > + cv (2.576), then reject Ho.
If you rejected Ho, therefore accept Ha and that is, the average weight of the cookie packages is not 9 ounces. To be
able to come up with the decision, we have to find the critical value, based on the previous slides, the critical value of z
where it is two tailed test and has 0.01 level of significance is ± 2.576.
Example 2
According to the U.S. Department of Education, full-time graduate students receive an average salary of $11,800.
The dean of graduate studies at a large state university in PA claims that his graduate students earn more than this. He
surveys 52 randomly selected students and finds their average salary is $12,445 with a standard deviation of $1700. With
α = 0.05, is the dean’s claim correct?
Solution:
Ho: The average salary of full time graduate students is not greater than $11800. (Ho: µ ≤ 11800)
Ha: The average salary of full time graduate students is greater than $11800. (Ho: µ > 11800)
Decision Rule: Reject Ho if Computed Value is >+cv, otherwise do not reject Ho.
So to solve for this, given the formula for Z test of One Population Mean, we have 𝑧 = 𝑥̅−𝜇 𝜎/√𝑛 where 𝑥̅is sample mean,
µ is population mean, σ is the population standard deviation and n is the number of observations collected.
Conclusion:
Since the Computed Value (2.78) is > + CV (1.645), then reject Ho.
If you rejected Ho, therefore accept Ha and that is, the average salary of full time graduate students is greater than
$11800. Therefore, the dean’s claim is correct.
To be able to come up with the decision, we have to find the critical value, based on the previous slides, the critical
value of z where it is one tailed test and has 0.05 level of significance is ± 1.645.
A Math Professor claims that there is no significant difference between the mean scores obtained by students in
afternoon and morning session in an examination in Math 101. Data are shown below:
Solution:
Let μ1 = mean score obtained by students in afternoon session in an examination in Math 101
Let μ1 = mean score obtained by students in afternoon session in an examination in Math 101
Ho: There is no significant difference between the mean scores obtained by students in afternoon and morning session in
an examination in Math 101. (Ho: µ1 = µ2)
Ha: There is a significant difference between the mean scores obtained by students in afternoon and morning session in
an examination in Math 101. (Ho: µ1 ≠ µ2)
Decision Rule: Reject Ho if Computed Value is > +cv or < - cv, otherwise do not reject Ho.
To compute for test statistic value we have, 𝑧 = 𝜇1−𝜇2 √ 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 where µ1 and µ2 are population means,
σ1 2 and σ2 2 are population variances and n1 and n2 are number of observations on each group.
Then looking to the Z table from the previous slide, the Critical Values are ± 1.960.
Conclusion
Since the Computed Value (-2) is < - CV (-1.960), then reject Ho.
If you rejected Ho, therefore accept Ha and that is, there is a significant difference between the mean scores obtained by
students in afternoon and morning session in an examination in Math 101
Example 2
In a Logic class, the students claim that those who took the Logic class in Filipino has higher average grades than those
who took it in English. Below is the data observed:
Solution:
Let μ1 = average grades of students who took Logic class in Filipino
Ho: The average grades of students who took Logic class in Filipino is not higher than those who took it in English. (Ho:
µ1 ≤ µ2)
Ha: The average grades of students who took Logic class in Filipino is higher than those who took it in English. (Ha: µ1 >
µ2)
Decision Rule: Reject Ho if Computed Value is > +cv , otherwise do not reject Ho.
To compute for test statistic value we have, 𝑧 = 𝜇1−𝜇2 √ 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 where µ1 and µ2 are population means,
σ1 2 and σ2 2 are population variances and n1 and n2 are number of observations on each group.
Then looking to the Z table from the previous slide, the Critical Values are ± 1.960.
Conclusion
Since the Computed Value (5.74) is < + CV (3.326), then reject Ho.
If you rejected Ho, therefore accept Ha and that is, the average grades of students who took Logic class in Filipino is
higher than those who took it in English. The students’ claim is correct.
In Population Proportion, the sampling method used is a simple random sampling where each sample point can result
in just two possible outcomes. These are success and failure. The sample includes at least 10 successes and 10 failures
because the population size is at least 20 times as big as the sample size.
In Population Proportion, the hypotheses are claims about the population proportion p. The null hypothesis is
denoted as po = 0 while the alternative hypothesis is the completing claim that the parameter is less than, greater than,
or not equal to po.
Alternative hypothesis can take one of three forms, and an investigator might believe that the parameter has
increased, decreased or changed.
Ha: p > p0 , where p0 is the comparator or null value (e.g., p0 =191 in our example about weight in men in 2006) and an
increase is hypothesized - this type of test is called an upper-tailed test;
Ha: p < p0 , where a decrease is hypothesized and this is called a lower-tailed test; or
Ha: p ≠ p0, where a difference is hypothesized and this is called a two-tailed test
The researchers usually use 0.05, 0.01, and 0.10 level of significance but any value between 0 to 1 can be used.
𝑧 = 𝑝̂−𝑝𝑜 √ 𝑝𝑜(1−𝑝𝑜) 𝑛
Z table
Area of Rejection
DECISION RULE:
Reject Ho and if the computed value is < -cv or > +cv, otherwise do not reject Ho
DECISION RULE:
DECISION RULE:
“A test statistic is a statistic used in statistical hypothesis testing. A Hypothesis Test is typically specified in terms of a
test statistic, considered as a numerical summary of a data-set that reduces the data to one value that can be used to
perform the hypothesis test. In general, a test statistic is selected or defined in such a way as to quantify, within observed
data, behaviors that would distinguished the null from the alternative hypothesis, where such an alternative is prescribed
or that would characterize the null hypothesis if there is no explicitly stated alternative hypothesis.
An important property of a test statistic is that its sampling distribution under the null hypothesis must be calculable,
either exactly or approximately which allows p values to be calculated. A test statistic shares some of the same qualities
of a descriptive statistic and many statistics can be used as both test statistics and descriptive statistics. However, a test
statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive is that it is easily
interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it
is difficult to determine their sampling distribution. Two widely used test statistics are the t-statistic and F-test.”
(Wikipedia, n.d.)
POPULATION PROPORTION
“In Statistics, a population proportion, generally denoted by P or the Greek letter 𝜋 is a parameter that describes a
percentage value associated with a population. For example, the 2010 United States Census showed that 83.7% of the
American Population was identified as not being Hispanic or Latino; the value of 0.837 is a population proportion. In
general, the population proportion and other population parameters are unknown. A census can be conducted in order
to determine the actual value of a population parameter but often a census is not practical due to its costs and time
consumption.
A population proportion is usually estimated through an unbiased sample statistic obtained from an observational
study or experiment. For example, the National Technological Literacy Conference conducted a national survey of 2,000
adults to determine the percentage of adults who are economically illiterate. The study showed that 72% of the 2,000
adults sample did not understand what a gross domestic product is. The value of 72% is a sample proportion. The sample
proportion is generally denoted by p
One of the main focuses of study in inferential statistics is determining the “true” value of a parameter. Generally, the
actual value for a parameter will never be found unless a census is conducted on the population of the study. However,
there are statistical methods that can be used to get a reasonable estimation for a parameter. These methods include
confidence intervals and hypothesis testing
Estimating the value of a population proportion can be of great implication in the areas of agriculture, business,
economics, education, engineering, environmental studies, medicine, law, political science, psychology, and sociology.
A population proportion can be estimated through the usage of a confidence interval known as one-sample
proportion in the Z-interval.” (Wikipedia, n.d.)
Z table:
Area of Rejection
DECISION RULE:
Reject Ho and if the computed value is < -cv or > +cv, otherwise do not reject Ho.
DECISION RULE:
DECISION RULE:
Test Statistic
𝑧 = 𝑝̂− 𝑝𝑜 √ 𝑝𝑜(1−𝑝𝑜) 𝑛
Where po is the null hypothesized proportion, that is when Ho: p = po.
Example 1
The CEO of a company claims that 75 percent of his 1,000,000 customers are very satisfied with the service they receive.
To test this claim, the local newspaper surveyed 200 customers using simple random sampling. Among the sampled
customers, 73 percent say they are very satisfied. Based on these findings, can we reject the CEO's hypothesis that 75%
of the customers are very satisfied? Use a 0.05 level of significance.
Solution:
Decision Rule: Reject Ho of computed value is > + cv or < - cv, otherwise do not reject Ho.
Decision: Since Computed Value (-0.65) is > than – cv ( - 1.960), then do not reject Ho.
Ha: p = 0.75, Yes. We can’t reject the CEO’s hypothesis that 75% of the customers are very satisfied.
Example 2
Newborn babies are more likely to be girls than boys. A random sample found 10,000 boys were born among 25,000
newborn children. The sample proportion of boys was 0.4. Is this sample evidence that the birth of boys is less common
than the birth of girls in the entire population? Use α = 0.01
Solution:
Decision Rule: Reject Ho of computed value is < - cv, otherwise do not reject Ho.
Decision: Since Computed Value (-31.62) is less than – cv (- 2.2360), then reject Ho.
Ha: p < 0.5, Yes. We can say that this sample evidence that the birth of boys is less common than the birth of girls in the
entire population.
18 Bivariate
Bivariate Data
Bivariate data comes from the word “Bi” meaning two and “variate” meaning variables. Usually the data are related to
each other.
If one variable is influencing another variable, then you will have bivariate data that has an independent and a
dependent variable. This is because one variable depends on the other for change. An independent variable is a
condition or piece of data in an experiment that can be controlled or changed. A dependent variable is a condition or
piece of data in an experiment that is controlled or influenced by an outside factor, most often the independent variable.
Scatter Plot
A scatter plot is a graph of plotted points that show the relationship between two sets of data. In the example above,
each dot represents one person’s height versus their age.
Correlation refers to the relationships that exist between any two variables.
Types of Correlation
There is no relationship between x and y. As x increases, there is no change in y, so there is no association between the
values of x and y.
The above table is the interpretation of coefficient of correlation wherein the values of r is -1 ≤ r ≤ 1.
Pearson (r) is a technique that is commonly used in determining relationship between two sets of data. It is applicable
once the data to be compared are measured in terms of interval or ratio scale.
Technique that is commonly used in determining relationship between two sets of data
Parametric toll that can be used to determine relationship between to variables First derived by a British Statistician
named Karl Pearson
Example
Data was collected to ten grade 12 students to determine if there is a significant correlation between the time (in his)
spent by the gr.12 students in studying and their scores on a test.
Can you determine the degree of relationship of the time they spent in studying and their scores on a test.
Solution:
Then after solving each sum we’re now ready to substitute it on our r. So we have,
It means that as a student have more time in studying then the higher his scores will be.
19 Linear Regression
Linear Regression
Linear Regression is a line that shows the relationship between two variables by a linear equation. It also calculates the
“best-fit” line for a certain set of data.
One variable is considered to be an independent variable and the other one is dependent variable
Independent Variable or Predictors or Explanatory is the controlling data or risk factors while Dependent Variable or
Response is affected by the controlling data or the outcome variable.
By using the least squares method (a procedure that minimizes the vertical deviations of plotted points surrounding a
straight line) we are able to construct a best fitting straight line to the scatter diagram points and then formulate a
regression equation
X = predictor
a = estimated y – intercept
b = slope
The y – intercept and slope are estimated from the sample data, and they are the values that minimize the sum of the
squared differences between the observed and the predicted values of the outcome
Slope
Intercept is the point where the graph will intersect the axis while slope is the inclination of the line.
Linear Equation
𝑦̂ = 𝑎 + 𝑏�
Example:
A sample of 6 persons was selected. The value of their age (x variable) and their weight is demonstrated in the
following table. Find the regression equation and the predicted weight when age is 8.5 years.
Solution:
So we have 𝑥̅= 41 6 = 6.83 and 𝑦̅ = 66 6 = 11, 𝑏 = 461− 41(66) 6 291− (41)2 6 =0.92