0% found this document useful (0 votes)
4 views15 pages

1732868803

Uploaded by

Minos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

1732868803

Uploaded by

Minos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistics Lecture Series

BY
Frahi Fadila

1
Contents

0.1 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1 Correlation and Regression 4


1.1 Introduction and objective . . . . . . . . . . . . . . . . . . . 4
1.2 Scatter Plots . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Components of scatter plot . . . . . . . . . . . . . . . 5
1.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Correlation coefficient . . . . . . . . . . . . . . . . . . 6
1.3.2 Coefficient and Scatter Plot . . . . . . . . . . . . . . . 6
1.3.3 The Significance of the Correlation Coefficient . . . . . 8
1.4 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Line of Best Fit . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 The Regression Line Equation . . . . . . . . . . . . . 9
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Solution of Exercises . . . . . . . . . . . . . . . . . . . . . . 12

Bibliography 15

2
CONTENTS

0.1 Syllabus

Course Description: Student will learn the fundamental concepts of data de-
scription through descriptive statistics. Three main categories of descriptive
statistics exist: The distribution relates to how frequently each value occurs.
The primary trend is related to the value averages. The dispersion or vari-
ability refers to how evenly distributed the results are.
Prerequisite(s): The student must know the basic processes and rules that
were only covered at the stage of middle and secondary education.
Course Meeting Times: Class Sessions: 2 sessions / week, 1.5 hours /
session.

Text(s): The Ultimate Book: Elementary Statistics: A Step-by-Step Ap-


proach, 8th Edition, McGraw-Hill Education, a business unit of The McGraw-
Hill Companies.
Author(s): Allan G. BLUMAN 2012, New York, NY®, [1]

Course Objectives:
At the completion of this course, students will be able to:
1. Understand, interpret, and communicate statistical reasoning from data
using basic statistical terms, descriptive statistics, charts and graphs
when appropriate.
2. Recognize and evaluate the relationship between two quantitative vari-
ables through simple linear regression and correlation and be able to
explain why correlation does not imply causation.
3. Calculate price and quantity index numbers using simple and weighted
average of price relatives.
Grade Distribution:
Assignments 40%
Final Exam 60%
Course Policies:

• Attendance is expected and will be taken each class. Students are allowed
to miss 1 class during the semester without penalty. Any further absences
will result in point and/or grade deductions.
• Students are responsible for all missed work, regardless of the reason for
absence. It is also the absentee’s responsibility to get all missing notes
or materials.

3
Chapter 1

Correlation and Regression

1.1 Introduction and objective

Correlation and Regression are an area of inferential statistics that are con-
cerned with determining the relationship between quantitative variables.
For example, a company manager may want to know whether the volume
of sales for a particular month correlates with the amount of advertising the
company does that month.
Teachers are interested in determining whether the number of hours a stu-
dent studies is related to the student’s score on a particular exam.
Medical researchers are interested in questions such as: Is caffeine linked to
heart damage? Or is there a relationship between a person’s age and his
blood pressure?
Correlation
Is a statistical method used to determine whether a linear relationship be-
tween variables exists.
Regression
Is a statistical method used to describe the nature of the relationship between
variables, that is, positive or negative, linear or nonlinear. The objective of
this chapter is to answer these questions statistically: 1. Are two or more
variables linearly related or not? 2. If so, what is the strength of this rela-
tionship?

1.2 Scatter Plots

Definition A scatter plot is a graph that describes the relationship between


two or multiple sets of data plotted with points. Scatter plots allow us
to combine the information into one graph. In scatter plots, we use only
numerical data.

4
Frahi Fadila Chapter5 Correlation and Regression

1.2.1 Components of scatter plot

Scatter plots usually contain the following elements:


- The x-axis represents values that we call the independent variable.
- The y-axis represents values that we call the dependent variable.
- Symbols plotted at the (X,Y) coordinates of data are usually points. A
chart can use different colored/shape symbols to represent separate groups
on the same chart.
- Example
In a school, a teacher has prepared a scatter plot on her computer to show the
marks of 8 students and the time spent in preparation for the examination.
It was as follows.

- What can be observed from this chart?


The data in the scatter plot shows:
- There is a linear relationship between the two variables.
- The marks increase with an increase in time spent on preparation.
- Than what can we conclude from this observation?
It can be concluded that the longer a student takes to review, the more good
marks he or she will get.
- Which means that there is a positive relationship between review and
achievement, good grades?
- What other possible relationships can exist between the variables?
As stated in the introduction, it is possible to find a linear or non-linear rela-
tionship between variables, positive or non-positive, or even no relationship
at all.

5
Frahi Fadila Chapter5 Correlation and Regression

1.3 Correlation

The word Correlation is made of Co- (meaning "together"), and Relation.


When the two sets of data are strongly linked together we say they have a
High Correlation and vice versa.
Correlation is Positive when the values increase together, and:
Correlation is Negative when one value decreases as the other increases.

1.3.1 Correlation coefficient

Correlation coefficient is computed from the data to measure the strength


and direction of a linear relationship between two quantitative variables.
The symbol for the sample correlation coefficient is r.
The symbol for the population correlation coefficient is ρ (Greek letter rho).
Correlation can have a value:
- 1 There is a perfect positive correlation.
- 0 There is no correlation (the values don’t seem linked at all).
- (-1 ) There is a perfect negative correlation.

1.3.2 Coefficient and Scatter Plot

The graphs in the following show the relationship between the correlation
coefficients and their corresponding scatter plots.
There are several ways to compute the value of the correlation coefficient.
One method is to use the formula shown here.
P P P
n( xy) − ( x)( y)
r=p P P P P
[n x2 − ( x)2 ] [n y 2 − ( y)2 ]
Where n is the number of data pairs.

6
Frahi Fadila Chapter5 Correlation and Regression

- Example

The data in the table indicates the time spent by five students on social
media (in hours) with values representing the mind focus, recorded on a
scale from 1 to 10.
- Draw the Scatter plot?
X 2 4 6 8 10
Y 5 3 2 1 4

- By examining the relationship between these two variables, what do you


find and what does it mean?

- Solution:

X Y XY X2 Y2
2 5 10 4 25
4 3 12 16 9
6 2 12 36 4
8 1 8 64 1
10 4 40 100 40
30 15 82 220 55

P P P
n( xy) − ( x)( y)
r=p P P P P
[n x2 − ( x)2 ] [n y 2 − ( y)2 ]

7
Frahi Fadila Chapter5 Correlation and Regression

y = 55, n = 5 than the


P P P P 2 P 2
x = 30, y = 15, xy = 82, x = 220,
correlation coefficient is:
r = −0.4
- We find that there is a weak negative correlation between time spent on
social media and mind focus.
- This means that as time spent on social media increases, mind focus tends
to decrease.

1.3.3 The Significance of the Correlation Coefficient

As stated before, the range of the correlation coefficient is between (-1) and
(+1).
When the value of r is near -1 or +1, there is a strong linear relationship.
When the value of r is near 0, the linear relationship is weak or nonexistent.
Since the value of r is computed from data obtained from samples, there are
two possibilities when r is not equal to zero: either the value of r is high
enough to conclude that there is a significant linear relationship between the
variables, or the value of r is due to chance.

1.4 Regression

After the scatter plot is drawn, and compute the value of the correlation co-
efficient. If this value is significant, the next step is to determine the equation
of the regression line, which is:
The data’s line of best fit.
- Note: Determining the regression line when r is not significant is meaning-
less.

8
Frahi Fadila Chapter5 Correlation and Regression

- The purpose of the regression line is to enable the researcher to see the
trend and make predictions on the basis of the data.

1.4.1 Line of Best Fit

The following figure shows through a scatter plot of data for two variables
that several lines can be drawn on the graph near the points. Given a scatter
plot, you must be able to draw the line of best fit. Best fit means that the
sum of the squares of the vertical distances from each point to the line is at
a minimum.
The reason you need a line of best fit is that the values of y will be predicted
from the values of x; hence, the closer the points are to the line, the better
the fit and the prediction will be.

1.4.2 The Regression Line Equation

In statistics, the equation of the regression line is written as:


ý = a + bx
Where a is the y intercept and b is the slope of the line.
There are several methods for finding the equation of the regression line. Two
formulas are given here. These formulas use the same values that are used
in computing the value of the correlation coefficient.

( y)( x2 ) − ( x)( xy)


P P P P
a= P P
n( x2 ) − ( x)2
P P P
n( xy) − ( x)( y)
b= P P
n( x2 ) − ( x)2

9
Frahi Fadila Chapter5 Correlation and Regression

1.5 Exercises
Exercise 1.

Find the equation of the regression line for the data in following table, and
graph the line on the scatter plot of the data.
Company Cars (x) Revenue (y)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Exercise 2.

The following scatter plot represents the number of assists and total points
of the scoring leaders in a game.

1. Convert
P P theP figure information
P 2 Pinto a table and find the following values:
x, y, xy, and x, y ?
2

2. Compute the value of the correlation coefficient?


3. Give a brief explanation of the type of relationship?

Exercise 3.

Let us have the table data:


X 10 20 60 100 210
Y 10 10 30 40 110

1. Draw a Scatter plot to the data From the top.


2. Find the equation of the regression line for the data.
3. Graph the line on the scatter plot, what do you notice?

10
Frahi Fadila Chapter5 Correlation and Regression

4. Use the equation of the regression line to predict the dependent variable
if the independent variable is equal to 300?

Exercise 4.

Suppose an analyst takes a random sample of 10 recent truck shipments


made by a company and records the distance in miles and delivery time to
the nearest half-day from the time that the shipment was made available for
pick-up.
- Determine the coefficient of correlation and find the equation of the regres-
sion line?

- Using the regression equation developed here and find the estimate of the
delivery time from the time that the shipment is available for pick-up for a
shipment of 1,000 miles?

11
Frahi Fadila Chapter5 Correlation and Regression

1.6 Solution of Exercises


Answers 1:

The value needed for the equation are,


P P
P P 2 n = 6, x = 153.8, y = 18.7,
xy = 682.77, x = 5859.26.
Substituting in the formulas, we get:
( y)( x2 ) − ( x)( xy)
P P P P
a= P P
n( x2 ) − ( x)2
(18.7)(5859.26) − (153.8)(682.77)
= = 0.396
6(5859.26) − (153.8)2
P P P
n( xy) − ( x)( y)
b= P P
n( x2 ) − ( x)2
6(682.77) − (153.8)(18.7)
b= = 0.106
6(5859.26) − (153.8)2
Hence, the equation of the regression line:
ý = a + bx
= 0.396 + 0.106x
To graph the line, select any two points for x and find the corresponding
values for y. Use any a values between 10 and 60. For example, let x = 15.
Substitute in the equation and find the corresponding ý value.
ý = 0.396 + 0.106(15)
= 1.986
Let, x = 40 then:
ý = 0.396 + 0.106(40)
= 4.636
Then plot the two points (15, 1.986) and (40, 4.636) and draw a line con-
necting the two points. See the following figure:

Answers 2:

1. The following scatter plots represents the number of assists and total
points of the scoring leaders in a game.
X
x = 18

12
Frahi Fadila Chapter5 Correlation and Regression

X 1 2 5 3 2 1 4
Y 3 4 6 3 5 4 8

X
y = 33
X
xy = 96
X
x2 = 60
X
y 2 = 175
2. P P P
n( xy) − ( x)( y)
r=p P P P P = 0.68
[n x2 − ( x)2 ] [n y 2 − ( y)2 ]

3. We conclude that there is a weak but positive relationship between num-


ber of assists and total points of the scoring leaders and it’s called Law
positive correlation.

Answers 3:

1. The Scatter plot to the data of the table is:


X 10 20 60 100 210
y 10 10 30 40 110

2. The equation of the regression line given by formulas:


( y)( x2 ) − ( x)( xy)
P P P P
a= P P
n( x2 ) − ( x)2
P P P
n( xy) − ( x)( y)
b= P P
n( x2 ) − ( x)2
13
Frahi Fadila Chapter5 Correlation and Regression

y = 14.8, than
P P P P 2 P 2
x = 400, y = 200, xy = 29.2, x = 58.2,
b = 0.5, and a = 0, we can also find a with the formula: a = Ȳ − bX̄, hence:
Y = 21 × X
3. We graph the line on the scatter plot, and we note that there is a high
positive correlation.
4. Use the equation of the regression line to predict the dependent variable
if the independent variable is equal to 300, Y = 150 [2, 3, 4]

14
Bibliography

[1] Allan Bluman. Elementary Statistics: A step by step approach 8e. McGraw Hill, 2012.
[2] Thomas A. Williams David R. Anderson. Statistics for Business & Economics. 11th ed. Boston,
USA: Cengage Learning, 2015.
[3] Zealure Holcomb. Fundamentals of descriptive statistics. Routledge, 2016.
[4] OpenClassrooms. March 3, 2024. url: https : / / openclassrooms . com / fr / courses / 7410486 -
nettoyez - et - analysez - votre - jeu - de - donnees / 7461346 - familiarisez - vous - avec - les -
mesures-de-concentration.

15

You might also like