1732868803
1732868803
BY
Frahi Fadila
1
Contents
0.1 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Bibliography 15
2
CONTENTS
0.1 Syllabus
Course Description: Student will learn the fundamental concepts of data de-
scription through descriptive statistics. Three main categories of descriptive
statistics exist: The distribution relates to how frequently each value occurs.
The primary trend is related to the value averages. The dispersion or vari-
ability refers to how evenly distributed the results are.
Prerequisite(s): The student must know the basic processes and rules that
were only covered at the stage of middle and secondary education.
Course Meeting Times: Class Sessions: 2 sessions / week, 1.5 hours /
session.
Course Objectives:
At the completion of this course, students will be able to:
1. Understand, interpret, and communicate statistical reasoning from data
using basic statistical terms, descriptive statistics, charts and graphs
when appropriate.
2. Recognize and evaluate the relationship between two quantitative vari-
ables through simple linear regression and correlation and be able to
explain why correlation does not imply causation.
3. Calculate price and quantity index numbers using simple and weighted
average of price relatives.
Grade Distribution:
Assignments 40%
Final Exam 60%
Course Policies:
• Attendance is expected and will be taken each class. Students are allowed
to miss 1 class during the semester without penalty. Any further absences
will result in point and/or grade deductions.
• Students are responsible for all missed work, regardless of the reason for
absence. It is also the absentee’s responsibility to get all missing notes
or materials.
3
Chapter 1
Correlation and Regression are an area of inferential statistics that are con-
cerned with determining the relationship between quantitative variables.
For example, a company manager may want to know whether the volume
of sales for a particular month correlates with the amount of advertising the
company does that month.
Teachers are interested in determining whether the number of hours a stu-
dent studies is related to the student’s score on a particular exam.
Medical researchers are interested in questions such as: Is caffeine linked to
heart damage? Or is there a relationship between a person’s age and his
blood pressure?
Correlation
Is a statistical method used to determine whether a linear relationship be-
tween variables exists.
Regression
Is a statistical method used to describe the nature of the relationship between
variables, that is, positive or negative, linear or nonlinear. The objective of
this chapter is to answer these questions statistically: 1. Are two or more
variables linearly related or not? 2. If so, what is the strength of this rela-
tionship?
4
Frahi Fadila Chapter5 Correlation and Regression
5
Frahi Fadila Chapter5 Correlation and Regression
1.3 Correlation
The graphs in the following show the relationship between the correlation
coefficients and their corresponding scatter plots.
There are several ways to compute the value of the correlation coefficient.
One method is to use the formula shown here.
P P P
n( xy) − ( x)( y)
r=p P P P P
[n x2 − ( x)2 ] [n y 2 − ( y)2 ]
Where n is the number of data pairs.
6
Frahi Fadila Chapter5 Correlation and Regression
- Example
The data in the table indicates the time spent by five students on social
media (in hours) with values representing the mind focus, recorded on a
scale from 1 to 10.
- Draw the Scatter plot?
X 2 4 6 8 10
Y 5 3 2 1 4
- Solution:
X Y XY X2 Y2
2 5 10 4 25
4 3 12 16 9
6 2 12 36 4
8 1 8 64 1
10 4 40 100 40
30 15 82 220 55
P P P
n( xy) − ( x)( y)
r=p P P P P
[n x2 − ( x)2 ] [n y 2 − ( y)2 ]
7
Frahi Fadila Chapter5 Correlation and Regression
As stated before, the range of the correlation coefficient is between (-1) and
(+1).
When the value of r is near -1 or +1, there is a strong linear relationship.
When the value of r is near 0, the linear relationship is weak or nonexistent.
Since the value of r is computed from data obtained from samples, there are
two possibilities when r is not equal to zero: either the value of r is high
enough to conclude that there is a significant linear relationship between the
variables, or the value of r is due to chance.
1.4 Regression
After the scatter plot is drawn, and compute the value of the correlation co-
efficient. If this value is significant, the next step is to determine the equation
of the regression line, which is:
The data’s line of best fit.
- Note: Determining the regression line when r is not significant is meaning-
less.
8
Frahi Fadila Chapter5 Correlation and Regression
- The purpose of the regression line is to enable the researcher to see the
trend and make predictions on the basis of the data.
The following figure shows through a scatter plot of data for two variables
that several lines can be drawn on the graph near the points. Given a scatter
plot, you must be able to draw the line of best fit. Best fit means that the
sum of the squares of the vertical distances from each point to the line is at
a minimum.
The reason you need a line of best fit is that the values of y will be predicted
from the values of x; hence, the closer the points are to the line, the better
the fit and the prediction will be.
9
Frahi Fadila Chapter5 Correlation and Regression
1.5 Exercises
Exercise 1.
Find the equation of the regression line for the data in following table, and
graph the line on the scatter plot of the data.
Company Cars (x) Revenue (y)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
Exercise 2.
The following scatter plot represents the number of assists and total points
of the scoring leaders in a game.
1. Convert
P P theP figure information
P 2 Pinto a table and find the following values:
x, y, xy, and x, y ?
2
Exercise 3.
10
Frahi Fadila Chapter5 Correlation and Regression
4. Use the equation of the regression line to predict the dependent variable
if the independent variable is equal to 300?
Exercise 4.
- Using the regression equation developed here and find the estimate of the
delivery time from the time that the shipment is available for pick-up for a
shipment of 1,000 miles?
11
Frahi Fadila Chapter5 Correlation and Regression
Answers 2:
1. The following scatter plots represents the number of assists and total
points of the scoring leaders in a game.
X
x = 18
12
Frahi Fadila Chapter5 Correlation and Regression
X 1 2 5 3 2 1 4
Y 3 4 6 3 5 4 8
X
y = 33
X
xy = 96
X
x2 = 60
X
y 2 = 175
2. P P P
n( xy) − ( x)( y)
r=p P P P P = 0.68
[n x2 − ( x)2 ] [n y 2 − ( y)2 ]
Answers 3:
y = 14.8, than
P P P P 2 P 2
x = 400, y = 200, xy = 29.2, x = 58.2,
b = 0.5, and a = 0, we can also find a with the formula: a = Ȳ − bX̄, hence:
Y = 21 × X
3. We graph the line on the scatter plot, and we note that there is a high
positive correlation.
4. Use the equation of the regression line to predict the dependent variable
if the independent variable is equal to 300, Y = 150 [2, 3, 4]
14
Bibliography
[1] Allan Bluman. Elementary Statistics: A step by step approach 8e. McGraw Hill, 2012.
[2] Thomas A. Williams David R. Anderson. Statistics for Business & Economics. 11th ed. Boston,
USA: Cengage Learning, 2015.
[3] Zealure Holcomb. Fundamentals of descriptive statistics. Routledge, 2016.
[4] OpenClassrooms. March 3, 2024. url: https : / / openclassrooms . com / fr / courses / 7410486 -
nettoyez - et - analysez - votre - jeu - de - donnees / 7461346 - familiarisez - vous - avec - les -
mesures-de-concentration.
15