0% found this document useful (0 votes)
29 views

Statistics

Uploaded by

Ali Allam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Statistics

Uploaded by

Ali Allam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 135

Lebanese University

Faculty of Pedagogy 1, Unesco

Dr. Ali Allam


1
Statistics and Sports

Statistics and Evaluation

2
Outline
o Chapter 1: Introduction to Statistics.

o Chapter 2: Frequency distribution and graphical


representations.

o Chapter 3: Measuring centers.

o Chapter 4: Measures of spread.

o Chapter 5: Bivariable distribution.

o Chapter 6: Events and probability.

o Chapter 7: Standard normal distribution.


3
Chapter 1

Introduction to Statistics

4
Outline
1. Definition.

2. Inferential and descriptive statistics.

3. Population and sample.

4. Individual and variable.

5. Qualitative and quantitative variable.

6. Discrete and continuous variable.


5
1. Definition
o Statistics is the science of data.

o It involves collecting, classifying, summarizing,


organizing, analyzing, and interpreting numerical
information.

o It is used in several different disciplines.

o It is used to make decisions and draw conclusions


based on data.

o There are two types of statistics: Inferential and


descriptive. 6
2.1. Inferential Statistics

o Inferential statistics utilizes sample data to make


estimates, decisions, predictions, or other
generalizations about a larger set of data.

o One of the most commonly used inferential


techniques is hypothesis testing.

o The main goal of inferential statistics is drawing


conclusions.

7
2.2. Descriptive Statistics

o Descriptive statistics utilizes numerical and


graphical methods to look for patterns, to
summarize and to present the information revealed
in a data set.

o The class of descriptive statistics includes both


numerical measures (mean, median, …) and
graphical displays of data (bar graph, pie chart, …).

o The main goal of descriptive statistics is to describe


a data set.
8
2.2. Descriptive Statistics

A pie chart A bar graph

9
3. Population and Sample
o A population is the entire collection of events in
which we are interested. It can be of any size.

o A sample is a subset of the units of a population, and


is typically smaller than the population.

o In a statistical study, all elements of a sample are


available for observation, which is not typically the
case for a population.
10
3. Population and Sample

11
4. Individual and Variable
o Any set of data contains information about some
group of individuals. The information is organized in
variables (or data).

o Individuals are the objects described by a set of data.


Individuals may be people, but they may also be
animals or things.

o A variable is any characteristic of an individual. A


variable can take different values for different
individuals.
12
5. Qualitative or Quantitative variable
o Qualitative (or categorical) data are measurements
for which there is no natural numerical scale.
o Examples of a qualitative data: color (green, blue,
yellow, …), gender (male, female, …) , size (small,
medium, large, …).

o Quantitative data are numerical measurements that


arise from a natural numerical scale. They contain
numbers.
o Examples of a quantitative data: age (10, 15, …),
height (150 cm, 160 cm, …), weight (70 kg, 80 kg, …).
13
6. Discrete and Continuous variable
o A quantitative data can be discrete or continuous.

o A discrete variable can only take certain values.


o Examples of a discrete variable: 12; 15; 16; …

o A continuous variable can take any value within a


given range (Example: interval).
o Examples of a continuous variable: 0-5; [5-10[; …

14
Applications
1. Identify each of the following data sets as either a
population or a sample:

* The grade point averages (GPAs) of all students at a


college: Population.

* The GPAs of a randomly selected group of students on


a college campus. Sample.

15
Applications
2. Identify the following measures as either
quantitative or qualitative:

* The genders of the first 40 newborns in a hospital one


year: Qualitative.
* The natural hair color of 20 randomly selected
fashion models: Qualitative.
* The ages of 20 randomly selected fashion models:
Quantitative.
* The political affiliation of 500 randomly selected
voters: Qualitative.

16
Applications
3. A researcher wishes to estimate the average weight
of newborns in South America in the last five years. He
takes a random sample of 235 newborns and obtains
an average of 3.27 kg.

* What is the population of interest? All newborn babies


in South America in the last five years.
* What is the parameter of interest? The average birth
weight of all newborn babies in South America in the last
five years.
* Based on this sample, do we know the average weight
of newborns in South America? Explain. No, not exactly,
but we know the approximate value of the average. 17
Applications
4. A sociologist wishes to estimate the proportion of all
adults in a certain region who have never married. In a
random sample of 1,320 adults, 145 have never
married, hence 145/1320 = 0.11 or about 11% have
never married.

* What is the population? All adults in the region.


* What is the parameter of interest? The proportion of
the adults in the region who have never married.
* Based on this sample, do we know the proportion of
all adults who have never married? Explain. No, not
exactly, but we know the approximate value of the
proportion. 18
Lebanese University

Faculty of Pedagogy 1, Unesco

Dr. Ali Allam


19
Chapter 2

Frequency Distribution and


Graphical Representations

20
Outline
1. Frequency and relative frequency.

2. Cumulative frequency.

3. Percentage.

4. Graphical representations for a qualitative variable.

5. Graphical representations for a discrete variable.

6. Graphical representations for a continuous variable.


21
1. Frequency and relative frequency
o The frequency ‘ni’ of a particular observation is the
number of times the event appears in the data.

o The total frequency ‘N’ is the sum of the frequencies.


N
N   ni
i 1

o The relative frequency ‘fi’ is:

ni ni
fi  
N
and F   fi  1
n
N
i
i 1

22
2. Cumulative Frequency

o It is obtained by adding each value to the sum of the


proceeding values.

o Ni ↑: Increasing cumulative frequency.

o Ni ↓: Decreasing cumulative frequency.

o Fi ↑: Increasing cumulative relative frequency.

o Fi ↓: Decreasing cumulative relative frequency.

23
2.1. Applications
o Let the event Xi the age of 11 players in Al Ahed
football club. Calculate fi, Ni ↑, Ni ↓, Fi ↑ and Fi ↓.
Xi 20 22 25 27 30 Total

ni 2 2 3 0 4 11

fi 2/11 2/11 3/11 0/11 4/11 1

Ni ↑ 2 4 7 7 11

Ni ↓ 11 9 7 4 4

Fi ↑ 2/11 4/11 7/11 7/11 1

Fi ↓ 1 9/11 7/11 4/11 4/11

24
2.2. Applications
o Let the event Xi the height in cm of 12 players in Al
Riyadi basketball club. Calculate fi, Ni ↑, Ni ↓, Fi ↑ and
Fi ↓.
Xi 185-190 190-195 195-200 200-205 205-210 Total

ni 3 2 4 1 2 12

fi 3/12 2/12 4/12 1/12 2/12 1

Ni ↑ 3 5 9 10 12

Ni ↓ 12 9 7 3 2

Fi ↑ 3/12 5/12 9/12 10/12 1

Fi ↓ 1 9/12 7/12 3/12 2/12

25
3. Percentage

o The percentage % is the frequency times 100.

ni
%  pi   100  f i  100
N

o The percentage is the most usable formula for


drawing conclusions.

26
4. Graphical representations for a
qualitative variable.
o In general, a qualitative variable can be graphically
represented by:

1. Pie-Chart and Semi Pie-Chart.

2. Bar graph (or bar chart).

27
4.1. Pie-Chart and Semi-Pie Chart.
o To draw a pie chart, we must calculate the angle ‘αi’ of
each event Xi.
o Every sector represents an event.

o For a Pie-Chart:
ni
i   360  f i  360
N
o For a Semi Pie-Chart:

ni
i   180  f i  180
N
28
4.1. Applications.
o The number of winning the world cup for 4 countries
is given in the table below. Draw the pie-chart.
Xi Italy Argentina Brazil Germany Total
ni 4 2 5 4 15

αi 96° = (4/15x360) 48° 120° 96° 360°

29
4.2. Bar graph (or Bar chart).
o The height of every bar corresponds to the frequency.

o The width of the bar has no signification.

30
4.2. Applications.
o The number of winning the world cup for 4 countries
is given in the table below. Draw the bar graph.
Xi Italy Argentina Brazil Germany Total

ni 4 2 5 4 15

31
5. Graphical representations for a
discrete variable.
o In general, a discrete variable can be graphically
represented by:

1. Pie-Chart and Semi Pie-Chart.

2. Bar graph (or bar chart).

3. Polygon of frequency ni (or relative frequency fi).

4. Polygon of cumulative frequency (Ni↑, Ni↓, Fi ↑, Fi↓).

32
5.1. Pie-Chart and Semi-Pie Chart.
o Same as for qualitative data.

o Example: Draw the semi-Pie chart for the following


data. Xi represents the grades over 20 in statistics for
10 students in a class.
Xi 5 10 15 18 Total
ni 3 2 4 1 10

αi 54° = (3/10x180) 36° 72° 18° 180°

33
5.2. Bar graph (or Bar chart).
o The height of every bar corresponds to the frequency.

o The width of the bar has no signification.

o The form of a bar graph for a discrete variable is given


below.

ni n
i N fi f i 1

n3 f3
n2 f2
n1 f1
x1 x2 x3 xm x1 x2 x3 xm
34
5.2. Applications.
o Example: Draw the bar graph for the following data. Xi
represents the grades over 20 in statistics for 10
students in a class.
Xi 5 10 15 18 Total
ni 3 2 4 1 10

35
5.3. Polygon of frequency ni (or fi).
o Example: Draw the polygon of frequency ni for the
following data. Xi represents the grades over 20 in
statistics for 10 students in a class.
Xi 5 10 15 18 Total
ni 3 2 4 1 10

36
5.4. Polygon of cumulative frequency.
o Example: Draw the polygon of frequency ni for the
following data. Xi represents the grades over 20 in
statistics for 10 students in a class.

Xi 5 10 15 18 Total
ni 3 2 4 1 10

Ni ↑ 3 5 9 10

Ni ↓ 10 7 5 1

Fi ↑ 3/10 5/10 9/10 1

Fi ↓ 1 7/10 5/10 1/10

o The form of Ni↑ is the same as for Fi ↑ , and the form of


Ni↓ is the same as for Fi↓. 37
5.4. Applications.
o The polygon of cumulative frequencies are given
below: a) Polygon of Ni↑ and b) Polygon of Ni↓.

Polygon of Ni↑ Polygon of Ni↓


38
6. Graphical representations for a
continuous variable.
o In general, a discrete variable can be graphically
represented by:

1. Pie-Chart and Semi Pie-Chart.

2. Histogram.

3. Polygon of frequency ni (or relative frequency fi).

4. Polygon of cumulative frequency (Ni↑, Ni↓, Fi ↑, Fi↓).

39
6.1. Pie-Chart and Semi-Pie Chart.
o Same as for qualitative data.

o Example: Draw the Pie chart. Xi represents the weight


in Kg for 20 newborns in a hospital in Beirut.
Xi [0-1 [ [1-2[ [2-3[ [3-4[ Total
ni 1 5 8 6 20

αi 18° = (1/20x360) 90° 144° 108° 180°

40
6.2. Histogram.
o The height of every bar corresponds to the frequency.

o The width of the bar corresponds to the width of the


interval.

o No space exist between the bars.

41
6.3. Polygon of frequency ni (or fi).
o Example: Draw the polygon of frequency of the
following data. Xi represents the weight in Kg for 20
newborns in a hospital in Beirut.
Xi [0-1 [ [1-2[ [2-3[ [3-4[ Total
ni 1 5 8 6 20

o We join the midpoints of each class interval.

42
6.4. Polygon of cumulative frequency.
o Example: Draw the polygon of frequency of the
following data. Xi represents the weight in Kg for 20
newborns in a hospital in Beirut.
Xi [0-1 [ [1-2[ [2-3[ [3-4[ Total
ni 1 5 8 6 20

Ni ↑ 1 6 14 20

Ni ↓ 20 19 14 6

Fi ↑ 1/20 6/20 14/20 1

Fi ↓ 1 19/20 14/20 6/20

o The form of Ni↑ is the same as for Fi ↑ , and the form of


Ni↓ is the same as for Fi↓. 43
6.4. Applications.
o The polygon of cumulative frequencies are given
below: a) Polygon of Ni↑ and b) Polygon of Ni↓.

Polygon of Ni↑ (or Fi ↑) Polygon of Ni↓ (or Fi ↓)


(we join the upper limit) (we join the lower limit)
44
Lebanese University

Faculty of Pedagogy 1, Unesco

Dr. Ali Allam


45
Chapter 3

Measuring centers

46
Outline
1. The Mean Ẍ.

2. The Mode Mo.

3. The Median Me.

4. The Quantiles.

5. Box Plot.

6. The relation between the measuring centers.

7. Other types of mean.


47
1. The Mean Ẍ
o Ẍ is calculated only for quantitative variable.

o Ẍ is called the arithmetic mean and is the most


usable measuring center for data analysis.

o Ẍ is calculated for:

1) Simple series.

2) Discrete variable.

3) Quantitative variable.
48
1.1. The Mean Ẍ
o For a simple series: X1, X2, X3, …

N
Xi
X 
i 1 N

o Example: Calculate the mean for the following data:

2–3–3–5–3–4

N
Xi 2  3  3  5  3  4
X    3.33
i 1 N 6

49
1.2. The Mean Ẍ
o For a discrete variable where the data are collected
in a table:
N N
ni X i
X    fi X i
i 1 N i 1

o Example: Calculate the mean for the following data:


Xi 2 6 8 Total
ni 4 3 1 8

N
ni X i ( 2  4)  (6  3)  (8  1)
X    4.25
i 1 N 8
50
1.3. The Mean Ẍ
o For a continuous variable where the data are
collected in a table:
ab
N N
niCi
X    f iCi and Ci 
i 1 N i 1 2
where Ci is the center of the interval [a; b].

o Example: Calculate the mean for the following data:


Xi [0; 2[ [2; 4[ [4; 6[ Total
ni 2 5 3 10
Ci 1 3 5 -

N
niCi ( 2  1)  (5  3)  (3  5)
X    3 .2
i 1 N 10 51
2. The Mode Mo
o Mo is calculated for qualitative and quantitative
variables.

o Mo represents the most repeated data which possess


the highest frequency ni (or fi).

o Some distributions are Bi-modal, Tri-modal, …

o Mo is calculated for:
1) Qualitative variable.
2) Simple series.
3) Discrete variable.
4) Quantitative variable.
52
2.1. The Mode Mo
o Mo is the data that possess the highest ni (or fi).

o Graphically, Mo corresponds to the tallest bar.

o Example: Brazil is the Mo.


Xi Italy Argentina Brazil Germany Total
ni 4 2 5 4 15

53
2.2. The Mode Mo
o For a simple series: X1, X2, X3, …

o Mo is the most repeated value.

o Example: Identify the Mode Mo for the following


data:

2–3–3–5–3–4

o Solution: ‘3’ is the most repeated value

⇛ Mo = 3.
54
2.3. The Mode Mo
o For a discrete variable : Mo has the highest
frequency ni (or fi).

o Graphically, Mo corresponds to the tallest bar.

o Example: Identify the Mo for the following data:

Xi 2 6 8 Total
ni 4 3 1 8

Mo = 2

55
2.4. The Mode Mo
o For a continuous variable, we identify the modal
class which possess the highest frequency, then:

1 1  ni  ni 1
Mo  Li  ai and
1   2  2  ni  ni 1

where Li is the lower limit of the modal class


ai is the width of the modal class

o Example: Let [a; b] is the modal class:

Li = a and ai = b – a
56
2.4. The Mode Mo
o Example: Calculate the mode for the following data:

Xi [0; 2[ [2; 4[ [4; 6[ Total


ni 2 5 3 10

o [2; 4[ is the modal class:

(5  2 ) 3
Mo  2  (4  2) 22  3.2
(5  2)  (5  3) 3 2

o Mo = 3.2 that belongs to [2; 4[.


57
2.4. The Mode Mo
o Graphically, for a continuous variable, we identify
the modal class in the histogram which possess the
tallest bar.

o Then we estimate the Mo graphically.

58
3. The Median Me
o Me represents the central value of a statistical
distribution, and it cuts the statistical series into 2
equal distributions.

o There exist only one median Me for one distribution.


o Me is calculated for:
1) Some Qualitative variable.
2) Simple series.
3) Discrete variable.
4) Quantitative variable.
59
3.1. The Median Me
o For qualitative data, two scales exist:

1) Nominal scale: are used for labeling variables,


without any quantitative value (Ex:
Green/blue/white/…, Male/Female, …). Here, the
Me cannot be estimated and is not meaningful.

2) Ordinal scale: the order of the values is important


and significant (Ex: unhappy/very happy/
happy/…). Here, the Me can be estimated and is
meaningful.
60
3.2. The Median Me
o For a simple series: X1, X2, X3, …

o Me corresponds to N/2.

o Example: Identify the Median Me for the following


data:
2–3–3–5–3–4
o Solution: we class the series by order:
2–3–3–3–4–5

N = 6, then N/2 = 6/2 = 3

Thus Me = 3 61
3.3. The Median Me
o For a discrete variable:
N  is  odd : Me  X N 1
2

XN  XN
1
N  is  even : Me  2 2
2

o Example: N = 8 (even).
Xi 2 6 8 Total
ni 4 3 1 8

o Me = (X4 + X5)/2 = (2 + 6)/2 = 4.


62
3.3. The Median Me
o Example: N = 9 (odd).
Xi 1 5 7 Total
ni 2 6 1 9

o Me = X (10/2) = X5 = 5.

o Graphically, Me intersects the polygon of increasing


or decreasing cumulative frequency by y = N/2.

o Graphically, Me intersects the polygon of increasing


or decreasing cumulative relative frequency by y =
0.5.
63
3.4. The Median Me
o For a continuous variable, we identify the median
class which corresponds to N/2 according to the
value of the increasing cumulative frequency:
N
 N i 1
Me  Li  ai 2
ni
where Li is the lower limit of the median class
ai is the width of the median class
ni is the frequency of the median class
N is the total frequency
N(i-1) is the increasing cumulative frequency for
the class before the median class.
64
3.4. The Median Me
o If ‘N’ is odd, we replace N/2 by (N+1)/2 in the
formula.
o Example: Calculate the Median Me.
Xi [0; 2[ [2; 4[ [4; 6[ Total
ni 2 5 3 10
Ni ↑ 2 7 10

o Solution: N = 10 (even).
o N/2 = 5.
o [2; 4[ is the median class.
52
Me  2  2  3.2 and 3.2 belongs to [2; 4[.
5
65
3.4. The Median Me
o Graphically, for a continuous variable, Me is the
vertical axis which divides the histogram into two
parts with equal areas.

66
3.4. The Median Me
o Graphically, for a continuous variable, Me
intersects the polygon of increasing or decreasing
cumulative frequency by y = N/2 (Me intersects the
polygon of increasing or decreasing cumulative
relative frequency by y = 0.5).

67
3.4. The Median Me
o Graphically, for a continuous variable, Me is the
point of intersection of the polygon of increasing
and decreasing cumulative frequencies (Me is the
point of intersection of the polygon of increasing or
decreasing cumulative relative frequencies.

68
4. The Quantiles
o The quantiles are values taken from regular intervals
of the quantile function of a variable.

o The quantiles have special names. Some of these


quantiles, are:

1) Quartiles Q.

2) Deciles D.

3) Percentiles P.

69
4.1. The Quartiles Q
o The quartiles Q are values that divide the statistical
series into 4 equal parts.

o They are named Q1, Q2 and Q3.

70
4.1. The Quartiles Q
o Q2 divides the series into 50% - 50%. Thus, Q2 = Me.

o For a continuous variable, and by using the formula:

N
 N i 1
Q1  Li  ai 4
ni
Q2  M e
3N
 N i 1
Q3  Li  ai 4
ni
71
4.2. The Deciles D
o The deciles D are values that divide the statistical
series into 10 equal parts.

o They are named D1, D2, D3, D4, D5 = Me, … and D9.

72
4.3. The Percentiles P
o The percentiles P are values that divide the
statistical series into 100 equal parts.

o They are named P1, P2, P3, P4, P50 = Me, … and P99.

73
5. Box Plot
o The box plot indicates the dispersion of a given
series.

o To draw a box plot, we:

1) We draw a horizontal (or vertical) axis.

2) We represent the following values: Minimum, Q1,


Me, Q3 and maximum.

3) We construct a rectangle parallel to the axis with a


length equal to the Interquartile Range IQ.
74
5. Box Plot
o The interquartile range IQ, is:

IQ  Q3  Q1

o An example of a box plot, is:

o In this example: Min = 71, Q1 = 210, Me = 268, Q3 =


342 and Max = 741.
75
6. The relation between the measuring centers
o The Median is always comprised between the Mode
and the Mean.

76
7. Other types of mean
o The Geometric mean G: To measure the growth.

N
1
log G 
N
 n log X
i 1
i i

o The Harmonic mean H: it depends on lowest values


(however, the arithmetic mean Ẍ depends on highest
values).
N
H N
ni

i 1 X i

77
7. Other types of mean
o The Quadratic mean Q: is used to calculate the
standard deviation.
N
1
Q  n X
2 2
i i
N i 1

o The relation between these means, is:

H G X Q
78
Lebanese University

Faculty of Pedagogy 1, Unesco

Dr. Ali Allam


79
Chapter 4

Measures of Spread

80
Outline
1. The Range W.

2. The Interquartile Range IQ.

3. The Mean and the Median absolute Deviations.

4. The Variance V(X).

5. The Standard Deviation σ(X).

6. The Coefficient of Variation CV.

7. The Approximation Errors.


81
1. The Range W
o The range of a set of data is the difference between
the largest and smallest values. It is measured in the
same units as the data.

W  Highest  Lowest

o It is used in representing the dispersion of a data.

o For a continuous distribution, the range is the


difference between the highest boundary of the last
class and the lowest boundary of the first class (or
the difference between the centers of the last class
and the first class). 82
1.1. The Range W
o Example for a simple series or a discrete variable:

Let the following data: 4 – 5 – 16 – 18 – 3


W  18  3  15
o Example for a continuous variable:
Xi [0; 2[ [2; 4[ [4; 6[ Total
ni 2 5 3 10
Ci 1 3 5 -

W  60  6
W  5 1  4
83
2. The Interquartile Range IQ
o IQ is the difference between the 3rd and the 1st
quartiles Q3 and Q1. The IQ is used to build box
plots.
IQ  Q3  Q1
o IQ is used to calculate the spread around the Median.

o 50% is the population exist in the IQ range.

o The IQ is often used to find outliers in data. Outliers


are observations that fall below Q1 - 1.5(IQ) or
above Q3 + 1.5(IQ).
84
2.1. The Interquartile Range IQ
o Example for a continuous distribution: We study the
salary is $ for 130 employees. Calculate IQ?

Xi 700-900 900-1100 1100-1300 1300-1500 1500-1700 Total


ni 15 25 55 28 7 130
Ni ↑ 15 40 95 123 130 -

o Q1: N(Q1) = 130/4 = 32.5


900-1100 is the class of Q1
Q1 = 1044$

o Q3: N(Q3) = (3*130)/4 = 97.5


1300-1500 is the class of Q3
Q3 = 1314$. Thus IQ = 1314 – 1044 = 270$. 85
3.1. The Mean Absolute Deviation
o The mean absolute deviation AD(Mean) is the mean
of the data's absolute deviations around the data's
mean: the average (absolute) distance from the
mean.

n i Xi  X
AD ( Mean)  i 1
N

86
3.2. The Median Absolute Deviation
o The median absolute deviation AD(Median) is the
mean of the data's absolute deviations around the
data's median: the average (absolute) distance from
the median.

n i X i  Me
AD ( Median)  i 1
N

87
4. The Variance V(X)
o The Variance is the square of the standard deviation
and it measures how far a set of numbers are
spread out from their mean.

 n X X
N
2
i i
V ( X )  Var ( X )  i 1
N
o The variance is always positive or zero.

o The variance of a constant random variable is zero,


and if the variance of a variable in a data set is zero,
then all the entries have the same value. 88
4. The Variance V(X)
o The properties of the variance, are:

V (X )  0
V (a)  0
V ( X  a)  V ( X )
V (aX )  a V ( X )
2

V (aX  bY )  a V ( X )  b V (Y )  2abCov( X , Y )
2 2

V (aX  bY )  a V ( X )  b V (Y )  2abCov( X , Y )
2 2

89
4. The Variance V(X)
o Another formula of the variance, is:

 n X X
N
2

 
i i N
1
V (X )  i 1

N

N
 i i
n X
i 1
2
 X 2
 2Xi X
N N N
1 1 2

N

i 1
ni X i 
2

N

i 1
ni X 
2

N
n X
i 1
i i X
N N
1 N 2 2X

N

i 1
ni X i 
2

N
X 
N
n Xi 1
i i

N
1

N
 i i
n X
i 1
2
 X 2
 2 X .X
N
1
 V (X ) 
N
 i i
n X
i 1
2
 X 2

90
5. The Standard Deviation σ(X)
o The standard deviation is the square root of the
variance. It is a measure that is used to quantify the
amount of variation or dispersion of a set of data
values.

 (X )  V (X )

o A low standard deviation indicates that the data


points tend to be close to the mean of the set.

o A high standard deviation indicates that the data


points are spread out over a wide range of values. 91
5. The Standard Deviation σ(X)
o The properties of the standard deviation, are:

 (X )  0
 (a)  0
 ( X  a)   ( X )
 (aX )  a  ( X )
 ( X  Y )  V ( X )  V (Y )  2Cov( X , Y )

92
6. The Coefficient of Variation CV.
o The coefficient of variation CV (also known as
relative standard deviation) is defined as the ratio
of the standard deviation to the mean.

CV 
X
o The CV is often expressed as a percentage.

%CV  100
X
o CV is a dimensionless number.
o In analytical chemistry, CV is used to express the
precision and repeatability of an assay. 93
7. The Approximation Errors
o The approximation error in some data is the
discrepancy between an exact value and some
approximation to it.

o The most three common errors, are:

Absolute Error : AE  X  X i
X  Xi
Re lative  Error : RE 
Xi
Percent  Error : PE  RE 100
94
Lebanese University

Faculty of Pedagogy 1, Unesco

Dr. Ali Allam


95
Chapter 5

Bivariate Distribution

96
Outline
1. Introduction.

2. Presentation in Table.

3. Means and Variances.

4. Covariance Cov(X,Y).

5. Linear Correlation Coefficient “r”.

6. Graphical Study.

7. Regression Lines.
97
1. Introduction
o In this chapter, we will study the relation between
two different variables or data. These variables may
be qualitative, discrete or continuous.

o The 1st variable is noted Xi (is the explanatory


variable), and the 2nd variable is noted Yj (is the
dependent variable).

o The relation between X and Y can be positive or


negative.

o To simplify the study, the frequency is always equal


to “1”. 98
2. Presentaion in Table
o The frequency of every observation for Xi or Yj
corresponds to “1”.

Observation 1 2 3 4 Total

Xi X1 X2 X3 X4 -

Yj Y1 Y2 Y3 Y4 -

Frequency 1 1 1 1 4

99
3. Means and Variances
o For the variable Xi:
N

X i
X  i 1

 n X X  X X
N N
2 2
i i i
V (X )  i 1
 i 1

N N
N
1
V (X ) 
N
X
i 1
i
2
X 2

100
3. Means and Variances
o For the variable Yj:
N

Y
j 1
j

Y 
N

 n Y X  Y X
N N
2 2
j j j
j 1 j 1
V (Y )  
N N
N
1
V (Y ) 
N
Y
j 1
j
2
Y 2

101
4. Covariance Cov(X,Y)
o The covariance Cov(X,Y) is a measure of how much
two random variables vary together. It’s similar to
variance, but where variance tells you how a single
variable varies, covariance tells you how two
variables vary together:

N N
1
Cov( X , Y ) 
N
 ( X
i 1 j 1
i  X )(Y j  Y )

N N
1
Cov( X , Y ) 
N
 X Y
i 1 j 1
i j  XY

102
4. Covariance Cov(X,Y)
o The properties of the covariance, are:

Cov( X  b, Y  d )  Cov( X , Y )
Cov(aX  b, cY  d )  a.c.Cov( X , Y )
Cov( X , X )  V ( X )
Cov( X , Y )   X . Y
V ( X  Y )  V ( X )  V (Y )  2Cov( X , Y )

o If X and Y are independent, thus Cov(X,Y) = 0 and


V(X+Y) = V(X) + V(Y). The opposite is not true.
103
5. Linear Correlation Coefficient “r”
o The linear correlation coefficient “r” is a measure of
the intensity of the relation of two different
variables.
Cov( X , Y ) Cov( X , Y )
r 
V ( X ).V (Y )  X . Y

o The value of “r” does not vary with the change of the
origin or the scale in the graph.

o If X and Y are independent, thus r=0. The opposite


is not true.
104
5. Linear Correlation Coefficient “r”
o The properties of the linear correlation coefficient
“r”, are:

 1  r  1
* r  1 : Strong  Positive  Correlation
* r  1 : Strong  Negative Correlation
* r  0 : Weak  Correlation
* r  0 : No  Correlation

o The sign of “r” depends on the sign of the Cov(X,Y).


105
6. Graphical Study
o The relation between X and Y is defined using a
mathematical equation which links these two
variables.

o We represent every observation (Xi,Yj) with a point


in a Cartesian graph.

o The equation of curves that link X and Y can be:


Line, Parabola, Hyperbola, Cubic curve, Exponential,
Logarithmic, Power function, ….

o In this chapter, we will the equation of line.


106
6. Graphical Study
o The totality of points plotted is named “Dispersion
diagram” or “Cloud of points”.

o We always plot the Mean Point named G(Ẍ, Ÿ) (or


the center of gravity).

107
6. Graphical Study
o Example: Strong Positive Linear Correlation: r ↦ +1.

o X increases and Y increases, or X decreases and Y


decreases.

108
6. Graphical Study
o Example: Strong Negative Linear Correlation: r ↦ -1.

o X increases and Y decreases, or X decreases and Y


increases.

109
6. Graphical Study
o Example: No Correlation: r ↦ 0. Thus Cov(X,Y) ↦ 0.

o No relation between X and Y.

110
7. Regression Lines
o The regression line of Y on X is noted (D):

( DY / X ) : Y  aX  b

o To calculate a and b:

Cov( X , Y )
a
V (X )
b  Y  aX
111
7. Regression Lines
o The regression line of X on Y is noted (D’):

( D X / Y ) : X  a ' Y  b'
'

o To calculate a’ and b’:

Cov( X , Y )
a' 
V (Y )
b'  X  a ' Y
112
7. Regression Lines
o The relation between a and a’, is:

r  a.a '
2

r a.a '

o (D) and (D’) intersects into the mean point G.

113
Calculator
o Shift Mode 3 = =

o Mode 3 1 REG

o Xi , Yj M+

o Shift 2: Ẍ, Ÿ, σ(X), σ(Y). Thus: V(X), V(Y).

o Shift 2: B = a
A=b
r
o Y = aX + b = BX + A (On calculator).
114
Lebanese University

Faculty of Pedagogy 1, Unesco

Dr. Ali Allam


115
Chapter 6

Events and Probability

116
Outline
1. Introduction.

2. Intersection and Union.

3. Disjoint and Independent Events.

4. Properties of Operations on Events.

5. Uniform and Statistical Probability.

6. Conditional Probability (The Tree).

7. Permutations, Arrangements and Combinations.


117
1. Introduction
o An event is a collection of outcomes of an
experiment to which a probability is assigned, and it
is noted A.

o The set of all possible outcomes of a probability


experiment is called sample space Ω. Thus P(Ω) = 1.

o Ω = (A, B, C, D, ……).

o The empty set ϕ or impossible event is always


unrealized. Thus P(ϕ) = 0.

o ϕ = ( ). 118
1. Introduction
o An event A is a subset of the sample space.

A Ω

o Any event A which consists of a single outcome in


the sample space is called an elementary or simple
event. An example of an elementary event: A = (4).

o Thus Cardinal (A) = Card (A) = 1, if A is an


elementary event. 119
1. Introduction
o The complement event Ā is associated to the event A.

o Ā refers to elements not in A.

A Ä
o Ä is realized if A is not realized and vice versa.

o Thus: P( A )  1  P( A)
120
2.1. Intersection
o The intersection A ∩ B of two sets A and B is the set
that contains all elements of A that also belong to B
(or equivalently, all elements of B that also belong to
A), but no other elements.

o A ∩ B = A and B.
121
2.2. Union
o The union A ∪ B of two sets A and B is the set of
elements which are in A, in B, or in both A and B

o A ∪ B = A or B.
122
3.1. Disjoint Events
o Two disjoint events are two events that do not occur
at the same time, and no elements are common to
both.

o Thus: A ∩ B = ϕ, and P(A ∩ B ) = 0, and:


P ( A  B )  P ( A)  P ( B )  P ( A  B )
P ( A  B )  P ( A)  P ( B )
123
3.2. Independent Events
o A and B are two independent events if they are not
linked. Thus A and B are unrelated events; the
outcome of one event does not impact the outcome
of the other event.

o A and B are two independent events, if:

P( A  B)  P( A)  P( B)

124
4. Properties of Operations on Events
o Let A, B and C three events:

1) Commutativity:

A B  B  A
A B  B  A

2) Associativity:

A  ( B  C )  ( A  B)  C
A  ( B  C )  ( A  B)  C
125
4. Properties of Operations on Events
3) Distributivity:
A  ( B  C )  ( A  B)  ( A  C )
A  ( B  C )  ( A  B)  ( A  C )
4) Negation rule:

A  A;   ;   
A  B  A B
A  B  A B
126
5. Uniform and Statistical Probability
1) Uniform Probability: Let Ω the sample space and A
an event of Ω. Thus, the probability of A, is:

Number.of .ways.which. A.can.happen


P( A) 
Number.of .all. possible.ways

2) Statistical Probability: The probability of an event


A is the relative frequency “f” of the realization of
this event. Thus, the probability of A, is:
P( A)  f ( A)
127
6. Conditional Probability (The Tree)
o The conditional probability of an event A, given that
the event B has occurred is denoted P(A/B) and is
readed “Probability of A given B”.

P( A  B)
P( A / B) 
P( B)
o Thus, the probability of B given A, P(B/A) is:

P( A  B)
P( B / A) 
P( A)
128
6. Conditional Probability (The Tree)
o If A and B are two independent events, thus:

P( A  B) P( A)  P ( B)
P( A / B)    P( A)
P( B) P( B)
P( A  B) P( A)  P ( B)
P ( B / A)    P( B)
P ( A) P( A)

129
6. Conditional Probability (The Tree)
o The Bayes theorem is a way to figure out
conditional probability.
P( B / A)  P( A)
P( A / B) 
P( B)
P( A / B)  P( B)
P( B / A) 
P( A)

o Other form of the formula, is:


P( B / A)  P( A)
P( A / B) 
P( B / A)  P( A)  P( B / A )  P( A )
130
7.1. Permutations
o Permutation: Is an ordered set of “n” objects.

o The number of permutations, is:

Nb.of .Permutations  n!

o Example: For a set of А, В, С.

o The number of permutations is 3! = 6. The


permutations, are: АВС, АСВ, ВАС, ВСА, САВ and
СВА. 131
7.2. Arrangements
o Arrangements: We choose “p” objects from “n”
objects in a certain order.

o The number of arrangements, is:

n!
A 
p

(n  p)!
n

o Example: For a set of А, В, С.


o The number of arrangements of 2 from 3, is equal to
3!/(3-2)!= 3!/1!= 6. The arrangements, are: АВ, BA,
AC, CA, BC and CB.
132
7.3. Combinations
o Combinations: We choose “p” objects from “n”
objects without any order.

o The number of combinations, is:


p
n! A
C p
 n
p!(n  p)! p!
n

o Example: For a set of А, В, С.


o The number of combinations of 2 from 3, is equal to
3!/2!(3-2)!= 3!/2!= 3. The combinations, are: АВ, AC
and BC.
133
Notes
o Some notes of the probability:
0  P( A)  1
P()  1; P( )  0
A A 
A A  

o If A ⊂ B, thus:

A⋂B=A
and A ⋃ B = B
134
Notes
o Let A and B two events, thus:

A  ( A  B)  ( A  B )
P( A)  P( A  B)  P( A  B )
135

You might also like