Statistics
Statistics
2
Outline
o Chapter 1: Introduction to Statistics.
Introduction to Statistics
4
Outline
1. Definition.
7
2.2. Descriptive Statistics
9
3. Population and Sample
o A population is the entire collection of events in
which we are interested. It can be of any size.
11
4. Individual and Variable
o Any set of data contains information about some
group of individuals. The information is organized in
variables (or data).
14
Applications
1. Identify each of the following data sets as either a
population or a sample:
15
Applications
2. Identify the following measures as either
quantitative or qualitative:
16
Applications
3. A researcher wishes to estimate the average weight
of newborns in South America in the last five years. He
takes a random sample of 235 newborns and obtains
an average of 3.27 kg.
20
Outline
1. Frequency and relative frequency.
2. Cumulative frequency.
3. Percentage.
ni ni
fi
N
and F fi 1
n
N
i
i 1
22
2. Cumulative Frequency
23
2.1. Applications
o Let the event Xi the age of 11 players in Al Ahed
football club. Calculate fi, Ni ↑, Ni ↓, Fi ↑ and Fi ↓.
Xi 20 22 25 27 30 Total
ni 2 2 3 0 4 11
Ni ↑ 2 4 7 7 11
Ni ↓ 11 9 7 4 4
24
2.2. Applications
o Let the event Xi the height in cm of 12 players in Al
Riyadi basketball club. Calculate fi, Ni ↑, Ni ↓, Fi ↑ and
Fi ↓.
Xi 185-190 190-195 195-200 200-205 205-210 Total
ni 3 2 4 1 2 12
Ni ↑ 3 5 9 10 12
Ni ↓ 12 9 7 3 2
25
3. Percentage
ni
% pi 100 f i 100
N
26
4. Graphical representations for a
qualitative variable.
o In general, a qualitative variable can be graphically
represented by:
27
4.1. Pie-Chart and Semi-Pie Chart.
o To draw a pie chart, we must calculate the angle ‘αi’ of
each event Xi.
o Every sector represents an event.
o For a Pie-Chart:
ni
i 360 f i 360
N
o For a Semi Pie-Chart:
ni
i 180 f i 180
N
28
4.1. Applications.
o The number of winning the world cup for 4 countries
is given in the table below. Draw the pie-chart.
Xi Italy Argentina Brazil Germany Total
ni 4 2 5 4 15
29
4.2. Bar graph (or Bar chart).
o The height of every bar corresponds to the frequency.
30
4.2. Applications.
o The number of winning the world cup for 4 countries
is given in the table below. Draw the bar graph.
Xi Italy Argentina Brazil Germany Total
ni 4 2 5 4 15
31
5. Graphical representations for a
discrete variable.
o In general, a discrete variable can be graphically
represented by:
32
5.1. Pie-Chart and Semi-Pie Chart.
o Same as for qualitative data.
33
5.2. Bar graph (or Bar chart).
o The height of every bar corresponds to the frequency.
ni n
i N fi f i 1
n3 f3
n2 f2
n1 f1
x1 x2 x3 xm x1 x2 x3 xm
34
5.2. Applications.
o Example: Draw the bar graph for the following data. Xi
represents the grades over 20 in statistics for 10
students in a class.
Xi 5 10 15 18 Total
ni 3 2 4 1 10
35
5.3. Polygon of frequency ni (or fi).
o Example: Draw the polygon of frequency ni for the
following data. Xi represents the grades over 20 in
statistics for 10 students in a class.
Xi 5 10 15 18 Total
ni 3 2 4 1 10
36
5.4. Polygon of cumulative frequency.
o Example: Draw the polygon of frequency ni for the
following data. Xi represents the grades over 20 in
statistics for 10 students in a class.
Xi 5 10 15 18 Total
ni 3 2 4 1 10
Ni ↑ 3 5 9 10
Ni ↓ 10 7 5 1
2. Histogram.
39
6.1. Pie-Chart and Semi-Pie Chart.
o Same as for qualitative data.
40
6.2. Histogram.
o The height of every bar corresponds to the frequency.
41
6.3. Polygon of frequency ni (or fi).
o Example: Draw the polygon of frequency of the
following data. Xi represents the weight in Kg for 20
newborns in a hospital in Beirut.
Xi [0-1 [ [1-2[ [2-3[ [3-4[ Total
ni 1 5 8 6 20
42
6.4. Polygon of cumulative frequency.
o Example: Draw the polygon of frequency of the
following data. Xi represents the weight in Kg for 20
newborns in a hospital in Beirut.
Xi [0-1 [ [1-2[ [2-3[ [3-4[ Total
ni 1 5 8 6 20
Ni ↑ 1 6 14 20
Ni ↓ 20 19 14 6
Measuring centers
46
Outline
1. The Mean Ẍ.
4. The Quantiles.
5. Box Plot.
o Ẍ is calculated for:
1) Simple series.
2) Discrete variable.
3) Quantitative variable.
48
1.1. The Mean Ẍ
o For a simple series: X1, X2, X3, …
N
Xi
X
i 1 N
2–3–3–5–3–4
N
Xi 2 3 3 5 3 4
X 3.33
i 1 N 6
49
1.2. The Mean Ẍ
o For a discrete variable where the data are collected
in a table:
N N
ni X i
X fi X i
i 1 N i 1
N
ni X i ( 2 4) (6 3) (8 1)
X 4.25
i 1 N 8
50
1.3. The Mean Ẍ
o For a continuous variable where the data are
collected in a table:
ab
N N
niCi
X f iCi and Ci
i 1 N i 1 2
where Ci is the center of the interval [a; b].
N
niCi ( 2 1) (5 3) (3 5)
X 3 .2
i 1 N 10 51
2. The Mode Mo
o Mo is calculated for qualitative and quantitative
variables.
o Mo is calculated for:
1) Qualitative variable.
2) Simple series.
3) Discrete variable.
4) Quantitative variable.
52
2.1. The Mode Mo
o Mo is the data that possess the highest ni (or fi).
53
2.2. The Mode Mo
o For a simple series: X1, X2, X3, …
2–3–3–5–3–4
⇛ Mo = 3.
54
2.3. The Mode Mo
o For a discrete variable : Mo has the highest
frequency ni (or fi).
Xi 2 6 8 Total
ni 4 3 1 8
Mo = 2
55
2.4. The Mode Mo
o For a continuous variable, we identify the modal
class which possess the highest frequency, then:
1 1 ni ni 1
Mo Li ai and
1 2 2 ni ni 1
Li = a and ai = b – a
56
2.4. The Mode Mo
o Example: Calculate the mode for the following data:
(5 2 ) 3
Mo 2 (4 2) 22 3.2
(5 2) (5 3) 3 2
58
3. The Median Me
o Me represents the central value of a statistical
distribution, and it cuts the statistical series into 2
equal distributions.
o Me corresponds to N/2.
Thus Me = 3 61
3.3. The Median Me
o For a discrete variable:
N is odd : Me X N 1
2
XN XN
1
N is even : Me 2 2
2
o Example: N = 8 (even).
Xi 2 6 8 Total
ni 4 3 1 8
o Me = X (10/2) = X5 = 5.
o Solution: N = 10 (even).
o N/2 = 5.
o [2; 4[ is the median class.
52
Me 2 2 3.2 and 3.2 belongs to [2; 4[.
5
65
3.4. The Median Me
o Graphically, for a continuous variable, Me is the
vertical axis which divides the histogram into two
parts with equal areas.
66
3.4. The Median Me
o Graphically, for a continuous variable, Me
intersects the polygon of increasing or decreasing
cumulative frequency by y = N/2 (Me intersects the
polygon of increasing or decreasing cumulative
relative frequency by y = 0.5).
67
3.4. The Median Me
o Graphically, for a continuous variable, Me is the
point of intersection of the polygon of increasing
and decreasing cumulative frequencies (Me is the
point of intersection of the polygon of increasing or
decreasing cumulative relative frequencies.
68
4. The Quantiles
o The quantiles are values taken from regular intervals
of the quantile function of a variable.
1) Quartiles Q.
2) Deciles D.
3) Percentiles P.
69
4.1. The Quartiles Q
o The quartiles Q are values that divide the statistical
series into 4 equal parts.
70
4.1. The Quartiles Q
o Q2 divides the series into 50% - 50%. Thus, Q2 = Me.
N
N i 1
Q1 Li ai 4
ni
Q2 M e
3N
N i 1
Q3 Li ai 4
ni
71
4.2. The Deciles D
o The deciles D are values that divide the statistical
series into 10 equal parts.
o They are named D1, D2, D3, D4, D5 = Me, … and D9.
72
4.3. The Percentiles P
o The percentiles P are values that divide the
statistical series into 100 equal parts.
o They are named P1, P2, P3, P4, P50 = Me, … and P99.
73
5. Box Plot
o The box plot indicates the dispersion of a given
series.
IQ Q3 Q1
76
7. Other types of mean
o The Geometric mean G: To measure the growth.
N
1
log G
N
n log X
i 1
i i
77
7. Other types of mean
o The Quadratic mean Q: is used to calculate the
standard deviation.
N
1
Q n X
2 2
i i
N i 1
H G X Q
78
Lebanese University
Measures of Spread
80
Outline
1. The Range W.
W Highest Lowest
W 60 6
W 5 1 4
83
2. The Interquartile Range IQ
o IQ is the difference between the 3rd and the 1st
quartiles Q3 and Q1. The IQ is used to build box
plots.
IQ Q3 Q1
o IQ is used to calculate the spread around the Median.
n i Xi X
AD ( Mean) i 1
N
86
3.2. The Median Absolute Deviation
o The median absolute deviation AD(Median) is the
mean of the data's absolute deviations around the
data's median: the average (absolute) distance from
the median.
n i X i Me
AD ( Median) i 1
N
87
4. The Variance V(X)
o The Variance is the square of the standard deviation
and it measures how far a set of numbers are
spread out from their mean.
n X X
N
2
i i
V ( X ) Var ( X ) i 1
N
o The variance is always positive or zero.
V (X ) 0
V (a) 0
V ( X a) V ( X )
V (aX ) a V ( X )
2
V (aX bY ) a V ( X ) b V (Y ) 2abCov( X , Y )
2 2
V (aX bY ) a V ( X ) b V (Y ) 2abCov( X , Y )
2 2
89
4. The Variance V(X)
o Another formula of the variance, is:
n X X
N
2
i i N
1
V (X ) i 1
N
N
i i
n X
i 1
2
X 2
2Xi X
N N N
1 1 2
N
i 1
ni X i
2
N
i 1
ni X
2
N
n X
i 1
i i X
N N
1 N 2 2X
N
i 1
ni X i
2
N
X
N
n Xi 1
i i
N
1
N
i i
n X
i 1
2
X 2
2 X .X
N
1
V (X )
N
i i
n X
i 1
2
X 2
90
5. The Standard Deviation σ(X)
o The standard deviation is the square root of the
variance. It is a measure that is used to quantify the
amount of variation or dispersion of a set of data
values.
(X ) V (X )
(X ) 0
(a) 0
( X a) ( X )
(aX ) a ( X )
( X Y ) V ( X ) V (Y ) 2Cov( X , Y )
92
6. The Coefficient of Variation CV.
o The coefficient of variation CV (also known as
relative standard deviation) is defined as the ratio
of the standard deviation to the mean.
CV
X
o The CV is often expressed as a percentage.
%CV 100
X
o CV is a dimensionless number.
o In analytical chemistry, CV is used to express the
precision and repeatability of an assay. 93
7. The Approximation Errors
o The approximation error in some data is the
discrepancy between an exact value and some
approximation to it.
Absolute Error : AE X X i
X Xi
Re lative Error : RE
Xi
Percent Error : PE RE 100
94
Lebanese University
Bivariate Distribution
96
Outline
1. Introduction.
2. Presentation in Table.
4. Covariance Cov(X,Y).
6. Graphical Study.
7. Regression Lines.
97
1. Introduction
o In this chapter, we will study the relation between
two different variables or data. These variables may
be qualitative, discrete or continuous.
Observation 1 2 3 4 Total
Xi X1 X2 X3 X4 -
Yj Y1 Y2 Y3 Y4 -
Frequency 1 1 1 1 4
99
3. Means and Variances
o For the variable Xi:
N
X i
X i 1
n X X X X
N N
2 2
i i i
V (X ) i 1
i 1
N N
N
1
V (X )
N
X
i 1
i
2
X 2
100
3. Means and Variances
o For the variable Yj:
N
Y
j 1
j
Y
N
n Y X Y X
N N
2 2
j j j
j 1 j 1
V (Y )
N N
N
1
V (Y )
N
Y
j 1
j
2
Y 2
101
4. Covariance Cov(X,Y)
o The covariance Cov(X,Y) is a measure of how much
two random variables vary together. It’s similar to
variance, but where variance tells you how a single
variable varies, covariance tells you how two
variables vary together:
N N
1
Cov( X , Y )
N
( X
i 1 j 1
i X )(Y j Y )
N N
1
Cov( X , Y )
N
X Y
i 1 j 1
i j XY
102
4. Covariance Cov(X,Y)
o The properties of the covariance, are:
Cov( X b, Y d ) Cov( X , Y )
Cov(aX b, cY d ) a.c.Cov( X , Y )
Cov( X , X ) V ( X )
Cov( X , Y ) X . Y
V ( X Y ) V ( X ) V (Y ) 2Cov( X , Y )
o The value of “r” does not vary with the change of the
origin or the scale in the graph.
1 r 1
* r 1 : Strong Positive Correlation
* r 1 : Strong Negative Correlation
* r 0 : Weak Correlation
* r 0 : No Correlation
107
6. Graphical Study
o Example: Strong Positive Linear Correlation: r ↦ +1.
108
6. Graphical Study
o Example: Strong Negative Linear Correlation: r ↦ -1.
109
6. Graphical Study
o Example: No Correlation: r ↦ 0. Thus Cov(X,Y) ↦ 0.
110
7. Regression Lines
o The regression line of Y on X is noted (D):
( DY / X ) : Y aX b
o To calculate a and b:
Cov( X , Y )
a
V (X )
b Y aX
111
7. Regression Lines
o The regression line of X on Y is noted (D’):
( D X / Y ) : X a ' Y b'
'
Cov( X , Y )
a'
V (Y )
b' X a ' Y
112
7. Regression Lines
o The relation between a and a’, is:
r a.a '
2
r a.a '
113
Calculator
o Shift Mode 3 = =
o Mode 3 1 REG
o Xi , Yj M+
o Shift 2: B = a
A=b
r
o Y = aX + b = BX + A (On calculator).
114
Lebanese University
116
Outline
1. Introduction.
o Ω = (A, B, C, D, ……).
o ϕ = ( ). 118
1. Introduction
o An event A is a subset of the sample space.
A Ω
A Ä
o Ä is realized if A is not realized and vice versa.
o Thus: P( A ) 1 P( A)
120
2.1. Intersection
o The intersection A ∩ B of two sets A and B is the set
that contains all elements of A that also belong to B
(or equivalently, all elements of B that also belong to
A), but no other elements.
o A ∩ B = A and B.
121
2.2. Union
o The union A ∪ B of two sets A and B is the set of
elements which are in A, in B, or in both A and B
o A ∪ B = A or B.
122
3.1. Disjoint Events
o Two disjoint events are two events that do not occur
at the same time, and no elements are common to
both.
P( A B) P( A) P( B)
124
4. Properties of Operations on Events
o Let A, B and C three events:
1) Commutativity:
A B B A
A B B A
2) Associativity:
A ( B C ) ( A B) C
A ( B C ) ( A B) C
125
4. Properties of Operations on Events
3) Distributivity:
A ( B C ) ( A B) ( A C )
A ( B C ) ( A B) ( A C )
4) Negation rule:
A A; ;
A B A B
A B A B
126
5. Uniform and Statistical Probability
1) Uniform Probability: Let Ω the sample space and A
an event of Ω. Thus, the probability of A, is:
P( A B)
P( A / B)
P( B)
o Thus, the probability of B given A, P(B/A) is:
P( A B)
P( B / A)
P( A)
128
6. Conditional Probability (The Tree)
o If A and B are two independent events, thus:
P( A B) P( A) P ( B)
P( A / B) P( A)
P( B) P( B)
P( A B) P( A) P ( B)
P ( B / A) P( B)
P ( A) P( A)
129
6. Conditional Probability (The Tree)
o The Bayes theorem is a way to figure out
conditional probability.
P( B / A) P( A)
P( A / B)
P( B)
P( A / B) P( B)
P( B / A)
P( A)
Nb.of .Permutations n!
n!
A
p
(n p)!
n
o If A ⊂ B, thus:
A⋂B=A
and A ⋃ B = B
134
Notes
o Let A and B two events, thus:
A ( A B) ( A B )
P( A) P( A B) P( A B )
135