2nd Software Engineering
2nd Software Engineering
• The term ’statistics’ is derived from the Latin word status, meaning
single cause.
predetermined purpose
accuracy.
Singular sense
Statistics is the science that deals with the methods of data collection,
organization, presentation, analysis and interpretation of data. According
to this definition statistical investigation have five stages
standard
or diagrams or graphs
testing procedures
quantifications
different values
common characteristics
investigations
techniques
population
4 Ratio: The scale of quantitative variable. There is true zero for this
scales E.g. Age, weight, height,... measurements
Data types
qualitative variable
Examples: Data on gender, religion, economic status, ethnicity of
subjects under investigation
quantitative variables
Examples: Data on age, weight, temperature, number of children a
family has, etc.
secondary sources
After data collection was made the collected data must be organized into
some meaningful way. Organization of data involves
2 Find u
• Arranging data: 3 3 7 8 9 11 12 13 13 15 16 17 17 18 19 19 20 20 21
21 22 22 22 22 22 22 23 23 23 23 23 24 24 24 24 24 25 25 26 26 26 27
27 28 28 28 29 30 31 33
• In any bar charts Y-axis represents the frequency for each category
frequency
Examples
• Always exist.
1 1 1
Pn 1
• x1
+ x2
+ ... + xn
= i=1 xi
Rules of Summation
Pn Pn Pn
1 For two variables x and y: i=1 (xi ± yi ) = i=1 xi ± i=1 yi
Pn Pn
2 For any constant k: i=1 kxi = k i=1 xi
Pn
3 i=1 kx = nk
Pn Pn Pn
4 i=1 (xi − k)2 = i=1 x2i − 2k i=1 xi + nk
Types of Measures of Central Tendency
two equal parts (more than 50% of our observation is below median
value whereas the remaining 50% is above the value)
• Mode: The most frequently observed values in a given data set (the
• Harmonic Mean (HM): It give good average when the observed data
n n
• For un organized data: HM = 1 =
+ x1 +...+ x1 1
Pn
x1 2 n i=1 xi
Pn
• For grouped data: HM = i=1 fi
Pn fi
i=1 xi
• The weighted harmonic mean when each observation have their own
Pn
wi
weigh is given by: HM = Pni=1 wi
i=1 xi
Median
Median is the value which located at the center and considered as the
measure of location
• This is done after identifying the median class. Median class is the
• Mode (x̂): Most frequently observed value. For un grouped data the
Mode (x̂): Most frequently observed value. For un grouped data the one
with greater frequency is the modal value
fx̂−f
x̂−1
• For grouped data: x̂ = LCBx̂ + ( f )∗w
x̂−fx̂−1 +fx̂−fx̂+1
• The modal class is the class with larger frequency of the class
Exercise on MCT
Example 1: The heights of 7 students selected from a class are given
below in centimeter. 165, 160, 172, 168, 159, 170, 173. Calculate the
simple AM of heights.
x̄ = 165+160+172+168+159+170+173
7
= 1167
7
= 166.5 cm is the average height of
the students
• Example 2: Calculate the mean amount of yield of maize, based on
the following grouped data.
Yield (in kg) No of plots (fi ) Class mark (xi ) fi mi
171-179 3 175 525
180-188 7 184 1288
189-197 12 193 2316
198-206 9 202 1818
207-215 4 211 844
216-224 4 220 880
225-233 1 229 229
Total 40 7900
P
• x̄ = Pfi xi = 7900
= 197.5 kg per plot is the average yield
fi 40
Exercise on MCT...
Example 3: A student was registered Stat 281 and Math 261 with four
credit hours and Math 224, Phil 201, and Comp 201 with three credit
hours. If the student earned B grade for the courses Stat 281, Math
261,and Phil 201 and C grade for the remaining two course find the
average score of the students
4∗3+4∗3∗+3∗3+3∗2+3∗2 45
4+4+3+3+3
= 17 = 2.64 is the average score student can earn at
the end of the semester.
• Example 5: What is the median of 180, 201, 220, 191, 219, 209 and
220
• Sorted values 180, 191, 201, 209, 219, 220, 220. Since n= 7 is odd its
• Find median and mode of 62, 63, 64, 65, 66, 66, 68 and 78.
• Sorted values 62, 63, 64, 65, 66, 66, 68, 78. Since n= 8 is odd the
• Example: Consider example 2 data and find the median and modal
value of yield.
Yield (in kg) No of plots (fi ) Class mark (xi ) LCF
171-179 3 175 3
180-188 7 184 10
189-197 12 193 22
198-206 9 202 31
207-215 4 211 35
216-224 4 220 36
225-233 1 229 40
x̄new = kx̄old
Pn
• Deviation taken from the mean is zero i.e i=1 (xi − x̄) = 0
these 7 are seniors with average weight of 165 lbs, 9 are juniors with
average weight of 160 lbs, 13 are sophomores with average weight of
152 lbs and 20 freshman with average weight of 150 lbs. Find the
average weight of students in the department.
7∗165+9∗130+3∗152+20∗150
x̄combined = 7+9+13+20
= 93.28 lbs
• The are quantiles which divides a given data sets in two more than
• Two or more data sets may have the same mean and (or) median but
they may be quite different. This implies that MCT alone do not
provide enough information about the nature of the data.
Score of class A 30 30 30
Score of class B 29 30 31
Score of class C 15 30 45
Score of class D 5 30 55
• All the four data sets have mean 30 and median is also 30. This do
not implied the data sets are similar and does not give clear picture
about the nature of data
Objectives of Measures of Variation
tendency
R
• Coefficient of Range: L+S
S
• CV = x̄
• % CV =CV*100%
Examples
• Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and
27+...+25 25+27
• x̄ = 8
= 25.5, x̃ = 2
= 26
|27−25.5|+...+|25−25.5| 34 4.25
• MDx̄ = 8
= 8
= 4.25, CMDx̄ = 25.5
= 0.1667,
|27−26|+...+|25−26| 32 4
MDx̃ = 8
= 8
= 4, CMDx̃ = 26
= 0.154
(27−25.5)2 +...+(25−25.5)2
√
• s2 = 7
= 34.57, S = 34.57 = 5.88, CV = 5.88
25.5
=
0.231
Examples
38−11
• R = UCLlast − LCLfirst = 38 − 11 = 27, CR = 38+11
= 0.551
338.28 6.04
• MDx̄ = 56
= 6.04, CMDx̄ = 25.64 = 0.24,
339.2 6.06
MDx̃ = 56
= 6.06, CMD x̃ = 26.1
= 0.23
√
• s2 = 2870.8576
55
= 52.19, s = 52.19 = 7.22, CV = 7.22
25.5
= 0.283
Properties of variance
assumed to be random
defined characteristics.
• E.g The event of getting two head with through of fair of coin twice
common (intersection)
• Exhaustive Events: events that their union forms the sample space
Counting Rules
Dawa. If she can use either plane, bus, cycle, horse, and there are 3
flights, 4 buses, 2 cycles and 3 horses available. In how many
different ways can she make her journey?
From the given problem nf = 3, nb = 4, nc = 2 and nh = 3. So she has
nf + nb + nc + nh = 3 + 4 + 2 + 3 = 12 different ways to make her trip
from Harar to Dire Dawa.
Counting Rules
and 4 t-shirts in how many possible ways this guy an wear his
clothes. ns = 3, nt = 4, nts = 4, 3*4*4= 48 possible ways
n!
• Permutation of n objects by taking r of them is given by: (n−r)!
• Possible ways of selecting r objects from the n total objects are given
n n!
by: r
= r!∗(n−r)!
Counting Rules...
5 6
(b) one particular female must be a member, 2
∗ 2
= 150
Approaches in Probability Definition
evaluation of a problem.
Some Probability Rules or Axioms
1 0 ≤ P(A) ≤ 1
2 P(S) = 1
3 P(Ac ) = 1 − P(A)
S T
4 P(A B) = P(A) + P(B) − P(A B)
balls are drawn at random, find the probability that at least one is
white. Let W is the event that at least one drawn ball is white
(6)∗(13) (6)∗(13) (6)∗(13)
P(w) = 1 19 2 + 2 19 1 + 3 19 0 = 0.7048
(3) (3) (3)
Independence Probability
and the probability that it will be well planned and well executed is
0.54. Then, what is the probability that it will be
(a) well executed given that it is well planned. Let D and E be an
events of the research project is well planned and well executed
T
respectively. Then P(D) = 0.6 and P(D E) = 0.54.
T
P(E D) 0.54
P(E/D) = P(D)
= 0.6
= 0.9
exhaustive events. Then for any event B, P(B)= ki=1 P(B/Ai )P(Ai )
P
P(Ai ) > 0 for i = 1; 2;...; k which partitioned any event B for P(B) > 0.
Then, the probability of jth event of Ai is obtained by Bayes’s Theory
which is based on multiplicative and total probability rule as:
T
P(Aj B) P(B/Aj )P(Aj )
• P(Aj /B) = P(B)
= Pk for j= 1,2, ..., k
i=1
P(B/Ai)P(Ai )
Bayes’ Theorem Example
• E.g: Microchips from a factory are sorted into three separate boxes.
Box 1 contains 25 microchips from shift 1, box 2 contains 35
microchips from shift 2, and box 3 contains 40 microchips from shift
3. There are 5, 10 and 5 defective microchips in the first, second and
third boxes, respectively. Let A denote the event that a defective
microchip is obtained and B1, B2 and B3 be the events of choosing
box 1, box 2 and box 3, respectively. All three boxes are equally likely
25 35
to be chosen. Given P(B1) = 100 , P(B2) = 100 , P(B1) =
40 5 10 5
100
, P(A/B1) = 25
, P(A/B2) = 35
, P(A/B3) = 40
The type of random variable is based on the nature of the possible out
come of a given experiment
The type of random variable is based on the nature of the possible out
come of a given experiment
1 0 ≤ p(xi ) ≤ 1
P
2 i p(xi ) = 1
P
3 p(Xi ≤ xi ) = i≤x p(xi )
Examples
X 0 1 2
1 2 1
P(X = xi ) 4 4 4
Y is given by:
P(Y = y) = cy2 , y = 0, 1, 2, 3, 4. Then find the value of c.
Probability Density function
Probability Distribution of continuous random variable is known to be
probability density function
1 f (x) ≥ 0, ∀x
R∞
2
−∞
f (x)dx = 1
Rx
3 p(X ≤ x) = ∞
f (x)dx
by:
f(x) = 2x, for 0 < x < 1,
R1
• (a)Verify whether f(x) is a pdf or not: 0
2xdx = x2 |10 = 1
R 0.75
• Find p(0.5<x<0.75)= 0.5
2xdx = x2 |0.75 2 2
0.5 = 0.75 − 0.5 = 0.315
CUMULATIVE DISTRIBUTION FUNCTION
• Var(aX)= a2 σ 2 , Var(a) = 0
• Var(X±a) = Var(X)
Examples
A coin is tossed two times. Let X be the number of heads. Find the mean
value and the standard deviation of X.
X 0 1 2
1 2 1
P(X = xi ) 4 4 4
1 2 1
P
• E(X) = xi p(xi ) = 0 ∗ 4
+1∗ 4
+2∗ 4
=1
• Var(X) =E(X 2 ) = 02 ∗ 1
4
+ 12 ∗ 2
4
+ 22 ∗ 1
4
= 1.5
Var(X) = 1.5-1 = 0.5
√ √
• SD(X) = σ = σ2 = 0.5 = 0.707
Examples
1 + x, if − 1 < x < 0
f (x) =
1 − x
if 0 < x < 1
There are a situation when define two random variable over a given
experiment. Suppose X and Y are random variables on the probability
space (Ω, A, P(.)). Their joint probability distribution describe information
about their properties relative to each other which is defined over R2 each
taking values in R.
2 0 ≤ P(X = xi , Y = yi ) ≤ 1
2 0 ≤ P(X = xi , Y = yi ) ≤ 1
2 f (x, y) ≥ 0
Y
p(xi /yi ) 0 1
0 0.2 0.6
x 1 0.4 0.2
2 0.4 0.2
• The conditional probability mass function of Y given X
Y
p(xi /yi ) 0 1
0 0.25 0.75
x 1 0.67 0.33
2 0.67 0.33
Example 2
• −1 ≤ ρXY ≤ 1
R1R1
• E(XY) = 0 0
xy( 23 x2 + y)dxdy = 34
96
R1 R1
• E(X) = 0
x( 32 x2 + 21 )dx = 58 , E(Y) = 0
y( 12 + y)dy = 7
12
• σXY = − 58 ∗ 12
34
96
7
= −26
96
= −0.01042
1 R1
• E(X 2 ) = 0 x2 ( 32 x2 + 12 )dx = 21 , E(Y 2 ) = 0 y2 ( 21 + y)dy = 5
R
30 12
• σx2 = 21
30
− ( 58 )2 = 0.3094, σy2 = 5
12
7 2
− ( 12 ) = 0.0764
1 n is indefinite
e−4 40
2 no amoeba.P(Y = 0) = 0!
= 0.0183
normal curve.
Normal probability distribution...
To get the probability under the normal density one have to integrate
−1
(x−µ)2
√ 1
R
2
e 2σ2 dx which is very complex.
2Πσ
To avoid this complexity the normal distribution should be changed to
standardized normal distribution to use normality table for finding the
probability
• After the transformation the property of the normal density and the
mean or zero
property
• The area from −∞ to 0 equal the area from 0 to ∞ which half of the
total (0.5)
Properties of Standardized Normal Distribution..
• P(z > 1) = P(z > 0) âĹŠ P(0 < z < 1) = 0.5 âĹŠ 0.3413 = 0.1587
• P(z < 1) = P(z < 0) + P(0 < z < 1) = 0.5 + 0.3413 = 0.8413
Standardized Normal Distribution Table
The college boards, which are administered each year to many thousands
of high school students, are scored so as to yield a mean of 500 and a
standard deviation of 100. These scores are close to being normally
distributed. What percentage of the scores can be expected to satisfy each
condition?
Pn
x y − n xi n
P P
n i=1 yi
• β̂ = Pn i i 2 i=1
i=1 Pn
n i=1 xi −( i=1 xi )2
• β̂0 = ȳ − β̂1 x̄
• The slope tell us the magnitude of the effect of the independent of the
between the heights of sons and the heights of their fathers. In other
words, do taller fathers have taller sons? The researcher took a
random sample of 6 fathers and their 6 sons. Their height in inches is
given below in an ordered array.
Father (x) 63 65 66 67 67 68
Son (y) 66 68 65 67 69 70
Pn Pn Pn Pn
• i=1xi yi = 26740, i=1 xi = 396, i=1 yi = 405, i=1 y2i =
27355, ni=1 x2i = 26152
P
6∗26740−396∗405 6∗26740−396∗405
• r= √ √ =0.597, β̂1 = √ = 0.625,
6∗26152−(396)2 ∗ 6∗27355−(405)2 2
6∗26152−(396)
β̂0 = 67.50.625 ∗ 66=26.25