Course3&4
Course3&4
A. DEFINITION
▪ A Random Variable (R.V) is a function, which associates a real number with each element in a
sample space. A R.V is either DISCRETE(e.g., nbr. of defects in a production line, nbr. of
admitted females students in medical school, etc.), or CONTINUOUS (e.g., Dia. of shafts produced
in a turning center, stress value in a part submitted to varying torque, normal/plane anisotropy of
sheet metal when subjected to a forming process, etc.)
Q3.2: An urn contains 4 red balls and 3 black. We draw two balls without replacement. Suppose
X a R.V, which represents the nbr. of red balls drawn, describe X.
▪ X a discrete R.V, the set of pairs (x, f(x)=p(X=x)) is the PROBABILITY FUNCTION or PROB.
DENSITY FUNCTION of X
f(x) = p(X = x)
f(x) ≥ 0) and ∑ f(x) = 1
For continuous R.V, X has zero prob. of assuming exactly any of its values, hence, its prob. dist.
cannot be given in a tabular form and intervals of continuous R.V are considered.
b f(x)dx a
p(a < X < b) = ∫ +∞ and f(x) ≥ 0 x
∫ f(x)dx = 1 −∞
CDF: CUMULATIVE DIST. FUNCTION, F(x) = p(X ≤ x) = ∫ f(x)dx
−∞
Q3.4: In a fan assembling line, the proportion of defective products can be described using a
continuous R.V, let it be X,
��(�� + ��)
��(��) { ��������
��������
���� < ��
< 1 ��
Q3.5: X is the error measurement of a physical quantity given by the density function
▪ In some situations we need to record the outcomes of numerous RVs., SIMULTANEOUSLY, e.g.,
the hardening capacity, Hc, the tensile stress (Ts), and the Yield stress (Ys) occurring in a given
material subjected to forming force. This results in a 3D sample space (Hc, Ts, Ys) and the joint
prob. dist. f (Hc, Ts, Ys) is required.
2
Q3.7: Given ��(��, ��������ℎ�����
��) { �
5(2�� + 3��) 0 ≤ ��,
∞
�� ≤ 1 0
Q1. Prove that ∬ ��(��, ��) = 1
−∞
Q2. Give Prob. (X, YϵA, 0 ≤ x≤ ½ and ¼≤ y≤ ½).
Q3. Give Prob. (X, Y ϵ A such that (�� − 1/2)2 + (�� − 1/2)2 ≥ 1/4).
D. MARGINAL DISTRIBUTION
▪ Given X and Y two RVs. The MARGINAL DISTRIBUTION functions g(x) and h(y) are obtained by
summing/integrating f(x, y) over the values of Y and X, respectively.
f(y│x) =f(x, y)
X and Y are two RVs. The CONDITIONAL DISTRIBUTION of X given Y=y is,
f(x│y) =f(x, y)
Q3.8: For questions Q3.6 and Q3.7, calculate the marginal distribution, g(x) and h(y) of the
RVs, X, and Y.
Q3.9: Two refills of an HP color printer are randomly selected from a box, which contains 3
blue refills, 2 red refills, and 3 green refills. Let X and Y be the number of blue and green refills,
which are selected, respectively, and at random.
Q1. Find the marginal density functions g(x) and h(y). Find f(y|x)
Q2. Find f(Y>½|X=0.25).
E. STATISTICAL INDEPENDENCE
Given X, Y two RVs having f(x, y), g(x), and h(y) as joint prob. dist., marginal dist. in X and
marginal dist. in Y, resp.; X and Y are STATISTICALLY INDEPENDENT if and only if,
Q3.11: Are the RVs X, and Y of Q3.6 and Q3.7 statistically independent?
F. GENERALIZATION
▪ Let f(x1, x2, .., xn) be the J.P.F of the RVs, X1, X2, .., Xn
Marginal distribution of X1
����, . . , ����) n
∑∑G(x)f(x, y) = ∑G(x)g(x)
E(G(X)) = ∞
xyx∞ ∞
{
∫ ∫ G(x)f(x, y)dxdy = ∫ G(x)g(x)dx
−∞ −∞ −∞
∑∑G(y)f(x, y) =
E(G(Y)) = ∑G(y)h(y) x y x
∞ ∞
∞
{
∫ ∫ G(y)f(x, y)dxdy = ∫ G(y)h(y)dy
−∞ −∞ −∞
Here g(x) and h(y) are the marginal distributions of the RVs X, Y.
Q3.12: X a discrete R.V which represents the number of PCs sold each Saturday from 4:00 to
9:00 P.M.
X=x 4 5 6 7 8 9
1 1 1 1 1 1
f(x)=P(X=x) 12 12 4 4 6 6
����
��(��) { �������
�������
��− �� < ����
�� < 2 ��
Find the expected value of G(X) = 4X+3.
▪ Given X, Y two RVs with a J.P.D f(x, y), the mean of G(X, Y) is,
∞ ∞
Continuous RV μG(X,Y) = E(G(X, Y)) = ∫ ∫ G(x, y)f(x, −∞dxdy
−∞y)
��(�� + ������)
��(��, ��) { ������������
������
���� < �� < 2, 0 <
�� < 1 ��
▪ VARIANCES: The mean or expected value of a R.V., X, describes where the probability
distribution is centered (Location). The variance describes the shape and spread of the
distribution, that is, it characterizes the data variability about the mean. Let X be a R.V with
P.D.F, f(x). The variance of the RV g(X) is,
2
= E[G(X) − μG(X)]2 = ∑ [G(x) − μG(X)]2
Discrete RV σG(X)
xf(x)
2
= E[G(X) − μG(X)]2 = ∫ [G(x) − μG(X)]2f(x)dx ∞
Continuous RV σG(X)
−∞
Q3.16: Calculate the variance of G(X)=2X+3 when f(x) is given by the table below
X=x 0 1 2 3
f(x) 0.51 0.38 0.10 0.01
Q3.17: X is a R.V having the density function f(x). Is f(x) a P.D.F. If not suggest a corrected
form then calculate the mean and the variance of G(X) =4X+3
Stat & Prob. for Engineers_ENSIT2020/2021 Ali TRABELSI, Dr., Eng., 7
�� ��
��(��) { ���������
���� − ���������
������(��)
▪ The covariance, ������, of two R.Vs, X, Y, is a non-scalar free measurement that evaluates
the nature of the association between X and Y. A positive covariance means that X and Y vary
monotonically. When X and Y are statistically independent the Cov. (X, Y) = 0. If Cov. (X, Y) =
0 means a nonlinear relationship between X and Y and not automatically independency. A more
used scalar-free measure of the strength of the linear relationship of X and Y is the correlation
coefficient ������.
Discrete RV
f(x, y)
σXY = E[(X − μX)(Y − μY)] = E(XY) −
μXμY = ∑∑(x − μX)(y − μY)
x
y
Continuous RV
∞
∞
σXY = E[(X − μX)(Y − μY)] = E(XY) − μXμY = ∫ ∫ (x − μX)(y − μY)f(x, y)dxdy
−∞
−∞
Q3.19: The fraction of totally nonreworkable (X) and reworkable (Y) parts in a production line
is given by the J.P.D.
▪ Given X and Y two R.Vs having ������, ���� ������ ���� as Cov. and Std Dev.,
respectively. The correlation Coef. , ������, of X and Y is given below,
ρXY =σXY
σXσY− 1 ≤ ρXY ≤ 1
▪ Given X a R.V and Y=g(X) nonlinear. The Taylor series approximation of G(X) around
X=E(X)= ����is:
(x − μX)2
∂x x=μX(x − μX)
2
+∂ G(x)
G(x) = G(μX) +∂G(x)
∂x2x=μX 2+ ⋯
If we truncate with the linear terms and take the expected value from both sides, we obtain
��[��(��)] ≈ ��(����)
E[G(X)] ≈ G(μX) σX2 2
for nonlinear cases +∂2G(X) ∂X2X=μX
2
Var[G(X)] ≈ [∂G(X)
∂X]X=μXσX2
Given X1, X2, .., Zk a set of k-independent RVs with means ��1, ��2, . . , ���� and variances
��12, ��22, . . , ����2, resp. Let Y=h(X1, X2, ..,Xk) be a nonlinear function, then,
k
E(Y) ≈ H(μ1, μ2, . . , μk) + [∂2H(x1,x2, . . , xk) ∂xi2]xi=μi
2
∑σi 2 i=1
2
Var (Y) = ∑ σi2
[∂H(x1,x2, . . , xk) k
i=1
∂xi]xi=μi
Q3.21: Given the RV, X, with ���� ������ ����2.Give the second order approximation to
E[exp(X)].
Q3.22: Given the RVs, X, and Y, with ����, ����, ����2������ ����2.Give
approximation to E[X/Y] and Var [X/Y].
H. CHEBYSHEV’S THEOREM
▪ The probability that a discrete/continuous RV, X, will assume a value within k Std. is at least
2
(1 −1�� )
Prob. (μ − kσ < X < μ + kσ) ≥ 1 −1k2
Q3.23: A RV, X, has ���� = 8 ������ ����2 = 9, the prob. dist. function is unknown. Find
out,
I. PROJECT
��������(��)
��
1−sin(��)���� 0 8. ∫
��
0 7. ∫
4
∞
��/6
������ 9. ∫cos(2��)
0 13. ∫ √��������(��)����
2 �� +∞
0 11. ∫ �� ln(x)���� 1
2
[1+cos(����)] ���� ��/4
2��
2
�� �� ���� ∞
0 10. ∫1 12. ∫ �� − 5��3������
−∞ 14. ∫ln(x)
1
0 0
17. ∫ ∫ 1 − cos(�� + ��)
�������� ��0
0 18.∫ ∫ (�� + ��������(√��2 + ��2�������� ��/6
1
5��2��2�� 0
0
������ 2��
19. ∫ ∫ 1
1. ��������(��) 2. �� − 1/√4��2
3.ln(��−3)
1−������(��+3)
��−1/3 4.
�� 5. ������ℎ(1 − ��)
������+3
��^3 7.
1−���
6.
��^3
�
LINEAR CORRELATION
AND REGRESSION
A. INTRODUCTION
▪ In many engineering and quality control problems, the estimation of the relationship between
two or more R.Vs is required, e.g., how does a tool life vary regarding the cutting speed and the
DOC? How does the octane number of gasoline vary with % purity? How does babies’ weight
vary with age and sex? How does stress in a section of steel shafts vary when a twisting torque
is applied? etc.
▪ In engineering, uses of the regression equations may help, among other purposes: i)
PREDICTION, ii) DESCRIPTION OF THE STRENGTH OF THE RELATIONSHIP BETWEEN VARIABLES, iii)
Finding out IMPORTANT INDEPENDENT VARIABLES, iv) INTERPOLATION between values of a
function, v) Determination of the OPTIMUM OPERATING CONDITIONS, vi) Discrimination between
ALTERNATIVE MODELS, vii) and/or, estimation REGRESSION COEFFICIENTS.
Q4.1: During 6 working days, a company kept records of absent workers and defective parts
(see Table)
Day 1 2 3 4 5 6
X : Nbr. of absent workers 3 5 0 1 2 6
Y: Nbr. of Defect. Parts 15 22 7 12 20 30
Q1. Plot the scatter diagram/cross plot Y=f(X) of the bi-variate data.
Q2. Comment about the distribution of the dataset (Xi, f(Xi).
When one system independent variable is of interest (others variables are either held Cte. or
their effect on the response variable is supposed to be small) the problem is a simple linear
regression.
▪ The purpose of regression is to make predictions about the Yi for some Xi OVER THE RANGE OF
̂ ̂
THE EXPERIMENT DATA. The prediction equation is �� = ���� + ������where �� is the
predicted value of Y for a given X and the L.S estimate for the parameters of the �� = ����
+ ������ + �� are given as below
̂
�� = �� + �� ��
Yi
��̅
̂
���� = ���� − ���� = ���� − (�� + �� ����)
̅
�� Xi
▪ The L.S method gives the best LINEAR UNBIASED estimates of parameters ‘a’ and ‘b’. One �� 2
way to determine coefs. ‘a’ and ‘b’ is to minimize ∑ |���� − ��̂��|
��=1. Note, no assumption has been
made as to the distribution of the random error, ε.
▪ Whereas REGRESSION is about the form of the relationship given by the equation of the
regression line, the CORRELATION determines the STRENGTH of the linear relationship between the
two variables. Even though the concept of correlation is meaningful when both variables, X and
Y, are random, the LS estimate of parameter ‘b’ (rate of change of Y per unit change in X) has
meaning for both random and controlled/fixed X.
S
The regression line of X on Y is, then, x̂ = a + by = (x̅ − xy
S
Syyy̅) + xy
Syyy
Note, the prediction equation is sometimes expressed in terms of deviation from the average,
hence, ŷ = y̅ + b(x − x̅ )
▪ After calculation of the prediction equation, the equation should be plotted over the data, i)
roughly, half the data points should be over the line and half below it, ii) the line should pass
̅ ̅
through (��, ��), iii) and the scatter plot may also indicate the presence of outliers
(observations (Xi, Yi) which deviate substantially from the rest of the data).
�� − 2=������ − ��������
��2 = ������
=������
�� − ����/2��
A 100(1-α)% CI for parameter �� − 2
β is
��−��
is �� = 0
Note: Data falling close to the regression means a strong correlation between X and Y.
Q4.2: For Q4.1 calculate the product-moment correlation coefficient as well as the regression
line.
Q4.3: The summary data for 33 pairs of (x, Y) values are as follows (see data set below)
x
37
1
5
11
1
51
6
81
6
72
8
92
7
03
5
03
5
13
0
13
0
23
2
33
4
33
2
43
4
63
7
63
8
63
4
73
6
83
8
93
7
93
6
93
5
04
9
14
1
24
0
24
4
34
7
44
4
54
6
64
6
74
9
05
Y
1
2
1
1
2
2
2
3
3
4
3
3
3
3
3
3
3
3
3
3
3
4
3
4
4
4
3
4
4
4
4
5
Q4.4: A planned experiment has considered a process response (Y: Tool Life in mn) against a
controlled factor(X: Cutting Speed in m.mn-1). The summary data is given below.
Test 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
7
2 2 2 3 3 3 3 3 3 3 3 3 3 3
x 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0
0.
3 5 2 2 5 9 8 1 3 8 0 5 1 1
6
1
4 3 3 2 3 2 1 2 1 1 2 1 1 5.
Y
5. 5. 0. 5.
4 4. 4. 5. 5. 0. 0. 0. 5. 5.
4.
3
4. 0 0 2 3
7 7 0 0 2 2 2 3 3
3
7
Q4.5: A planned experiment has considered a process response (Y: Octane Number) against a
random variable (X: % Purity). The summary data are given below.
Test 1 2 3 4 5 6 7 8 9 10 11 X 99.8 99.7 99.6 99.5 99.4 99.3 99.2 99.1 99.0
98.9 98.8 Y 88.6 86.4 87.2 88.4 87.2 86.8 86.1 87.3 86.4 86.6 87.1
30
Regression Analysis: Y41 versus
x41
25 Analysis of Variance
Source DF Adj SS Adj MS F-Value P
20 Value
1
Coefficients
Q4.3
P-Value = 0.000
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 3.83 1.77 2.17 0.038
x43 0.9036 0.0501 18.03 0.000 1.00
35
Analysis of Variance
30
Source DF Adj SS Adj MS F-Value P-Value Regression
4
25
1 1619.42 1619.42 66.78 0.000 x44 1 1619.42 1619.42
4